The Power of Good Questions

Kaushik Das of Greenplum, at a recent Big Data Cloud meetup spoke about Big Data as an enabler of 21st Century storytelling.  One of the comments made was about the Chompsky vs Norvig argument regarding patterns that can be found in random data.  Correlation is not cause and effect, a difference that most of the press still does not comprehend.  Chompsky is the premier linguist in the world, and feels we need to understand cause and effect, not just correlation as identified by Google’s Norvig.

It does bring into focus the ability of analytics to bring insights versus pattern searching.  The difference is in the quality of the questions asked.  I think that because analytics is pattern matching, in essence, that one can derive more insight with multiple questions, and I believe this is how most practitioners approach Big Data analytics.  The tools of analytics are powerful and provide greater reach than previously possible.  No longer are we limited to averages and samples, but a complete analysis of the data is possible.  This is important because at the fringes things can become non-linear with spectacular impact.  Look at the financial crisis or hurricane prediction as a couple of examples of where data that works very predictably inside the middle of a normal distribution can become very wild at the extremes.

Real insight becomes storytelling.  I mean this in the highest sense.  Storytelling that allows us to understand systems, be they natural, or based on human behavior can be tremendously powerful.  At the best it becomes predictive.  At the worst we are awaiting 100 monkeys with typewriters to produce Shakespeare.  When in doubt, ask more questions.  Do more analytics, and get more perspective on the story.