Are Social Media companies like Facebook, LinkedIn, and the like underutilizing the data they’ve accumulated? I think marketers from all media companies and indeed all companies want to know more about their customers and about prospective customers. Data Warehousing was an attempt to understand their customers better, but it’s not adequate for today’s challenges of massive amounts of data. The solution is a different approach, because even with the rapid growth of compute and storage power, there is still a chasm between what is done today and what could be done.
Smart phones and tablets do more than provide internet downloads and interesting QR links. They are the delivery mechanism for an explosion of tweets, blogs and other unstructured data that in aggregate is useful, if only it could be captured and analyzed. Due to the size of these datasets the traditional relational databases will have difficulty processing and indexing this data since new data is continually arriving at an increasing rate.
The solution is still unfolding, but it appears that parallel computing structures hold the best hope of attacking this mountain of data. Hadoop is a tool to tackle this job that was spun out of Yahoo! and is an open source program that can distribute the workload to try and turn data into information. Hadoop has spawned several startups looking to commercialize the solution to Big Data, and are joined by several large companies like IBM and EMC who already have horses in this race.