Hadoop Summit 2012 Summary

The Haddop Summit June 13 and 14 of 2012 was attended by over 2,000 Big Data geeks according to the organizers.  There were over 100 sessions and keynotes.  There were several particularly interesting comments made by the speakers.


Real time systems, resource management and hardware are the next Hadoop Frontiers.  – Scott Burke Yahoo!


We expect to see over 50% of the world’s data on Hadoop by 2015.  -Shaun Connolly Hortonworks


Data will grow from 1.2ZB in 2010 to 35.2ZB in 2020 –IDC Digital Universe Study quoted at the summit


Yahoo! has a Big Data configuration including over 40,000 nodes!  It uses proprietary management to handle this monster.  It services 3,000 users inside Yahoo! 


For more normal configurations, large was considered to be 3-6,000 nodes in operation. 


I was surprised at how many people were muttering about how to displace Oracle installations with Hadoop.  The Open Source movement is something like a religion to some people, but the community has their best shot at a real application based beach head with Hadoop and its attendant components.


There are several tools to try and paste together a comprehensive system, and some providers like Hortonworks and Cloudera have put together stable Hadoop platforms with a set of tools to make it usable.  The established players are getting in on the action too. 


IBM had a significant presence with their BigInsights version of Hadoop.  VMware was very active on the demo floor and with presentations discussing how their approach will bring some additional discipline and tools to the party. 


One of the very fun parts was hearing from companies and organizations that have implemented Hadoop and are using it for real results today.  Because the nature of Hadoop use is to boost one’s competitive position, in many cases the details were missing, but no less compelling.  @WalmartLabs presented, and was also recruiting.  Indeed, almost every presentation mentioned that the company was hiring. 


It’s a party like 1999.


VMware has a different take on storage for the Hadoop cluster.  By design, Hadoop uses local direct attach storage.  VMware wants to take the efficiencies of storage networking to the Hadoop cluster.  They showed some data about direct storage being cheaper than storage networked configurations, and then talked about managing storage more efficiently with VMware.  The major benefit touted was the availability through use of failover of critical components, like the namenode.  They have a project Serengeti to manage Hadoop and provide some structure for availability.


A number of speakers were addressing the issue of working Hadoop into existing relational databases.  Sqoop was repeatedly mentioned as a mechanism to import data into Hadoop to make the analytic efforts more comprehensive and useful. 


Finally, log files, the long forgotten detritus of the data center are getting some respect.  Now that there is a method (Hadoop) of using this data to predict upcoming faults and data center problems, log data is getting new attention.  Log files are now used in security analytics to look for patterns of incursion or threats. 



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s