Notes from the Churchill Club meeting December 7, 2011
Keith Collins, SVP SAS
Gil Elbaz, CEO Factual
Ping Li, Partner Accel Partners
Luke Lonergan, CTO Greenplum/EMC
Anand Rajaman, SVP WalmartLabs
Michael Chui, Senior Fellow McKinsey
The meeting was in the style of a panel responding to questions from Michael Chui who took the role of moderator. Ultimately, the questions came from the audience. Michael did a nice job of extracting comments from the whole of the panel. The flow of the discussions tended to wander once the audience members were polled, so you may note some issues emerging more than once.
Some of the more interesting comments I’ve captured in this blog. For instance the differentiation was made between Open Data, which might be transparent, easy to access and community driven versus Big Data. Big Data might include Open Data, but is not limited to Open Data. Information that has limitations on its use such as medical information is not Open Data, by law. There were a number of questions probing different aspects of the relative openness of data. The panel agreed this is an area that is still evolving, and the use of abstracted data, or anonymous data is still evolving due to privacy concerns and laws. Legislative change on data use and ownership is still an open issue.
Social data might include public or Facebook information. Facebook might be an example where the person owns the data and offers permission for its use on a broader basis. One use of this might be such as Shopping Cat where you can research what to get your friends based on what is found by your agent crawling through facebook.
One tool available today for Big Data analysis is Hadoop. Hadoop is an open source parallel file system designed to be used with Big Data. Due to the difficulty of working with Hadoop a few strategies have emerged. One is to create a team of experts to manage and create the system around Hadoop to generate business intelligence from massive amounts of unstructured data. The panel seemed to agree that there is a skill shortage of qualified people to perform these tasks. The emergence of the data scientist who would acquire, analyze and tell stories with the data is becoming a high demand position. Additionally, the skill set to realize the potential of Big Data needs to be collaborative. New ideas and behavior modeling will come from outside traditional IT skills. It is through the introduction of new approaches and the large amount of data to be analyzed that the power of Big Data will be felt. The democratization of data will better exploit the data resource. With Big Data comes the need for data quality, and the tools in this area are growing.
One of the structural problems with Big Data is the break with traditional CPU/Memory storage architecture. To pursue a parallel file system, like Hadoop, the business intelligence applications need to be restructured. Since you are not just sampling data, the sheer volume of the information to be processed overwhelms traditional architectures.
One of the significant new sources of Big Data is the mobile location data from smart phones and similar devices. This also harkens back to the discussion of sensitive data and privacy concerns.
Applications are still evolving using the power of Big Data. Greenplum/EMC, creator of a Big Data appliance has created a recognition event for their customers called DataHeros. It recognizes people who are making good use of data in new ways. Some applications include new ways to identify credit card fraud, child abuse and even helping the government identify tax cheats.
Another application is Kaggle, a crowd-sourcing platform for data analytics, and is co-sponsoring a contest with a purse of $3 million to help solve difficult problems, in particular to help resolve a preventative health care problem.
New Big Data business ideas might include:
- Changes in the way of buying local services with coupons and local commerce identified by geo-locating capabilities.
- Mobileand Big Data intersect in a way that might be useful, not harassing. It might enable better management of a fleet of delivery trucks, for example.
- There are ways to be explored that combine public and private data for a more complete view of a problem.
- Smart grid management could be a natural for Big Data to help manage the nation’s energy.
- Data from security cameras has already helped security agencies worldwide, but maybe the next generation might include millions of cheap cameras to provide a more complete view of our world. They could get so cheap we could put cameras on pigeons. (I trust you appreciate the humor of the panelists)
- To better monitor pollution we could put sensors on bikes. As the cost of sensors becomes more affordable, they could be come widespread and give us new ways of looking at the world around us.
Something that is happening now is EMC is using Big Data via an employee prize. The winner at EMC looked at customer service issues and found a way to make products more reliable.
Walmart.com is using Big Data to learn about products and customers in a social sense. They look at blogs to see what products are interesting. They can compare different geographical areas to see what products are popular in an area via social media input.
By the way, data isn’t just black and white. It would make things easier to regulate and categorize if that were the case. We live in a probabilistic world, and there is a lot of grey areas, which will make going forward with Big Data and regulations tricky.