IDC June 2011
Extracting Value from Chaos
By John Gantz and David Reinsel
The IDC team spent some time putting together a report that started off discussing the mammoth amount of data that has arrived with the advent of social media, data collection in the natural world and in man-made systems. The numbers are staggering: 1.8 zettabytes (that’s 21 zeros after the 1.8) which represents a growth of data of about 9 times what it was just 5 years ago. IDC estimates that by 2015 there will be 7.9ZB in existence.
Most of this data is “unstructured”. In the digital world, structured data exists in tables and databases and can be indexed, searched and manged. Unstructured data includes things like emails, video, photos, voice conversations and is the vast majority of data in existence.
Since over 90% of the digital data is unstructured, given that someone will want to extract some value from this information, the rush is on to create tools to manage this resource. In addition to new tools to organize this information, a new approach to locate this information has emerged in cloud storage. These two technologies are changing the data storage environment.
Deriving value from all this data is just on the early steps. Protecting this data is also in early stages, with perhaps only half of sensitive data being adequately protected today. The famous lost laptop computer with sensitive information on it that seems to appear in the news as a regular feature reminds us it’s not just technology, but people and their behavior that can put data at risk.
IDC describes “Big Data” as the analysis and use of large volumes of data, not just the fact that there are large volumes of data. They are predicting that cloud storage will be responsible for much of the management of these massive amounts of data, and that the move to monetize the value of this data will be led by the cloud storage vendors. Cloud vendors will collect, analyze and enable third parties to utilize this technology for their data monetization.
Security is another evolving requirement for Big Data. The different needs and different regulatory requirements creates a hierarchy of security. IDC estimated that 28% of all data required some level of security. For instance a youtube upload may require only privacy on the users email address. Information that MIGHT be required for litigation discovery in an organization would need to be protected to some extent. Sensitive personal information held by a custodian requires a higher standard of protection, and organization confidential information will often have a higher standard of protection. Finally, military and banking information would reflect needs for the highest protection.
IDC has also created the concept of a “digital shadow” that covers the information about a person, not information that person has created. As technology and our presence in the digital world expand, so does our “digital shadow”. For instance our facebook page might have data we put there, but also data put there by other people, and digital shadow information would also include who our friends are and information that can be derived from those relationships.
The point of the paper is to reinforce and expand current storage management strategies and practices, as well as consider new approaches, like cloud storage, to manage the emergence of Big Data. This change typically has to be managed in a flat to modestly increasing resource environment. This will challenge our best industry resources to keep up with the challenge.