Flash Memory in a Big Data World

The Flash Memory Summit just concluded in Silicon Valley.  The vendors and attendees were legion and the excitement level was high.  Several vendors were fishing for employees and the industry definitely felt like it was on the growth path.  One of the common footballs being passed back and forth was “what role does flash memory play in the enterprise?”  Refinements on that theme include will SSD replace HDD?  “Will new applications, like Big Data, enable a new age for flash?”

Let’s take a quick look at flash memory first, then we can assess how it might impact Big Data applications.  Flash memory is nonvolatile semiconductor memory.  That is to say, unlike DRAM memory, flash memory remembers even after the power is turned off.  It is like DRAM memory in that they are both semiconductor technologies.  Why is flash memory getting a lot of attention now?  It’s all about price.  The cost of flash memory has plummeted.  One speaker mentioned that in the first half of 2012 prices dropped 46%, and was expected to drop about 64% for the year!!!  It was pointed out that SSDs that use flash memory now are price competitively with 15K HDDs, the high-end enterprise HDDs found in many storage arrays.  That will enable a lot of new applications and attention. 

Flash memory is the heart of the sold state disc or SSD.  But you can also get flash as a card or just chips.  SSDs are different from raw Flash in that SSDs have a controller, some software, an interface like SAS or SATA instead of a bus interface like PCIe.  The technology in flash is NAND logic semiconductor and can be found in lots of consumer electronics like your camera, cell phone and thumb drive.  It is incorporated into other components and packaging for your SSD.

It sounds great, and we can expect SSDs to take over from HDDs, right?  Not so fast.  It’s fair to say every technology has some good and bad points.  Latency is a great advantage for solid state memory, either DRAM or Flash.  Flash has some issues.  Write performance is a challenge, and can even be slower than an enterprise HDD.  Another issue for the enterprise is the traditional RAS (reliability, availability and serviceability).  The flash issue is endurance.  Heavy use, especially writes, will cause a flash memory sell to plug up and cease to function.  Flash has a limited life.  The good news is that there are workarounds.  As mentioned earlier the SSD drive isn’t just flash, there are other components, and these can be used to ameliorate some of the downsides of flash.

How does this impact Big Data?  In-memory applications will benefit from SSDs since semiconductor flash will have similar low latency advantages as DRAM but not as fast and not as expensive, so these jobs will see a benefit.  Also looking at the needs of the Big Data job is the best way to predict if there will be enough benefit to offset the expense of Flash.  For instance, Hadoop is a batch job.  There are benefits to getting faster execution of a batch job, but consider how time sensitive your requirements are since flash can be pricey. 

Another issue is the configuration.  Most of the SSDs are found in storage arrays where virtualization and Big Data might coincide.  If storage is being configured to support a NoSQL job like Cassandra, Mongo DB and the like, this could be a good use of flash.  If the configuration is a direct attached storage, in the range of traditional Hadoop clusters, it might not make sense.  LSI was making a case at the Flash Memory Summit that their hybrid drive with flash and rotating disc was a perfect solution for DAS in a Big Data cluster.  They claimed a 37% reduction in run time for a sample Terasort job in a Hadoop environment.

Other speakers feel strongly that bandwidth is the requirement for Big Data performance.  You can see why, since Big Data apps may involve moving massive amounts of data.  Moving that data quickly can be a major determinate of overall performance.    

The basics require that you understand the nature of your job to put the right storage in the right configuration, and there might be a vendor out there with some enabling software that might optimize for your situation.  One vendor was showing how their algorithms can double the performance of their SSDs over standard SSDs.  This just demonstrates how variable your results might be.  Even so, it’s nice to have options, and the Flash Memory Summit showed how even Big Data can benefit from this fast changing technology.