FAST13 USENIX Conference on File and Storage Technologies February 12–15, 2013 in San Jose, CA
If you’re not familiar with the geekfest called USENIX and their file and storage technology conference, it is a very scholarly affair. Papers are submitted on a variety of file and storage topics, and the top picks present their findings to the audience. The major component and system vendors are there along with a wide variety of academic and national labs.
Let’s review a paper about using SSDs in high performance computing where there are a large number of nodes. See the reference at the end for details regarding the paper.*
The issue is how to manage two jobs on one data set. The example in the paper is a two-step process in the high-end computing world. Complex simulations are being done on supercomputers then the results are moved to separate systems where the data is subject to analytics. Analytics are typically done on smaller systems in a batch mode. The high-end computing (HEC) systems that do the simulations are extremely expensive, and keeping them fully utilized is important. This creates a variety of issues in the data center that include the movement of data between the supercomputer and the storage farm, analytic performance and the power required for these operations. The approach proposed is called “Active Flash”.
The floating point operations performed on the HEC systems are designed for the simulation, not the typical analytic workload. This results in the data being moved to another platform for analytics. The growth in the data (now moving to exabytes) is projected to increase costs so that just moving the data will be comparable in cost to the analytic processing. In addition, extrapolating the current power cost to future systems indicates this will become the primary design constraint on new systems. The authors expect the power to increase 1000X in the next decade while the power envelope available will only be 10X greater. Clearly, something must be done.
The authors have created an openSSD platform Flash Translation Layer (FTL) with data analysis functions to prove their theories about an Active Flash configuration to reduce both the energy and performance issues with analytics in a HEC environment. Their 18,000 compute node configuration produces 30TB of data each hour. On-the-fly data analytics are performed in the staging area, avoiding data migration performance and energy issues. By staging area we are talking about the controller in the SSDs.
High Performance Computing (HPC) tends to be bursty with I/O intensive and compute intensive activity. It’s common that a short I/O burst will be followed by a longer computational activity period. These loads are not evenly split, indeed I/O is usually less than 5% of overall activity. The nature of the workload creates an opportunity for some SSD controller activity to do analytics. As SSD controllers move to multi-core this creates more opportunity for analytics activity while the simulations are active.
The model to identify which nodes will be suitable for SSDs is a combination of capacity, performance, and write endurance characteristics. The energy profile is similarly modeled to predict the energy cost and savings of different configurations. The author’s experimental models were tested in different configurations. The Active Flash version actually extends the traditional FTL layer with analytic functions. The analytic function is enabled with an out-of-band command. The result is elegant and outperforms the offline analytic or dedicated analytic node approach.
The details and formulas are in the referenced paper, and are beyond my humble blog. But for those thinking of SSDs for Big Data, it appears the next market is to enhance the SSD controller for an Active Flash approach to analytics.
*The paper is #119 “Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines” by Devesh Tiwari 1, Simona Boboila 2, Sudharshan S. Vazhkudai 3, Youngjae Kim 3, Xiaosong Ma 1, Peter J. Desnoyers 2 and Yan Solihin 1 1North Carolina State University 2Northeastern University 3Oak Ridge National Laboratory.