NetApp presents two technical papers and sponsors USENIX’s Federated Conference Week 2014

NetApp presents two technical papers and sponsors USENIX’s Federated Conference Week 2014 NetApp is a silver sponsor and will be presenting at USENIX’s Annual Technical Conference & HotStorage during the Federated Conference Week 2014 in Philadelphia, PA, on June 17-20, 2014.

NetApp’s experts: Douglas Santry & Kaladhar Voruganti will present a technical paper on Violet: A Storage Stack for IOPs/Capacity Bifurcated Storage Environments at ATC on Thursday, June 19, from 8:30 – 10:10 AM EST.

  • This paper describes a storage system called Violet that efficiently marries fine-grained host side data management with capacity optimized backend disk systems. Currently, for efficiency reasons, real-time analytics applications are forced to map their in-memory graph like data structures on to columnar databases or other intermediate disk friendly data structures when they are replicating these data structures to protect them from node failures. Violet provides efficient fine-grained end-to-end data management functionality that obviates the need to perform this intermediate mapping. Violet presents the following two key innovations that allow us to efficiently do this mapping between the finegrained host side data structures and capacity optimized backend disk system: 1) efficient detection of updates on the host that leverages hardware in-memory transaction mechanisms and 2) efficient streaming of finegrained updates on to a disk using a new data structure called Fibonacci Arrays.

NetApp’s experts: Srinivasan Narayanamurthy, Ranjit Kumar, and Siddhartha Nandi, will present a paper on presenting Evaluation of Codes with Inherent Double Replication for Hadoop at HotStorage on Wednesday, June 18, from 11 AM – 12:15 PM EST.

  • This paper evaluates the efficacy, in a Hadoop setting, of two coding schemes, both possessing an inherent double replication of data. The two coding schemes belong to the class of regenerating and locally regenerating codes respectively, and these two classes are representative of recent advances made in designing codes for the efficient storage of data in a distributed setting. In comparison with triple replication, double replication permits a significant reduction in storage overhead, while delivering good MapReduce performance under moderate work loads. The two coding solutions under evaluation here, add only moderately to the storage overhead of double replication, while simultaneously offering reliability levels similar to that of triple replication.

One might expect from the property of inherent data duplication that the performance of these codes in executing a MapReduce job would be comparable to that of double replication. However, a second feature of this class of code comes into play here, namely that under both coding schemes analyzed here, multiple blocks from the same coded stripe are required to be stored on the same node. This concentration of data belonging to a single stripe negatively impacts MapReduce execution times. However, much of this effect can be undone by simply adding a larger number of processors per node. Further improvements are possible if one tailors the Map task scheduler to the codes under consideration. We present both experimental and simulation results that validate these observations. For more information about USENIX’s Federate Conference Week please visit: https://www.usenix.org/conference/fcw14.