Cloud Challenges, Disruptive Change, and Data Protection: Information from the Tenth NetApp University Day

About 80 people attended the tenth NetApp University Day on February 27, the day before USENIX FAST. Participants included more than 20 faculty members and postdoc researchers from 18 different universities, 14 graduate students, 12 ATG engineers, and 19 technical and product leaders from various groups. We also invited five distinguished speakers and hosted a successful poster session at the end of the event. Everyone enjoyed the lively discussions between academic faculty, students, researchers, engineers, and industry leaders.

Mark Bregman, NetApp SVP and CTO, gave a “top of mind” keynote speech. Mark shared his thoughts on data management challenges in hybrid cloud environments and the need for a unified data fabric to manage data across heterogeneous clouds. EVP Dave Hitz, a NetApp cofounder, shared lessons he has learned in moving forward with NetApp’s hybrid cloud vision. Cloud is a disrupting change that is changing the way software is delivered and consumed. It’s not enough to simply port traditional enterprise systems so that they can be run in the cloud. We should offer the system’s capabilities as services for applications within the cloud. Software delivery needs to expand beyond traditional delivery. It should be available in app-store-like packaging and use a pay-as-you-go charge model.

In addition to the inspiring talks by NetApp leadership, we heard from three industry speakers. The first speaker was Tushar Bandopadhyay, a technical director and product architect from Veritas. He shared the challenges in each of the four stages of the data lifecycle. A basic requirement for good data management is the ability to classify data and derive its value. He also discussed dark data — data that is never accessed after being stored. The ability to detect this type of data could lead to major savings in the long term. Tushar also described the Data Genomics Index, which analyzes the data access patterns of many customers. The data is made available in online reports.

Our second industry speaker was Brent Welch, a senior staff software engineer from Google. Brent shared how Google Cloud Storage evolved as the computational model changed. Google Filesystem (GFS) is a replication-based cluster filesystem that delivers its value on cheap servers and disks. On top of GFS, MapReduce, Bigtable, and Dremel were introduced. Brent related that as new computation models such as real-time and interactive analysis emerged, it became necessary to build Colossus, the next-generation Google Filesystem. Google also built other data systems, such as Megastore and Spanner, to meet the requirements of new use cases. Brent also described Kubernetes, for managing containerized applications running in heterogeneous clouds. There is also a need for “Kubernetes for data” to manage large-scale datasets distributed across multiple clouds.

Our final industry speaker was Krishna Narayanaswamy, cofounder and chief scientist of Netskope, a startup in the cloud security space. He described how the adoption of cloud and mobile technologies renders traditional security tools of little use in the new environments. He highlighted three areas in need of continued research: data classification, data protection, and threat protection. Krishna stated that human-based classification is insufficient and unreliable and that machine-based classification and verification are a must. At the moment, encryption is the common approach to protect sensitive data, but once data is encrypted, the operations or computations that can be performed on it are limited. Novel key management is also now required so that data owners can keep control and flexible use of their data. Finally, new vulnerabilities from using cloud services, such as the recent MongoDB ransomware situation, require better approaches to protect data and IT infrastructure, whether it’s running on the owner’s premises or in the public cloud.

The event ended with a happy hour with food and beer. Active discussions about posters and new project ideas flowed in the hallway. We hope that everyone enjoyed the event, and we’re all looking forward to next year.

— NetApp University Day Organizing Committee 2017

NetApp ATG awards five NFF research grants

In the last month, the Advanced Technology Group (ATG) at NetApp awarded five universities with a NetApp Faculty Fellowship. Recipients of NFF awards have demonstrated to NetApp through a proposal and review process to be working on research that is likely to make significant and relevant contributions to the greater body of work and to the storage and data management industry.

The five recently awarded NetApp Faculty Fellowships include:

The benefits of a NFF award include:

  • NetApp recommends the department, university or institution for a grant from the NetApp University Research Fund (NURF) at the Silicon Valley Community Foundation (SVCF).
  • A sponsor from NetApp will be assigned to communicate and in some cases may collaborate with project’s PI(s) and team for one year.
  • An invitation for the PI and one graduate student to attend the next annual University Day held by ATG.

For more information:

NetApp presents two technical papers and sponsors USENIX’s Federated Conference Week 2014

NetApp presents two technical papers and sponsors USENIX’s Federated Conference Week 2014 NetApp is a silver sponsor and will be presenting at USENIX’s Annual Technical Conference & HotStorage during the Federated Conference Week 2014 in Philadelphia, PA, on June 17-20, 2014.

NetApp’s experts: Douglas Santry & Kaladhar Voruganti will present a technical paper on Violet: A Storage Stack for IOPs/Capacity Bifurcated Storage Environments at ATC on Thursday, June 19, from 8:30 – 10:10 AM EST.

  • This paper describes a storage system called Violet that efficiently marries fine-grained host side data management with capacity optimized backend disk systems. Currently, for efficiency reasons, real-time analytics applications are forced to map their in-memory graph like data structures on to columnar databases or other intermediate disk friendly data structures when they are replicating these data structures to protect them from node failures. Violet provides efficient fine-grained end-to-end data management functionality that obviates the need to perform this intermediate mapping. Violet presents the following two key innovations that allow us to efficiently do this mapping between the finegrained host side data structures and capacity optimized backend disk system: 1) efficient detection of updates on the host that leverages hardware in-memory transaction mechanisms and 2) efficient streaming of finegrained updates on to a disk using a new data structure called Fibonacci Arrays.

NetApp’s experts: Srinivasan Narayanamurthy, Ranjit Kumar, and Siddhartha Nandi, will present a paper on presenting Evaluation of Codes with Inherent Double Replication for Hadoop at HotStorage on Wednesday, June 18, from 11 AM – 12:15 PM EST.

  • This paper evaluates the efficacy, in a Hadoop setting, of two coding schemes, both possessing an inherent double replication of data. The two coding schemes belong to the class of regenerating and locally regenerating codes respectively, and these two classes are representative of recent advances made in designing codes for the efficient storage of data in a distributed setting. In comparison with triple replication, double replication permits a significant reduction in storage overhead, while delivering good MapReduce performance under moderate work loads. The two coding solutions under evaluation here, add only moderately to the storage overhead of double replication, while simultaneously offering reliability levels similar to that of triple replication.

One might expect from the property of inherent data duplication that the performance of these codes in executing a MapReduce job would be comparable to that of double replication. However, a second feature of this class of code comes into play here, namely that under both coding schemes analyzed here, multiple blocks from the same coded stripe are required to be stored on the same node. This concentration of data belonging to a single stripe negatively impacts MapReduce execution times. However, much of this effect can be undone by simply adding a larger number of processors per node. Further improvements are possible if one tailors the Map task scheduler to the codes under consideration. We present both experimental and simulation results that validate these observations. For more information about USENIX’s Federate Conference Week please visit: https://www.usenix.org/conference/fcw14.