Publications

ATG creates and submits publications to various global technical conferences. Please see below for a current list of papers with authors from NetApp, including those from ATG.

Publication Year :


FlexGroup Volumes: A Distributed WAFL File System

Ram Kesavan, Google; Jason Hennessey, Richard Jernigan, Peter Macko, Keith A. Smith, Daniel Tennant, and Bharadwaj V. R., NetApp

2019 USENIX Annual Technical Conference



Managing Response Time Tails by Sharding

P. G. Harrison, Imperial College London; N. M. Patel, NetApp Inc; J. F. Pérez, Universidad del Rosario; Z. Qiu, Imperial College London

Matrix analytic methods are developed to compute the probability distribution of response times (i.e., data access times) in distributed storage systems protected by erasure coding, which is implemented by sharding a data object into N fragments, only K<; N of which are required to reconstruct the object. This leads to a partial-fork-join model with a choice of canceling policies for the redundant N−K tasks. The accuracy of the analytical model is supported by tests against simulation in a broad range of setups. At increasing workload intensities, numerical results show the extent to which increasing the redundancy level reduces the mean response time of storage reads and significantly flattens the tail of their distribution; this is demonstrated at medium-high quantiles, up to the 99th. The quantitative reduction in response time achieved by two policies for canceling redundant tasks is also shown: for cancel-at-finish and cancel-at-start, which limits the additional load introduced whilst losing the benefit of selectivity amongst fragment service times.



Storage Gardening: Using a Virtualization Layer for Efficient Defragmentation in the WAFL File System

Ram Kesavan, Matthew Curtis-Maury, Vinay Devadas, and Kesari Mishra, NetApp

As a file system ages, it can experience multiple forms of fragmentation. Fragmentation of the free space in the file system can lower write performance and subsequent read performance. Client operations as well as internal operations, such as deduplication, can fragment the layout of an individual file, which also impacts file read performance. File systems that allow sub-block granular addressing can gather intra-block fragmentation, which leads to wasted free space. This paper describes how the NetApp® WAFL® file system leverages a storage virtualization layer for defragmentation techniques that physically relocate blocks efficiently, including those in read-only snapshots. The paper analyzes the effectiveness of these techniques at reducing fragmentation and improving overall performance across various storage media.



TDDFS: A Tier-Aware Data Deduplication-Based File System

Zhichao Cao, Hao Wen, University of Minnesota; Xiongzi Ge, NetApp; Jingwei Ma, Nankai University; Jim Diehl, David H. C. Du; University of Minnesota

With the rapid increase in the amount of data produced and the development of new types of storage devices, storage tiering continues to be a popular way to achieve a good tradeoff between performance and cost-effectiveness. In a basic two-tier storage system, a storage tier with higher performance and typically higher cost (the fast tier) is used to store frequently-accessed (active) data while a large amount of less-active data are stored in the lower-performance and low-cost tier (the slow tier). Data are migrated between these two tiers according to their activity. In this article, we propose a Tier-aware Data Deduplication-based File System, called TDDFS, which can operate efficiently on top of a two-tier storage environment.



Yodea: Workload Pattern Assessment Tool for Cloud Migration

Rukma Talwadker and Cijo George, NetApp

As the news around cloud repatriations gets real, many cloud technologists associate them with poor understanding of the applications and their usage patterns by the enterprises. Our solution, Yodea, is a tool cum methodology to analyze work-load patterns in the light of cloud suitability. We bring forward compute patterns which can benefit from cloud economics with on-demand compute scaling. Yodea further ranks workloads in terms of their cloud suitability on the basis of these metrics. After the fact analysis of storage workloads for a customer install-base, features 38% of the “already in cloud” volumes in the top 100 ranked list by Yodea.



Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems

Haryadi S. Gunawi and Riza O. Suminto, University of Chicago; Russell Sears and Casey Golliher, Pure Storage; Swaminathan Sundararaman, Parallel Machines; Xing Lin and Tim Emami, NetApp; Weiguang Sheng and Nematollah Bidokhti, Huawei; Caitie McCaffrey, Twitter; Gary Grider and Parks M. Fields, Los Alamos National Laboratory; Kevin Harms and Robert B. Ross, Argonne National Laboratory; Andree Jacobson, New Mexico Consortium; Robert Ricci and Kirk Webb, University of Utah; Peter Alvaro, University of California, Santa Cruz, Mingzhe Hao, Huaicheng Li, and H. Birali Runesha, University of Chicago

Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 14 institutions. We show that all hardware types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. We made several important observations such as faults convert from one form to another, the cascading root causes and impacts can be long, and fail-slow faults can have varying symptoms. From this study, we make suggestions to vendors, operators, and systems designers.



ChewAnalyzer: Workload-Aware Data Management Across Differentiated Storage Pools

Xiongzi Ge, NetApp, Inc.; Xuchao Xie, NUDT; David H.C. Du, University of Minnesota; Pradeep Ganesan, NetApp, Inc.; Dennis Hahn, NetApp, Inc.;

the 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2018)



A model-based approach to streamlining distributed training for asynchronous SGD

Sung-Han Lin, NetApp; Marco Paolieri, University of Southern California; Cheng-Fu Chou, National Taiwan University; Leana Golubchik, University of Southern California

the 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2018)



Efficient Search for Free Blocks in the WAFL File System

Ram Kesavan, Matthew Curtis-Maury, and Mrinal Bhattacharjee (NetApp)

The WAFL write allocator is responsible for assigning blocks on persistent storage to data in a way that maximizes both write throughput to the storage media and subsequent read performance of data. The ability to quickly and efficiently guide the write allocator toward desirable regions of available free space is critical to achieving that goal. This ability is influenced by several factors, such as any underlying RAID geometry, media-specific attributes such as erase-block size of solid state drives or zone size of shingled magnetic hard drives, and free space fragmentation. This paper presents and evaluates the techniques used by the WAFL write allocator to efficiently find regions of free space.



A Secure Cloud with Minimal Provider Trust

Amin Mosayyebzadeh and Gerardo Ravago, Boston University; Apoorve Mohan, Northeastern University; Ali Raza and Sahil Tikale, Boston University; Nabil Schear, MIT Lincoln Laboratory; Trammell Hudson, Two Sigma; Jason Hennessey, Boston University and NetApp; Naved Ansari, Boston University; Kyle Hogan, MIT; Charles Munson, MIT Lincoln Laboratory; Larry Rudolph, Two Sigma; Gene Cooperman and Peter Desnoyers, Northeastern University; Orran Krieger, Boston University

Bolted is a new architecture for a bare metal cloud with the goal of providing security-sensitive customers of a cloud the same level of security and control that they can obtain in their own private data centers. It allows tenants to elastically allocate secure resources within a cloud while being protected from other previous, current, and future tenants of the cloud. The provisioning of a new server to a tenant isolates a bare metal server, only allowing it to communicate with other tenant’s servers once its critical firmware and software have been attested to the tenant. Tenants, rather than the provider, control the tradeoffs between security, price, and performance. A prototype demonstrates scalable end-to-end security with small overhead compared to a less secure alternative.