Publications

ATG creates and submits publications to various global technical conferences. Please see below for a current list of papers with authors from NetApp, including those from ATG.

Publication Year :


Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems

Haryadi S. Gunawi and Riza O. Suminto, University of Chicago; Russell Sears and Casey Golliher, Pure Storage; Swaminathan Sundararaman, Parallel Machines; Xing Lin and Tim Emami, NetApp; Weiguang Sheng and Nematollah Bidokhti, Huawei; Caitie McCaffrey, Twitter; Gary Grider and Parks M. Fields, Los Alamos National Laboratory; Kevin Harms and Robert B. Ross, Argonne National Laboratory; Andree Jacobson, New Mexico Consortium; Robert Ricci and Kirk Webb, University of Utah; Peter Alvaro, University of California, Santa Cruz, Mingzhe Hao, Huaicheng Li, and H. Birali Runesha, University of Chicago

Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 14 institutions. We show that all hardware types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. We made several important observations such as faults convert from one form to another, the cascading root causes and impacts can be long, and fail-slow faults can have varying symptoms. From this study, we make suggestions to vendors, operators, and systems designers.



ChewAnalyzer: Workload-Aware Data Management Across Differentiated Storage Pools

Xiongzi Ge, NetApp, Inc.; Xuchao Xie, NUDT; David H.C. Du, University of Minnesota; Pradeep Ganesan, NetApp, Inc.; Dennis Hahn, NetApp, Inc.;

the 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2018)



A model-based approach to streamlining distributed training for asynchronous SGD

Sung-Han Lin, NetApp; Marco Paolieri, University of Southern California; Cheng-Fu Chou, National Taiwan University; Leana Golubchik, University of Southern California

the 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2018)



Efficient Search for Free Blocks in the WAFL File System

Ram Kesavan, Matthew Curtis-Maury, and Mrinal Bhattacharjee (NetApp)

The WAFL write allocator is responsible for assigning blocks on persistent storage to data in a way that maximizes both write throughput to the storage media and subsequent read performance of data. The ability to quickly and efficiently guide the write allocator toward desirable regions of available free space is critical to achieving that goal. This ability is influenced by several factors, such as any underlying RAID geometry, media-specific attributes such as erase-block size of solid state drives or zone size of shingled magnetic hard drives, and free space fragmentation. This paper presents and evaluates the techniques used by the WAFL write allocator to efficiently find regions of free space.



A Secure Cloud with Minimal Provider Trust

Amin Mosayyebzadeh and Gerardo Ravago, Boston University; Apoorve Mohan, Northeastern University; Ali Raza and Sahil Tikale, Boston University; Nabil Schear, MIT Lincoln Laboratory; Trammell Hudson, Two Sigma; Jason Hennessey, Boston University and NetApp; Naved Ansari, Boston University; Kyle Hogan, MIT; Charles Munson, MIT Lincoln Laboratory; Larry Rudolph, Two Sigma; Gene Cooperman and Peter Desnoyers, Northeastern University; Orran Krieger, Boston University

Bolted is a new architecture for a bare metal cloud with the goal of providing security-sensitive customers of a cloud the same level of security and control that they can obtain in their own private data centers. It allows tenants to elastically allocate secure resources within a cloud while being protected from other previous, current, and future tenants of the cloud. The provisioning of a new server to a tenant isolates a bare metal server, only allowing it to communicate with other tenant’s servers once its critical firmware and software have been attested to the tenant. Tenants, rather than the provider, control the tradeoffs between security, price, and performance. A prototype demonstrates scalable end-to-end security with small overhead compared to a less secure alternative.



Empirical Evaluation and Enhancement of Enterprise Storage System Request Scheduling

Deng Zhou, San Diego State University; Vania Fang, NetApp; Tao Xie, Wen Pan, San Diego State University; Ram Kesavan, Tony Lin, and Naresh Patel, NetApp

ACM Transactions on Storage (TOS) Volume 14 Issue 2, May 201 Article No. 14



PASTE: A Network Programming Interface for Non-Volatile Main Memory

Michio Honda, NEC Laboratories Europe; Giuseppe Lettieri, Università di Pisa; Lars Eggert and Douglas Santry, NetApp

15th USENIX Symposium on Networked Systems Design and Implementation APRIL 9–11, 2018 RENTON, WA, USA



Optimizing Energy-Performance Trade-Offs in Solar-Powered Edge Devices

Peter G. Harrison, Imperial College London; Naresh M. Patel, NetApp

Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering April 09 – 13, 2018



Clay Codes: Moulding MDS Codes to Yield an MSR Code

Myna Vajha, Vinayak Ramkumar, Bhagyashree Puranik, Ganesh Kini, Elita Lobo, Birenjith Sasidharan, and P. Vijay Kumar, Indian Institute of Science, Bangalore; Alexandar Barg and Min Ye, University of Maryland; Srinivasan Narayanamurthy, Syed Hussain, and Siddhartha Nandi, NetApp ATG, Bangalore



Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems

Haryadi S. Gunawi and Riza O. Suminto, University of Chicago; Russell Sears and Casey Golliher, Pure Storage; Swaminathan Sundararaman, Parallel Machines; Xing Lin and Tim Emami, NetApp; Weiguang Sheng and Nematollah Bidokhti, Huawei; Caitie McCaffrey, Twitter; Gary Grider and Parks M. Fields, Los Alamos National Laboratory; Kevin Harms and Robert B. Ross, Argonne National Laboratory; Andree Jacobson, New Mexico Consortium; Robert Ricci and Kirk Webb, University of Utah; Peter Alvaro, University of California, Santa Cruz; H. Birali Runesha, Mingzhe Hao, and Huaicheng Li, University of Chicago