Lars Eggert, NetApp
The Network and Distributed System Security Symposium (NDSS)
San Diego, CA, USA
This paper is the first to evaluate the feasibility of deploying QUIC, a new UDP-based transport protocol currently undergoing IETF standardization, directly on resource-constrained IoT devices. It quantifies the storage, compute, memory and energy requirements of the Quant QUIC stack on two different IoT platforms, and finds that a minimal standards-compliant QUIC client currently requires approximately 58 to 63KB of flash, around 4KB of stack, and can retrieve 5KB of data in 4.2 to 5.1 s over 0-RTT or 1-RTT connections, using less than 16 KB of heap memory (plus packet buffers), less than 4 KB of stack memory and less than 1.09 J of energy per transaction.
Ram Kesavan, Matthew Curtis-Maury, Vinay Devadas, and Kesari Mishra; NetApp
ACM Transactions on Storage (TOS)
Article No.: 25
As a file system ages, it can experience multiple forms of fragmentation. Fragmentation of the free space in the file system can lower write performance and subsequent read performance. Client operations as well as internal operations, such as deduplication, can fragment the layout of an individual file, which also impacts file read performance. File systems that allow sub-block granular addressing can gather intra-block fragmentation, which leads to wasted free space. Similarly, wasted space can also occur when a file system writes a collection of blocks out to object storage as a single large object, because the constituent blocks can become free at different times. The impact of fragmentation also depends on the underlying storage media. This article studies each form of fragmentation in the NetApp® WAFL®file system, and explains how the file system leverages a storage virtualization layer for defragmentation techniques that physically relocate blocks efficiently, including those in read-only snapshots. The article analyzes the effectiveness of these techniques at reducing fragmentation and improving overall performance across various storage media.
Daniel Gruss, Erik Kraft, Graz University of Technology; and Trishita Tiwari, Boston University; Michael Schwarz, Graz University of Technology; Ari Trachtenberg, Boston University; Jason Hennessey, NetApp; and Alex Ionescu, CrowdStrike and Anders Fogh, Intel
Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security
November 11 – 15, 2019
London, United Kingdom
We present a new side-channel attack that targets one of the most fundamental software caches in modern computer systems: the operating system page cache. The page cache is a pure software cache that contains all disk-backed pages, including program binaries, shared libraries, and other files. On Windows, dynamic pages are also part of this cache and can be attacked as well, e.g., data, heap, and stacks. Our side channel permits unprivileged monitoring of accesses to these pages of other processes, with a spatial resolution of 4kB and a temporal resolution of 2µs on Linux (≤6.7 measurements per second), and 466ns on Windows 10 (≤223 measurements per second). We systematically analyze the side channel by demonstrating different hardware-agnostic local attacks, including a sandbox-bypassing high-speed covert channel, an ASLR break on Windows 10, and various information leakages that can be used for targeted extortion, spam campaigns, and more directly for UI redressing attacks. We also show that, as with hardware cache attacks, we can attack the generation of temporary passwords on vulnerable cryptographic implementations. Our hardware-agnostic attacks can be mitigated with our proposed security patches, but the basic side channel remains exploitable via timing measurements. We demonstrate this with a remote covert channel exfiltrating information from a colluding process through innocuous server requests.
- A copy of the paper can be found at: link.
Hoda Maleki (University of Connecticut); Kyle Hogan (MIT); Reza Rahaeimehr (University of Connecticut); Ran Canetti, Mayank Varia, Jason Hennessey (Boston University and NetApp); Marten van Dijk (University of Connecticut); Haibin Zhang (UMBC)
IEEE Secure Development Conference
September 25 – 27, 2019
We initiate an effort to provide a rigorous, holisticand modular security analysis of OpenStack. OpenStack is theprevalent open-source, non-proprietary package for managingcloud services and data centers. It is highly complex and consistsof multiple inter-related components which are developed byseparate, loosely coordinated groups. All of these properties makethe security analysis of OpenStack both a worthy mission and achallenging one. We base our modeling and security analysis inthe universally composable (UC) security framework. This allowsspecifying and proving security in a modular way â€” a crucialfeature when analyzing systems of such magnitude. Our analysishas the following key features:
1) It isuser-centric: It stresses the security guarantees givento users of the system in terms of privacy, correctness, andtimeliness of the services.
2) It considers the security of OpenStack even when some ofthe components are compromised. This departs from thetraditional design approach of OpenStack, which assumesthat all services are fully trusted.
3) It ismodular: It formulates security properties for individualcomponents and uses them to prove security properties ofthe overall system.
Specifically, this work concentrates on the high-level struc-ture of OpenStack, leaving the further formalization and moredetailed analysis of specific OpenStack services to future work.Specifically, we formulate ideal functionalities that correspond tosome of the core OpenStack modules, and then proves securityof the overall OpenStack protocol given the ideal components.
As demonstrated within, the main challenge in the high-level design is to provide adequately fine-grained scoping ofpermissions to access dynamically changing system resources.We demonstrate security issues with current mechanisms in caseof failure of some components, propose alternative mechanisms,and rigorously prove adequacy of then new mechanisms within ourmodeling.
- A copy of the paper can be found at: link.
Rukma Talwadker, Deepti Aggarwal; NetApp Inc
2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
September 16-20, 2019
PORTO DE GALINHAS, BRAZIL
Software system configuration problems are fairly prevalent and continue to impair the reliability of the underlying system software. Configurations also play an important role in establishing the quality of the software. With every configuration “knob” we delegate a responsibility to the user and also, we might make the software vulnerable to a failure, poor performance and other system operational issues. Efforts to facilitate a healthy configuration can be summarized by the way of following steps: 1) Gain knowledge about what defines a configuration; 2) operationalize a mechanism to mine popular or recommended configuration defaults; and 3) leverage insights for improving software quality or faster troubleshooting and fixing in the case of a software failure. Using PopCon, a tool that we built, we target all three aspects in a closed-loop fashion, by focussing on storage system software from NetApp, ONTAP data management software. We learn popular configurations from the deployed community, evaluate active configurations, deliver actionable information through this tool. Our findings have been encouraging. We can report that about 99% of our ONTAP software user community gravitates towards popular configuration values. Though about 20% of the configuration parameters initially need a custom or user input, we have found that over a period of a few months, systems adopt these popular values. Also, there is a high correlation between the number of outstanding deviations from the popular values and the number of active support cases on these systems. Further, we have also learned that for about 40% of the systems with support cases, deviations disappear at about the time of case closures. Finally, PopCon capabilities presented here are simple to implement and operationalize in any software system.
- A copy of the paper can be downloaded at: link.
Ram Kesavan, Google; Jason Hennessey, Richard Jernigan, Peter Macko, Keith A. Smith, Daniel Tennant, and Bharadwaj V. R., NetApp
2019 USENIX Annual Technical Conference
The rapid growth of customer applications and datasets has led to demand for storage that can scale with the needs of modern workloads. We have developed FlexGroup volumes to meet this need. FlexGroups combine local WAFLÂ® file systems in a distributed storage cluster to provide a single namespace that seamlessly scales across the aggregate resources of the cluster (CPU, storage, etc.) while preserving the features and robustness of the WAFL file system.
In this paper we present the FlexGroup design, which includes a new remote access layer that supports distributed transactions and the novel heuristics used to balance load and capacity across a storage cluster. We evaluate FlexGroup performance and efficacy through lab tests and field data from over 1,000 customer FlexGroups.
P. G. Harrison, Imperial College London; N. M. Patel, NetApp Inc; J. F. Pérez, Universidad del Rosario; Z. Qiu, Imperial College London
ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS)
Volume 4 Issue 1, March 2019
Article No. 5
Matrix analytic methods are developed to compute the probability distribution of response times (i.e., data access times) in distributed storage systems protected by erasure coding, which is implemented by sharding a data object into N fragments, only K<; N of which are required to reconstruct the object. This leads to a partial-fork-join model with a choice of canceling policies for the redundant N−K tasks. The accuracy of the analytical model is supported by tests against simulation in a broad range of setups. At increasing workload intensities, numerical results show the extent to which increasing the redundancy level reduces the mean response time of storage reads and significantly flattens the tail of their distribution; this is demonstrated at medium-high quantiles, up to the 99th. The quantitative reduction in response time achieved by two policies for canceling redundant tasks is also shown: for cancel-at-finish and cancel-at-start, which limits the additional load introduced whilst losing the benefit of selectivity amongst fragment service times.
Ram Kesavan, Matthew Curtis-Maury, Vinay Devadas, and Kesari Mishra, NetApp
7th USENIX Conference on File and Storage Technologies (FAST)
FEBRUARY 25–28, 2019
BOSTON, MA, USA
As a file system ages, it can experience multiple forms of fragmentation. Fragmentation of the free space in the file system can lower write performance and subsequent read performance. Client operations as well as internal operations, such as deduplication, can fragment the layout of an individual file, which also impacts file read performance. File systems that allow sub-block granular addressing can gather intra-block fragmentation, which leads to wasted free space. This paper describes how the NetApp® WAFL® file system leverages a storage virtualization layer for defragmentation techniques that physically relocate blocks efficiently, including those in read-only snapshots. The paper analyzes the effectiveness of these techniques at reducing fragmentation and improving overall performance across various storage media.
Zhichao Cao, Hao Wen, University of Minnesota; Xiongzi Ge, NetApp; Jingwei Ma, Nankai University; Jim Diehl, David H. C. Du; University of Minnesota
ACM Transactions on Storage (TOS)
Volume 15 Issue 1, March 2019
With the rapid increase in the amount of data produced and the development of new types of storage devices, storage tiering continues to be a popular way to achieve a good tradeoff between performance and cost-effectiveness. In a basic two-tier storage system, a storage tier with higher performance and typically higher cost (the fast tier) is used to store frequently-accessed (active) data while a large amount of less-active data are stored in the lower-performance and low-cost tier (the slow tier). Data are migrated between these two tiers according to their activity. In this article, we propose a Tier-aware Data Deduplication-based File System, called TDDFS, which can operate efficiently on top of a two-tier storage environment.
Specifically, to achieve better performance, nearly all file operations are performed in the fast tier. To achieve higher cost-effectiveness, files are migrated from the fast tier to the slow tier if they are no longer active, and this migration is done with data deduplication. The distinctiveness of our design is that it maintains the non-redundant (unique) chunks produced by data deduplication in both tiers if possible. When a file is reloaded (called a reloaded file) from the slow tier to the fast tier, if some data chunks of the file already exist in the fast tier, then the data migration of these chunks from the slow tier can be avoided. Our evaluation shows that TDDFS achieves close to the best overall performance among various file-tiering designs for two-tier storage systems.
Rukma Talwadker and Cijo George, NetApp
2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)
10-13 Dec. 2018
As the news around cloud repatriations gets real, many cloud technologists associate them with poor understanding of the applications and their usage patterns by the enterprises. Our solution, Yodea, is a tool cum methodology to analyze work-load patterns in the light of cloud suitability. We bring forward compute patterns which can benefit from cloud economics with on-demand compute scaling. Yodea further ranks workloads in terms of their cloud suitability on the basis of these metrics. After the fact analysis of storage workloads for a customer install-base, features 38% of the “already in cloud” volumes in the top 100 ranked list by Yodea.