Professor Jason Flinn and his students at the University of Michigan have built a prototype file system for archival data that selectively replaces file data with logs that reproduce that data. This substantially reduces the bytes written and stored for cold file data, even compared to aggressive storage efficiency mechanisms such as delta compression and chunk-based deduplication.
NetApp Faculty Fellowships (NFF) encourage leading-edge research in storage and data management and to foster relationships between academic researchers and engineers. Please see below for a current list of ATG’s NFFs.
When distributed systems fail in the field, identifying the root cause and pinpoint the faulty software component or machine can be extremely hard and time-consuming. This research aims to provide an end-to-end solution to automate the diagnosis of production failures on distributed software stack solely using the unstructured logs output. It builds this in three parts. First, it aims to design a new postmortem diagnosis tool to automatically reconstruct the extensive domain knowledge of the programmers who wrote the code; it does this by relying on the principle (“Flow Reconstruction Principle”) that programmers log events such that one can reliably reconstruct the execution flow a posteriori. However, any postmortem debugging relying on log output hinges on the efficacy of such logging. This is the focus of the second part, which is, to measure the quality of software’s log output. Finally, they intend to use this measurement to ultimately automate software logging itself.
The aim of this project is to make cloud data secure, available and accessible to authorized users. To protect against disk crashes, multiple copies of data might be stored. The cloud service provider (CSP) should abide by the terms of the Service Level Agreement (SLA). However, the CSP can be untrusted and might delete or modify data. To protect against this, data auditing is required. However, unlike maintaining transaction logs, the CSP should be able to prove the data owner that the data is intact and the data can be retrieved correctly. Proofs of storage are thus important. Data owner might also wish to delegate the auditing task to a third party. Thus, it is important that the third party performs the audit, without even knowing the content. This is known as privacy preserving data auditing. Most of the techniques are not practical for the dynamic case (where client can modify data) and multi-server model. In this project, we will aim at designing practical and provably secure privacy-preserving auditing schemes. We will use techniques from authenticated data structures, signature schemes, cryptographic accumulators, secure network coding to solve the problem.
A large population of customers can be aﬀected by sudden slowdown or abnormal behavior of enterprise wide application or product. Analysts and developers of large scale systems spend considerable time dealing with functional and performance bugs. Timely identiﬁcation of signiﬁcant change in application behavior may help us in providing early informative warning and subsequently prevent the negative impact on the service. In this project, we aim to develop a framework to predict the sudden system anomaly in advance by analysis of log data generated by sub modules in system and to develop an automated warning system to protect the system slipping towards failure. Note that this was the ﬁrst year of the project where start-up grant has been provided by NetApp to primarily explore the problem and produce initial results. However, we have several achievements and several insights developed during this one-year stint
The architecture of a data system is defined by the chosen data layouts, data access algorithms, data models and data flow methods. Different applications and hardware require a different architecture design for optimal performance (speed, energy). Yet, so far all data systems are static, operating within a single and narrowly defined design space (NoSQL, NewSQL, SQL) and hardware profile. Historically, a new data system architecture requires at least a decade to reach a stable design. However, hardware and applications evolve rapidly and continuously, leaving data-driven applications locked with sub-optimal systems or with systems that simply do not have the desired functionality or the right data model. Our goal in this project is to make it extremely easy to design and test a new data system in a matter of a few hours or days as opposed to several years. Given a data set, a query workload and a hardware profile, a self-designing data system evolves such that its architecture matches the properties of the environment. The whole system design is generated automatically and adaptively by being able to create numerous individual system components that can be combined to synthesize alternative full system designs. A self-designing system continuously performs automatic synthesis of components to evaluate new designs at the lowest levels of database architectures such the data layout, access methods and execution strategies. This research creates opportunities to bootstrap new applications, to automatically create systems tailored for specific scenarios, to minimize system footprint as well as to automatically adapt to new hardware.
In computer systems, caches create the ability to simply and effectively boost the performance of any downstream layer, either storage or memory. Literature is rife with a variety of cache replacement algorithms that have optimized cache utilization irrespective of the workload. Unfortunately, conventional cache replacement algorithms have been designed for datapathcaches, wherein each cache miss leads to a cache insertion operation and in most cases, a cache eviction operation as well. These cache updating operations are expensive, often unnecessary, and in many cases counter-productive to cache performance. Non-datapath caches, on the other hand, are not required to perform a cache update on every cache miss. Thus, one can apply opportunistic cache updates, whereby case-by-case decisions can be made whether to perform a cache update. A host-side flash cache is an example of a non-datapath cache. Host-side flash caches are attractive because they can reduce the demands placed on network storage, speed up I/O performance, and provide I/O latency and throughput control.
Reinventing RDMA with Remote Direct Function Access (RDFA)
The advent of fast non-volatile, main memory technologies (e.g., PCM, RRAM, or STTM) leads to new trade-offs in designing storage systems and the network protocols used to access them. Conventional application-level network protocols (e.g., the set of RPCs that constitute the NFS interface) are too slow fully exploit the low-latency these new memories offer. RDMA is a possible alternative, but its limited functionality (just read and write) means that many operations require multiple RDMA requests, each of which requires a network round trip, forfeiting any efficiency gains.
Network and Storage Stack Specialization for Performance
Over the last two years, Ilias Marinos, a doctoral student at the University of Cambridge, in collaboration with Mark Handley (UCL) and Robert Watson, have been pursuing a research project on clean-slate network-stack design and network-stack specialization, testing two hypotheses: (1) that current network-stack designs, dating from the 1980s, fail to exploit contemporary architectural features for performance and hence suffer significant penalties – and that re-architecting fundamental aspects of stack design with micro-architectural awareness will dramatically improve performance; and (2) that the generality in current network-stack designs, substantially hampers application performance whereas ‘specialized’ stacks that integrate applications with the network stack itself can offer dramatic performance improvement opportunities. The team has prototyped a clean-slate, userspace TCP stack, published at SIGCOMM 2014, illustrating these effects on high-performance network traffic for in-DRAM workloads, experiencing substantial performance benefits (e.g., 6x throughput with a tiny fraction of CPU utilization) for HTTP and DNS workloads. The team proposes to extend this work to include a clean-slate network-storage stack based on a new userspace ‘diskmap’ facility for PCI-attached flash to cater to workloads with footprints greater than DRAM size – e.g., high-volume HTTP/HTTPS content-delivery networks (CDNs), and RPC-based filesystem services. Where sensible and appropriate, the team propose to open source (and upstream) artifacts developed during this work under a BSD-style license, as well as pursue collaboration opportunities with NetApp.
Characterization of Operating Systems for System Intensive Workloads
Due to continued technology scaling as predicted by the Moore’s law, the number of cores per chip are doubling roughly every two years. As a result a lot of applications that required large clusters, and mainframes, are being run on server processors that have a unified view of memory and storage. Some of these applications such as micro-blogging, algorithmic trading, map-reduce based analytics, as well as traditional applications such as web servers and file servers are being re-architected to run on large multicore servers. With the advent of faster memory and storage technologies the adoption of servers for such applications has accelerated over the past few years. Along with improvements in hardware, we need improvements in software as well, particularly the OS and the hypervisor (virtual machine). The aim of this project is to see how prepared our current operating systems are in handling the deluge of novel workloads that are expected to run on them, and additionally what features of operating systems are the most beneficial for a certain class of workloads.
Scale-Checkable Cloud Systems
Distributed cloud software infrastructures (i.e., cloud systems) have emerged as a dominant backbone for many modern applications. Cloud systems such as scale-out storage systems, computing frameworks, synchronization services, and cluster management services become “the operating system” of cloud computing, and thus users expect high reliability from these systems.