Tag Archives: fast

Warming Up Storage-Level Caches with Bonfire

fast13_button_125.pngY. Zhang, G. Soundararajan, M. W. Storer, L. N. Bairavasundaram, S. Subbiah, A. C. Arpaci-Dusseau and R. H. Arpaci-Dusseau

Bonfire is a mechanism for accelerating cache warmup for large caches so that application service levels can be met significantly sooner than would be possible with on-demand warmup.

Large caches in storage servers have become essential for meeting service levels required by applications. These caches need to be warmed with data often today due to various scenarios including dynamic creation of cache space and server restarts that clear cache contents. When large storage caches are warmed at the rate of application I/O, warmup can take hours or even days, thus affecting both application performance and server load over a long period of time.

We have created Bonfire, a mechanism for accelerating cache warmup. Bonfire monitors storage server workloads, logs important warmup data, and efficiently preloads storage-level caches with warmup data. Bonfire is based on our detailed analysis of block-level data-center traces that provides insights into heuristics for warmup as well as the potential for efficient mechanisms. We show through both simulation and trace replay that Bonfire reduces both warmup time and backend server load significantly, compared to a cache that is warmed up on demand.

In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13)

San Jose, February, 2013.

Resources

fast13-bonfire.pdf

MixApart: Decoupled Analytics for Shared Storage Systems

fast13_button_125.pngMadalin Mihailescu, University of Toronto and NetApp; Gokul Soundararajan, NetApp; Cristiana Amza, University of Toronto

MixApart uses an integrated data caching and scheduling solution to allow MapReduce computations to analyze data stored on enterprise storage systems.

Abstract:

Distributed file systems built for data analytics and enterprise storage systems have very different functionality requirements. For this reason, enabling analytics on enterprise data commonly introduces a separate analytics storage silo. This generates additional costs, and inefficiencies in data management, e.g., whenever data needsto be archived, copied, or migrated across silos. MixApart uses an integrated data caching and scheduling solution to allow MapReduce computations to analyze data stored on enterprise storage systems. The front-end caching layer enables the local storage performance required by data analytics. The shared storage back-end simplifies data management.

We evaluate MixApart using a 100-core Amazon EC2 cluster with micro-benchmarks and production workload traces. Our evaluation shows that MixApart provides i) up to 28% faster performance than the traditional ingest then-compute workflows used in enterprise IT analytics, and ii) comparable performance to an ideal Hadoop setup without data ingest, at similar cluster sizes.

In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’13), February 2013.

Resources

fast13-final58.pdf

iDedup: Latency-aware, inline data deduplication for primary storage

fast12_button.jpgKiran Srinivasan, Tim Bisson, Garth Goodson, and Kaladhar Voruganti.

In this paper, we propose an inline deduplication solution, iDedup, for primary workloads, while minimizing extra IOs and seeks.

Deduplication technologies are increasingly being deployed to reduce cost and increase space-efficiency in corporate data centers. However, prior research has not applied deduplication techniques inline to the request path for latency sensitive, primary workloads. This is primarily due to the extra latency these techniques introduce. Inherently, deduplicating data on disk causes fragmentation that increases seeks for subsequent sequential reads of the same data, thus, increasing latency. In addition, deduplicating data requires extra disk IOs to access on-disk deduplication metadata. In this paper, we propose an inline deduplication solution, iDedup, for primary workloads, while minimizing extra IOs and seeks.

Our algorithm is based on two key insights from real-world workloads: i) spatial locality exists in duplicated primary data; and ii) temporal locality exists in the access patterns of duplicated data. Using the first insight, we selectively deduplicate only sequences of disk blocks. This reduces fragmentation and amortizes the seeks caused by deduplication. The second insight allows us to replace the expensive, on-disk, deduplication metadata with a smaller, in-memory cache. These techniques enable us to trade off capacity savings for performance, as demonstrated in our evaluation with real-world workloads. Our evaluation shows that iDedup achieves 60-70% of the maximum deduplication with less than a 5% CPU overhead and a 2-4% latency impact.

In Proceedings of the USENIX Conference on File and Storage Technologies 2012 (FAST ’12)

Resources

idedup-FAST12.pdf

Improving the throughput of small disk requests with proximal I/O

fast11_button.jpg Jiri Schindler, Sandip Shete, and Keith Smith.

This paper introduces proximal I/O, a new technique for improving random disk I/O performance in file systems.

The key enabling technology for proximal I/O is the ability of disk drives to retire multiple I/Os, spread across dozens of tracks, in a single revolution. Compared to traditional update-in-place or write-anywhere file systems, this technique can provide a nearly seven-fold improvement in random I/O performance while maintaining (near) sequential on-disk layout. This paper quantifies proximal I/O performance and proposes a simple data layout engine that uses a flash memory-based write cache to aggregate random updates until they have sufficient density to exploit proximal I/O. The results show that with cache of just 1% of the overall disk-based storage capacity, it is possible to service 5.3 user I/O requests per revolution for random updates workload. On an aged file system, the layout can sustain serial read bandwidth within 3% of the best case. Despite using flash memory, the overall system cost is just one third of that of a system with the requisite number of spindles to achieve the equivalent number of random I/O operations.

In Proceedings of the USENIX Conference on File and Storage Technologies 2011 (FAST ’11)

Resources

  • A copy of the paper is attached to this posting.

ProximalIO.pdf

Discovery of Application Workloads from Network File Traces

fast10_button.jpg Neeraja J. Yadwadkar, Chiranjib Bhattacharyya, K. Gopinath, Thirumale Niranjan, and Sai Susarla.

In this paper, we describe a trace analysis methodology based on Profile Hidden Markov Models.

An understanding of application I/O access patterns is useful in several situations. First, gaining insight into what applications are doing with their data at a semantic level helps in designing efficient storage systems. Second, it helps create benchmarks that mimic realistic application behavior closely. Third, it enables autonomic systems as the information obtained can be used to adapt the system in a closed loop.

All these use cases require the ability to extract the application-level semantics of I/O operations. Methods such as modifying application code to associate I/O operations with semantic tags are intrusive. It is well known that network file system traces are an important source of information that can be obtained non-intrusively and analyzed either online or offline. These traces are a sequence of primitive file system operations and their parameters. Simple counting, statistical analysis or deterministic search techniques are inadequate for discovering application-level semantics in the general case, because of the inherent variation and noise in realistic traces.

In this paper, we describe a trace analysis methodology based on Profile Hidden Markov Models. We show that the methodology has powerful discriminatory capabilities that enable it to recognize applications based on the patterns in the traces, and to mark out regions in a long trace that encapsulate sets of primitive operations that represent higher-level application actions. It is robust enough that it can work around discrepancies between training and target traces such as in length and interleaving with other operations. We demonstrate the feasibility of recognizing patterns based on a small sampling of the trace, enabling faster trace analysis. Preliminary experiments show that the method is capable of learning accurate profile models on live traces in an online setting. We present a detailed evaluation of this methodology in a UNIX environment using NFS traces of selected commonly used applications such as compilations as well as on industrial strength benchmarks such as TPCC and Postmark, and discuss its capabilities and limitations in the context of the use cases mentioned above.

In Proceedings of the USENIX Conference on File and Storage Technologies 2010 (FAST ’10)

Resources

  • A copy of the paper is attached to this posting.

workloads-yadwadkar.pdf

Tracking Back References in a Write-Anywhere File System

fast10_button.jpgPeter Macko, Margo Seltzer, and Keith A. Smith.

In this paper, we present Backlog, an efficient implementation of explicit back references, to address the problem of complications introduced by advanced file system features.

Many file systems reorganize data on disk, for example to defragment storage, shrink volumes, or migrate data between different classes of storage. Advanced file system features such as snapshots, writable clones, and deduplication make these tasks complicated, as moving a single block may require finding and updating dozens, or even hundreds, of pointers to it.

We present Backlog, an efficient implementation of explicit back references, to address this problem. Back references are file system meta-data that map physical block numbers to the data objects that use them. We show that by using LSM-Trees and exploiting the write-anywhere behavior of modern file systems such as NetApp® WAFL® or btrfs, we can maintain back reference meta-data with minimal overhead (one extra disk I/O per 102 block operations) and provide excellent query performance for the common case of queries covering ranges of physically adjacent blocks.

In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’10)

Resources

  • A copy of the paper is attached to this posting.

tracking-fast10.pdf

Storage and Network Deduplication Technologies

fast10_button.jpgMichael Condict

This tutorial provided a detailed look at the multitude of ways that deduplication can be used to improve the efficiency of storage and networking devices.

A Tutorial Presented at the USENIX Conference on File and Storage Technologies 2010 (FAST ’10)

Abstract

Economic and environmental concerns are currently motivating a push across the computing industry to do more with less: less energy and less money. Deduplication of data is one of the most effective tools to accomplish this. Removing redundant copies of stored data reduces hardware requirements, lowering capital expenses and using less power. Avoiding sending the same data repeatedly across a network increases the effective bandwidth of the link, reducing networking expenses.

This tutorial provided a detailed look at the multitude of ways deduplication can be used to improve the efficiency of storage and networking devices. It consisted of two parts.

The first part introduced the basic concepts of deduplication and compared it to the related technique of file compression. A taxonomy of basic deduplication techniques was covered, including the unit of deduplication (file, block, or variable-length segment), the deduplication scope (file system, storage system, or cluster), in-line vs. background deduplication, trusted fingerprints, and several other design choices. The relative merits of each were analyzed.

The second part discussed advanced techniques, such as the use of fingerprints other than a content hash to uniquely identify data, techniques for deduplicating across a storage cluster, and the use of deduplication within a client-side cache.

 

Understanding Customer Problem Troubleshooting from Storage System Logs

fast09_button.jpgWeihang Jiang, Chongfeng Hu, Shankar Pasupathy, Arkady Kanevsky, Zhenmin Li, and Yuanyuan Zhou.

This paper makes two major contributions to better understand customer problem troubleshooting.

Customer problem troubleshooting has been a critically important issue for both customers and system providers. This paper makes two major contributions to better understand this topic.

First, it provides one of the first characteristic studies of customer problem troubleshooting using a large set (636,108) of real world customer cases reported from 100,000 commercially deployed storage systems in the last two years. We study the characteristics of customer problem troubleshooting from various dimensions as well as correlation among them. Our results show that while some failures are either benign, or resolved automatically, many others can take hours or days of manual diagnosis to fix. For modern storage systems, hardware failures and misconfigurations dominate customer cases, but software failures take longer time to resolve. Interestingly, a relatively significant percentage of cases are because customers lack sufficient knowledge about the system. We observe that customer problems with attached system logs are invariably resolved much faster than those without logs.

Second, we evaluate the potential of using storage system logs to resolve these problems. Our analysis shows that a failure message alone is a poor indicator of root cause, and that combining failure messages with multiple log events can improve low-level root cause prediction by a factor of three. We then discuss the challenges in log analysis and possible solutions.

In Proceedings of the USENIX Conference on File and Storage Technologies 2009 (FAST ’09)

Resources

  • A copy of the paper is attached to this posting.

troubleshooting-fast09.pdf

Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems

fast09_button.jpgAndrew W. Leung, Minglong Shao, Timothy Bisson, Shankar Pasupathy, and Ethan L. Miller.

Spyglass is a search system that provides fast and complex searches over large-scale file metadata by exploiting metadata search properties.

The scale of today’s storage systems has made it increasingly difficult to find and manage files. To address this, we have developed Spyglass, a file metadata search system that is specially designed for large-scale storage systems. Using an optimized design, guided by an analysis of real-world metadata traces and a user study, Spyglass allows fast, complex searches over file metadata to help users and administrators better understand and manage their files.

Spyglass achieves fast, scalable performance through the use of several novel metadata search techniques that exploit metadata search properties. Flexible index control is provided by an index partitioning mechanism that leverages namespace locality. Signature files are used to significantly reduce a query’s search space, improving performance and scalability. Snapshot-based metadata collection allows incremental crawling of only modified files. A novel index versioning mechanism provides both fast index updates and “back-in-time” search of metadata. An evaluation of our Spyglass prototype using our real-world, large-scale metadata traces shows search performance that is 1-4 orders of magnitude faster than existing solutions. The Spyglass index can quickly be updated and typically requires less than 0.1% of disk space. Additionally, metadata collection is up to 10× faster than existing approaches.

In Proceedings of the USENIX Conference on File and Storage Technologies 2009 (FAST ’09)

Resources

  • A copy of the paper is attached to this posting.

spyglass-leung-fast2009.pdf

CA-NFS: A Congestion-Aware Network File System

fast09_button.jpgAlexandros Batsakis, Randal Burns, Arkady Kanevsky, James Lentini, and Thomas Talpey.

This paper presents a holistic framework for adaptively scheduling asynchronous requests in distributed file systems.

We develop a holistic framework for adaptively scheduling asynchronous requests in distributed file systems. The system is holistic in that it manages all resources, including network bandwidth, server I/O, server CPU, and client and server memory utilization. It accelerates, defers, or cancels asynchronous requests in order to improve application-perceived performance directly. We employ congestion pricing via online auctions to coordinate the use of system resources by the file system clients so that they can detect shortages and adapt their resource usage. We implement our modifications in the Congestion-Aware Network File System (CA-NFS), an extension to the ubiquitous network file system (NFS). Our experimental result shows that CA-NFS results in a 20% improvement in execution times when compared with NFS for a variety of workloads.

Award_icon.jpg

Best Paper Award

In Proceedings of the USENIX Conference on File and Storage Technologies 2009 (FAST ’09)

Resources

 
ca-nfs.pdf