Tag Archives: fast08

Parity Lost and Parity Regained

usenix08_button.jpgA. Krioukov, L.N. Bairavasundaram, G.R. Goodson, K. Srinivasan, R. Thelen, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau.

This paper uses model checking to evaluate mechanisms used in parity-based RAID systems to protect against a single sector error or a block corruption, and identifies additional protection measures necessary to handle all errors.

RAID storage systems protect data from storage errors, such as data corruption, using a set of one or more integrity techniques, such as checksums. The exact protection offered by certain techniques or a combination of techniques is sometimes unclear. We introduce and apply a formal method of analyzing the design of data protection strategies. Specifically, we use model checking to evaluate whether common protection techniques used in parity-based RAID systems are sufficient in light of the increasingly complex failure modes of modern disk drives. We evaluate the approaches taken by a number of real systems under single-error conditions, and find flaws in every scheme. In particular, we identify a parity pollution problem that spreads corrupt data (the result of a single error) across multiple disks, thus leading to data loss or corruption. We further identify which protection measures must be used to avoid such problems. Finally, we show how to combine real-world failure data with the results from the model checker to estimate the actual likelihood of data loss of different protection strategies.

In Proceedings of the USENIX Conference on File and Storage Technologies 2008 (FAST ’08)

Resources

  • A copy of the paper is attached to this posting.

parity-fast08.pdf

Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage

usenix08_button.jpgMark W. Storer, Kevin M. Greenan, Ethan L. Miller, and Kaladhar Voruganti.

Pergamum is a distributed network of intelligent, disk-based storage appliances that stores data reliably and energy-efficiently.

As the world moves to digital storage for archival purposes, there is an increasing demand for reliable, low-power, cost-effective, easy-to-maintain storage that can still provide adequate performance for information retrieval and auditing purposes. Unfortunately, no current archival system adequately fulfills all of these requirements. Tape-based archival systems suffer from poor random access performance, which prevents the use of inter-media redundancy techniques and auditing, and requires the preservation of legacy hardware. Many disk-based systems are ill-suited for long-term storage because their high energy demands and management requirements make them cost-ineffective for archival purposes.

Our solution, Pergamum, is a distributed network of intelligent, disk-based, storage appliances that stores data reliably and energy-efficiently. While existing MAID systems keep disks idle to save energy, Pergamum adds NVRAM at each node to store data signatures, metadata, and other small items, allowing deferred writes, metadata requests and inter-disk data verification to be performed while the disk is powered off. Pergamum uses both intra-disk and inter-disk redundancy to guard against data loss, relying on hash tree-like structures of algebraic signatures to efficiently verify the correctness of stored data. If failures occur, Pergamum uses staggered rebuild to reduce peak energy usage while rebuilding large redundancy stripes. We show that our approach is comparable in both startup and ongoing costs to other archival technologies and provides very high reliability. An evaluation of our implementation of Pergamum shows that it provides adequate performance.

In Proceedings of the USENIX Conference on File and Storage Technologies 2008 (FAST ’08)

Resources

  • A copy of the paper is attached to this posting.

storer2008pergamum.pdf

Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics

usenix08_button.jpgWeihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky.

This paper analyzes the failure characteristics of storage subsystems, using storage logs from 39,000 commercial storage systems.

Building reliable storage systems becomes increasingly challenging as the complexity of modern storage systems continues to grow. Understanding storage failure characteristics is crucially important for designing and building a reliable storage system. While several recent studies have been conducted on understanding storage failures, almost all of them focus on the failure characteristics of one component – disks – and do not study other storage component failures.

This paper analyzes the failure characteristics of storage subsystems. More specifically, we analyzed the storage logs collected from about 39,000 storage systems commercially deployed at various customer  sites. The data set covers a period of 44 months and includes about 1,800,000 disks hosted in about 155,000 storage shelf enclosures. Our study reveals many interesting findings, providing useful guideline for designing reliable storage systems. Some of our major findings include: (1) In addition to disk failures that contribute to 20-55% of storage subsystem failures, other components such as physical interconnects and protocol stacks also account for significant percentages of storage subsystem failures. (2) Each individual storage subsystem failure type and storage subsystem failure as a whole exhibit strong self-correlations. In addition, these failures exhibit “bursty” patterns. (3) Storage subsystems configured with redundant interconnects experience 30-40% lower failure rates than those with a single interconnect. (4) Spanning disks of a RAID group across multiple shelves provides a more resilient solution for storage subsystems than within a single shelf.

In Proceedings of the USENIX Conference on File and Storage Technologies 2008 (FAST ’08)

Resources

  • A copy of the paper is attached to this posting.

dominant-fast08.pdf

SWEEPER: An Efficient Disaster Recovery Point Identification Mechanism

usenix08_button.jpgAkshat Verma, Kaladhar Voruganti, Ramani Routray, and Rohit Jain.

This paper presents a technique on how to automatically identify recovery points when trying to figure out which backup copy to use based on system events and user-specified RTO/RPO requirements.

Data corruption is one of the key problems that is on top of the radar screen of most CIOs. Continuous Data Protection (CDP) technologies help enterprises deal with data corruption by maintaining multiple versions of data and facilitating recovery by allowing an administrator restore to an earlier clean version of data. The aim of the recovery process after data corruption is to quickly traverse through the backup copies (old versions), and retrieve a clean copy of data. Currently, data recovery is an ad-hoc, time consuming and frustrating process with sequential brute force approaches, where recovery time is proportional to the number of backup copies examined and the time to check a backup copy for data corruption.

In this paper, we present the design and implementation of SWEEPER architecture and backup copy selection algorithms that specifically tackle the problem of quickly and systematically identifying a good recovery point. We monitor various system events and generate checkpoint records that help in quickly identifying a clean backup copy. The SWEEPER methodology dynamically determines the selection algorithm based on user specified recovery time and recovery point objectives, and thus, allows system administrators to perform trade-offs between recovery time and data currentness. We have implemented our solution as part of a popular Storage Resource Manager product and evaluated SWEEPER under many diverse settings. Our study clearly establishes the effectiveness of SWEEPER as a robust strategy to significantly reduce recovery time.

In Proceedings of the USENIX Conference on File and Storage Technologies 2008 (FAST ’08)

Resources

  • A copy of the paper is attached to this posting.

Sweeper-fast08.pdf

AWOL: An AdaptiveWrite Optimizations Layer

usenix08_button.jpgAlexandros Batsakis, Randal Burns, Arkady Kanevsky, James Lentini, and Thomas Talpey.

This paper presents I/O performance improvements from adaptively allocating memory between write buffering and read caching and opportunistically writing dirty pages.

Operating system memory managers fail to consider the population of read versus write pages in the buffer pool or outstanding I/O requests when writing dirty pages to disk or network file systems. This leads to bursty I/O patterns, which stall processes reading data and reduce the efficiency of storage. We address these limitations by adaptively allocating memory between write buffering and read caching and by writing dirty pages to disk opportunistically before the operating system submits them for write-back. We implement and evaluate our methods within the Linux® system and show performance gains of more than 30% for mixed read/write workloads.

In Proceedings of the USENIX Conference on File and Storage Technologies 2008 (FAST ’08)

Resources

  • A copy of the paper is attached to this posting.

awol_fast08.pdf

An Analysis of Data Corruption in the Storage Stack

usenix08_button.jpgL.N. Bairavasundaram, G.R. Goodson, B. Schroeder, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau.

This paper analyzes real-world data on the prevalence of data corruption due to storage stack components such as disk drives, and analyzes its characteristics such as dependence on disk-drive type, spatial and temporal locality, and correlation with workload.

An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand its characteristics. In this paper, we present the first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. We study three classes of corruption: checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches since they occur the most.

We find more than 400,000 instances of checksum mismatches over the 41-month period. We find many interesting trends among these instances including: i) nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise class disk drives, ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality, and iii) checksum mismatches across different disks in the same storage system are not independent. We use our observations to derive lessons for corruption-proof system design.

Best Student Paper Award

In Proceedings of the USENIX Conference on File and Storage Technologies 2008 (FAST ’08)

Resources

corruption-fast08.pdf