Tag Archives: usenix

Measurement and Analysis of Large-Scale Network File System Workloads

usenix08_button.jpgAndrew W. Leung, Shankar Pasupathy, Garth Goodson, and Ethan L. Miller.

In this paper, we present the analysis of two large-scale network file system workloads.

In this paper we present the analysis of two large-scale network file system workloads. We measured CIFS traffic for two enterprise-class file servers deployed in the NetApp data center for a three month period. One file server was used by marketing, sales, and finance departments and the other by the engineering department. Together these systems represent over 22TB of storage used by over 1500 employees, making this the first ever large-scale study of the CIFS protocol.

We analyzed how our network file system workloads compared to those of previous file system trace studies and took an in-depth look at access, usage, and sharing patterns. We found that our workloads were quite different from those previously studied; for example, our analysis found increased read-write file access patterns, decreased read-write ratios, more random file access, and longer file lifetimes. In addition, we found a number of interesting properties regarding file sharing, file re-use, and the access patterns of file types and users, showing that modern file system workload has changed in the past 5–10 years. This change in workload characteristics has implications on the future design of network file systems, which we describe in the paper.

In Proceedings of the USENIX Annual Technical Conference 2008 (USENIX ’08)

Resources

  • A copy of the paper is attached to this posting.

largescale-usenix08.pdf

FlexVol: Flexible, Efficient File Volume Virtualization in WAFL

usenix08_button.jpgJ.K. Edwards, D. Ellard, C. Everhart, R. Fair, E. Hamilton, A. Kahn, A. Kanevsky, J. Lentini, A. Prakash, K.A. Smith, and E. Zayas.

We present the basic architecture of FlexVol volumes, including performance optimizations, and also describe the new features enabled by this architecture.

Virtualization is a well-known method of abstracting physical resources and of separating the manipulation and use of logical resources from their underlying implementation. We have used this technique to virtualize file volumes in the WAFL® file system, adding a level of indirection between client-visible volumes and the underlying physical storage. The resulting virtual file volumes, or FlexVol® volumes, are managed independent of lower storage layers. Multiple volumes can be dynamically created, deleted, resized, and reconfigured within the same physical storage container.

We also exploit this new virtualization layer to provide several powerful new capabilities. We have enhanced SnapMirror®, a tool for replicating volumes between storage systems, to remap storage allocation during transfer, thus optimizing disk layout for the destination storage system. FlexClone® volumes provide writable Snapshot® copies, using a FlexVol volume backed by a Snapshot copy of a different volume. FlexVol volumes also support thin provisioning; a FlexVol volume can have a logical size that exceeds the available physical storage. FlexClone volumes and thin provisioning are a powerful combination, as they allow the creation of light-weight copies of live data sets while consuming minimal storage resources.

We present the basic architecture of FlexVol volumes, including performance optimizations that decrease the overhead of our new virtualization layer. We also describe the new features enabled by this architecture. Our evaluation of FlexVol performance shows that it incurs only a minor performance degradation compared with traditional, nonvirtualized WAFL volumes. On the industry-standard SPEC SFS benchmark, FlexVol volumes exhibit less than 4% performance overhead, while providing all the benefits of virtualization.

In Proceedings of the USENIX Annual Technical Conference 2008 (USENIX ’08)

Resources

  • A copy of the paper is attached to this posting.

flexvol_usenix08.pdf

POTSHARDS: Secure Long-Term Storage Without Encryption

atc07_button.jpgMark W. Storer, Kevin M. Greenan, Ethan L. Miller, and Kaladhar Voruganti.

POTSHARDS is an archival storage system that provides long-term recoverable security for data with very long lifetimes by using provably secure secret splitting.

Users are storing ever-increasing amounts of information digitally, driven by many factors including government regulations and the public’s desire to digitally record their personal histories. Unfortunately, many of the security mechanisms that modern systems rely upon, such as encryption, are poorly suited for storing data for indefinitely long periods of time—it is very difficult to manage keys and update cryptosystems to provide secrecy through encryption over periods of decades. Worse, an adversary who can compromise an archive need only wait for cryptanalysis techniques to catch up to the encryption algorithm used at the time of the compromise in order to obtain “secure” data.

To address these concerns, we have developed POTSHARDS, an archival storage system that provides long-term security for data with very long lifetimes without using encryption. Secrecy is achieved by using provably secure secret splitting and spreading the resulting shares across separately-managed archives. Providing availability and data recovery in such a system can be difficult; thus, we use a new technique, approximate pointers, in conjunction with secure distributed RAID techniques to provide availability and reliability across independent archives. To validate our design, we developed a prototype POTSHARDS implementation, which has demonstrated “normal” storage and retrieval of user data using indexes, the recovery of user data using only the pieces a user has stored across the archives and the reconstruction of an entire failed archive.

In Proceedings of the USENIX Annual Technical Conference 2007 (USENIX ’07)

Resources

  • A copy of the paper is attached to this posting.

storer2007potshards.pdf

SnapMirror®: File System Based Asynchronous Mirroring for Disaster Recovery

 

Hugo Patterson, Stephen Manley, Mike Federwisch, Dave Hitz, Steve Kleiman, and Shane Owara.

We present SnapMirror, an asynchronous mirroring technology that leverages file system snapshots to ensure the consistency of the remote mirror and optimize data transfer.

Computerized data has become critical to the survival of an enterprise. Companies must have a strategy for recovering their data should a disaster such as a fire destroy the primary data center. Current mechanisms offer data managers a stark choice: rely on affordable tape but risk the loss of a full day of data and face many hours or even days to recover, or have the benefits of a fully synchronized on-line remote mirror, but pay steep costs in both write latency and network bandwidth to maintain the mirror. In this paper, we argue that asynchronous mirroring, in which batches of updates are periodically sent to the remote mirror, can let data managers find a balance between these extremes. First, by eliminating the write latency issue, asynchrony greatly reduces the performance cost of a remote mirror. Second, by storing up batches of writes, asynchronous mirroring can avoid sending deleted or overwritten data and thereby reduce network bandwidth requirements. Data managers can tune the update frequency to trade network bandwidth against the potential loss of more data. We present SnapMirror, an asynchronous mirroring technology that leverages file system snapshots to ensure the consistency of the remote mirror and optimize data transfer. We use traces of production filers to show that even updating an asynchronous mirror every 15 minutes can reduce data transferred by 30% to 80%. We find that exploiting file system knowledge of deletions is critical to achieving any reduction for no-overwrite file systems such as WAFL and LFS. Experiments on a running system show that using file system metadata can reduce the time to identify changed blocks from minutes to seconds compared to purely logical approaches. Finally, we show that using SnapMirror to update every 30 minutes increases the response time of a heavily loaded system only 22%.

In Proceedings of the USENIX Conference on File and Storage Technologies 2002 (FAST ’02)

Resources

  • A copy of the paper is attached to this posting.

snapmirror-fast02.pdf

File System Design for an NFS File Server Appliance

 

Dave Hitz, James Lau, and Michael Malcolm.

This paper describes WAFL (Write Anywhere File Layout) and how WAFL uses Snapshots to eliminate the need for file system consistency checking after an unclean shutdown.

Network Appliance Corporation recently began shipping a new kind of network server called an NFS file server appliance, which is a dedicated server whose sole function is to provide NFS file service. The file system requirements for an NFS appliance are different from those for a general-purpose UNIX system, both because an NFS appliance must be optimized for network file access and because an appliance must be easy to use.

This paper describes WAFL (Write Anywhere File Layout), which is a file system designed specifically to work in an NFS appliance. The primary focus is on the algorithms and data structures that WAFL uses to implement Snapshots™, which are read-only clones of the active file system. WAFL uses a copy-on-write technique to minimize the disk space that Snapshots consume. This paper also describes how WAFL uses Snapshots to eliminate the need for file system consistency checking after an unclean shutdown.

In Proceedings of the USENIX Winter 1994 Technical Conference

Resources

  • A copy of the paper is attached to this posting.
  • A copy of a technical report (better formatting of the paper) is attached to this posting

FILE_SYSTEM_DESIGN_USENIX94.pdf

file-system-design.pdf