All posts by ATGEditor

Data ONTAP GX: A Scalable Storage Cluster

fast07_button.jpgMichael Eisler, Peter Corbett, Michael Kazar, Daniel S. Nydick, and J. Christopher Wagner.

This paper presents Data ONTAP GX, a clustered Network Attached File server that is composed of a number of cooperating filers.

Data ONTAP GX is a clustered Network Attached File server composed of a number of cooperating filers. Each filer manages its own local file system, which consists of a number of disconnected flexible volumes. A separate namespace infrastructure runs within the cluster, which connects the volumes into one or more namespaces by means of internal junctions. The cluster collectively exposes a potentially large number of separate virtual servers, each with its own independent namespace, security and administrative domain. The cluster implements a protocol routing and translation layer which translates requests in all incoming file protocols into a single unified internal file access protocol called SpinNP. The translated requests are then forwarded to the correct filer within the cluster for servicing by the local file system instance. This provides data location transparency, which is used to support transparent data migration, load balancing, mirroring for load sharing and data protection, and fault tolerance. The cluster itself greatly simplifies the administration of a large number of filers by consolidating them into a single system image. Results from benchmarks (over one million file operations per second on a 24 node cluster) and customer experience demonstrate linear scaling.

In Proceedings of the USENIX Conference on File and Storage Technologies 2007 (FAST ’07)

Resources

  • A copy of the paper is attached to this posting.

GX-fast2007.pdf

Enhancing the Linux Memory Architecture to Support File Systems over Heterogeneous Devices

Alexandros Batsakis, Randal Burns, Thomas Talpey, Arkady Kanevsky, and James Lentini.

This position paper was selected for presentation at the invitation-only Linux Storage and File System Workshop.

The Linux kernel must deal with the ever-growing performance heterogeneity of network and I/O devices. In a heterogeneous environment a single, policy-based framework for memory management does not provide good write performance to all storage resources. Currently, Linux treats all memory pages uniformly without considering the capabilities of the underlying device.

New implementations of traditional file sharing mechanisms such as zero-copy NFS over RDMA make this problem more apparent. We have conducted a series of experiments using NFS over RDMA that show that write throughput for cached I/O lags far behind the available bandwidth. This is an indication that the current memory management scheme is not optimized for low-latency, high-bandwidth interconnects such as Infiniband or 10Gb Ethernet. Although the memory manager can be bypassed via direct I/O, this solution requires changes to existing applications and often yields lower system performance because it loses the benefit caching.

In Proceedings of Linux Storage and Filesystem Workshop 2007

Resources

  • A copy of the paper is attached to this posting.


Linux-memory-lsf07.pdf

Making enterprise storage more search-friendly

Shankar Pasupathy, Garth Goodson, and Vijayan Prabhakaran.

This work explores the types of APIs that a storage system can expose to a search engine to better enable it to do its job.

The focus of this work is to determine how to enhance storage systems to make search and indexing faster and better able to produce relevant answers. Enterprise search engines often run in appliances that must access the file system through standard network file system protocols (NFS, CIFS). As such, they are not able to take advantage of features that may be offered by the storage system. This work explores the types of APIs that a storage system can expose to a search engine to better enable it to do its job. We make the case that by exposing certain information we can make search faster and more relevant.

In Proceedings of the ACM Symposium on Operating Systems Principles 2005 (SOSP ’05)

Resources

Row-Diagonal Parity for Double Disk Failure Correction

 

Peter Corbett, Bob English, Atul Goel, Tomislav Grcanac, Steven Kleiman, James Leong, and Sunitha Sankar.

This paper introduces Row-Diagonal Parity (RDP), a new algorithm for protecting against double disk failures.

Row-Diagonal Parity (RDP) is a new algorithm for protecting against double disk failures. It stores all data unencoded, and uses only exclusive-or operations to compute parity. RDP is provably optimal in computational complexity, both during construction and reconstruction. Like other algorithms, it is optimal in the amount of redundant information stored and accessed. RDP works within a single stripe of blocks of sizes normally used by file systems, databases and disk arrays. It can be utilized in a fixed (RAID-4) or rotated (RAID-5) parity placement style. It is possible to extend the algorithm to encompass multiple RAID-4 or RAID-5 disk arrays in a single RDP disk array. It is possible to add disks to an existing RDP array without recalculating parity or moving data. Implementation results show that RDP performance can be made nearly equal to single parity RAID-4 and RAID-5 performance.

Award_icon.jpgBest Paper Award

 

In Proceedings of the USENIX Conference on File and Storage Technologies 2004 (FAST ’04)

Resources

  • A copy of the paper is attached to this posting.

rdp-fast04.pdf

Implementation and Analysis of the User Direct Access Programming Library

James Lentini, Vu Pham, Steven Sears, and Randall Smith.

In this paper, we evaluate the uDAPL interface and share our experiences developing an open source implementation using InfiniBand adapters.

The User Direct Access Programming Library (uDAPL) is a generic application programming interface (API) for network adapters capable of remote direct memory access (RDMA). The uDAPL interface allows user space applications to work with RDMA adapters using a platform and transport independent API. The uDAPL interface has been proposed for use in clustering, distributed systems, and network file systems.

In this paper we evaluate the uDAPL interface and share our experiences developing an open source implementation using InfiniBand adapters.

In Proceedings of the Second Workshop on Novel Uses of System Area Networks 2003 (Co-located with the Ninth International Symposium on High Performance Computer Architecture HPCA-9)

Resources

  • A copy of the paper is attached to this posting.


udapl_san2.pdf

SnapMirror®: File System Based Asynchronous Mirroring for Disaster Recovery

 

Hugo Patterson, Stephen Manley, Mike Federwisch, Dave Hitz, Steve Kleiman, and Shane Owara.

We present SnapMirror, an asynchronous mirroring technology that leverages file system snapshots to ensure the consistency of the remote mirror and optimize data transfer.

Computerized data has become critical to the survival of an enterprise. Companies must have a strategy for recovering their data should a disaster such as a fire destroy the primary data center. Current mechanisms offer data managers a stark choice: rely on affordable tape but risk the loss of a full day of data and face many hours or even days to recover, or have the benefits of a fully synchronized on-line remote mirror, but pay steep costs in both write latency and network bandwidth to maintain the mirror. In this paper, we argue that asynchronous mirroring, in which batches of updates are periodically sent to the remote mirror, can let data managers find a balance between these extremes. First, by eliminating the write latency issue, asynchrony greatly reduces the performance cost of a remote mirror. Second, by storing up batches of writes, asynchronous mirroring can avoid sending deleted or overwritten data and thereby reduce network bandwidth requirements. Data managers can tune the update frequency to trade network bandwidth against the potential loss of more data. We present SnapMirror, an asynchronous mirroring technology that leverages file system snapshots to ensure the consistency of the remote mirror and optimize data transfer. We use traces of production filers to show that even updating an asynchronous mirror every 15 minutes can reduce data transferred by 30% to 80%. We find that exploiting file system knowledge of deletions is critical to achieving any reduction for no-overwrite file systems such as WAFL and LFS. Experiments on a running system show that using file system metadata can reduce the time to identify changed blocks from minutes to seconds compared to purely logical approaches. Finally, we show that using SnapMirror to update every 30 minutes increases the response time of a heavily loaded system only 22%.

In Proceedings of the USENIX Conference on File and Storage Technologies 2002 (FAST ’02)

Resources

  • A copy of the paper is attached to this posting.

snapmirror-fast02.pdf

Logical vs. Physical File System Backup

OSDI99.gifNorman C. Hutchinson, Stephen Manley, Mike Federwisch, Guy Harris, Dave Hitz, Steven Kleiman, and Sean O’Malley.

This paper compares logical and physical backup strategies in large file systems.

As file systems grow in size, ensuring that data is safely stored becomes more and more difficult. Historically, file system backup strategies have focused on logical backup where files are written in their entirety to the backup media. An alternative is physical backup where the disk blocks that make up the file system are written to the backup media. This paper compares logical and physical backup strategies in large file systems. We discuss the advantages and disadvantages of the two approaches, and conclude by showing that while both can achieve good performance, physical backup and restore can achieve much higher throughput while consuming less CPU. In addition, physical backup and restore is much more capable of scaling its performance as more devices are added to a system.

In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation 1999 (OSDI ’99)

Resources

  • A copy of the paper is attached to this posting.

logical-backup-osdi99.pdf

File System Design for an NFS File Server Appliance

 

Dave Hitz, James Lau, and Michael Malcolm.

This paper describes WAFL (Write Anywhere File Layout) and how WAFL uses Snapshots to eliminate the need for file system consistency checking after an unclean shutdown.

Network Appliance Corporation recently began shipping a new kind of network server called an NFS file server appliance, which is a dedicated server whose sole function is to provide NFS file service. The file system requirements for an NFS appliance are different from those for a general-purpose UNIX system, both because an NFS appliance must be optimized for network file access and because an appliance must be easy to use.

This paper describes WAFL (Write Anywhere File Layout), which is a file system designed specifically to work in an NFS appliance. The primary focus is on the algorithms and data structures that WAFL uses to implement Snapshots™, which are read-only clones of the active file system. WAFL uses a copy-on-write technique to minimize the disk space that Snapshots consume. This paper also describes how WAFL uses Snapshots to eliminate the need for file system consistency checking after an unclean shutdown.

In Proceedings of the USENIX Winter 1994 Technical Conference

Resources

  • A copy of the paper is attached to this posting.
  • A copy of a technical report (better formatting of the paper) is attached to this posting

FILE_SYSTEM_DESIGN_USENIX94.pdf

file-system-design.pdf