Tag Archives: mascots

ChewAnalyzer: Workload-Aware Data Management Across Differentiated Storage Pools

Xiongzi Ge, NetApp, Inc.; Xuchao Xie, NUDT; David H.C. Du, University of Minnesota; Pradeep Ganesan, NetApp, Inc.; Dennis Hahn, NetApp, Inc.;

the 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2018)

September 25 – 28, 2018
Milwaukee, Wisconsin, US

In multi-tier storage systems, moving data from one tier to the next can be inefficient. And because each type of storage device has its own idiosyncrasies with respect to the workloads that it can best support, unnecessary data movement might result. In this paper, we explore a fully connected storage architecture in which data can move from any storage pool to another. We propose a Chunk-level storage-aware workload Analyzer framework, abbreviated as ChewAnalyzer, to facilitate efficient data placement. Access patterns are characterized in a flexible way by a collection of I/O accesses to a data chunk. ChewAnalyzer employs a Hierarchical Classifier [30] to analyze the chunk patterns step by step. In each classification step, the Chunk Placement Recommender suggests new data placement policies according to the device properties. Based on the analysis of access pattern changes, the Storage Manager can adequately distribute or migrate the data chunks across different storage pools. Our experimental results show that ChewAnalyzer improves the initial data placement and that it migrates data into the proper pools directly and efficiently.

Resources

A model-based approach to streamlining distributed training for asynchronous SGD

Sung-Han Lin, NetApp; Marco Paolieri, University of Southern California; Cheng-Fu Chou, National Taiwan University; Leana Golubchik, University of Southern California

the 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2018)

September 25 – 28, 2018
Milwaukee, Wisconsin, US

The success of Deep Neural Networks (DNNs) has created significant interest in the development of software tools,
hardware architectures, and cloud systems to meet the huge computational demand of their training jobs. A common approach to speeding up an individual job is to distribute training data and computation among multiple nodes, periodically exchanging intermediate results. In this paper, we address two important problems for the application of this strategy to large-scale clusters and multiple, heterogeneous jobs. First, we propose and validate a queueing model to estimate the throughput of a training job as a function of the number of nodes assigned to the job; this model targets asynchronous Stochastic Gradient Descent (SGD), a popular strategy for distributed training, and requires only data from quick, two-node profiling in addition to job characteristics (number of requested training epochs, mini-batch size, size of DNN parameters, assigned bandwidth). Throughput estimations are then used to explore several classes of scheduling heuristics to reduce response time in a scenario where heterogeneous jobs are continuously submitted to a large-scale cluster. These scheduling algorithms dynamically select which jobs to run and how many nodes to assign to each job, based on different trade-offs between service time reduction and efficiency (e.g., speedup per additional node). Heuristics are evaluated through extensive simulations of realistic DNN workloads, also investigating the effects of early termination, a common scenario for DNN training jobs.

Resources

WORMStore: A Specialized Object Store for Write-Once Read-Many Workloads

mascots14Srinivasan Narayanamurthy, Kartheek Muthyala and Gaurav Makkar.

The recent increase in interest for batch analytics has resulted in extensive use of distributed frameworks such as Hadoop and Dryad. Batch analytics—as the name suggests, perform many computations on large volumes of data.

That is, large quantities of data are ingested once and read many times mostly in large chunks, which is characterized as write-once read- many (WORM) workload. The storage part of these distributed frameworks (say, HDFS in Hadoop) use file systems such as ext4 or XFS as native object stores to store objects as files in individual nodes of the distributed system. These general purpose file systems were designed with broader goals such as POSIX-compliance, optimal performance for a wide range of file size, user friendliness, etc. However, most of these features are not required for a native object store in distributed file systems.

WORMStore is a light weight object store that is designed exclusively for use in distributed systems for WORM workload. WORMStore provides interesting advantages such as the ability to prefetch large objects, small metadata to data ratio, media aware data/metadata placement, etc. As WORMStore is log-structured, it provides the ability to recover upon failure. Our experiments show that WORMStore provides a 28% increase in the read throughput per node in a Hadoop cluster.

  • The author’s version of the paper is attached to this posting. Please observe the following copyright:

© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • The definitive version of the paper can be found at:

Nakshatra: Towards running batch analytics on an archive

mascots14Atish Kathpal and Giridhar Yasa 

Long term retention of data has become a norm for
reasons like compliance and data preservation for future needs.
With storage media continuing to become cheaper, this trend has
further strengthened and is testified with introduction of archival
solutions like Amazon Glacier and Spectra Logic BlackPearl.

On the other hand, analytics and big data have become key enablers
for business and research. However, analytics and archiving
happens on separate storage silos. This generates additional costs
and inefficiencies when part of archived data needs to be
analyzed using batch analytics platforms like Hadoop because a)
We need additional storage for data transferred from archive to
analytics tier and b) Transfer time costs are incurred due to data
migration to analytics tier. Moreover, accessing archived data
has high times to first byte, as much of the data is stored in
offline media like tapes or spun down disks. We introduce
Nakshatra, a data processing framework to run analytics directly
on an archive based on offline media. To the best of our
knowledge, this is the first work of its kind available in literature.
We leverage batched pre-fetching and scheduling techniques for
improved retrieval of data and scalable analytics on archives.
Our preliminary evaluation shows Nakshatra to be upto 81%
faster than the traditional ingest-then-compute workflow for
archived data.

  • The author’s version of the paper is attached to this posting. Please observe the following copyright:
  • © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

    • The definitive version of the paper can be found at:

    Cooperative Storage-Level De-Duplication for I/O Reduction in Virtualized Data Centers

    Min Li, Shravan Gaonkar, Ali R. Butt, Deepak Kenchammana, and Kaladhar Voruganti.

    This paper explores the synergy between the two layers of storage and server virtualization to exploit block sharing information.

    Data centers are increasingly being re-designed for workload consolidation in order to reap the benefits of better resource utilization, power savings, and physical space savings. Among the forces driving savings are server and storage virtualization technologies. As more consolidated workloads are concentrated on physical machines—e.g., the virtual density is already very high in virtual desktop environments, and will be driven to unprecedented levels with the fast growing high-core counts of physical servers—the shared storage layer must respond with virtualization innovations of its own such as de-duplication and thin provisioning. A key insight of this paper is that there is a greater synergy between the two layers of storage and server virtualization to exploit block sharing information than was previously thought possible. We reveal this via developing a systematic framework to explore the storage and virtualization servers interactions.We also quantitatively evaluate the I/O bandwidth and latency reduction that is possible between virtual machine hosts and storage servers using real-world trace driven simulation. Moreover, we present a proof of concept NFS implementation that incorporates our techniques to quantify their I/O latency benefits.

    In Proceedings of the IEEE 20th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems 2012 (MASCOTS ’12)

    Resources

    • The author’s version of the paper is attached to this posting. Please observe the following copyright:

    © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

    IO-Reduction-mascots2012.pdf