Tag Archives: hotstorage12

MixApart: Decoupled Analytics for Shared Storage Systems

fedweek12_badge_125x125.pngMadalin Mihailescu, Gokul Soundararajan, and Cristiana Amza.

In this paper, we introduce MixApart, a scalable data processing framework for shared enterprise storage systems.

Data analytics and enterprise applications have very different storage functionality requirements. For this reason, enterprise deployments of data analytics are on a separate storage silo. This may generate additional costs and inefficiencies in data management, e.g., whenever data needs to be archived, copied, or migrated across silos. We introduce MixApart, a scalable data processing framework for shared enterprise storage systems. With MixApart, a single consolidated storage back-end manages enterprise data and services all types of workloads, thereby lowering hardware costs and simplifying data management. In addition, MixApart enables the local storage performance required by analytics through an integrated data caching and scheduling solution. Our preliminary evaluation shows that MixApart can be 45% faster than the traditional ingest-then-compute workflow used in enterprise IT analytics, while requiring one third of storage capacity when compared to HDFS.

In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems 2012 (HotStorage ’12)

Resources

mixapart-hotstorage12.pdf

Analyzing Compute vs. Storage Tradeoff for Video-aware Storage Efficiency

fedweek12_badge_125x125.pngAtish Kathpal, Mandar Kulkarni, and Ajay Bakre.

In this paper, we develop cost metrics that allow us to compare storage vs. compute costs and suggest when a transcoding on-the-fly solution can be cost-effective.

Video content is quite unique from its storage footprint perspective. In a video distribution environment, a master video file needs to be transcoded into different resolutions, bitrates, codecs and containers to enable distribution to a wide variety of devices and media players over different kinds of networks. Our experiments show that when 8 master videos are transcoded into most popular 376 formats (derived from 8 resolutions and 6 containers), transcoded versions occupy 8 times more storage than the master video. One major challenge with efficiently storing such content is that traditional de-duplication algorithms cannot detect significant duplication between any 2 versions. Transcoding on-the-fly is a technique in which a distribution copy is created only when requested by a user. This technique saves storage but at the expense of extra compute cost and latency resulting from transcoding after a user request is received. In this paper we develop cost metrics that allow us to compare storage vs. compute costs and suggest when a transcoding on-the-fly solution can be cost effective. We also analyze how such a solution can be deployed in a practical storage system using access pattern information or a variant of ski-rent [1] online algorithm when such information is not available.

In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems 2012 (HotStorage ’12)

Resources

efficiency-hotstorage12.pdf