A Policy-based Architecture for Scalable Storage Systems
The complexity of modern storage systems continues to grow, making management of these systems a first-class concern. A storage system today may need to exploit the widely-varying characteristics of heterogeneous storage units to meet the simultaneous demands of many customers with differing requirements. In addition, desirable properties such as cost-effectiveness, scalability, reliability and power-efficiency may conflict with each other. A further challenge arises due to virtualization and other layers of indirection between applications and storage hardware, because application-level optimizations to exploit hardware features may not have the desired effect. Performance may be lost when the underlying physical layout does not match the assumptions made at the higher level. Worse, reliability may be reduced if an underlying deduplication system removes extra copies of data blocks that were deliberately replicated, such as critical file system metadata. Finally, existing management interfaces are not extensible, making it difficult to express novel policies. As a result, significant time and effort is spent designing, customizing, and maintaining storage solutions.
We argue that a scalable storage system must expose a more flexible mechanism for researchers or storage administrators to express the desired properties. We propose a policy-driven architecture that introduces extensibility and dynamism into the control plane of a data center’s storage system. Our proposed system consists of two parts: (1) a domain-specific policy language that allows the construction of sophisticated policies using both static and dynamic properties of the available storage devices and the storage requests; (2) an extension of the storage system’s control plane, capable of interpreting and enforcing these policies by monitoring the stream of requests and the dynamic characteristics of the storage devices. Existing work on policy-based storage management is mainly concerned with storage allocation or configuration, and focuses on static properties of the storage devices (e.g. capacity, cost, throughput, reliability). Our goal is to automatically manage the daily operation of the storage system, adjusting to changes in workload requests, power consumption, and load hotspots according to high-level policies.
Storage and Compute Provisioning Informed by Application Characteristics and SLOs
This research addresses the problem of provisioning infrastructure for large-scale distributed applications. The proposed approach, a tool called CProv, takes the characterizations of application demands and hardware capabilities as input to automatically select the most cost-effective storage and compute configurations that meet the SLO requirement of the application. The main objectives of this project are 1) characterizing application behavior, hardware, and performance requirements with sufficient accuracy and manageable complexity 2) solving the constraint optimization problem.
- Leverage expertise at NetApp to inform how the CProv tool should characterize its inputs—cluster building blocks, applications, and SLOs.
- CProv needs to predict both the architecture and scale of various storage and compute building blocks required to meet input SLOs. They plan to evaluate CProv’s ability to predict the right scale by comparing with other capacity planning tools.
- With CProv, we will study when inﬂection points occur in cluster architectures (e.g., scale out architectures aren’t always cost-effective for a given workload) so that administrators who provision clusters can appropriately plan for these inﬂections.
Automated Workload Mapping and SLO Creation
The use of consolidated storage systems requires deep understanding of performance requirements as well as suitable tools to manage such requirements for configuring such systems. As advanced features such as Quality-of-Service (QoS) are introduced, most customers will have difficulties to configure these features. This is mainly due to the fact, that state-of-the-art storage products integrate various kinds of devices and offer customers several interfaces to seamlessly access file system space in their infrastructure. Common features include high availability, overbooking, de-duplication, and even support for configuring priorities for achieving desired levels of Quality-of-Service for individual workloads sharing the same system. This results in multiple QoS criteria to be taken in account and to be optimized in order to guarantee proper operation of the consolidated storage environment.
However, the business interest does not lie in mere storage management, but in fulfilling the requirements of applications and higher-level services, from which it is in general difficult to deduce specific storage QoS parameters. Here, this fellowship proposal is aiming at creating one essential pillar linking business objectives and storage-level QoS parameters through an automated mapping approach. Instead of setting priorities on certain parameters, we propose research and development tasks for describing the desired Quality-of-Service at a higher level (called Service Level Objectives) and automatically map those to storage requirements including the configuration parameters necessary to configure the storage system. This approach simplifies the configuration of the system and also allows administrators more easily to verify whether the target desired QoS level is has been achieved.
This research work will create suitable application models based on workload analyses of typical business applications. As a result, tools will be provided which allow workload modeling on this application level and to automatically create suitable storage QoS/SLO configurations for this workload. The fellowship will, in addition, cover the verification of the models through simulation and testing on real systems. Based on these results, adaption mechanism will be evaluated to dynamically verify whether the workload for the forecasted application setting fits the original modeling and to propose or initiate changes to the QoS/SLO configuration of storage systems.