Scalable Coordination of Hierarchical Parallelism

Vinay Devadas and Matthew Curtis-Maury, NetApp

International Conference on Parallel Processing
August 2020
Edmonton, Canada

Given continually increasing core counts, multiprocessor software scaling becomes critical. One set of applications that is especially difficult to parallelize efficiently are those that operate on hierarchical data. In such applications, correct execution relies on all threads coordinating their accesses within the hierarchy. At the same time, high performance execution requires that this coordination happen efficiently while maximizing parallelism.

In this paper, we identify two key scalability bottlenecks in the coordination of hierarchical parallelism by studying the hierarchical data partitioning framework within the WAFL file system. We first observe that the global synchronization required to enforce the hierarchical constraints limits performance on increased core counts. We thus propose a distributed architecture, called Scheduler Pools, that divides the hierarchy into disjoint sub-hierarchies that can be managed independently in the common case, thereby reducing coordination overhead. We next observe that periodically draining all in-flight operations in order to facilitate the execution of coarse-grained operations in the hierarchy results in an excess of idle CPU cycles. To address this, we propose a new scheme, called Hierarchy-Aware Draining, that minimizes wasted CPU cycles by draining only those regions of the hierarchy required to execute the desired operation. When implemented together in the context of WAFL, Scheduler Pools and Hierarchy-Aware Draining overcome the observed scalability bottlenecks. Our evaluation using a range of real world benchmarks on high-end storage systems shows throughput gains of up to 33% and reductions in latency of up to 64%.