Michael Reiter, University of North Carolina – June 2012

reiter.jpgWACCO: A Wide-Area Cluster-Consistent Object Store

Michael Reiter, UNC  June 2012
Project proposal is to construct a system called WACCO, an abbreviation for “Wide-Area Cluster-Consistent Objects”. WACCO manages access to stateful, deterministic objects over a logically tree-based overlay network of proxies that is arranged to respect geography; i.e., neighbors in the tree tend to be close geographically or, more to the point, enjoy low latency between them. Each client is assigned to a nearby proxy to which it connects to access objects, and object access is managed through a protocol that offers a novel type of consistency that we dub cluster consistency. Cluster consistency is strong: it ensures sequential consistency, a consistency condition initially conceived for use in shared-memory systems, and also that clusters of concurrent reads see the most recent preceding update to the object on which the reads are performed.

Scalability of services implemented using WACCO is achieved through two strategies. First, WACCO uses the logical tree structure of the overlay to aggregate read demand, permitting the responses to some reads to answer others. As such, under high read concurrency, the vast majority of reads are not propagated to the location of the object; rather, most are paused in the tree awaiting other to complete, from which the return result can be “borrowed.” Second, WACCO employs migration to dynamically change where each object resides. This permits the object to move closer to demand as it fluctuates, e.g., due to diurnal patterns.

WACCO was initially conceived to support global-scale services such as content distribution networks (CDNs), but while ensuring much greater responsiveness to data updates than existing designs allow. As such, WACCO’s design places a premium on supporting both frequent updates and widespread concurrent reads on a per-object basis. The proposed work includes the implementation of WACCO and its evaluation in the CDN domain, as well as exploration of its use for applications running across geographically distributed datacenters.