All posts by Anusha S

Eamonn Keogh, UC Riverside – August 2018

Time Series Snippets: A New Analytics Primitive with applications to IoT Edge Computing

While most of today’s always-connected tech devices take advantage of cloud computing, many Internet of Things (IoT) developers increasingly understand the benefits of doing more analytics on the devices themselves, a philosophy known as edge computing. By performing analytic tasks directly on the sensor, edge computing can drastically reduce the bandwidth, cloud processing, and cloud storage needed.

However, even if taken to the extreme, edge computing will occasionally have to report some summary data to a central server. Thus, a common type of IoT analytical query is essentially “Send me some representative/typical data.” This query might be issued by a human attempting to understand an unexpected event at a manufacturing plant, or it might be issued by an algorithm as a subroutine in some higher-level analytics. In either case, the problem of finding representative time series subsequences has not been solved despite the ubiquity of time series in almost all human endeavors, and especially in IoT domains.

In this proposal Prof Eamonn Keogh argues for a new definition of representative patterns called time series snippets. In many domains, time series snippets allow an extreme form of data reduction; instead of transmitting many megabytes of data from each sensor per hour, it may be possible to transmit just a few kilobytes of data, containing a handful of snippets with their metadata. Moreover, the proposal will attempt to show that many downstream analytic tasks, including classification, anomaly detection and monitoring, can greatly benefit by reasoning about the snippets, rather than working with the raw data.

Stratos Idreos, Harvard – January 2018

Faster, Cheaper, and Predictable NoSQL Storage and Analytics

NoSQL key-value stores are at the heart of numerous modern data-driven applications. Prof Stratos Idreos research shows that all existing designs are sub-optimal. They work towards a completely new data system design, CrimsonDB, which is 1) 10x faster than existing systems and gets faster for bigger data, 2) requires less hardware resources (memory) to produce the same or better results, and 3) it is automatically adaptable to new hardware and workloads, requiring no human-in-the-loop to tune the system.

The innovation comes from a full break down of the “design space” of LSM-trees which is the primary data structure in log based storage. Through this break down they formalize the design space so that they can algorithmically and with closed form formulas argue about possible tunings as well as discover optimization opportunities.