NetApp Faculty Fellowships (NFF) encourage leading-edge research in storage and data management and to foster relationships between academic researchers and engineers.
Please see below for a current list of ATG’s NFFs.
Zoned namespace SSDs: Challenges and Opportunities
Zoned NameSpaces (ZNS) are a mechanism proposed in the NVM Express Workgroup to provide features and functionality similar to that of Open Channel SSD, but fully integrated with the NVMe model using a zone concept similar to that in the ZAC/ZBD extensions for SMR disk. The goals of this research are to investigate applications for ZNS SSD, in particular (a) RAID-like functionality over ZNS SSD, (b) strategies for file system support for ZNS, and (c) interfaces and strategies for direct application usage of ZNS SSD.
Modern storage systems have been developed for decades with the security-critical foundation provided by operating system (OS). However, they are still vulnerable to malware attacks and software defects. Adversaries can obtain the OS kernel privilege or leverage software vulnerabilities to bypass, terminate or destroy current malware detection and defense systems. For instance, encryption ransomware accounts for more than half of all malware attacks today, but current software-based defense systems often fail to enable the victims to say no to ransom collectors. Therefore, it is natural to utilize hardware techniques which have been proven effective in defending against malware attacks.
Time Series Snippets: A New Analytics Primitive with applications to IoT Edge Computing
While most of today’s always-connected tech devices take advantage of cloud computing, many Internet of Things (IoT) developers increasingly understand the benefits of doing more analytics on the devices themselves, a philosophy known as edge computing. By performing analytic tasks directly on the sensor, edge computing can drastically reduce the bandwidth, cloud processing, and cloud storage needed.
Accelerating Internet of Things Data Analytics through Scalable Time-Series Representation Learning
Kernel methods, a class of machine learning algorithms for pattern recognition, have shown a great deal of promise in the analysis of complex, real-world, data. However, kernel methods remain largely unexplored in the analysis of time- varying measurements (i.e., time series), which is becoming increasingly prevalent across scientific disciplines, industrial settings, and Internet of Things (IoT) applications. Until now, research in time-series analysis has focused on designing methods for three components, namely, (i) representation methods; (ii) comparison functions; and (iii) indexing mechanisms. Unfortunately, these components have typically been investigated and developed independently, resulting in methods that are incompatible with each other. The lack of a unified approach has hindered progress towards scalable analytics over massive time-series collections.
Remzi’s research group proposed to an investigation of Cloud-Native Systems (CNS’s) as a new paradigm for systems design and implementation in the era of the cloud. Instead of assuming raw hardware (CPUs, memory, disks, and networks) as a basic building block for systems and services, CNS’s instead assume the presence of the cloud and study how systems should change in this new era. They specifically focus on two CNS examples: a cloud-native local file system and cloud-native persistent key-value store.
Faster, Cheaper, and Predictable NoSQL Storage and Analytics
NoSQL key-value stores are at the heart of numerous modern data-driven applications. Prof Stratos Idreos research shows that all existing designs are sub-optimal. They work towards a completely new data system design, CrimsonDB, which is 1) 10x faster than existing systems and gets faster for bigger data, 2) requires less hardware resources (memory) to produce the same or better results, and 3) it is automatically adaptable to new hardware and workloads, requiring no human-in-the-loop to tune the system.
Provisioning and managing SSD/NVM-based disk caching in derivative clouds
Flash-based non-volatile storage devices (SSDs) have been widely used for disk caching in data-center infrastructure setups. With virtualization enabled hosting, SSD cache partitioning and provisioning is an important management question for resource controllers to ensure disk IO performance guarantees. While this problem has received considerable attention in the server-side cache management domain, the following gaps remain and we aim to explore them:
Prioritizing Attention in High-Volume, High-Dimensional Data Streams
The rise of Big Data infrastructure has supercharged the collection of high volume, highly heterogeneous data. However, increasingly, this data is too big for any cost-effective manual inspection, and, in practice, much of this “fast data” is only accessed in exceptional cases (e.g., to debug a failure). As a result, important behaviors often go unnoticed, leading to inefficiency, wasted resources, and limited visibility into complex application deployments.
Maintaining locality in a space-constrained file system
Donald Porter (UNC), Michael Bender and Rob Johnson (Stony Brook) & Martin Farach-Colton (Rutgers) have teamed up on the collaborative research entitled “Maintaining locality in a space-constrained file system”