35th International Conference
on Massive Storage Systems
and Technology (MSST 2019)
May 20 — 24, 2019

Sponsored by Santa Clara University,
School of Engineering


Since the conference was founded, in 1974, by the leading national laboratories, MSST has been a venue for massive-scale storage system designers and implementers, storage architects, researchers, and vendors to share best practices and discuss building and securing the world's largest storage systems for high-performance computing, web-scale systems, and enterprises.
    



Hosted at
Santa Clara University
Santa Clara, CA


2019 Conference


MSST 2019, as is our tradition, will focus on distributed storage system technologies, including persistent memory, new memory technologies, long-term data retention (tape, optical disks...), solid state storage (flash, MRAM, RRAM...), software-defined storage, OS- and file-system technologies, cloud storage, big data, and data centers (private and public). The conference will focus on current challenges and future trends in storage technologies.

MSST 2019 will include a day of tutorials, two days of invited papers, and two days of peer-reviewed research papers. The conference will be held, once again, on the beautiful campus of Santa Clara University, in the heart of Silicon Valley.

Santa Clara University



Subscribe to our email list for (infrequent) information along the way.



Registration is open!


Registration Fees
Tutorial Day   (Monday) $85
Invited Track   (Tuesday, Wednesday)   $170
Research Track   (Thursday, Friday) $170

Register here



Logistics


Venue: Locatelli Center on the Santa Clara University Campus (map)

Parking: Daily and multi-day permits are available for purchase at the 
main gate at 500 El Camino Real ($8/day, where the attendant will direct you to the
Locatelli Center/Leavey Center parking lot), or daily permits may be
purchased for $5 at an unmanned kiosk in the parking lot. (map)         

Driving Directions (to the campus)

Walking Directions (on campus)

Hotels near the campus
(To reduce your attendance fees, there is no
"conference hotel", so you can choose where to stay.)


Tutorials, Monday, May 20th
(Additional Parallel Sessions May Be Added)
7:30 — 9:00 Registration / Breakfast
9:00 — 9:05 Introduction
Sean Roberts, Tutorial Chair
9:05 — 12:30 IME Storage System (10:30 — 11:00 Break)
Paul Nowoczynski, DDN
DDN’s IME (aka "the Infinite Memory Engine") is an all NAND-flash storage system which acts as a high-performance storage tier in a user’s overall storage environment. IME has been built from the ground up as a highly-available, clustered storage technology which provides millions of IOPs to applications and best-case media endurance properties. IME’s top-rated capabilities report the highest overall performance for the most demanding data workloads as recorded by the independent IO500 organization.

The tutorial will focus on IME’s tiering ability along with performance demonstrations in difficult workload scenarios. Configuration, usage, and monitoring of IME, which will be done on a live cluster, will all be covered and attendees can expect to obtain a reasonable sense of an IME environment’s look and feel.
12:30 — 1:30 Lunch
1:30 — 5:00 Expanding the World of Heterogeneous Memory Hierarchies: The Evolving
                     Non-Volatile Memory Story (3:00 — 3:30 Break)
Bill Gervasi, Nantero
Emerging technologies and storage options are challenging the traditional system architecture hierarchies and giving designers new variables to consider. Existing options include module level solutions such as 3DXpoint and NVDIMMs which bring data persistence onto the memory channel, each with a variety of tradeoffs in terms of cost, performance, and mechanical considerations. Emerging options include new non-volatile memory technologies capable of addressing the limitations of the current solutions with lower latency and predictable time to data persistence, a critical factor for high reliability data processing applications. Meanwhile, an increasing number of systems are moving towards distributed fabric-based backbones with heterogeneous computing elements as well, including but not limited to artificial intelligence and deep learning, but also in-memory computing and non-von Neumann processing.

This tutorial is targeted at system architects who can appreciate the complexity of a confusing number of options, and would like some insights about managing the complexity to solve real world problems. Some of the standards in process are new, such as NVDIMM-P, DDR5 NVRAM, or Gen-Z specifications, so this is an opportunity to learn about future developments as well. The tutorial will allocate time for attendees to share their system integration stories as well, making it a joint learning experience for all.


Invited Track, Tuesday, May 21st
(Preliminary Program)
7:30 — 8:30 Registration / Breakfast
8:30 — 9:30 Keynote
(Session Chair: Meghan McClelland)
Mark Kryder
9:30 — 10:00 Break
10:00 — 12:00 Storage in the Age of AI
Storage, Privacy, and Security in the Age of AI
Aleatha Parker-Wood, Symantec
Machine Learning and Algorithmic Privacy at Humu
Nisha Talgala
I/O for Deep Learning at Scale
Quincey Koziol, National Energy Research Scientific Computing Center (NERSC)
Deep Learning is revolutionizing the fields of computer vision, speech recognition and control systems. In recent years, a number of scientific domains (climate, high-energy physics, nuclear physics, astronomy, cosmology, etc) have explored applications of Deep Learning to tackle a range of data analytics problems. As one attempts to scale Deep Learning to analyze massive scientific datasets on HPC systems, data management becomes a key bottleneck. This talk will explore leading scientific use cases of Deep Learning in climate, cosmology, and high-energy physics on NERSC and OLCF platforms; enumerate I/O challenges and speculate about potential solutions.
12:00 — 1:15 Lunch
1:15 — 2:45 Computational Memory and Storage for AI
Changing Storage Architecture will require new Standards
Mark Carlson, Toshiba Memory Corporation
Stephen Bates, Eideticom
Storage in the New Age of AI/ML
Young Paik, Samsung (bio)
One of the hottest topics today is Artificial Intelligence/Machine Learning. Most of the attention has been on the enormous increases in computational power now possible with GPU/ASIC servers. Much less time has been spent on what is arguably just as important: the storage of the data that feeds these hungry beasts. There are many technologies that may be used (e.g. PCIe Gen4, erasure coding, smart storage, SmartNICs). However, in designing new storage architectures it is important to realize where these may not work well together. Young will describe some of the characteristics of machine learning systems, the methods to process the data to feed them, and what considerations should go into designing the storage for them.
2:45 — 3:00 Break
3:00 — 4:30 User Requirements of Storage at Scale
Amedeo Perazzo, SLAC National Accelerator Laboratory
NWSC Storage: A look at what users need
Chris Hoffman, National Center for Atmospheric Research
Understanding Storage System Challenges for Parallel Scientific Simulations
Dr. Bradley Settlemyer, Los Alamos National Laboratory (bio)
Computer-based simulation is critical to the study of physical phenomena that are difficult or impossible to physically observe. Examples include asteroid collisions, chaotic interactions in climate models, and massless particle interactions. Long-running simulations, such as those at running on Los Alamos National Laboratory's Trinity supercomputer, generate many thousands of snapshots of the simulation state that are written to stable storage for fault tolerance and visualization/analysis. For extreme scale simulation codes, such as the Vector Particle-in-Cell code (VPIC), improving the efficiency of storage system access is critical to accelerating scientific insight and discovery. In this talk we will discuss the structure of the VPIC software architecture and several storage system use cases associated with the VPIC simulation code and the challenges associated with parallel access to the underlying storage system. We will not present solutions but instead focus on the underlying requirements of the scientific use cases including fault tolerance and emerging data analysis workloads that directly accelerate scientific discovery.
4:30 — 4:40 Break
4:40 — 5:10 Panel: Addressing the gaps between storage needs and haves
What gaps remain between requirements and solutions in the new, AI-driven age of massive storage?
5:10 — 6:00 Lightning Talks
Sign-up board will be available all day.


Invited Track, Wednesday, May 22nd
(Preliminary Program)
7:30 — 8:30 Registration / Breakfast
8:30 — 9:30 Keynote
More than Storage
Margo Seltzer, University of British Columbia
The incredible growth and success that our field has experienced over the past half a century has had the side effect of transforming systems into a constellation of siloed fields; storage is one of them. I'm going to make the case that we should return to a road interpretation of systems, undertaking bolder, higher risk projects, and be intentional about how we interact with other fields. I'll support the case with examples or several research projects that embody this approach.
9:30 — 10:00 Break
10:00 — 12:00 Resilience at Scale
Session Chair: John Bent, DDN
Asaf Cidon, Stanford University
Paul D. Manno, Georgia Tech
Lance Evans, Cray, Inc.
Cyril Guyot, Western Digital
Sadaf Alam
12:00 — 1:15 Lunch
1:15 — 3:15 Next Generation Storage Software
(Session Chair: Meghan McClelland)
How are new algorithms and storage technologies addressing the new requirements of AI and Big Science? How are virtual file systems bridging the gap between big repositories and usability?
CERN's Virtual File System for Global-Scale Software Delivery
Jakob Bloomer, CERN
Jeff Denworth, VAST Data
Zach Brown / Harriet Coverston, Versity)
Grand Unified File Index: A Development, Deployment, and Performance Update
Dominic Manno, Los Alamos National Laboratory
3:15 — 3:30 Break
3:30 — 5:00 Future Storage Systems
Moore's Law coming to an end has parallels in the storage industry. What comes next? What lies beyond 10 years with respect to new nonvolatile media? What software approaches can help stem the tide in achieving peak performance and density?
Karin Strauss, Microsoft Research
The Future of Storage Systems – a Dangerous Opportunity
Rob Peglar, Advanced Computation and Storage, LLC (bio)
Nantero NRAM carbon nanotube memory changes the foundation for next generation storage concepts
Bill Gervasi, Nantero, Inc. (bio)
Nantero NRAM defines a new class of memory class storage (MCS) devices with the performance of a DRAM using carbon nanotubes for centuries-long data persistence. How does the introduction of MCS change how we think of data storage hierarchies? When main memory acts as a self-serving storage layer, traditional concepts of checkpointing to slower media or maintaining energy stores for backup mechanisms evaporate, and a new model for data integrity emerges. MCS clearly can replace the volatile caches used for acceleration in the mass storage media as well, and when the cache size is decoupled from considerations like the capacity of energy stores, designers are able to rethink their assumptions about cost versus performance calculations. With a growing trend towards fabric-based system bus architectures including Gen-Z, Open-CAPI, CCIX, etc., the timing is right for introduction of new paradigms for data distribution that take advantage of data persistence. This talk describes Nantero NRAM’s technical details, touches on the new JEDEC standards effort for MCS devices, and explores use cases for MCS in massive storage systems.
5:00 — 5:10 Break
5:00 — 6:00 Lightning Talks
Sign-up board will be available all day.


Research Track, Thursday, May 23rd
(Preliminary Program)
XORInc: Optimizing Data Repair and Update for Erasure-Coded Systems with XOR-Based
In-Network Computation
Yingjie Tang, Fang Wang, Yanwen Xie and Xuehai Tang
Huazhong University of Science and Technology, Institute of Information Engineering, Chinese Academy of Sciences
Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart
Jialing Zhang, Xiaoyan Zhuo, Aekyeung Moon, Hang Liu, Seung Woo Son
University of Massachusetts Lowell
CDAC: Content-Driven Deduplication-Aware Storage Cache
Yujuan Tan, Jing Xie, Congcong Xu, Zhichao Yan, Hong Jiang, Yajun Zhao, Min Fu, Xianzhang Chen, Duo Liu and Wen Xia
Chongqing University, HP, University of Texas Arlington, Sangfor, Harbin Institute of Technology
Data deduplication, as a proven technology for effective data reduction in backup and archive storage systems, also demonstrates the promise in increasing the logical space capacity of storage caches by removing redundant data. However, our in-depth evaluation of the existing deduplication-aware caching algorithms reveals that they do improve the hit ratios compared to the caching algorithms without deduplication, especially when the cache block size is set to 4KB. But when the blocksize is larger than 4KB, a clear trend for modern storage systems, their hit ratios are significantly reduced. A slight increase in hit ratios due to deduplication may not be able to improve the overall storage performance because of the high over-head created by deduplication.

To address this problem, in this paper we propose CDAC, a Content-driven Deduplication-Aware Cache, which focuses on exploiting the blocks’ content redundancy and their intensity of content sharing among source addresses in cache management strategies. We have implemented CDAC based on LRU and ARC algorithms, called CDAC-LRU and CDAC-ARC respectively. Our extensive experimental results show that CDAC-LRU and CDAC-ARC outperform the state-of-the-art deduplication-aware caching algorithms, D-LRU and D-ARC, by up to 19.49X in read cache hit ratio, with an average of 1.95X under real-world traces when the cache size ranges from 20% to 80% of the working set size and the block size ranges from 4KB to 64 KB.
Scalable QoS for Distributed Storage Clusters using Dynamic Token Allocation
Yuhan Peng, Qingyue Liu and Peter Varman
Rice University
The paper addresses the problem of providing performance QoS guarantees in a clustered storage system. Multiple related storage objects are grouped into logical containers called buckets, which are distributed over the servers based on the placement policies of the storage system. QoS is provided at the level of buckets. The service credited to a bucket is the aggregate of the IOs received by its objects at all the servers. The service depends on individual time-varying demands and congestion at the servers.

We present a token-based, coarse-grained approach to providing IO reservations and limits to buckets. We propose pShift, a novel token allocation algorithm that works in conjunction with token-sensitive scheduling at each server to control the aggregate IOs received by each bucket on multiple servers. pShift determines the optimal token distribution based on the estimated bucket demands and server IOPS capacities. Compared to existing approaches, pShift has far smaller overhead, and can be accelerated using parallelization and approximation. Our experimental results show that pShift provides accurate QoS among the buckets with different access patterns, and handles runtime demand changes well.
Tiered-ReRAM: A Low Latency and Energy Efficient TLC Crossbar ReRAM Architecture
Yang Zhang, Dan Feng, Wei Tong, Jingning Liu, Chengning Wang and Jie Xu
Huazhong University of Science and Technology
CeSR: A Cell State Remapping Strategy to Reduce Raw Bit Error Rate of MLC NAND Flash
Yutong Zhao, Wei Tong, Jingning Liu, Dan Feng and Hongwei Qin
Huazhong University of Science and Technology
Retention errors and program interference errors have been recognized as the two main types of NAND flash errors. Since NAND flash cells in the erased state which hold the lowest threshold voltage are least likely to cause program interference and retention errors, existing schemes preprocess the raw data to increase the ratio of cells in the erased state. However, such schemes do not effectively decrease the ratio of cells with the highest threshold voltage which are most likely to cause program interference and retention errors. In addition, we note that the dominant error type of flash varies with data hotness. Retention errors are not too much of a concern for frequently updated hot data while cold data that is rarely updated need to worry about the growing retention errors as P/E cycles increase. Furthermore, the effects of these two types of errors on the same cell partially counteract each other. Given the observation that retention errors and program interference errors are both cell-state-dependent, this paper presents a cell state remapping (CeSR) strategy based on the error tendencies of data with different hotness. For different types of data segments, CeSR adopts different flipping schemes to remap the cell states in order to achieve the least error-prone data pattern for written data with different hotness. Evaluation shows that the proposed CeSR strategy can reduce the raw bit error rates of hot and cold data by up to 20.30% and 67.24%, respectively, compared with the state-of-the-art NRC strategy.
vPFS+: Managing I/O Performance for Diverse HPC Applications
Ming Zhao and Yiqi Xu
Arizona State University, VMware
Parity-Only Caching for Robust Straggler Tolerance
Mi Zhang, Qiuping Wang, Zhirong Shen and Patrick P. C. Lee
The Chinese University of Hong Kong
Mitigate HDD Fail-Slow by Pro-actively Utilizing System-level Data Redundancy
with Enhanced HDD Controllability and Observability
Jingpeng Hao, Yin Li, Xubin Chen and Tong Zhang
Rensselaer Polytechnic Institute
This paper presents a design framework aiming to mitigate occasional HDD fail-slow. Due to their mechanical nature, HDDs may occasionally suffer from spikes of abnormally high internal read retry rates, leading to temporarily significant degradation of speed (especially the read latency). Intuitively, one could expect that existing system-level data redundancy (e.g., RAID or distributed erasure coding) may be opportunistically utilized to mitigate HDD fail-slow. Nevertheless, current practice tends to use system-level redundancy merely as a safety net, i.e., reconstruct data sectors via system-level redundancy only after the costly intra-HDD read retry fails. This paper shows that one could much more effectively mitigate occasional HDD fail-slow by more pro-actively utilizing existing system-level data redundancy, in complement to (or even replacement of) intra-HDD read retry. To enable this, HDDs should support a higher degree of controllability and observability in terms of their internal read retry operations. Assuming a very simple form enhanced HDD controllability and observability, this paper presents design solutions and a mathematical formulation framework to facilitate the practical implementation of such pro-active strategy for mitigating occasional HDD fail-slow. Using RAID as a test vehicle, our experimental results show that the proposed design solutions can effectively mitigate the RAID read latency degradation even when HDDs suffer from read retry rates as high as 1% or 2%.
Accelerating Relative-error Bounded Lossy Compression for HPC datasets with
Precomputation-Based Mechanisms
Xiangyu Zou, Tao Lu, Wen Xia, Xuan Wang, Weizhe Zhang, Sheng Di, Dingwen Tao and Franck Cappello
Harbin Institute of Technology, Marvell Technology Group, Argonne National Laboratory, University of Alabama
Scientific simulations in high-performance computing (HPC) environments are producing vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for post-analysis. In this work, we develop efficient precomputation-based mechanisms in the SZ lossy compression framework for HPC datasets. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error bounded compression with excellent compression ratios. In addition, our mechanisms also help reduce traversing operations for Huffman decoding, and thus significantly accelerate the decompression process in SZ. Experiments with four well-known real-world scientific simulation datasets show that our solution can improve the compression rate by about 30% and decompression rate by about 70% in most of cases, making our designed lossy compression strategy the best choice in class in most cases.
Towards Virtual Machine Image Management for Persistent Memory
Jiachen Zhang, Lixiao Cui, Peng Li, Xiaoguang Liu and Gang Wang
Nankai University
vNVML: An Efficient Shared Library for Virtualizing and Sharing Non-volatile Memories
Chih Chieh Chou, Jaemin Jung, Narasimha Reddy, Paul Gratz and Doug Voigt
Texas A&M University, Hewlett Packard Enterprise


Research Track, Friday, May 24th
(Preliminary Program)
DFPE: Explaining Predictive Models for Disk Failure Prediction
Yanwen Xie, Dan Feng, Fang Wang, Xuehai Tang, Jizhong Han and Xinyan Zhang
Huazhong University of Science and Technology, Chinese Academy of Sciences
Parallel all the time: Plane Level Parallelism Exploration for High Performance SSD
Congming Gao, Liang Shi, Jason Chun Xue, Cheng Ji, Jun Yang and Youtao Zhang
Chongqing University, East China Normal University, City University of Hong Kong, University of Pittsburgh
Solid state drives (SSDs) are constructed with multiple level parallel organization, including channels, chips, dies and planes. Among these parallel levels, plane level parallelism, which is the last level parallelism of SSDs, has the most strict restrictions. Only the same type of operations which access the same address in different planes can be processed in parallel. In order to maximize the access performance, several previous works have been proposed to exploit the plane level parallelism for host accesses and internal operations of SSDs. However, our preliminary studies show that the plane level parallelism is far from well utilized and should be further improved. The reason is that the strict restrictions of plane level parallelism are hard to be satisfied. In this work, a from plane to die parallel optimization framework is proposed to exploit the plane level parallelism through smartly satisfying the strict restrictions all the time. In order to achieve the objective, there are at least two challenges. First, due to that host access patterns are always complex, receiving multiple same-type requests to different planes at the same time is uncommon. Second, there are many internal activities, such as garbage collection (GC), which may destroy the restrictions. In order to solve above challenges, two schemes are proposed in the SSD controller: First, a die level write construction scheme is designed to make sure there are always N pages of data written by each write operation. Second, in a further step, a die level GC scheme is proposed to activate GC in the unit of all planes in the same die. Combing the die level write and die level GC, write accesses from both host write operations and GC induced valid page movements can be processed in parallel at all time. As a result, the GC cost and average write latency can be significantly reduced. Experiment results show that the proposed framework is able to significantly improve the write performance without read performance impact.
SES-Dedup: a Case for Low-Cost ECC-based SSD Deduplication
Zhichao Yan, Hong Jiang and Yujuan Tan
University of Texas-Arlington, Chongqing University
Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection
Jingwei Li, Patrick P. C. Lee, Yanjing Ren and Xiaosong Zhang
University Electronic Science and Technology of China, The Chinese University of Hong Kong
Encrypted deduplication combines encryption and deduplication in a seamless way to provide confidentiality guarantees for the physical data in deduplication storage, yet it incurs substantial metadata storage overhead due to the additional storage of keys. We present a new encrypted deduplication storage system called Metadedup, which suppresses metadata storage by also applying deduplication to metadata. Its idea builds on indirection, which adds another level of metadata chunks that record metadata information. We find that metadata chunks are highly redundant in real-world workloads and hence can be effectively deduplicated. In addition, metadata chunks can be protected under the same encrypted deduplication framework, thereby providing confidentiality guarantees for metadata as well. We evaluate Metadedup through microbenchmarks, prototype experiments, and trace-driven simulation. Metadedup has limited computational overhead in metadata processing, and only adds 6.19% of performance overhead on average when storing files in a networked setting. Also, for real-world backup workloads, Metadedup saves the metadata storage by up to 97.46% at the expense of only up to 1.07% of indexing overhead for metadata chunks.
A Performance Study of Lustre File System Checker: Bottlenecks and Potentials
Dong Dai, Om Rameshwar Gatla and Mai Zheng
UNC Charlotte, Iowa State University
Lustre, as one of the most popular parallel file systems in high-performance computing (HPC), provides POSIX interface and maintains a large set of POSIX-related metadata, which could be corrupted due to hardware failures, software bugs, configuration errors, etc. The Lustre file system checker (LFSCK) is the remedy tool to detect metadata inconsistencies and to restore a corrupted Lustre to a valid state, hence is critical for reliable HPC.

Unfortunately, in practice, LFSCK runs slow in large deployment, making system administrators reluctant to use it as a routine maintenance tool. Consequently, cascading errors may lead to unrecoverable failures, resulting in significant downtime or even data loss. Given the fact that HPC is rapidly marching to Exascale and much larger Lustre file systems are being deployed, it is critical to understand the performance of LFSCK.

In this paper, we study the performance of LFSCK to identify its bottlenecks and analyze its performance potentials. Specifically, we design an aging method based on real-world HPC workloads to age Lustre to representative states, and then systematically evaluate and analyze how LFSCK runs on such an aged Lustre via monitoring the utilization of various resources. From our experiments, we find out that the design and implementation of LFSCK is sub-optimal. It consists of scalability bottleneck on the metadata server (MDS), relatively high fan-out ratio in network utilization, and unnecessary blocking among internal components. Based on these observations, we discussed potential optimization and present some preliminary results.
FastBuild: Accelerating Docker Image Building for Efficient Development and Deployment of Containers
Zhuo Huang, Song Wu, Song Jiang and Hai Jin
Huazhong University of Science and Technology, The University of Texas at Arlington
Wear-aware Memory Management Scheme for Balancing Lifetime and Performance of Multiple NVM Slots
Chunhua Xiao, Linfeng Cheng, Lei Zhang, Duo Liu and Weichen Liu
Chongqing University, Nanyang Technological University
Adjustable flat layouts for Two-Failure Tolerant Storage Systems
Thomas Schwarz
Marquette University
LIPA: a Learning-based Indexing and Prefetching Approach for data deduplication
Guangping Xu, Chi Wan Sung, Quan Yu, Hongli Lu and Bo Tang
Tianjin University of Technology, City University of Hong Kong, WHUT, TJUT
AZ-Code: An Efficient Availability Zone Level Erasure Code to Provide High Fault
Tolerance in Cloud Storage Systems
Xin Xie, Chentao Wu, Junqing Gu, Han Qiu, Jie Li, Minyi Guo, Xubin He, Yuanyuan Dong and Yafei Zhao
Shanghai Jiao Tong University, Temple University, Alibaba Group
As data in modern cloud storage system grows dramatically, it’s a common method to partition data and store them in different Availability Zones (AZs). Multiple AZs not only provide high fault tolerance (e.g., rack level tolerance or disaster tolerance), but also reduce the network latency. Replication and Erasure Codes (EC) are typical data redundancy methods to provide high reliability for storage systems. Compared with the replication approach, erasure codes can achieve much lower monetary cost with the same fault-tolerance capability. However, the recovery cost of EC is extremely high in multiple AZ environment, especially because of its high bandwidth consumption in data centers. LRC is a widely used EC to reduce the recovery cost, but the storage efficiency is sacrificed. MSR code is designed to decrease the recovery cost with high storage efficiency, but its computation is too complex.

To address this problem, in this paper, we propose an erasure code for multiple availability zones (called AZ-Code), which is a hybrid code by taking advantages of both MSR code and LRC codes. AZ-Code utilizes a specific MSR code as the local parity layout, and a typical RS code is used to generate the global parities. In this way, AZ-Code can keep low recovery cost with high reliability. To demonstrate the effectiveness of AZ-Code, we evaluate various erasure codes via mathematical analysis and experiments in Hadoop systems. The results show that, compared to the traditional erasure coding methods, AZ-Code saves the recovery bandwidth by up to 78.24%.
BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement
Yang Yang, Qiang Cao and Hong Jiang
Huazhong University of Science and Technology, University of Texas at Arlington
Existing local file systems, designed to support a typical single-file access pattern only, can lead to poor performance when accessing a batch of files, especially small files. This single-file pattern essentially serializes accesses to batched files one by one, resulting in a large number of non-sequential, random, and often dependent I/Os between file data and metadata at the storage ends. We first experimentally analyze the root cause of such inefficiency in batch-file accesses. Then, we propose a novel batch-file access approach, referred to as BFO for its set of optimized Batch-File Operations, by developing novel BFOr and BFOw operations for fundamental read and write processes respectively, using a two-phase access for metadata and data jointly. The BFO offers dedicated interfaces for batch-file accesses and additional processes integrated into existing file systems without modifying their structures and procedures. We implement a BFO prototype on ext4, one of the most popular file systems. Our evaluation results show that the batch-file read and write performances of BFO are consistently higher than those of the traditional approaches regardless of access patterns, data layouts, and storage media, with synthetic and real-world file sets. BFO improves the read performance by up to 22.4x and 1.8x with HDD and SSD respectively; and boosts the write performance by up to 111.4x and 2.9x with HDD and SSD respectively. BFO also demonstrates consistent performance advantages when applied to four representative applications, Linux cp, Tar, GridFTP, and Hadoop.
Fighting with Unknowns: Estimating the Performance of Scalable Distributed Storage
Systems with Minimal Measurement Data
Moo-Ryong Ra and Hee Won Lee
AT&T Labs Research
Pattern-based Write Scheduling and Read Balance-oriented Wear- leveling for Solid State Drivers
Jun Li, Xiaofei Xu, Xiaoning Peng and Jianwei Liao
Southwestern University
When NVMe over Fabrics Meets Arm: Performance and Implications
Yichen Jia, Eric Anger and Feng Chen
Southwest University, Louisiana State University, ARM Inc.
Long-Term JPEG Data Protection and Recovery for NAND Flash-Based Solid-State Storage
Yu-Chun Kuo, Ruei-Fong Chiu and Ren-Shuo Liu
Department of Electrical Engineering, National Tsing Hua Univesity
Economics of Information Storage: The Value in Storing the Long Tail
James Hughes
University of California, Santa Cruz


2019 Organizers
Conference Co-Chairs     Dr. Ahmed Amer,  Dr. Sam Coleman
Invited Track Program Co-Chairs     Dr. Glenn K. Lockwood,  Dr. Michal Simon
Research Track Program Co-Chairs     James Hughes,  Thomas Schwarz
Research Track Program Committee
Communications Chair     Meghan Wingate McClelland
Local Arrangements Chair     Prof. Yuhong Liu
Registration Chair     Prof. Behnam Dezfouli


Page Updated April 18, 2019