Hosted at

33rd International Conference
on Massive Storage Systems
and Technology (MSST 2017)
May 15 — 19, 2017

Sponsored by Santa Clara University,
School of Engineering

Since the conference was founded by the leading national laboratories, MSST has been a venue for massive-scale storage system designers and implementers, storage architects, researchers, and vendors to share best practices and discuss building and securing the world's largest storage systems for high-performance computing, web-scale systems, and enterprises.

Hosted at
Santa Clara University
Santa Clara, CA

MSST Preview (interview
with Dr. Matt O'Keefe): 
Inside HPC
HPCWire Article

2017 Conference

MSST (2017), as is our custom, will dedicate five days to computer-storage technology, including a day of tutorials, two days of invited papers, two days of peer-reviewed research papers, and a vendor exposition. The conference will be held, once again, on the beautiful campus of Santa Clara University, in the heart of Silicon Valley.

Santa Clara University


— 2017 Registration —

(Register for one, two, or all three tracks.)
Invited Track
Research Track
(1 day)  
(2 days)
(2 days)
Register Here


Venue: Locatelli Center on the Santa Clara University Campus (map)

Parking: Daily and multi-day permits are available for purchase at the 
main gate at 500 El Camino Real (where the guard will direct you to the
nearest parking lot to the Locatelli Center), or daily permits may be pur-
chased at an unmanned machine in the Leavey parking lot. (map)         

Driving Directions (to the campus)

Walking Directions (on campus)

Hotels near the campus

Subscribe to our email list for (infrequent) information along the way.

2017 Program

Tutorial, Monday, May 15th
9:00am — 5:00pm (lunch 12:30pm — 1:30pm)

Instructors: Sean Roberts (bio) and Stefano Maffulli (bio)

With over 2000 developers from 130 different companies worldwide, OpenStack is one of the largest collaborative software-development projects. Because of its size, it is characterized by a huge diversity in social norms and technical conventions. These can significantly slow down the speed at which changes by newcomers are integrated in the open source project.

We've designed a training program to accelerate the speed at which new developers are successful at integrating their own roadmap into that of the open source project. We have taken a slice of an existing successful two-day training program and broken out the session dealing with development interaction. This seven-hour live class teaches students to navigate the intricacies of a project's technical teams and social interactions using Legos. It is a lot of fun and very informative to the way upstream development teams, companies, and individual technical contributors behave and react to milestones. For more background, read:

Invited Track, Tuesday, May 16th
(Preliminary Program)
7:30 — 8:30 Registration / Breakfast
Managing Extreme Scale Genomic Datasets Today While Planning for Future Growth, including Management that Minimizes Cost while Maximizing Utility
Jack Collins, Ph.D., National Institutes of Health
Emerging Open Source Storage System Design for Hyperscale Computing
An Update on MarFS in Production
David Bonnie, Los Alamos National Laboratory (bio)
With MarFS in production at LANL since fall 2016, we have gained new insights, learned lessons, and expanded our future plans. We'll discuss the various hurdles required to deploy such an ambitious system with minimal manpower. Further, we'll delve into the challenges, triumphs, and defeats on the road to a new tier of inexpensive scalable storage.
Integrating File/Object/Analytic Name Spaces for HPC
Carrie Spear, NASA
Bridging Big - Small, Fast - Slow with Campaign Storage
Peter Braam, Campaign Storage, LLC (bio)
Economic considerations and technology developments are necessitating widely usable tiered storage. Untroubled by the worries of transparency and performance, Campaign Storage—invented at Los Alamos National Laboratory—offers radical revisions of old workflows and adapts to new technologies. But it also leverages widely available technologies and interfaces to offer stability from the ground up and blend in with the past. We'll discuss how a simple combination of components can support scalability, data analytics and efficient integration with memory based storage.
12:30 — 1:30 Lunch
Leveraging Compression, Encryption, and Erasure Coding Chip
Hardware Support to Construct Large Scale Storage Systems
SPARC Chip Support for Compression, Encryption, and SQL
Stephen Phillips, Oracle SPARC
Storage acceleration with ISA-L
Greg Tucker, Intel
As distributed storage adds advanced features such as erasure coding, dedup, compression and encryption, the computational requirements can limit performance. ISA-L is an optimized software library for storage algorithms intended to maximize efficiency by targeting the highest cycle-per-byte functions in modern storage systems.
Building High Speed Erasure Coding Libraries for ARM and x86 Processors
Per Simonsen, Memoscale (bio)
Library optimizations as well as development of new erasure coding algorithms have been keys to unlocking higher levels of erasure coding performance. Learn about performance improvements achieved on ARM and x86 processors.
Panel — The Limits of Open Source in Large-Scale Storage Systems Design
Peter Braam, Campaign Storage, LLC (bio)
Matthew O'Keefe, Oracle (bio)
Sean Roberts, OpenStack Consortium (bio)
Michael Declerck, Oracle
Building Extreme-Scale SQL and NoSQL Processing Environments
The World's Largest SQL Processing Systems
Ken Ritchhart, Oracle
Exadata: Design of an Extreme-Scale SQL Processing System
Matthew O'Keefe, Oracle (bio)
SQL is the language for data in business and many science applications today. In this talk, we will outline Oracle’s Exadata design and architecture and how it achieves very high performance, capacity, and resilience in production.
Design Decisions and Trade-offs in Apache Accumulo
Aaron Cordova, Koverse (bio)
NoSQL databases make some strong assertions about reasons for deviating from the conventional relational model. In this talk Aaron Cordova will present the design decisions and trade-offs made in building Apache Accumulo, a highly scalable open source implementation of Google's BigTable and will discuss how these decisions have enabled Accumulo to achieve extreme scalability in several dimensions.
Short Talks
Attendees and vendors can sign up in advance, or at the conference, to give 5-15 minute
works-in-progress or summary updates on work of interest to conference attendees.
Fragmentation at Low Utilization in the Lustre File System
John Kaitschuck, Seagate
Designing and Managing Large, Long-Lived Archives Efficiently
Donna Harland, Oracle Optimized Solutions
Donna Shawhan, Oracle StorageTek Archive Solutions
Discover how to easily design and manage your large archives for the long term. This short talk will highlight the key components and features that are essential for a secure, future-proof long-lived archive. Understand how to leverage and scale the optimal mix of storage, ranging from flash to disk to tape to cloud. Get tips on how to architect an environment that provides you easy access to and infinite retention for your data, no matter where it is stored or how old.

Invited Track, Wednesday May 17th
(Preliminary Program)
7:30 — 8:30 Breakfast
Keynote 2
Memory Driven Computing
Kimberly Keeton, Hewlett Packard Enterprise (bio)
Data growth and data analytics requirements are outpacing the compute and storage technologies that have provided the foundation of processor-driven architectures for the last five decades. This divergence requires a deep rethinking of how we build systems, and points towards a memory-driven architecture, where memory is the key resource and everything else, including processing, revolves around it.

Memory-driven computing (MDC) brings together byte-addressable persistent memory, a fast memory fabric, task-specific processing, and a new software stack to address these data growth and analysis challenges. At Hewlett Packard Labs, we are exploring MDC hardware and software design through The Machine. This talk will review the trends that motivate MDC, illustrate how MDC benefits applications, provide highlights from our Machine-related work in data management and programming models, and outline challenges that MDC presents for the storage community.
Storage Innovation in Large HPC Data Centers
Ian Randall, ECMWF
Ellen Salmon, NASA
Storage Development at CERN
Dr. Michal Simon, CERN (bio)
The storage group in CERN’s IT department provides coherent storage services for the physics community at CERN, including the the experiments at the Large Hadron Collider. In this talk we give a status update on the storage technologies, workloads, and storage analytics at CERN. Moreover, we will discuss the strategic developments like a new archive backend for our EOS disk pools, a RAFT based implementation of EOS namespace and latest security enhancements to our data access protocol (xroot).
Panel — How Large HPC Data Centers Can Leverage Public Cloud for Computing and Storage
Ian Randall, ECMWF
Ellen Salmon, NASA
Dr. Michal Simon, CERN
Supporting Extreme-Scale Name Spaces with NAS Technology
ZFS for Extreme-Scale NAS
Jason Schaffer, Oracle
OnTap Clustering Enhancements
TBD, Netapp
Lightning-fast File Operations for Extreme Scale Name Spaces:
Techniques for Applying Structure to Unstructured Data
Bryan Pham, Cloudtenna (bio)
Bryan Pham, CTO and co-founder of Cloudtenna, will illustrate the power of separating content and metadata. When metadata is extracted into a database, the index powers web-scale file access, search, and audit in ways not before possible.
12:30 — 1:30 Lunch
Storage System Designs Leveraging Hardware Support
Accelerating Ceph data services with Intel QuickAssist Technology and ISA-L
Tushar Gohad, Intel
Ceph is an open-source, unified, distributed storage system designed with scale in mind. Ceph's performance doesn't suffer as your data storage grows, which makes it a good fit for Big Data projects. Ceph community has recently been focused on adding key enterprise features such as erasure coding, compression and encryption. In this presentation, we’ll talk about how Intel QuickAssist and ISA-L based offloads can help accelerate these functions in Ceph.
Tiered Erasure - When Flat Doesn't Fit
David Bonnie, Los Alamos National Laboratory (bio)
Storage systems continue to demand the seemingly impossible triad: faster, cheaper, and more reliable. As systems scale up, all three become increasingly hard to balance, with reliability generally taking the back seat. While flat protection schemas work well for small systems, they all compromise too much of the triad at the tens to hundreds of petabyte scale. This discussion will focus on the genesis of the tiered erasure system used in MarFS and how it leverages hardware accelerated erasure to implement a fast, safe, and efficient storage paradigm.
How Can Large Scale Storage Systems Support Containerization?
Unsolved Storage Issues in Linux Container Interfaces
Dr. James Bottomley, IBM (bio)
With the addition of the superblock namespace (essentially a user namespace for the kernel to filesystem boundary) much of the stage is now set for fixing one of the biggest underlying container problems: that of translating unprivileged container writes into real filesystem uid/gids. This talk will examine how this system works, why it is necessary and what solutions have been proposed so far, how the upstream discussions are going and what still need to be added for orchestration systems to make use of it.
Learning from ZFS to Scale Storage on and under Containers
Evan Powell, Entrepreneur and Hacker (bio)
What is so new about the container environment that a new class of storage software is emerging to address these use cases? And can container orchestration systems themselves be part of the solution? As is often the case in storage, metadata matters here. We are implementing in the open source some approaches that are in some regards inspired by ZFS to enable much more efficient scale out block storage for containers that itself is containerized. The goal is to enable storage to be treated in many regards as just another application while, of course, also providing storage services to stateful applications in the environment.
Big Software-RAID Storage in Zoned Virtual Environments
Scott Sinno, NASA (bio)
This presentation will describe the evolution and growth of the NCCS's flagship virtualization project known as "ADAPT". The ADAPT system is a KVM/QEMU-based virtualized environment hosting 11 PB of highly cost-effective disk-storage in the form of JBODS, heavily leveraging Linux's "mdadm" software-based RAID for data-integrity and reliability. The ADAPT environment enforces a true zoned architecture, such that nodes within a zone have no visibility whatsoever to network or storage resources in other zones. This is accomplished by providing each zone it's own distinct set of virtualized fileservers which access their media as logical block devices presented by their hypervisors.
Trends in Non-Volatile Media
Basic Principles and Challenges of STT-MRAM for Embedded Memory Applications
Luc Thomas, Headway (bio)
Spin-Transfer-Torque Magnetic Random Access Memory (STT-MRAM) is emerging as a leading candidate for a variety of embedded memory applications ranging from embedded NVM to working memory and last level cache. In this talk, we will discuss the basic principles of STT-MRAM, as well as recent advances that bring perpendicular STT-MRAM closer to mass production. We will also address the specific challenges facing STT-MRAM for standalone and embedded applications, and its place in the emerging NVM landscape.
Persistent Memory Programming: The Current State of the Ecosystem
Andy Rudoff, Intel (bio)
In this presentation, Andy will report on the latest developments around persistent memory programming. He’ll describing current discussions in the SNIA NVM Programming Technical Work Group, the current state of operating system support, recent tool and library development, and finally he’ll describe some of the upcoming challenges for high performance persistent memory use.
Short Talks
Attendees and vendors can sign up in advance, or at the conference, to give 5-15 minute
works-in-progress or summary updates on work of interest to conference attendees.

Research Track, Thursday, May 18th - May 19th
(Preliminary List of Papers)
Ouroboros Wear-Leveling: A Two-Level Hierarchical Wear-Leveling Model for NVRAM
Qingyue Liu and Peter Varman, Rice University
A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key Value Stores
Ting Yao, Jiguang Wan, Qingxin Gui, Fei Wu and Changsheng Xie, Huazhong University of Science and Technology, China
Ping Huang and Xubin He, Temple University
Experience from Two Years of Visualizing Flash with SSDPlayer
Gala Yadgar and Roman Shor, Technion, Israel
A Page-Based Storage Framework for Phase Change Memory
Peiquan Jin, Zhangling Wu and Lihua Yue, University of Science and Technology of China
HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud
Huijun Wu and Sherif Sakr, The University of New South Wales, Australia
Chen Wang and Liming Zhu, Data61, CSIRO, Australia
Yinjin Fu, PLA University of Science and Technology, China
Kai Lu, National University of Defense Technology, China
Larger, Cheaper, but Faster: SSD-SWD Hybrid Storage Boosted by a New SMR-oriented Cache Framework
Chunling Wang, Dandan Wang, Yunpeng Chai, Chuanwen Wang and Diansen Sun, Renmin University of China
Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays
Ziqi Fan, Fenggang Wu, Jim Diehl and David Du, University of Minnesota
Dongchul Park, Intel Corporation
Doug Voigt, Hewlett Packard Enterprise
Experiences with a Distributed Deduplication API
Fred Douglis, Andrew Huber, Donna Lewis and Rachel Traylor, Dell EMC
LaLDPC: Latency aware LDPC for Read Performance Improvement of Solid State Drives
Yajuan Du, Huazhong University of Science and Technology and City University of Hong Kong
Deqing Zou and Hai Jin, Huazhong University of Science and Technology
Qiao Li and Liang Shi, Chongqing University
Chun Jason Xue, City University of Hong Kong
Near-Optimal Offline Cleaning for Flash-Based SSDs
Mansour Shafaei and Peter Desnoyers, Northeastern University
Near-Data Processing for Differentiable Learning Machines
Hyeokjun Choe, Seil Lee, Hyunha Nam, Seongsik Park, Seijoon Kim and Sungroh Yoon, Seoul National University, South Korea
Eui-Young Chung, Yonsei University, South Korea
A Write-friendly Hashing Scheme for Non-volatile Memory Systems
Pengfei Zuo and Yu Hua, Huazhong University of Science and Technology
LX-SSD: Enhancing the Lifespan of NAND Flash-based Memory via Recycling Invalid Pages
Ke Zhou, Shaofu Hu and Yuhong Zhao, Huazhong University of Science and Technology, China
Ping Huang, Temple University
Understanding Write Behaviors of Storage Backends in Ceph Object Store
Dong-Yun Lee, Kisik Jeong, Sang-Hoon Han, and Jin-Soo Kim, SungKyunKwan University, South Korea
Joo-Young Hwang and Sangyeun Cho, Samsung Electronics Co., Ltd., South Korea
Campaign Storage
Peter Braam, Campaign Storage (bio)
David Bonnie, Los Alamos National Laboratory (bio)
Content-aware Trace Collection and I/O Deduplication for Smartphones
Bo Mao, Suzhen Wu, Xiao Chen and Weijian Yang, Xiamen University, China
Hong Jiang, University of Texas at Arlington
Native OS Support for Persistent Memory with Regions
Mohammad Chowdhury and Raju Rangaswami, Florida International University
FRD: A Filtering based Buffer Cache Algorithm that Considers both Frequency and Reuse Distance
Sejin Park, SK Telecom, South Korea
Chanik Park, POSTECH, South Korea
A Cost-efficient Rewriting Scheme to Improve Restore Performance in Deduplication Systems
Jie Wu, Yu Hua, Pengfei Zuo and Yuanyuan Sun, Huazhong university of Science and Technology, China
SMORE: A Cold Data Object Store for SMR-drives
Peter Macko, James Kelley, David Slik, Keith A. Smith and Maxim G. Smith, NetApp, Inc.
John Haskins Jr., Qualcomm
FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance
Yujuan Tan, Jian Wen and Baiping Wang, Chongqing University, China
Zhichao Yan and Hong Jiang, University of Texas at Arlington
Witawas Srisa-An, University of Nebraska, Lincoln
Hao Luo, Nimble Storage
Performance Analysis of Containerized Applications on Local and Remote Storage
Qiumin Xu and Murali Annavaram, University of Southern California
Manu Awasthi, IIT Gandhinagar, India
Krishna Malladi and Jingpei Yang, Samsung Semiconductor, Inc.
Janki Bhimani, Northeastern University
BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding
Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai, Peking University, China
Improving the Performance of Backup Candidate File Selection using Inode Bitmap
Sosuke Matsui, Tsuyoshi Miyamura, Noriko Tanemura, Terue Watanabe and Norie Iwasaki, IBM Japan
DsDs: Data Store Driven Scheduling of Applications For Energy and Perfromant Efficient Micro-Clouds System
Frezewd Lemma Tena and Christof Fetter, Technical University of Dresden, Germany

2017 Organizers
Conference Chair     Dr. Sam Coleman
Tutorial Chair     Sean Roberts
Program Chair     Dr. Matthew O'Keefe
Research General Chair     Dr. Ahmed Amer
Research Program Chairs     Dr. Thomas Schwarz
  Dr. Aleatha Parker-Wood
Research Track Program Committee
SCU Arrangements     Dr. Ahmed Amer
Industry Chair     Dr. James Reaney
Communications Chair     Meghan Wingate McClelland
Registration Chairs     JoAnne Holliday, Yi Fang

Page Updated April 26, 2017