30th International Conference
on Massive Storage Systems
and Technology (MSST 2014)

Santa Clara, California, USA
June 2 — 6, 2014

Sponsored by Santa Clara University,
Computer Engineering

Technically Co-Sponsored by IEEE Computer Society

MSST 2014 Speaker

Daniel Duffy and John Schnase, NASA Goddard

Meeting the Big Data Challenges of Climate Science through Cloud-Enabled Climate Analytics-as-a-Service

Climate science is a big data domain that is experiencing unprecedented growth. In our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS), a specialization of the concept of business process-as-a-service that is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. In this presentation, we will describe two projects that demonstrate this shift.

MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRA/AS enables MapReduce analytics over the MERRA data collection by bringing together the following elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRA/AS has been demonstrated in several applications.

NASA’s High-Performance Science Cloud (HPSC) is an example of the type of compute-storage fabric required to support CAaaS. The HPSC combines several technologies in use within the NCCS: (1) virtualized high speed Infiniband network, (2) combined high performance file system and object storage, and (3) virtual system environments specific for data intensive, science applications. At the center of the HPSC resource is a large object storage environment that combines computation with data storage capabilities, which allows users to access the object storage environment much like a traditional file system, while also providing the capability to perform data proximal processing using technologies like a Hadoop Distributed File System (HDFS). Surrounding the storage is a cloud of high performance compute resources with many processing cores and large memory coupled to the storage through an Infiniband network. Through the use of technologies such as Single Root Input/Output Virtualization (SR-IOV), virtual systems can be provisioned on the compute resources with extremely high-speed network connectivity to the storage and to other virtual systems.

These technologies are providing a new tier in the data and analytic services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. In our experience, CAaaS lowers the barriers and risk to organizational change, fosters innovation and experimentation, and provides the agility required to meet our customers' increasing and changing needs.

Page Updated April 25, 2017