Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - SAGE (SAGE)

Teaser

Exascale is characterized not just by Exaflop computational capability, but also by massive volumes of data generated by simulations running on such systems and increasingly by data generated through massive scientific experiments, crowdsourcing, and expanding sensor networks...

Summary

Exascale is characterized not just by Exaflop computational capability, but also by massive volumes of data generated by simulations running on such systems and increasingly by data generated through massive scientific experiments, crowdsourcing, and expanding sensor networks continually multiplying the volume of data. Such data must be analysed to derive valuable insights through which innovations and understanding are made possible in a vast spectrum of domains such as physics, computational biology, neuroscience, pharmaceutics, energy, and industrial manufacturing - which is critical for societal scientific and technological progress. The SAGE project, which incorporates research and innovation in hardware and enabling software, will significantly improve the performance of data access and enable computation and analysis to be performed more locally to data wherever it resides in the architecture, drastically minimising data movements between compute and data storage infrastructures. With a seamless view of data throughout the platform, incorporating multiple tiers of storage from memory to disk to long-term archives, it will enable Application Programming Interfaces and programming models to easily use such a platform to efficiently utilize the most appropriate data analytics techniques suited to the problem space.

The following are the overall objectives of the SAGE project:

• Provide a next-generation multi-tiered object-based data storage system (hardware and enabling software) supporting current and future-generation persistent storage media, (solid-state and disc) within an I/O hierarchy . We term this “Percipient Storage”.
• The project;
o Redefines the storage subsystem as an integral part of the computational infrastructure.
o Provides integrated computational capability anywhere in the storage system.

• Significantly improves the overall scientific output through advancements in systemic I/O performance and latency, and drastically reduces data movements hence improving energy efficiency by:
o providing the ability to flexibly move appropriate computational workloads to where the data resides
o providing a storage architecture built from the ground up to handle Exascale I/O
o providing a potential to use resources in the computational cluster as part of the storage system

• Provides a roadmap of technologies supporting data access for both Exascale/Exabyte and High Performance Data Analytics (HPDA) requirements:
o Targeting scalability to 500-1000PBytes, with bandwidth in the order of 60TB/sec with a storage system energy footprint of less than approximately 5KW/petabyte;
o With flexible and efficient usage of HPDA application environments regardless of the compute node’s architecture and implementation.
• Investigates and documents the requirements of relevant HPC applications and their storage use cases as part of a co-design approach.

• Provides programming models and access methods for the SAGE architecture and validates their usability, including (but not limited to) legacy applications and ‘Big-Data’ data access and analysis methods.

• Validates the the full system in a relevant environment, for a relevant set of applications and benchmarks on a SAGE prototype integrated into an HPC data centre, validating performance, scalability, energy efficiency and the reduction in data transport requirements.

Once accomplished, these objectives will firmly establish European excellence in the areas of Exascale storage, data centric computing, HPDA, and the emerging field of Big Data Extreme Computing (BDEC), and significantly impact computational scientific research.

Work performed

The work carried out during the first 18 months of the SAGE project defined the overall architecture of the SAGE system including the architecture and designs of the individual software components of the SAGE system (including the Mero Object store and its API, the ecosystem tools, use case access, programming models, visualization utilities and runtimes). We also completed the formal co-design activity with all the use cases. The SAGE prototype hardware is implemented and is deployed for validation with the use cases in Juelich Supercomputing Center.

Final results

\"The project has defined the following vision during the course of the project to impact the European Extreme Scale HPC (and associated Big Data) ecosystem.

SAGE to lay the foundation for a European storage platform to be #1 at Extreme Scale

In the course of pursuing this vision, the project has already highlighted progress beyond the existing state of the art by:

(1) Providing a working object storage base platform (Mero) and its API Clovis specifically suited for Extreme scale HPC
(2) Providing prototype system hardware for multi-tier storage system with more than 3 tiers (device types) of storage, with in-built compute capability, already deployed in the evaluation environment.
(3) Providing concept and architecture of ecosystem tools (HSM, data integrity checking, performance analysis and debugging, data analytics, programming models, runtimes and visualization utilities) that will be suitable for Extreme scale HPC for multi-tier storage systems such as SAGE.
(4) Co-design of use cases with such an architecture for the first time.

We are already in discussion and have active interest from communities outside of the SAGE consortium to use the outcomes of work in SAGE (eg: Mero). We also have an early product based on Mero that is currently under evaluation. These early successes provide us with ample confidence that we will achieve wider use and coverage for SAGE components within the international community helping to lay the foundations for a European Extreme scale storage platform.
We are also working closely with organisations such as EXDCI, ETP4HPC etc to understand and help shape the goals and ambitions for Europe in the area of Extreme scale HPC in the next few years.


\"

Website & more info

More info: http://www.sagestorage.eu.