Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - PROCESS (PROviding Computing solutions for ExaScale ChallengeS)

Teaser

The research problem addressed by the PROCESS project arises from the huge increase in computational power expected with the upcoming class of exascale systems, where scientific research in health, engineering, and global consumer services faces new challenges for massively...

Summary

The research problem addressed by the PROCESS project arises from the huge increase in computational power expected with the upcoming class of exascale systems, where scientific research in health, engineering, and global consumer services faces new challenges for massively data driven activities.
PROCESS -PROviding Computing solutions for ExaScale challengeS- a collaborative research and innovation EU project, will deliver ground-breaking services prototypes and tools, specially developed to speed-up uptake of future extreme scale data processing offerings aimed at boosting scientific progress and disruptive business innovation.
PROCESS outputs will be implemented using a mature, modular, generalizable open source solution for user friendly exascale data. The services will be thoroughly validated in real-world settings, both in scientific research and in industry pilot deployments with potentially huge impact for our society.
PROCESS is focused on helping key players in the new data-driven ecosystem, such as top-level HPC, e-infrastructures and big data centres on the one hand, and scientific communities and companies with mission critical extreme scale data challenges on the other hand, thereby enabling the uptake of these powerful systems for addressing grand challenges with huge societal impact.

Work performed

Creating the PROCESS platform: Web-platform prototype for end-users and application developers including configurable computational pipelines for workflow and job submission

Data Service Prototype:
a) Development of the initial, revised and evaluated PROCESS architecture; Automated deployment, configuration and remediation of application and network services with defined service deployment templates,
b) Alpha release of the Data Service introducing containerized micro-infrastructures and
c) completing requirements of the service pilots for the prototypes

Stakeholder involvement and sustainability:
Work were focused in 3 main areas:
a) establish internal workflows for project quality assurance and requirements fulfillment;
b) Scientific and Business Dissemination was initiated with a set of 28 activities and 14 publications and
c) Preparing future exploitation, Market, customer behaviour and windows of opportunity. The value proposition and USP were formulated and are ready for validation.

Validation
Work concentrated on three lines:
a) Release and validation of the first prototype,
b) Initial performance modelling of the PROCESS solution towards exascale prediction and
c) Reference exascale architecture.

Final results

The solutions provided by PROCESS enable to utilization of the world’s most powerful supercomputers for data-intensive tasks. This exceeds many other solutions in that domain and therefore advancing the state-of-the-art.

The invocation protocols used by services today are not suitable for transferring significant volumes of data as they mix the invocation and actual data transfer. New data delivery models need to be researched where the invocation protocol is separated from data movement with the aim to reduce the execution time of workflows, especially in the case of streaming applications. The problem will become more challenging assuming data is distributed across RIs, loosely coupled, and stored in a variety of storage resources ranging from a simple file system to heterogeneous cloud storage.

The processing of data adds information to data which increases knowledge about the data. A pronounced difference exists in the various data usage scenarios i.e. shared vs. private data. Simple batch systems assume the data processing tasks are independent of each other and thus do not preserve any order. This can be problematic with intra-dependent tasks (such as in scientific workflows). Within scientific problem solving systems, models of computations vary too – a dataflow model of computation runs tasks only when all data is available for that task, while a Petri-net model runs tasks depending on token transmission as a means of flow control. The shift towards data-centric computing means that data processing needs to be managed alongside the management of computational tasks.

Exascale data may be impossible to transfer over significant distances through the internet, therefore methods which will allow to preprocess such data in-situ (or close to where they are stored) need to be investigated, designed and evaluated

PROCESS outputs will be showcased through fit-for-purpose solutions to demanding requirements of exemplary use cases:

· Exascale learning on medical image data to demonstrate the value of big data as a diagnostics support tool.
· The Square Kilometre Array e-infrastructure, enabling managing Petabyte per second data streams.
· Supporting innovation based on global disaster risk data.
· Airlines ancillary revenues management to serve hundred million customers per year.
· Agro-Copernicus, for validating long-term agro-business modelling and simulation, using earth observation datasets continuously growing more than 7 Petabytes per month.

PROCESS positive impacts are based on three principles: Leapfrog beyond the current state-of-the-art, Ensuring broad research and innovation impact and Supporting the long tail of science and broader innovation. In practical terms, PROCESS outputs will allow more intuitive and easier to use exascale data services for broader communities, fostering wider uptake and seeking to expand European e-infrastructure user bases, to secure stronger impact and sustainability.

Website & more info

More info: https://www.process-project.eu.