Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - INDIGO-DataCloud (INtegrating Distributed data Infrastructures for Global ExplOitation)

Teaser

The INDIGO-DataCloud project developed an open source software platform providing data and computing solutions for scientific communities, resource centers and public or private Cloud providers. The project addressed several technological issues that prevented easy and...

Summary

The INDIGO-DataCloud project developed an open source software platform providing data and computing solutions for scientific communities, resource centers and public or private Cloud providers.

The project addressed several technological issues that prevented easy and efficient exploitation of Cloud resources in many scientific domains, such as:
• missing consistent authentication and authorization policies across both applications and infrastructures;
• the difficulty of actually finding and using data and computing resources necessary for a given problem;
• the problem of negotiating and guaranteeing clear Quality of Service policies;
• the issue of expressing in a simple way high-level requirements that go well beyond the simple concept of a “Virtual Machine”;
• the trouble in expressing complex scientific workflows in Cloud infrastructures;
• the partial solutions currently available to integrate legacy applications into Cloud-based scientific portals, mobile appliances, or to write complex web-based front-ends exploiting advanced Cloud features;
• the problematic and often un-scalable deployment of applications necessitating Cloud resources;
• the difficulty in interfacing with both public and private Cloud infrastructures, avoiding proprietary lock-ins and licensing issues.

INDIGO-DataCloud tackled these problems on the one hard developing building blocks, or tools, that respond to the requirements of many scientific communities; on the other hand, applying these tools to concrete scientific use cases and applications, deploying them to both public and private e-infrastructures. This results in faster results and in better and easier use of data and compute resources across Europe and elsewhere. These high-level objectives are summarized by the INDIGO motto: “Better Software for Better Science”.

Work performed

The INDIGO-DataCloud project accomplished in 30 months a massive amount of innovative results, detailed in the 44 Deliverables published by the project. The INDIGO work was driven by research communities and constantly reviewed by them, as they continued to integrate INDIGO components into their applications. Most importantly, INDIGO released two software releases, open source and freely available for download, together with installation, configuration and usage instructions, from the project website at https://www.indigo-datacloud.eu. The first release of the INDIGO software stack (codenamed “INDIGO-1 MidnightBlue”) was made public in August 2016. This release, built on several months of alpha and beta testing, included 38 software components covering the different areas of Data Center Solutions, Data Services, Automated Solutions and High-level User-Oriented Services. The second and final release of the INDIGO software stack (codenamed “INDIGO-2 ElectricIndigo”) was officially released in April 2017. ElectricIndigo includes 40 modular components distributed via 170 software packages and 50 ready-to-use Docker containers. It supports modern Linux-based operating systems such as CentoOS 7 and Ubuntu 16.04, and popular open source Cloud frameworks such as OpenStack Newton and OpenNebula 5.x. ElectricIndigo allows applications to seamlessly connect to public or private Cloud infrastructures and allows resource providers and scientific communities alike to address challenging problems and deliver new services.

In the area of dissemination and exploitation, INDIGO collaborated with several commercial companies such as ATOS, T-Systems, IBM, INDRA and Santer-REPLY to facilitate the adoption and enhancement of INDIGO components. An ambitious goal the project set for itself was to promote INDIGO solutions as key components of a future European Open Science Cloud (EOSC): in fact, INDIGO is one the three projects that, together with EGI-Engage and EUDAT2020, coordinated the successful preparation and submission of the EOSC-hub project, which will contribute to the EOSC implementation by enabling seamless and open access to a system of research data and services provided across countries and multiple disciplines. In EOSC-hub, INDIGO nominates the overall Technical Coordinator of the project; many INDIGO components at Technology Readiness Level (TRL) 8 or above will also find place in the EOSC unified service catalogue, and INDIGO solutions are at the foundation of several EOSC-hub Thematic Services and Competence Centers. In addition to EOSC-hub, two additional Horizon2020 projects that directly derive from INDIGO-DataCloud were also submitted and positively evaluated, i.e. eXtreme-DataCloud (XDC) and DEEP-HybridDataCloud. These two follow-on projects will introduce novel features and services to existing TRL6 INDIGO components in many areas, with the objective of bringing them to TRL8 and eventually including them into the EOSC service catalogue.

Final results

INDIGO-DataCloud achieved significant advancements compared to the state of the art. In particular, the project developed a comprehensive open source Cloud architecture, which provides many new functionalities previously unavailable in open source and in several cases also in proprietary Cloud offerings. These functionalities abstract from underlying IaaS technologies through the consistent use of both de jure and de facto standards. This allows interoperability with hybrid (public/private) infrastructures, or with e-infrastructures of different type (Grid, Cloud, HPC). The project also supports multiple existing authentication technologies (such as OAuth or SAML OpenID-Connect), addresses the need for unified data access, and provides a flexible and scalable way to authorize or deny access to distributed Cloud resources. The INDIGO platform hides the complexity and differences of physical storage systems and works seamlessly in geographically distributed infrastructures, with an optimized access to data through a template-based orchestration system and ways to automate deployment, scalability and monitoring of complex services, be they long-running or workload-based services. The project then introduced support for new services at the infrastructure level, for example extending Container support for popular open source Cloud frameworks, providing advanced resource scheduling mechanisms, and introducing QoS and data lifecycle support in storage systems. At the user interface level, the project is developing a completely programmable web framework, capable of interfacing with existing applications, mobile developments, complex workflows, big data analytics, and above all capable of supporting all the advance data and compute capabilities of the INDIGO platform.

These advancements are fully described in the INDIGO Service Catalogue and in scientific applications making use of INDIGO components, both described at the project website. The key impact of the project is toward easy and efficient usage of both public and private compute & data resources, in the development of cost-efficient, state-of-the-art scientific services and applications that are interoperable across diverse infrastructures, and ultimately toward producing results in many scientific domains in a faster, more effective way. INDIGO-developed solutions have for instance enabled new advances in understanding how the basic blocks of matter (quarks) interact, using supercomputers, how new molecules involved in life work, using GPUs, or how complex new repositories to preserve and consult digital heritage can be easily built.

Website & more info

More info: https://www.indigo-datacloud.eu.