Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - DEEP-HybridDataCloud (Designing and Enabling E-infrastructures for intensive Processing in a Hybrid DataCloud)

Teaser

The DEEP-HybridDataCloud project aims to support intensive computing techniques that require specialized HPC hardware, like GPUs or low-latency interconnects, to explore very large datasets. In particular the project is focusing on machine learning, deep learning and...

Summary

The DEEP-HybridDataCloud project aims to support intensive computing techniques that require specialized HPC hardware, like GPUs or low-latency interconnects, to explore very large datasets. In particular the project is focusing on machine learning, deep learning and post-processing of data, providing a comprehensive framework that aims to streamline the development of the aforementioned applications, covering the whole application lifecycle. Leveraging the technology from previous projects (like INDIGO-DataCloud) the project integrates all the components required to provide added value services focused on machine learning, deep learning and post-processing over distributed pan-European e-Infrastructures, following a hybrid-cloud approach.

Moreover, under the common label of DEEP as a Service (DEEPaaS), the project is providing a framework to deliver a set of building blocks that enable the easy development of applications requiring cutting-edge techniques: artificial intelligence (machine learning and deep learning), parallel post-processing of very large data, and analysis of massive online data streams. To this aim the DEEP-HybridDataCloud architecture implements a catalogue (or marketplace, available online at https://marketplace.deep-hybrid-datacloud.eu/) that allows the publication and reuse of the developed modules, as well as its deployment en European e-Infrastructures. Those modules can be easily deployed as a service, the DEEPaaS component, providing therefore an environment to deploy scientific applications following a Service Oriented Architecture (SOA).

Therefore the DEEP-HybridDataCloud project does not only aim to solve problems of the scientific community, but it also aims to bring knowledge closer to society, offering a way to deploy scientific applications (based on the aforementioned techniques) as a service, offering its functionality to a broader range of public.

Work performed

\"Apart from the usual governance definition and setup of necessary collaborative tools and related activities, the project started with the initial phase of planning and requirements elicitation. Involving user communities from the early stages DEEP-HybridDataCloud gathered a set of user stories that were further refined into technical requirements leading to the definition of the project architecture, as reported in the different deliverables (D4.2, D5.1, D6.2). These requirements have evolved according to the user needs, and are being tracked using the JIRA online tool (https://jira.deep-hybrid-datacloud.eu). In parallel, joint research activities performed an extensive and exhaustive state of the art (D4.1 and D6.1) on their related activities, providing a solid document to base the following activities.

During this phase, extensive work was performed in all the software quality and maintenance related activities (like software quality assurance, continuous integration, software maintenance, software release, etc.), involving both software developers and users since the initial phases. This methodology (involving all stakeholders in the design of the procedures) was carried out in order to obtain a high degree of acceptance on the software related procedures, avoiding excessive burden on software development teams. Moreover, these activities are being done in coordination with other projects, as reflected in the publication of \"\"A set of common software quality assurance baseline criteria for research projects\"\" document (available at http://hdl.handle.net/10261/160086). The compliance with these criteria is checked and performed in a fully automated way, following a Continuous Integration approach, as a way to reduce even more the load on the development teams and ensuring a high degree of conformity in the merged code. As a matter of fact, the whole software stack developed within the project is compliant with the SQA criteria, thus ensuring a streamlined release process and strengthening its quality during the whole development cycles. Besides, the initial setup of development testbeds and software development tools was also carried out, ensuring that the deployed services follow the existing recommendations to be integrated into the EOSC realm (like following the AARC blueprint architecture for AAI).

This initial planning and setup phase has ensured a streamlined development process by all the relevant development work packages. Close interaction with the use cases has been carried out in the Work Package 6 (DEEP as a Service) in order to ensure that the developments were aligned with the user expectations. Interaction with other research projects (like XDC and EOSC-Hub) has been carried out in order to study and ensure the complementarity of the projects and define and follow integration paths. External user communities have been also explored, incorporating several external use cases to test the developed solutions. All these steps have resulted in the first DEEP-HybridDataCloud release and first platform and prototype (codenamed DEEP-Genesis and announced in January 2019).

Regarding dissemination activities, several scientific publications have emanated from the project work (apart from participation in workshops and conferences). We have organised different dissemination activities, being the most relevant the organisation of the Menendez Pelayo International University Summer Course entitled \"\"New challenges in Data Science: Big Data and Deep Learning on Data Clouds\"\" (in cooperation with eXtreme-DataCloud), held in Santander, June 2018. A second workshop towards the EOSC communities is planned in 2019 and a special session is expected at the EGI.eu conferece.\"

Final results

DEEP-HybidDataCloud is providing a comprehensive framework for the development of intensive computing applications on top of pan-European e-Infrastructures, following a Service Oriented Architecture.

The next period for the DEEP-HybridDataCloud project will be focused on the final promotion to production of the developed services, opening them to further external communities. In this regard, one of the most immediate objectives for this next phase is the integration of the DEEP services within the EOSC framework. Currently the DEEP-HybridDataCloud services are be deployed on top of the project internal testbed, offered to the research communities linked to the project through pilot applications. The project will address the integration of the individual services in the EOSC portal (activity that has started during this period, on hold due to EOSC-Hub ongoing activities) and will explore external exploitation activities, as well as the on-boarding of external use cases.

Website & more info

More info: https://deep-hybrid-datacloud.eu.