Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - ECOSCALE (Energy-efficient Heterogeneous COmputing at exaSCALE)

Teaser

As HPC architectures have evolved, the HPC market has undergone a paradigm shift. The adoption of low-cost, Linux-based clusters offer significant computing performance and the ability to run a wide array of applications. These new classes of HPC applications are becoming...

Summary

As HPC architectures have evolved, the HPC market has undergone a paradigm shift. The adoption of low-cost, Linux-based clusters offer significant computing performance and the ability to run a wide array of applications. These new classes of HPC applications are becoming increasingly performance and power hungry, pushing the boundaries of HPC systems to the limits. On the other hand, existing HPC systems cannot provide exascale performance because of their power limitations.

ECOSCALE tackles this challenge by proposing a scalable programming environment and hardware architecture tailored to the characteristics and trends of current and future HPC applications, reducing significantly the data traffic as well as the energy consumption and delays. ECOSCALE proposes a novel heterogeneous energy-efficient architecture, programming model and runtime system which follow a hierarchical approach where the system is partitioned into multiple autonomous Workers (i.e. compute nodes). Workers are interconnected in a tree-like structure in order to form larger Partitioned Global Address Space (PGAS) partitions, which can be further hierarchically interconnected via an MPI protocol.

ECOSCALE introduced the UNILOGIC (Unified Logic) architecture, which is an extension of the UNIMEM architecture, proposed within the EUROSERVER FP7 project. The architecture supports shared partitioned reconfigurable resources accessed by any Worker in a PGAS partition, and, more importantly, automated hardware synthesis of these resources from an OpenCL-based programming model.

ECOSCALE target is to make it technically feasible and economically viable to design and deploy an innovative heterogeneous energy efficient HPC architecture and programming environment that can reach exascale taking full advantage of the novel architectural features and increasing the programmers’ productivity. ECOSCALE targets to achieve this objective, by identifying potential solutions, through a co-design process between hardware architecture, HPC middleware, and HPC applications.

The ECOSCALE 256-core heterogeneous prototype will provide a small-scale HPC system interconnecting several nodes together. The initial results from the prototype will be utilized in the simulation environment designed so as to demonstrate that, when also taking into account the projected performance and power trends in technology (process, 3D integration, NVM, etc.), the ECOSCALE architecture can achieve the following technical objetives:

• Scalability sustainable at exascale: An exascale system will contain in the order of 100 million cores running billions of tasks. Scalability is a very important issue in such a system. ECOSCALE collaborates closely with the ExaNeSt FETHPC project which focuses on a scalable infrastructure providing a low-latency high-throughput communication.
• Energy-efficiency at exascale: Extrapolating from top HPC systems, we would need about 1 GW to sustain exaflop computation, which is by far too expensive and inefficient. To reach exascale, performance has to grow by a factor of close to 1000, while power consumption must stay within the range of 20 MW.
• High Programmability: Several existing HPC applications must run efficiently on the proposed architecture with minimum extra programming effort.

In parallel, ECOSCALE targets to carry out support activities promoting widespread acceptance and adoption of this highly innovative platform. These activities form an integral part of the project and have already started from the beginning of the project.

The project objectives are described in more detail in the Project Technical Report.

Work performed

The work carried out in the first reporting period is described in the Periodic Technical Report (PTR) in detail. In summary ECOSCALE finished the Specification phase in M7 and it is now in the middle of the Implementation phase (M5-M28) as shown in the third attached figure (ecoscale_timeplan.jpg). The first attached figure (ecoscale_architecture.jpg) shows the various components of the ECOSCALE infrastructure. Within WP4 the hardware for the first ECOSCALE prototype, based on Trenz boards, was procured and the first UNILOGIC architecture was deployed in TSI. T4.3 and T4.4 are the central technical tasks extending the UNIMEM architecture from Euroserver in order to provide distributed reconfigurable environment employed on multiple Xilinx Ultrascale+ FPGAs.

In parallel, WP5 has provided the first reconfiguration tool flow and the associated FPGA design which can be used to partially program Xilinx Ultrascale+ FPGAs at runtime in order to load new HW accelerated tasks. The second attached figure (ecoscale_reconfiguration.jpg) shows that the reconfigurable logic of the Ultrascale+ FPGA is partitioned into HW slots/pages which are allocated by the ECOSCALE reconfiguration tool when a new accelerated task, from the accelerated library, is loaded on the FPGA. This tool can move around an implemented task using adjacent HW slots without the need of re-sythesizing and re-implementing (Place and Route) the task which would be a very time consuming process.

WP6 has provided a first version of the runtime system which is responsible for managing and allocating the reconfigurable resources of the ECOSCALE architecture. The runtime system orchestrates the operations in the UNILOGIC architecture helping the programmer to use the architecture transparently and more efficiently.

The core functionality of the two ECOSCALE applications has been rewritten in OpenCL in order to offload tasks onto the FPGAs of the ECOSCALE prototype. The initial porting onto the first version of the ECOSCALE platform has been performed but significant tuning and improvements are still required in order to take full advantage of the reconfigurable resources.

We have proactively started working on WP7 in order to secure a smooth integration of the final ECOSCALE results. In particular we have started integrating the various components in order to provide an early-stage prototype based on the initial version of all the ECOSCALE components from WP4, WP5 and WP6 and we have ran the first simple applications.

The work carried out in each WP is further summarized below in the Periodic Technical Report.

Final results

ECOSCALE does not simply provide another offloading engine but it goes beyond the state of the art proposing and developing a Global Distributed Reconfigurable Logic where reconfigurable resource are transparently shared between the applications. Moreover ECOSCALE provides for the first time an heterogeneous infrastructure, runtime system and tool flow which can automatically load HW tasks anywhere in the reconfigurable resources of the ECOSCALE system. This functionality is offered to the programmer through a user-friendly programming model which extends OpenCL in order to access multiple distributed reconfigurable resources.

The target of ECOSCALE is to provide the technological advancement in order to derive the first energy-efficient exascale system unifying and extending existing architectures, and programming environments as well as incorporating them with novel reconfigurable systems. The expected potential impact of ECOSCALE is described in the DoA. No update is needed.

Website & more info

More info: http://www.ecoscale.eu/.