Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - NoMaD (The Novel Materials Discovery Laboratory)

Teaser

Materials are incredibly important as new commercial products and services, necessary to improve lives and overcome global challenges, depend on new, novel or improved ones. For example, to make better use of energy generated from renewable sources, such as wind or solar...

Summary

Materials are incredibly important as new commercial products and services, necessary to improve lives and overcome global challenges, depend on new, novel or improved ones. For example, to make better use of energy generated from renewable sources, such as wind or solar power, we need more efficient batteries and novel catalysts to turn electricity into fuels. However, finding the right material to use is difficult as the number of possible materials is practically infinite and typically used experimental “trial and error” methods for materials design are costly and time-consuming. Data is lately often referred to as the “raw material” of the 21st century and computational materials science can make choosing the best material for specific applications easier. Predicting materials requires a synergetic linkage of data created at high-performance computing (HPC) centers and novel data analytics by artificial intelligence methods. This will advance materials science, identify new physical phenomena, and help industry to improve existing and develop novel products and technologies.

Scientists have already generated a lot of valuable computational data about materials, spending billions of CPU hours at many HPC centers all over the world. Using these data is currently difficult as they are stored in separate databases. Also, these data have been generated using many different programs and storage formats so that it is not easy to directly compare data.

To address these challenges, NOMAD creates, collects, stores, and cleanses a large volume of computational materials science data, derived from the most important materials science programs available today. In addition, NOMAD develops tools for mining this data in order to find structures, correlations, and novel information that could not be discovered from studying smaller data sets. Together, the large volume of data and innovative tools is enabling researchers in basic science and engineering to advance materials science.

Work performed

NOMAD has addressed the main obstacles to using computational data to advance materials science and engineering, advancing Open Science. Firstly, NOMAD brought together existing data, previously scattered around the world, building on the existing NOMAD Repository. The Repository is a database where computational materials scientists from around the world can store and share their data for free. The Repository has now grown to over 50 million open access calculations, making it the largest database of its kind. In keeping with Open Science best practices, NOMAD has led the way to making sure the data is Findable, Accessible, Interoperable and Reusable (FAIR). For this, expert scientists of the project developed 40 software programs to convert data from many different formats into a single format, making it easier to combine and compare. This new, single format data is stored in the NOMAD Archive, which is continuously updated as scientists continue to perform and upload more calculations to the growing Repository. NOMAD partner institutes have also helped to establish a new, non-profit initiative, FAIR Data Infrastructure e.V., that will support extensive, sustainable data sharing in future.

The amount of data in the Repository and Archive is massive. In order for it to be truly useful for R & D, scientists need efficient tools to search and analyze the data. NOMAD provides such tools through the NOMAD Encyclopedia, Advanced Graphics, and the Big Data Analytics Toolkit. The Encyclopedia is a user-friendly, public access point to NOMAD’s extensive data that lets users see, compare, explore, and comprehend computations for a large variety of materials. Advanced Graphics tools help experts and the general public alike to visualize complex, multi-dimensional data and experience the world of materials first hand through virtual-reality simulations. The Analytics Toolkit presently offers more than 15 advanced tools for performing Big Data analytics to discover patterns and other useful information in NOMAD’s massive collection of data. Advanced users can also create their own automated tools based on NOMAD software. All of these tools and services have been made available using HPC infrastructure and services, letting academic and industrial users alike maximize the use of European HPC capabilities and resources.

In addition, NOMAD has carried out case studies to show how the tools and services can be used to solve challenges with societal impact. For example, NOMAD researchers generated and screened a massive database of materials looking for the best materials to use in polymer solar cells for sustainable energy generation. The most promising materials are now being made to test their laboratory performance.

Impressively, the NOMAD team has published over 60 articles in top journals in just three years, with many more to come. NOMAD researchers have spoken at major international conferences about NOMAD’s contribution to the data revolution, e.g. at the Global Internet of Things Summit or the Platform for Advanced Scientific Computing Conference, giving over 215 invited presentations. More than 25 training events have also been held, making sure that NOMAD expertise is widely shared for maximal benefit.

Final results

NOMAD has significantly advanced the state of the art in computational materials science:
• The Archive is the only materials science database in the world that contains data from all important worldwide programs in a single format.
• For the first time, scientists can see, compare, explore, and comprehend computational materials science data through the Encyclopedia.
• Big Data analytics, not previously possible with data in many formats in databases scattered around the world, have been developed and proven on test datasets.
• Advanced Graphics and virtual-reality simulations have been developed to help scientists better understand and use materials science data.

One of the most important achievements of NOMAD has been the change in the scientific culture towards extensive data sharing. Before NOMAD, data were not widely shared. Now, through NOMAD, the materials science community has uploaded over 50 million calculations for open access reuse, and more are uploaded every day. While there are now other databases, they are restricted to a single program and serve only the user community generating the data. NOMAD supports all important programs, with support for new ones on request, and is open to anyone in the world.

NOMAD is ensuring that Europe leads the way in novel materials discovery, in collaboration with international networks. NOMAD is also training the next generation of scientists and engineers who will advance computational materials science, to make sure European science and industry remain competitive in global markets.

The most exciting part of NOMAD has only just begun!

Scientists are now using NOMAD tools and services for novel materials discovery. By making these tools freely and openly available, so that others can build on our work, we are improving access to and use of computational materials science data to advance basic science and drive innovation in a broad range of industries from sustainable energy to transport to healthcare and more. For example, NOMAD is currently collaborating with industry to investigate materials for green chemical production that would decrease carbon dioxide emissions and increase renewable energy usage.

Website & more info

More info: http://nomad-coe.eu/.