News

Final report of the DeepRain project

The DeepRain project was launched to develop new approaches to combine modern machine learning methods with high performance IT systems for data processing and dissemination in order to produce high-resolution spatial maps of precipitation over Germany. The foundation of this project was the multi-year archive of ensemble model forecasts from the numerical weather model COSMO of the German Weather Service (DWD). Six trans-disciplinary research institutions worked together in DeepRain to develop an end-to-end processing chain which could potentially be used in the future operational weather forecasting context. The project proposal had identified several challenges which had to be overcome in this regard. Next to the technical challenges in establishing a novel data fusion of rather diverse data sets (numerical model data, radar data, ground-based station observations), building scalable machine learning solutions and optimising the performance of data processing and machine learning, there were various scientific challenges related to the small local-scale structures of precipitation events, difficulties with finding robust evaluation methods for precipitation forecasts and non-gaussian precipitation statistics combined with highly imbalanced data sets. When DeepRain started, the application of machine learning to weather and climate data was still very new and there were hardly any publications or software codes available to build upon. DeepRain thus pioneered the use of modern deep learning models in the domain of weather forecasting. Concurrently, one could observe an exponential increase in the number of publications in this new field over the past three years; very often these were studies conducted in North America or China. Global players like Google, Amazon, NVidia, or Microsoft have in the meantime established groups of scientists and engineers to advance research on “Weather AI” and develop (marketable) weather and climate applications with deep learning. Therefore, the DeepRain project was very timely as it established a baseline for machine learning in weather and climate in Germany and it allowed the consortium to explore the potential of deep learning in context with the gigantic data processing that is needed and to keep pace with the international developments in this rapidly growing field of research.


While DeepRain could not complete the final deliverable, i.e. the construction of a prototype end-to-end workflow for high-resolution precipitation forecasts based on deep learning, all of the related research questions have been answered and all the necessary building blocks for such a workflow have been developed. In particular, modern datacube technology has been used successfully for establishing four to six-dimensional atmospheric simulation datacubes based on DWD data available for extraction and analytics.


In addition to the anticipated challenges described above there were severe issues materialising during the project: 1) a large scale data loss due to hardware failures in spring 2021, 2) the Covid-19 pandemic from March 2020 until now, and 3) difficulties to find highlyskilled personnel – especially in times when most work had to be done in a home office setting.

The main accomplishments of DeepRain are:

  • Petabyte-scale data transfer of archived COSMO-DE EPS forecasts from tape drives of DWD and of RADKLIM dataset from OpenData-server to the file system JUST at JSC/FZ Jülich, organisation and cleaning of these data and granting data access to all project partners,
  • Parallelized processing of COSMO-EPS and RADKLIM data (ensemble statistics, remapping for data fusion and for ingestion to rasdaman),
  • Implementation of Rasdaman datacube array database servers at FZ Jülich and ingestion of several TBytes of weather data,
  • Establishing links from the Jülich Rasdaman servers to the EarthServer datacube federation,
  • Further developments of Rasdaman to accelerate data ingestions and retrieval, define new user-defined functions for analysis of topographic data, define a new coordinate reference system for rotated pole coordinates, and prepare for interfacing machine learning workflows,
  • Development of statistical downscaling techniques and machine learning models to:
    • Generate dichotomous and quantitative precipitation forecasts at station sites,
    • Generate area forecasts at the RADKLIM radar data resolution,
  • Exploration of new verification statistics based on partial correlations and regression boosting.

In this report, we provide a detailed overview on the work and the achievements within the DeepRain project. This report is organised in five sections: In Section 1, we present the deliverable plan from the project proposal and provide information on the delivery state of each task separately to allow for a compact comparison between the project plan and its output. In Section 2, we then outline in detail the work carried out in the project for each work package individually. The expected outcome from the project achievements as well as possible future benefits are discussed in Section 3. In Section 4, we give a general overview on the progress made in the research fields related to DeepRain; specifically, these are:
machine learning for precipitation forecasting, precipitation forecast evaluation methods, big data handling and FAIR data practices. Finally, Section 5 lists all the journal publications,data sets and software packages and planned submissions resulting from the DeepRain project.

Section 6 contains the list of references.

Link to full final report: https://hdl.handle.net/2128/33144

Progress in the machine learning section of the project!

After preparing ensemble statistics over the COSMO-DE EPS numerical weather prediction, we compared artificial neural networks and the classically used linear regression as post-processing models for precipitation at several weather stations. After testing numerous different potential neural network architectures, we found that we can consistently outperform linear regression and improve significantly over the numerical weather prediction result. Look forward to a publication soon.

Paper on “Can deep learning beat numerical weather prediction?” published

Today, the Open Access article by Martin Schultz et.al. has been published in the Philosophical Transactions of the Royal Societ A, theme issue “Machine learning for weather and climate modelling”. The paper discusses the question, whether it is possible to completely replace  the  current  numerical  weather  models  and data  assimilation  systems  with  deep learning  approaches. It is available at https://royalsocietypublishing.org/doi/10.1098/rsta.2020.0097.

Fifth DeepRain Project Meeting

In this project meeting, which again took place virtually, the project progress of the past months was discussed. There was some pleasing progress to report here, although the interdisciplinary collaboration proved more difficult than hoped in some cases due to the pandemic situation.

  • The University of Bonn has carried out an analysis of how large-scale weather data affect the occurrence of precipitation at the Münster and Osnabrück measuring stations. For this purpose, a so-called logistic regression was used. Depending on the season, this approach results in a significant improvement compared to a purely local forecast.
  • The DWD tested different approaches for a regression with a generalised linear model and, in particular, investigated to what extent it makes a difference whether the input variables are selected separately for each measuring station or jointly for all of them. In fact, even with a generalised approach, similar good results can be obtained for the estimation of precipitation as with the separate procedure. However, the yes-no decision of whether it will rain or not becomes less accurate at some stations.
  • Jacobs University used Jupyter notebooks to demonstrate how queries from the Rasdaman database can be integrated into data analysis and machine learning workflows. Furthermore, Jülich was supported to join the data federation in the Earth Server data cube.
  • Forschungszentrum Jülich reported progress in the development of machine learning workflows, much of which has now been parallelized, greatly increasing the throughput of data and computations. Data management for the huge amount of weather data is now largely consolidated; in the main, radar data still needs to be finalized for processing.
  • The University of Osnabrück successfully applied neural networks to learn the relation of next-day precipitation amounts to current weather data at a small set of measurement stations. The neural networks outperformed classical regression. Implementing the machine learning workflow on the Jülich supercomputer presented some challenges due to the need of an efficient and flexible data handling tool, which would work on the vast amount of raw data that is available in the project.

Guest lecture at DWD on station-based time series predictions

Today Felix Kleinert gave a guest lecture on “MLAir & IntelliO3: Station-based time series forecasts using neural networks” at the AI Forum at the German Weather Service. The Python framework for station-based time series forecasts, MLAir, currently under development, was presented and applied to ground-level ozone concentrations.

DeepRain Progresses

Owing to the continued Covid-19 situation the DeepRain partners have now adopted a monthly schedule for online meetings. For specific technical and scientific discussions, smaller teams meet weekly. Over the past month, at JSC the data import of the COSMO ensemble weather model have been improved, and the benchmarking tests regarding data entry and data extraction has been successfully completed. New instances of the Rasdaman array databases (“data cubes”) have been installed on JSC resources, including one instance of the Rasdaman Enterprise server, which will soon become a node in an international data federation. The JSC team further supported Osnabrück with their implementation of machine learning workflows on the Jülich supercomputer systems.

DeepLearning in Rain Prediction

Predicting rain (precipitation forecasts) requires to consider several factors that are generated by weather models. The relation between the forecasted precipitations is considered to be complex and non-linear. To improve forecasts we therefore use neuronal networks that learn to combine factors based on training data. As an outcome of the DeepRain Project we developed several deep learning based approaches using networks with different numbers of layers and complexities in terms of number of neurons. We compared these to the more classical approach that assumes a linear combination with a single layer and showcased on two weather stations in Germany that deep learning can outperform classical precipitation forecasts. Best performance was reached using 3 layers indicating that non-linear mixing of factors produced by weather models are important.

Fourth DeepRain Project Meeting

Icon Virtual Meeting
Icon Virtual Meeting
Icons made by surang from www.flaticon.com

Due to the Covid-19 situation, the 4th DeepRain project meeting, which was scheduled to take place from March, 10-12, 2020, had to be converted into a series of web meetings with the project partners. The main focus of the meeting was to prepare input for the upcoming project report in April and to plan specific actions for the next 6 months. Over the past months, a lot of progress has been made with respect to the data preparation and machine learning workflows, but a couple of issues remain in terms of implementing these workflows on the JSC supercomputing system. This has so far hindered the project from generating meaningful rainfall predictions through machine learning. All project partners are working together to get these workflows in place and we are optimistic that, once these issues are solved, it will be relatively easy to scale the solutions to much larger datasets. For the next month the focus will be on performance improvements for data imports, on applying machine learning on station timeseries data, and exploiting parallelisation at all levels to optimize workflows, tools and data processing on the HPC system.

Data transfer of DWD ensemble weather model to Jülich completed

Today, the last byte of more than 439 Terabytes of weather model data from the COSMO Ensemble Prediction System of the German Weather Service DWD has been received and stored at the Jülich Supercomputing Centre. The DeepRain team now has 7 years of hourly weather data from 20 ensemble members available. These data are an essential foundation for the project. They are used to train the advanced neural networks which are developed at the University of Osnabrück and to evaluate the predictivepower of the machine learning method at the University of Bonn.

Participation in ESiWACE2 container hackathon

From left: Amirpasha Mozaffari (JSC), Tomas Aliaga (CSCS), Jan Vogelsang (JSC), Bing Gong (JSC) in front of Switzerland National Supercomputer Center (CSCS).

Bing Gong, Amirpash Mozaffari, and Jan Vogelsang participated in the ESiWACE2 container hackathon for modelers in Switzerland national supercomputer center (CSCS) located in Lugano, Switzerland from 3rd to 6th of December 2019. 

The containerization is a method to bundle an application together with all its related configuration files, libraries and dependencies required for it to run in an efficient and bug-free way across different computing environments

In three days, they have worked on containerizing the machine learning workflow in local machine and later port the scripts to run on HPC (see the application details: https://github.com/eth-cscs/ContainerHackathon/tree/wfppdl/wfppdl). CSCS is the house of the Piz Daint, the fastest supercomputer in Europe, and it is ranked 6th in Top 500 supercomputers by November 2019.  Our team managed to containerize the workflow and successfully run the script on Piz Daint and passing the scalability test. 

DeepRain third project meeting

Meeting participants behind a rain collector in the backyard of the institute for geosciences at the university of Bonn

The DeepRain team has met for its third project meeting at the institute for geosciences at the university of Bonn. Fitting the project’s topic and the season, the rain collector in the backyard of the institute measured 8 mm of rainfall during the 3 days of the meeting
(source: https://www.ifgeo.uni-bonn.de/abteilungen/meteorologie/messdaten/wetterdaten-bonn-endenich/messwerte).
After the first project year, a lot has been accomplished in terms of managing the vast amount of data that are needed in the project, but also with respect to exploring suitable machine learning strategies and statistical methods. The project meeting provided a great opportunity to bring all this information together and discuss the strategy for the coming months. Predicting rainfall is difficult (we knew this from the start), but it is also challenging to understand if and when a rainfall prediction is actually “good” and what this means. Even a simple question like “rain or no rain?” is actually not easy to answer because of measurement and model uncertainties. It will thus require some more time before DeepRain can generate its first meaningful data products.

Workshop “Machine Learning in weather and climate modelling” in Oxford

Martin Schultz and Lukas Leufen attended a workshop on “Machine Learning in weather and climate modelling” at Corpus Christi college in Oxford. This workshop assembled more than 100 top-notch climate scientists and experts in HPC computer science and machine learning to present ongoing work and discuss the way forward. It became clear from the start that machine learning can likely play an important role in almost all stages of a weather and climate modelling workflow. Much discussed topics were the perceived need to impose physical constraints on the machine learning algorithms and quantify uncertainties. Martin Schultz’s presentation on the IntelliAQ and DeepRain projects was well received and the positive response confirmed the research strategy followed by these projects.

Master thesis “Deep Hyperresolution for Weather Forecasting”

At Osnabrück University Jonas Rebstadt sucessfully finished his studies with a master thesis titled “Deep Hyperresolution for Weather Forecasting”. The goal is to develop a system that is able to increase the precision of rain forecast without exorbitant higher computational demand. The approach presented in this thesis is trying to increase the spatial resolution of a currently productively used forecast model developed from the Deutscher Wetterdienst (DWD) by training a neural network based on higher resolved radar images as target.

Master Thesis “Deep Learning for Future Frame Prediction of Weather Maps”

Severin Hußmann sucessfully finished his master studies at Humboldt-University of Berlin. His master thesis “Deep Learning for Future Frame Prediction of Weather Maps” focuses on applying data-driven deep learning methodologies to the field of weather forecasting, specifically air temperature over Europe. A future frame prediction model from the computer vision field is trained with the three input variables air temperature, surface pressure, and the 500 hPa geopotential in order to predict the air temperature itself. The experiments show that the model can make better hourly and even several-hour predictions than the persistence assumption.

Data Transfer Milestone: 100th Terabyte

Progress in data transfer: an essential aspect of the DeepRain project is the large amount of data used for training and evaluation of machine learning methods. A total of over 430 terabytes of data are currently being transferred from the German Weather Service to Forschungszentrum Jülich to be used on JSC supercomputers for deep learning. Today, the 100th terabyte was successfully transferred and integrated into the storage systems at JSC. This is an important milestone, as enough data is now available to carry out the first meaningful deep learning and analyses.

Data Storage at JSC allocated

Jülich Supercomputing Centre (JSC) has allocated two large data projects with a volume of several hundred terabytes for the DeepRain project. The first 30 TByte of meteorological model data have been successfully transferred from the German Weather Service to JSC and a prototype workflow for processing of these data has been established.

European Geoscience Union Conference

The DeepRain presentation by Martin Schultz at the European Geoscience Union conference in Vienna was well received. Machine learning attracts a lot of attention now in the research field of weather and climate. Fruitful discussions followed the talk, which may lead to future collaborations.

Second Project Meeting

The DeepRain team has just completed its second project meeting and the project partners return to their home institutions. The meeting was organized at the Institute for Cognitive Sciences at the University in Osnabrück and included a brief tutorial on Deep Learning, which is the expertise of Prof. Gordon Pipa’s research group. Prof. Pipa’s team presented their plans for neural network architectures that will be used to learn rainfall patterns from the ensemble model runs by the German Weather Service. The DeepRain partners continued their discussions about the Terabyte-scale data management and debated about validation methods and error characteristics, and how these may affect the performance of the neural networks. Careful analysis of errors and understanding the merits and limitations of deep learning in the context of weather data are key objectives of the DeepRain project.

DeepRain Cooperation Agreement has been finalized

The DeepRain cooperation agreement has been finalized and signed by all project partners. This constitutes the formal basis for a fruitful collaboration among Forschungszentrum Jülich as coordinator, the German Weather Services (DWD), the Universities of Osnabrück and Bonn and Jacobs University in Bremen. “The DeepRain project adopts the principles of Open Science and Open Data. Therefore the collaboration agreement imposes as little constraints as possible, but some rules are necessary.”, says Dr. Martin Schultz, who coordinates the project.