Progress in the machine learning section of the project!

After preparing ensemble statistics over the COSMO-DE EPS numerical weather prediction, we compared artificial neural networks and the classically used linear regression as post-processing models for precipitation at several weather stations. After testing numerous different potential neural network architectures, we found that we can consistently outperform linear regression and improve significantly over the numerical weather prediction result. Look forward to a publication soon.

Paper on “Can deep learning beat numerical weather prediction?” published

Today, the Open Access article by Martin Schultz et.al. has been published in the Philosophical Transactions of the Royal Societ A, theme issue “Machine learning for weather and climate modelling”. The paper discusses the question, whether it is possible to completely replace  the  current  numerical  weather  models  and data  assimilation  systems  with  deep learning  approaches. It is available at https://royalsocietypublishing.org/doi/10.1098/rsta.2020.0097.

Fifth DeepRain Project Meeting

In this project meeting, which again took place virtually, the project progress of the past months was discussed. There was some pleasing progress to report here, although the interdisciplinary collaboration proved more difficult than hoped in some cases due to the pandemic situation.

  • The University of Bonn has carried out an analysis of how large-scale weather data affect the occurrence of precipitation at the Münster and Osnabrück measuring stations. For this purpose, a so-called logistic regression was used. Depending on the season, this approach results in a significant improvement compared to a purely local forecast.
  • The DWD tested different approaches for a regression with a generalised linear model and, in particular, investigated to what extent it makes a difference whether the input variables are selected separately for each measuring station or jointly for all of them. In fact, even with a generalised approach, similar good results can be obtained for the estimation of precipitation as with the separate procedure. However, the yes-no decision of whether it will rain or not becomes less accurate at some stations.
  • Jacobs University used Jupyter notebooks to demonstrate how queries from the Rasdaman database can be integrated into data analysis and machine learning workflows. Furthermore, Jülich was supported to join the data federation in the Earth Server data cube.
  • Forschungszentrum Jülich reported progress in the development of machine learning workflows, much of which has now been parallelized, greatly increasing the throughput of data and computations. Data management for the huge amount of weather data is now largely consolidated; in the main, radar data still needs to be finalized for processing.
  • The University of Osnabrück successfully applied neural networks to learn the relation of next-day precipitation amounts to current weather data at a small set of measurement stations. The neural networks outperformed classical regression. Implementing the machine learning workflow on the Jülich supercomputer presented some challenges due to the need of an efficient and flexible data handling tool, which would work on the vast amount of raw data that is available in the project.

Guest lecture at DWD on station-based time series predictions

Today Felix Kleinert gave a guest lecture on “MLAir & IntelliO3: Station-based time series forecasts using neural networks” at the AI Forum at the German Weather Service. The Python framework for station-based time series forecasts, MLAir, currently under development, was presented and applied to ground-level ozone concentrations.

DeepRain Progresses

Owing to the continued Covid-19 situation the DeepRain partners have now adopted a monthly schedule for online meetings. For specific technical and scientific discussions, smaller teams meet weekly. Over the past month, at JSC the data import of the COSMO ensemble weather model have been improved, and the benchmarking tests regarding data entry and data extraction has been successfully completed. New instances of the Rasdaman array databases (“data cubes”) have been installed on JSC resources, including one instance of the Rasdaman Enterprise server, which will soon become a node in an international data federation. The JSC team further supported Osnabrück with their implementation of machine learning workflows on the Jülich supercomputer systems.

DeepLearning in Rain Prediction

Predicting rain (precipitation forecasts) requires to consider several factors that are generated by weather models. The relation between the forecasted precipitations is considered to be complex and non-linear. To improve forecasts we therefore use neuronal networks that learn to combine factors based on training data. As an outcome of the DeepRain Project we developed several deep learning based approaches using networks with different numbers of layers and complexities in terms of number of neurons. We compared these to the more classical approach that assumes a linear combination with a single layer and showcased on two weather stations in Germany that deep learning can outperform classical precipitation forecasts. Best performance was reached using 3 layers indicating that non-linear mixing of factors produced by weather models are important.

Fourth DeepRain Project Meeting

Icon Virtual Meeting
Icon Virtual Meeting
Icons made by surang from www.flaticon.com

Due to the Covid-19 situation, the 4th DeepRain project meeting, which was scheduled to take place from March, 10-12, 2020, had to be converted into a series of web meetings with the project partners. The main focus of the meeting was to prepare input for the upcoming project report in April and to plan specific actions for the next 6 months. Over the past months, a lot of progress has been made with respect to the data preparation and machine learning workflows, but a couple of issues remain in terms of implementing these workflows on the JSC supercomputing system. This has so far hindered the project from generating meaningful rainfall predictions through machine learning. All project partners are working together to get these workflows in place and we are optimistic that, once these issues are solved, it will be relatively easy to scale the solutions to much larger datasets. For the next month the focus will be on performance improvements for data imports, on applying machine learning on station timeseries data, and exploiting parallelisation at all levels to optimize workflows, tools and data processing on the HPC system.

Data transfer of DWD ensemble weather model to Jülich completed

Today, the last byte of more than 439 Terabytes of weather model data from the COSMO Ensemble Prediction System of the German Weather Service DWD has been received and stored at the Jülich Supercomputing Centre. The DeepRain team now has 7 years of hourly weather data from 20 ensemble members available. These data are an essential foundation for the project. They are used to train the advanced neural networks which are developed at the University of Osnabrück and to evaluate the predictivepower of the machine learning method at the University of Bonn.

Participation in ESiWACE2 container hackathon

From left: Amirpasha Mozaffari (JSC), Tomas Aliaga (CSCS), Jan Vogelsang (JSC), Bing Gong (JSC) in front of Switzerland National Supercomputer Center (CSCS).

Bing Gong, Amirpash Mozaffari, and Jan Vogelsang participated in the ESiWACE2 container hackathon for modelers in Switzerland national supercomputer center (CSCS) located in Lugano, Switzerland from 3rd to 6th of December 2019. 

The containerization is a method to bundle an application together with all its related configuration files, libraries and dependencies required for it to run in an efficient and bug-free way across different computing environments

In three days, they have worked on containerizing the machine learning workflow in local machine and later port the scripts to run on HPC (see the application details: https://github.com/eth-cscs/ContainerHackathon/tree/wfppdl/wfppdl). CSCS is the house of the Piz Daint, the fastest supercomputer in Europe, and it is ranked 6th in Top 500 supercomputers by November 2019.  Our team managed to containerize the workflow and successfully run the script on Piz Daint and passing the scalability test. 

DeepRain third project meeting

Meeting participants behind a rain collector in the backyard of the institute for geosciences at the university of Bonn

The DeepRain team has met for its third project meeting at the institute for geosciences at the university of Bonn. Fitting the project’s topic and the season, the rain collector in the backyard of the institute measured 8 mm of rainfall during the 3 days of the meeting
(source: https://www.ifgeo.uni-bonn.de/abteilungen/meteorologie/messdaten/wetterdaten-bonn-endenich/messwerte).
After the first project year, a lot has been accomplished in terms of managing the vast amount of data that are needed in the project, but also with respect to exploring suitable machine learning strategies and statistical methods. The project meeting provided a great opportunity to bring all this information together and discuss the strategy for the coming months. Predicting rainfall is difficult (we knew this from the start), but it is also challenging to understand if and when a rainfall prediction is actually “good” and what this means. Even a simple question like “rain or no rain?” is actually not easy to answer because of measurement and model uncertainties. It will thus require some more time before DeepRain can generate its first meaningful data products.

Workshop “Machine Learning in weather and climate modelling” in Oxford

Martin Schultz and Lukas Leufen attended a workshop on “Machine Learning in weather and climate modelling” at Corpus Christi college in Oxford. This workshop assembled more than 100 top-notch climate scientists and experts in HPC computer science and machine learning to present ongoing work and discuss the way forward. It became clear from the start that machine learning can likely play an important role in almost all stages of a weather and climate modelling workflow. Much discussed topics were the perceived need to impose physical constraints on the machine learning algorithms and quantify uncertainties. Martin Schultz’s presentation on the IntelliAQ and DeepRain projects was well received and the positive response confirmed the research strategy followed by these projects.

Master thesis “Deep Hyperresolution for Weather Forecasting”

At Osnabrück University Jonas Rebstadt sucessfully finished his studies with a master thesis titled “Deep Hyperresolution for Weather Forecasting”. The goal is to develop a system that is able to increase the precision of rain forecast without exorbitant higher computational demand. The approach presented in this thesis is trying to increase the spatial resolution of a currently productively used forecast model developed from the Deutscher Wetterdienst (DWD) by training a neural network based on higher resolved radar images as target.

Master Thesis “Deep Learning for Future Frame Prediction of Weather Maps”

Severin Hußmann sucessfully finished his master studies at Humboldt-University of Berlin. His master thesis “Deep Learning for Future Frame Prediction of Weather Maps” focuses on applying data-driven deep learning methodologies to the field of weather forecasting, specifically air temperature over Europe. A future frame prediction model from the computer vision field is trained with the three input variables air temperature, surface pressure, and the 500 hPa geopotential in order to predict the air temperature itself. The experiments show that the model can make better hourly and even several-hour predictions than the persistence assumption.

Data Transfer Milestone: 100th Terabyte

Progress in data transfer: an essential aspect of the DeepRain project is the large amount of data used for training and evaluation of machine learning methods. A total of over 430 terabytes of data are currently being transferred from the German Weather Service to Forschungszentrum Jülich to be used on JSC supercomputers for deep learning. Today, the 100th terabyte was successfully transferred and integrated into the storage systems at JSC. This is an important milestone, as enough data is now available to carry out the first meaningful deep learning and analyses.

Data Storage at JSC allocated

Jülich Supercomputing Centre (JSC) has allocated two large data projects with a volume of several hundred terabytes for the DeepRain project. The first 30 TByte of meteorological model data have been successfully transferred from the German Weather Service to JSC and a prototype workflow for processing of these data has been established.

European Geoscience Union Conference

The DeepRain presentation by Martin Schultz at the European Geoscience Union conference in Vienna was well received. Machine learning attracts a lot of attention now in the research field of weather and climate. Fruitful discussions followed the talk, which may lead to future collaborations.

Second Project Meeting

The DeepRain team has just completed its second project meeting and the project partners return to their home institutions. The meeting was organized at the Institute for Cognitive Sciences at the University in Osnabrück and included a brief tutorial on Deep Learning, which is the expertise of Prof. Gordon Pipa’s research group. Prof. Pipa’s team presented their plans for neural network architectures that will be used to learn rainfall patterns from the ensemble model runs by the German Weather Service. The DeepRain partners continued their discussions about the Terabyte-scale data management and debated about validation methods and error characteristics, and how these may affect the performance of the neural networks. Careful analysis of errors and understanding the merits and limitations of deep learning in the context of weather data are key objectives of the DeepRain project.

DeepRain Cooperation Agreement has been finalized

The DeepRain cooperation agreement has been finalized and signed by all project partners. This constitutes the formal basis for a fruitful collaboration among Forschungszentrum Jülich as coordinator, the German Weather Services (DWD), the Universities of Osnabrück and Bonn and Jacobs University in Bremen. “The DeepRain project adopts the principles of Open Science and Open Data. Therefore the collaboration agreement imposes as little constraints as possible, but some rules are necessary.”, says Dr. Martin Schultz, who coordinates the project.

Rasdaman Training Workshop

Back from the Rasdaman training workshop on the campus of Jacobs University in Bremen, the DeepRain co-ordinator Dr. Martin Schultz sounds rather satisfied: “This workshop was very helpful to all ten participants. Not only did we learn a lot about the amazing technology behind Rasdaman and the thorough design concept, which not only follows but actually sets standards for geographic data processing, but it was also good to find a bit of time to actually work on samples from the actual data that will be used during the project. It was great to learn that we can obtain even more data from DWD than we had thought initially. At JSC we now have to get our heads together to organize the transfer and management of half a petabyte of weather model data. DeepRain clearly is one of the most exciting projects I have been working on in my career.”

>