Oral Abstract

Oral Contribution (O9.3) Gijs Molenaar (pythonic.nl)

Theme: Data processing pipelines

Easy to deploy and easy to modify data reduction pipelines using KERN and CWL

Radio telescopes are producing more data than ever before. The data rates of future telescopes, like SKA, will be of such a great scale making it infeasible to keep the end-user astronomer involved in the imaging and calibration step.

However, by making the pipelines open access, modular, easy to deploy and easy to modify, we believe that involving the end-user astronomer in the calibration decision process might not be that far-fetched. Creating technical and logistical pipeline infrastructure is pivotal, and will eventually result in better data products and better science.

Our solution consists of multiple layers. The base layer is the packaging of all the relevant software, and regularly releases a bundle of these packages as software distributions. This created a solid basis for reproducible science. Secondly, to address the portability and deployment issue, we adopt container technology with singularity currently as the primary choice. Lastly, to recombine these packages into data reduction pipelines, we embrace the Common Workflow Language (CWL). CWL is an open and free standard software package which is implemented by numerous pipeline running frameworks.

Decomposing the data reduction problem in this matter enables recombining and modifying pipelines in a high-level abstract way, while also enabling implicit parallelisation due to the functional (programming) nature of the standard. Furthermore, it represents all available software as with uniform interfaces resulting in simplifying storage, representation and visualisation of both parameters and (intermediate) results.

At ASTRON and SKA South Africa, successful experiments have been completed based on the described ensemble during recent years. These are slowly being transferred into production. Still, there is much more work to be done. The software stack needs improvements to make it future ready, and the packaging requires continuous effort in need of financing.

This talk is intended towards astronomers and sysadmins who struggle to maintain- or intend to set up a multi-user multi-machine cluster for medium to large scale data reduction.