Poster Abstract

P10.18 Aard Keimpema (Joint Institute for VLBI ERIC)

Theme: Data processing pipelines

Efficient remote interactive pipelines using CASA and Jupyter

The size of astronomical datasets has increased dramatically over the years; terabyte sized datasets
are no longer an exception. This trend will only accelerate; the SKA is expected to produce nearly 1 PB
of archivable data per day. Because of the data volumes involved data reduction will need to be done
close to where the data is archived, i.e. remotely in a science data centre. While most of this processing
can be performed with very little user interaction using (semi-)automated pipelines. There are always projects
which require a large amount of user interaction, either because it is an innovative application which isn't
handled by existing pipelines or because of the need to recover every bit of potential sensitivity.

In this talk I will present the Jupyter kernel which we created for CASA which allows CASA tasks to be
executed from within a Jupyter notebook. Jupyter notebooks are an ideal platform to allow users to execute
and modify pipelines remotely. The notebook format has the great advantage that all steps of the data
reduction are preserved inside the notebook. This means that the whole data reduction process is self-documenting
and fully repeatable. It also allows users to very easily make changes to their pipeline and then rerun the
pipeline steps affected.

The Jupyter kernel is distributed together with a custom build of CASA as Docker- and Singularity images.
An online demonstration of the software is running at http://jupyter.jive.nl.