Oral Abstract

Oral Contribution (O1.3) Sara Nieto (European Space Astronomy Centre (ESAC))

Theme: Data discovery across heterogeneous datasets

A Science discovery portal for EUCLID data: The EUCLID scientific archive system

Euclid is an ESA mission and a milestone to explore the dark Universe. Euclid will map the 3D distribution of up to two billion galaxies and dark matter associated with them. It will hence measure the large-scale structures of the Universe across 10 billion light years, revealing the history of its expansion and the growth of structures during the last three-quarters of its history. In total Euclid will produce up to 26 PB per year of observations.

The Euclid Archive System is a joint development between ESA and the Euclid Consortium and is led by the Science Data Centres of the Netherlands and the ESDC (ESAC Science Data Centre). The EAS is composed by three different subsystems: Data Processing System (DPS), Distributed Storage System (DSS) and Science Archive System (SAS). The SAS is being built at the ESDC and is intended to provide access to the most valuable scientific data, which is currently estimated in 10 PB of images, catalogues and spectra, after 6 years mission.

The heterogeneous nature of Euclid data, mixing together Euclid observations with ground based images obtained from several telescopes, joined with the fact that Euclid will generate pixel data and also catalogues and spectra, makes the SAS a key driver for the scientific discoveries in the cosmology field. In this line, the tools provided as part of the SAS like images visualizer, catalogue query service, overlay, cutout, etc. require big-data technologies to enable the analysis of large data sets in ways that could not be done downloading the data. In this context, we can highlight the main technologies explored so far: JupyterLab, GreenPlum and PostgresXL and Apache Spark.
We will describe how Euclid in the context of the ESDC and in collaboration with the Gaia archive, envisages such a challenge to reach the scientific goals of the community enabling data discovery across different data sets.