Theme: Data science challenges: tools from statistics to machine learning
Six Dimensional Streaming Algorithm for Cluster Finding in N-Body Simulations
Cosmological N-body simulations are crucial for understanding how the Universe evolves. Studying large-scale distributions of matter in these simulations and comparing them to observations usually involves detecting dense clusters of particles called "halos," which are gravitationally bound and expected to form galaxies. However, traditional cluster finders are computationally expensive and use massive amounts of memory. Previous work by Liu et al. showed the connection between cluster detection and memory-efficient streaming algorithms by implementing heavy hitter algorithms as halo finders. They later developed a more robust streaming tool for halo finding which utilizes state-of-the-art computing techniques on GPUs to increase performance and scalability in Ivkin et al. The main drawback to this method is that it involves mapping particle onto a discrete grid, losing all other information, such as that given by the particles’ velocities. The complicated movement and substructure of halos is such that multiple halos frequently occupy the same grid location. Furthermore, the velocity distribution of these close halos can be telling of an interesting merging or passing of galaxies worth studying. In this project we analyze data from the Millennium Simulation Project to motivate the inclusion of velocity in our streaming algorithm. We demonstrate a method of doing so that allows one to find the same halos as before, while also detecting these interesting velocity distributions.