In situ and in-transit analysis of cosmological simulations

Brian Friesen, Ann Almgren, Zarija Lukic, Gunther Weber, Dmitriy Morozov, Vincent Beckner, Marcus Day.
Computational Astrophysics and Cosmology, vol. 3, pages 1-18, 2016.
DOI: 10.1186/s40668-016-0017-2
Modern cosmological simulations have reached the trillion-element scale, rendering data storage and subsequent analysis formidable tasks. To address this circumstance, we present a new MPI-parallel approach for analysis of simulation data while the simulation runs, as an alternative to the traditional workflow consisting of periodically saving large data sets to disk for subsequent ‘offline’ analysis. We demonstrate this approach in the compressible gasdynamics/N-body code Nyx, a hybrid MPI+OpenMP code based on the BoxLib framework, used for large-scale cosmological simulations. We have enabled on-the-fly workflows in two different ways: one is a straightforward approach consisting of all MPI processes periodically halting the main simulation and analyzing each component of data that they own (‘in situ’). The other consists of partitioning processes into disjoint MPI groups, with one performing the simulation and periodically sending data to the other ‘sidecar’ group, which post-processes it while the simulation continues (‘in-transit’). The two groups execute their tasks asynchronously, stopping only to synchronize when a new set of simulation data needs to be analyzed. For both the in situ and in-transit approaches, we experiment with two different analysis suites with distinct performance behavior: one which finds dark matter halos in the simulation using merge trees to calculate the mass contained within iso-density contours, and another which calculates probability distribution functions and power spectra of various fields in the simulation. Both are common analysis tasks for cosmology, and both result in summary statistics significantly smaller than the original data set. We study the behavior of each type of analysis in each workflow in order to determine the optimal configuration for the different data analysis algorithms.