The scatter in the galaxy-halo connection: a machine learning analysis. (arXiv:2202.14006v1 [astro-ph.GA])
<a href="http://arxiv.org/find/astro-ph/1/au:+Stiskalek_R/0/1/0/all/0/1">Richard Stiskalek</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Bartlett_D/0/1/0/all/0/1">Deaglan J. Bartlett</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Desmond_H/0/1/0/all/0/1">Harry Desmond</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Anbajagane_D/0/1/0/all/0/1">Dhayaa Anbajagane</a>

We apply machine learning, a powerful method for uncovering complex
correlations in high-dimensional data, to the galaxy-halo connection of
cosmological hydrodynamical simulations. The mapping between galaxy and halo
variables is stochastic in the absence of perfect information, but conventional
machine learning models are deterministic and hence cannot capture its
intrinsic scatter. To overcome this limitation, we design an ensemble of neural
networks with a Gaussian loss function that predict probability distributions,
allowing us to model statistical uncertainties in the galaxy-halo connection as
well as its best-fit trends. We extract a number of galaxy and halo variables
from the Horizon-AGN and IllustrisTNG100-1 simulations and quantify the extent
to which knowledge of some subset of one enables prediction of the other. This
allows us to identify the key features of the galaxy-halo connection and
investigate the origin of its scatter in various projections. We find that
while halo properties beyond mass account for up to 50 per cent of the scatter
in the halo-to-stellar mass relation, the prediction of stellar half-mass
radius or total gas mass is not substantially improved by adding further halo
properties. We also use these results to investigate semi-analytic models for
galaxy size in the two simulations, finding that assumptions relating galaxy
size to halo size or spin are not successful.

We apply machine learning, a powerful method for uncovering complex
correlations in high-dimensional data, to the galaxy-halo connection of
cosmological hydrodynamical simulations. The mapping between galaxy and halo
variables is stochastic in the absence of perfect information, but conventional
machine learning models are deterministic and hence cannot capture its
intrinsic scatter. To overcome this limitation, we design an ensemble of neural
networks with a Gaussian loss function that predict probability distributions,
allowing us to model statistical uncertainties in the galaxy-halo connection as
well as its best-fit trends. We extract a number of galaxy and halo variables
from the Horizon-AGN and IllustrisTNG100-1 simulations and quantify the extent
to which knowledge of some subset of one enables prediction of the other. This
allows us to identify the key features of the galaxy-halo connection and
investigate the origin of its scatter in various projections. We find that
while halo properties beyond mass account for up to 50 per cent of the scatter
in the halo-to-stellar mass relation, the prediction of stellar half-mass
radius or total gas mass is not substantially improved by adding further halo
properties. We also use these results to investigate semi-analytic models for
galaxy size in the two simulations, finding that assumptions relating galaxy
size to halo size or spin are not successful.

http://arxiv.org/icons/sfx.gif