ERGO-ML — Comparing IllustrisTNG and HSC galaxy images via contrastive learning. (arXiv:2310.19904v1 [astro-ph.GA])
<a href="http://arxiv.org/find/astro-ph/1/au:+Eisert_L/0/1/0/all/0/1">Lukas Eisert</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Bottrell_C/0/1/0/all/0/1">Connor Bottrell</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Pillepich_A/0/1/0/all/0/1">Annalisa Pillepich</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Shimakawa_R/0/1/0/all/0/1">Rhythm Shimakawa</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Rodriguez_Gomez_V/0/1/0/all/0/1">Vicente Rodriguez-Gomez</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Nelson_D/0/1/0/all/0/1">Dylan Nelson</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Angeloudi_E/0/1/0/all/0/1">Eirini Angeloudi</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Huertas_Company_M/0/1/0/all/0/1">Marc Huertas-Company</a>

Modern cosmological hydrodynamical galaxy simulations provide tens of
thousands of reasonably realistic synthetic galaxies across cosmic time.
However, quantitatively assessing the level of realism of simulated universes
in comparison to the real one is difficult. In this paper of the ERGO-ML series
(Extracting Reality from Galaxy Observables with Machine Learning), we utilize
contrastive learning to directly compare a large sample of simulated and
observed galaxies based on their stellar-light images. This eliminates the need
to specify summary statistics and allows to exploit the whole information
content of the observations. We produce survey-realistic galaxy mock datasets
resembling real Hyper Suprime-Cam (HSC) observations using the cosmological
simulations TNG50 and TNG100. Our focus is on galaxies with stellar masses
between $10^9$ and $10^{12} M_odot$ at $z=0.1-0.4$. This allows us to evaluate
the realism of the simulated TNG galaxies in comparison to actual HSC
observations. We apply the self-supervised contrastive learning method NNCLR to
the images from both simulated and observed datasets (g, r, i – bands). This
results in a 256-dimensional representation space, encoding all relevant
observable galaxy properties. Firstly, this allows us to identify simulated
galaxies that closely resemble real ones by seeking similar images in this
multi-dimensional space. Even more powerful, we quantify the alignment between
the representations of these two image sets, finding that the majority
($gtrsim 70$ per cent) of the TNG galaxies align well with observed HSC
images. However, a subset of simulated galaxies with larger sizes, steeper
Sersic profiles, smaller Sersic ellipticities, and larger asymmetries appears
unrealistic. We also demonstrate the utility of our derived image
representations by inferring properties of real HSC galaxies using simulated
TNG galaxies as the ground truth.

Modern cosmological hydrodynamical galaxy simulations provide tens of
thousands of reasonably realistic synthetic galaxies across cosmic time.
However, quantitatively assessing the level of realism of simulated universes
in comparison to the real one is difficult. In this paper of the ERGO-ML series
(Extracting Reality from Galaxy Observables with Machine Learning), we utilize
contrastive learning to directly compare a large sample of simulated and
observed galaxies based on their stellar-light images. This eliminates the need
to specify summary statistics and allows to exploit the whole information
content of the observations. We produce survey-realistic galaxy mock datasets
resembling real Hyper Suprime-Cam (HSC) observations using the cosmological
simulations TNG50 and TNG100. Our focus is on galaxies with stellar masses
between $10^9$ and $10^{12} M_odot$ at $z=0.1-0.4$. This allows us to evaluate
the realism of the simulated TNG galaxies in comparison to actual HSC
observations. We apply the self-supervised contrastive learning method NNCLR to
the images from both simulated and observed datasets (g, r, i – bands). This
results in a 256-dimensional representation space, encoding all relevant
observable galaxy properties. Firstly, this allows us to identify simulated
galaxies that closely resemble real ones by seeking similar images in this
multi-dimensional space. Even more powerful, we quantify the alignment between
the representations of these two image sets, finding that the majority
($gtrsim 70$ per cent) of the TNG galaxies align well with observed HSC
images. However, a subset of simulated galaxies with larger sizes, steeper
Sersic profiles, smaller Sersic ellipticities, and larger asymmetries appears
unrealistic. We also demonstrate the utility of our derived image
representations by inferring properties of real HSC galaxies using simulated
TNG galaxies as the ground truth.

http://arxiv.org/icons/sfx.gif