Estimating Galaxy Parameters with Self-Organizing Maps and the Effect of Missing Data
Valentina La Torre (Tufts University), Anna Sajina (Tufts University), Andy D. Goulding (Princeton University), Danilo Marchesini (Tufts University), Rachel Bezanson (University of Pittsburgh), Alan N. Pearl (University of Pittsburgh), Laerte Sodr’e Jr (Universidade de S~ao Paulo)
arXiv:2403.18888v1 Announce Type: new
Abstract: The current and upcoming large data volume galaxy surveys require the use of machine learning techniques to maximize their scientific return. This study explores the use of Self-Organizing Maps (SOMs) to estimate galaxy parameters with a focus on handling cases of missing data and providing realistic probability distribution functions for the parameters. We train a SOM with a simulated mass-limited lightcone assuming a ugrizYJHKs+IRAC dataset, mimicking the Hyper Suprime-Cam (HSC) Deep joint dataset. For parameter estimation, we derive SOM likelihood surfaces considering photometric errors to derive total (statistical and systematic) uncertainties. We explore the effects of missing data including which bands are particular critical to the accuracy of the derived parameters. We demonstrate that the parameter recovery is significantly better when the missing bands are “filled-in” rather than if they are completely omitted. We propose a practical method for such recovery of missing data.arXiv:2403.18888v1 Announce Type: new
Abstract: The current and upcoming large data volume galaxy surveys require the use of machine learning techniques to maximize their scientific return. This study explores the use of Self-Organizing Maps (SOMs) to estimate galaxy parameters with a focus on handling cases of missing data and providing realistic probability distribution functions for the parameters. We train a SOM with a simulated mass-limited lightcone assuming a ugrizYJHKs+IRAC dataset, mimicking the Hyper Suprime-Cam (HSC) Deep joint dataset. For parameter estimation, we derive SOM likelihood surfaces considering photometric errors to derive total (statistical and systematic) uncertainties. We explore the effects of missing data including which bands are particular critical to the accuracy of the derived parameters. We demonstrate that the parameter recovery is significantly better when the missing bands are “filled-in” rather than if they are completely omitted. We propose a practical method for such recovery of missing data.