Using Multivariate Imputation by Chained Equations to Predict Redshifts of Active Galactic Nuclei. (arXiv:2203.00087v1 [astro-ph.IM])
<a href="http://arxiv.org/find/astro-ph/1/au:+Gibson_S/0/1/0/all/0/1">Spencer James Gibson</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Narendra_A/0/1/0/all/0/1">Aditya Narendra</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Dainotti_M/0/1/0/all/0/1">Maria Giovanna Dainotti</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Bogdan_M/0/1/0/all/0/1">Malgorzata Bogdan</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Pollo_A/0/1/0/all/0/1">Agniezska Pollo</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Poliszczuk_A/0/1/0/all/0/1">Artem Poliszczuk</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Rinaldi_E/0/1/0/all/0/1">Enrico Rinaldi</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Liodakis_I/0/1/0/all/0/1">Ioannis Liodakis</a>

Redshift measurement of active galactic nuclei (AGNs) remains a
time-consuming and challenging task, as it requires follow up spectroscopic
observations and detailed analysis. Hence, there exists an urgent requirement
for alternative redshift estimation techniques. The use of machine learning
(ML) for this purpose has been growing over the last few years, primarily due
to the availability of large-scale galactic surveys. However, due to
observational errors, a significant fraction of these data sets often have
missing entries, rendering that fraction unusable for ML regression
applications. In this study, we demonstrate the performance of an imputation
technique called Multivariate Imputation by Chained Equations (MICE), which
rectifies the issue of missing data entries by imputing them using the
available information in the catalog. We use the Fermi-LAT Fourth Data Release
Catalog (4LAC) and impute 24% of the catalog. Subsequently, we follow the
methodology described in Dainotti et al. (2021) and create an ML model for
estimating the redshift of 4LAC AGNs. We present results which highlight
positive impact of MICE imputation technique on the machine learning models
performance and obtained redshift estimation accuracy.

Redshift measurement of active galactic nuclei (AGNs) remains a
time-consuming and challenging task, as it requires follow up spectroscopic
observations and detailed analysis. Hence, there exists an urgent requirement
for alternative redshift estimation techniques. The use of machine learning
(ML) for this purpose has been growing over the last few years, primarily due
to the availability of large-scale galactic surveys. However, due to
observational errors, a significant fraction of these data sets often have
missing entries, rendering that fraction unusable for ML regression
applications. In this study, we demonstrate the performance of an imputation
technique called Multivariate Imputation by Chained Equations (MICE), which
rectifies the issue of missing data entries by imputing them using the
available information in the catalog. We use the Fermi-LAT Fourth Data Release
Catalog (4LAC) and impute 24% of the catalog. Subsequently, we follow the
methodology described in Dainotti et al. (2021) and create an ML model for
estimating the redshift of 4LAC AGNs. We present results which highlight
positive impact of MICE imputation technique on the machine learning models
performance and obtained redshift estimation accuracy.

http://arxiv.org/icons/sfx.gif