Outlier Prediction and Training Set Modification to Reduce Catastrophic Outlier Redshift Estimates in Large-Scale Surveys. (arXiv:1911.04572v4 [astro-ph.GA] UPDATED)
<a href="http://arxiv.org/find/astro-ph/1/au:+Wyatt_M/0/1/0/all/0/1">M. Wyatt</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Singal_J/0/1/0/all/0/1">J. Singal</a>

We present results of using individual galaxies’ probability distribution
over redshift as a method of identifying potential catastrophic outliers in
empirical photometric redshift estimation. In the course of developing this
approach we develop a method of modification of the redshift distribution of
training sets to improve both the baseline accuracy of high redshift (z>1.5)
estimation as well as catastrophic outlier mitigation. We demonstrate these
using two real test data sets and one simulated test data set spanning a wide
redshift range (0<z<4). Results presented here inform an example `prescription’
that can be applied as a realistic photometric redshift estimation scenario for
a hypothetical large-scale survey. We find that with appropriate optimization,
we can identify a significant percentage (>30%) of catastrophic outlier
galaxies while simultaneously incorrectly flagging only a small percentage (<7%
and in many cases <3%) of non-outlier galaxies as catastrophic outliers. We
find also that our training set redshift distribution modification results in a
significant (>10) percentage point decrease of outlier galaxies for z>1.5 with
only a small (<3) percentage point increase of outlier galaxies for z<1.5
compared to the unmodified training set. In addition, we find that this
modification can in some cases cause a significant (~20) percentage point
decrease of galaxies which are non-outliers but which have been incorrectly
identified as outliers, while in other cases cause only a small (<1) percentage
increase in this metric.

We present results of using individual galaxies’ probability distribution
over redshift as a method of identifying potential catastrophic outliers in
empirical photometric redshift estimation. In the course of developing this
approach we develop a method of modification of the redshift distribution of
training sets to improve both the baseline accuracy of high redshift (z>1.5)
estimation as well as catastrophic outlier mitigation. We demonstrate these
using two real test data sets and one simulated test data set spanning a wide
redshift range (0<z<4). Results presented here inform an example `prescription’
that can be applied as a realistic photometric redshift estimation scenario for
a hypothetical large-scale survey. We find that with appropriate optimization,
we can identify a significant percentage (>30%) of catastrophic outlier
galaxies while simultaneously incorrectly flagging only a small percentage (<7%
and in many cases <3%) of non-outlier galaxies as catastrophic outliers. We
find also that our training set redshift distribution modification results in a
significant (>10) percentage point decrease of outlier galaxies for z>1.5 with
only a small (<3) percentage point increase of outlier galaxies for z<1.5
compared to the unmodified training set. In addition, we find that this
modification can in some cases cause a significant (~20) percentage point
decrease of galaxies which are non-outliers but which have been incorrectly
identified as outliers, while in other cases cause only a small (<1) percentage
increase in this metric.

http://arxiv.org/icons/sfx.gif