Noisy Labels for Weakly Supervised Gamma Hadron Classification. (arXiv:2108.13396v1 [cs.LG])
<a href="http://arxiv.org/find/cs/1/au:+Pfahler_L/0/1/0/all/0/1">Lukas Pfahler</a>, <a href="http://arxiv.org/find/cs/1/au:+Bunse_M/0/1/0/all/0/1">Mirko Bunse</a>, <a href="http://arxiv.org/find/cs/1/au:+Morik_K/0/1/0/all/0/1">Katharina Morik</a>

Gamma hadron classification, a central machine learning task in gamma ray
astronomy, is conventionally tackled with supervised learning. However, the
supervised approach requires annotated training data to be produced in
sophisticated and costly simulations. We propose to instead solve gamma hadron
classification with a noisy label approach that only uses unlabeled data
recorded by the real telescope. To this end, we employ the significance of
detection as a learning criterion which addresses this form of weak
supervision. We show that models which are based on the significance of
detection deliver state-of-the-art results, despite being exclusively trained
with noisy labels; put differently, our models do not require the costly
simulated ground-truth labels that astronomers otherwise employ for classifier
training. Our weakly supervised models exhibit competitive performances also on
imbalanced data sets that stem from a variety of other application domains. In
contrast to existing work on class-conditional label noise, we assume that only
one of the class-wise noise rates is known.

Gamma hadron classification, a central machine learning task in gamma ray
astronomy, is conventionally tackled with supervised learning. However, the
supervised approach requires annotated training data to be produced in
sophisticated and costly simulations. We propose to instead solve gamma hadron
classification with a noisy label approach that only uses unlabeled data
recorded by the real telescope. To this end, we employ the significance of
detection as a learning criterion which addresses this form of weak
supervision. We show that models which are based on the significance of
detection deliver state-of-the-art results, despite being exclusively trained
with noisy labels; put differently, our models do not require the costly
simulated ground-truth labels that astronomers otherwise employ for classifier
training. Our weakly supervised models exhibit competitive performances also on
imbalanced data sets that stem from a variety of other application domains. In
contrast to existing work on class-conditional label noise, we assume that only
one of the class-wise noise rates is known.

http://arxiv.org/icons/sfx.gif