Pulsar Candidate Identification Using Semi-Supervised Generative Adversarial Networks. (arXiv:2010.07457v2 [astro-ph.IM] UPDATED)
<a href="http://arxiv.org/find/astro-ph/1/au:+Balakrishnan_V/0/1/0/all/0/1">Vishnu Balakrishnan</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Champion_D/0/1/0/all/0/1">David Champion</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Barr_E/0/1/0/all/0/1">Ewan Barr</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Kramer_M/0/1/0/all/0/1">Michael Kramer</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Sengar_R/0/1/0/all/0/1">Rahul Sengar</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Bailes_M/0/1/0/all/0/1">Matthew Bailes</a>

Machine learning methods are increasingly helping astronomers identify new
radio pulsars. However, they require a large amount of labelled data, which is
time consuming to produce and biased. Here we describe a Semi-Supervised
Generative Adversarial Network (SGAN) which achieves better classification
performance than the standard supervised algorithms using majority unlabelled
datasets. We achieved an accuracy and mean F-Score of 94.9% trained on only 100
labelled candidates and 5000 unlabelled candidates compared to our standard
supervised baseline which scored at 81.1% and 82.7% respectively. Our final
model trained on a much larger labelled dataset achieved an accuracy and mean
F-score value of 99.2% and a recall rate of 99.7%. This technique allows for
high quality classification during the early stages of pulsar surveys on new
instruments when limited labelled data is available. We open-source our work
along with a new pulsar-candidate dataset produced from the High Time
Resolution Universe – South Low Latitude Survey. This dataset has the largest
number of pulsar detections of any public dataset and we hope it will be a
valuable tool for benchmarking future machine learning models.

Machine learning methods are increasingly helping astronomers identify new
radio pulsars. However, they require a large amount of labelled data, which is
time consuming to produce and biased. Here we describe a Semi-Supervised
Generative Adversarial Network (SGAN) which achieves better classification
performance than the standard supervised algorithms using majority unlabelled
datasets. We achieved an accuracy and mean F-Score of 94.9% trained on only 100
labelled candidates and 5000 unlabelled candidates compared to our standard
supervised baseline which scored at 81.1% and 82.7% respectively. Our final
model trained on a much larger labelled dataset achieved an accuracy and mean
F-score value of 99.2% and a recall rate of 99.7%. This technique allows for
high quality classification during the early stages of pulsar surveys on new
instruments when limited labelled data is available. We open-source our work
along with a new pulsar-candidate dataset produced from the High Time
Resolution Universe – South Low Latitude Survey. This dataset has the largest
number of pulsar detections of any public dataset and we hope it will be a
valuable tool for benchmarking future machine learning models.

http://arxiv.org/icons/sfx.gif