Classification of Multiwavelength Transients with Machine Learning. (arXiv:1811.08446v2 [astro-ph.IM] UPDATED)
<a href="http://arxiv.org/find/astro-ph/1/au:+Sooknunan_K/0/1/0/all/0/1">K. Sooknunan</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Lochner_M/0/1/0/all/0/1">M. Lochner</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Bassett_B/0/1/0/all/0/1">Bruce A. Bassett</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Peiris_H/0/1/0/all/0/1">H. V. Peiris</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Fender_R/0/1/0/all/0/1">R. Fender</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Stewart_A/0/1/0/all/0/1">A. J. Stewart</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Pietka_M/0/1/0/all/0/1">M. Pietka</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Woudt_P/0/1/0/all/0/1">P. A. Woudt</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+McEwen_J/0/1/0/all/0/1">J. D. McEwen</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Lahav_O/0/1/0/all/0/1">O. Lahav</a>

With the advent of powerful telescopes such as the Square Kilometer Array and
the Vera C. Rubin Observatory, we are entering an era of multiwavelength
transient astronomy that will lead to a dramatic increase in data volume.
Machine learning techniques are well suited to address this data challenge and
rapidly classify newly detected transients. We present a multiwavelength
classification algorithm consisting of three steps: (1) interpolation and
augmentation of the data using Gaussian processes; (2) feature extraction using
wavelets; and (3) classification with random forests. Augmentation provides
improved performance at test time by balancing the classes and adding diversity
into the training set. In the first application of machine learning to the
classification of real radio transient data, we apply our technique to the
Green Bank Interferometer and other radio light curves. We find we are able to
accurately classify most of the 11 classes of radio variables and transients
after just eight hours of observations, achieving an overall test accuracy of
78 percent. We fully investigate the impact of the small sample size of 82
publicly available light curves and use data augmentation techniques to
mitigate the effect. We also show that on a significantly larger simulated
representative training set that the algorithm achieves an overall accuracy of
97 percent, illustrating that the method is likely to provide excellent
performance on future surveys. Finally, we demonstrate the effectiveness of
simultaneous multiwavelength observations by showing how incorporating just one
optical data point into the analysis improves the accuracy of the worst
performing class by 19 percent.

With the advent of powerful telescopes such as the Square Kilometer Array and
the Vera C. Rubin Observatory, we are entering an era of multiwavelength
transient astronomy that will lead to a dramatic increase in data volume.
Machine learning techniques are well suited to address this data challenge and
rapidly classify newly detected transients. We present a multiwavelength
classification algorithm consisting of three steps: (1) interpolation and
augmentation of the data using Gaussian processes; (2) feature extraction using
wavelets; and (3) classification with random forests. Augmentation provides
improved performance at test time by balancing the classes and adding diversity
into the training set. In the first application of machine learning to the
classification of real radio transient data, we apply our technique to the
Green Bank Interferometer and other radio light curves. We find we are able to
accurately classify most of the 11 classes of radio variables and transients
after just eight hours of observations, achieving an overall test accuracy of
78 percent. We fully investigate the impact of the small sample size of 82
publicly available light curves and use data augmentation techniques to
mitigate the effect. We also show that on a significantly larger simulated
representative training set that the algorithm achieves an overall accuracy of
97 percent, illustrating that the method is likely to provide excellent
performance on future surveys. Finally, we demonstrate the effectiveness of
simultaneous multiwavelength observations by showing how incorporating just one
optical data point into the analysis improves the accuracy of the worst
performing class by 19 percent.

http://arxiv.org/icons/sfx.gif