Photometric Classifications of Evolved Massive Stars: Preparing for the Era of Webb and Roman with Machine Learning. (arXiv:2102.02829v2 [astro-ph.SR] UPDATED)
<a href="http://arxiv.org/find/astro-ph/1/au:+Dorn_Wallenstein_T/0/1/0/all/0/1">Trevor Z. Dorn-Wallenstein</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Davenport_J/0/1/0/all/0/1">James R.A. Davenport</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Huppenkothen_D/0/1/0/all/0/1">Daniela Huppenkothen</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Levesque_E/0/1/0/all/0/1">Emily M. Levesque</a>

In the coming years, next-generation space-based infrared observatories will
significantly increase our samples of rare massive stars, representing a
tremendous opportunity to leverage modern statistical tools and methods to test
massive stellar evolution in entirely new environments. Such work is only
possible if the observed objects can be reliably classified. Spectroscopic
observations are infeasible with more distant targets, and so we wish to
determine whether machine learning methods can classify massive stars using
broadband infrared photometry. We find that a Support Vector Machine classifier
is capable of coarsely classifying massive stars with labels corresponding to
hot, cool, and emission line stars with high accuracy, while rejecting
contaminating low mass giants. Remarkably, 76% of emission line stars can be
recovered without the need for narrowband or spectroscopic observations. We
classify a sample of ${sim}2500$ objects with no existing labels, and identify
fourteen candidate emission line objects. Unfortunately, despite the high
precision of the photometry in our sample, the heterogeneous origins of the
labels for the stars in our sample severely inhibits our classifier from
distinguishing classes of stars with more granularity. Ultimately, no large and
homogeneously labeled sample of massive stars currently exists. Without
significant efforts to robustly classify evolved massive stars — which is
feasible given existing data from large all-sky spectroscopic surveys —
shortcomings in the labeling of existing data sets will hinder efforts to
leverage the next-generation of space observatories.

In the coming years, next-generation space-based infrared observatories will
significantly increase our samples of rare massive stars, representing a
tremendous opportunity to leverage modern statistical tools and methods to test
massive stellar evolution in entirely new environments. Such work is only
possible if the observed objects can be reliably classified. Spectroscopic
observations are infeasible with more distant targets, and so we wish to
determine whether machine learning methods can classify massive stars using
broadband infrared photometry. We find that a Support Vector Machine classifier
is capable of coarsely classifying massive stars with labels corresponding to
hot, cool, and emission line stars with high accuracy, while rejecting
contaminating low mass giants. Remarkably, 76% of emission line stars can be
recovered without the need for narrowband or spectroscopic observations. We
classify a sample of ${sim}2500$ objects with no existing labels, and identify
fourteen candidate emission line objects. Unfortunately, despite the high
precision of the photometry in our sample, the heterogeneous origins of the
labels for the stars in our sample severely inhibits our classifier from
distinguishing classes of stars with more granularity. Ultimately, no large and
homogeneously labeled sample of massive stars currently exists. Without
significant efforts to robustly classify evolved massive stars — which is
feasible given existing data from large all-sky spectroscopic surveys —
shortcomings in the labeling of existing data sets will hinder efforts to
leverage the next-generation of space observatories.

http://arxiv.org/icons/sfx.gif