Effectively using unsupervised machine learning in next generation astronomical surveys. (arXiv:1911.06823v1 [astro-ph.IM])
<a href="http://arxiv.org/find/astro-ph/1/au:+Reis_I/0/1/0/all/0/1">Itamar Reis</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Rotman_M/0/1/0/all/0/1">Michael Rotman</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Poznanski_D/0/1/0/all/0/1">Dovi Poznanski</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Prochaska_J/0/1/0/all/0/1">J. Xavier Prochaska</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Wolf_L/0/1/0/all/0/1">Lior Wolf</a>

In recent years many works have shown that unsupervised Machine Learning (ML)
can help detect unusual objects and uncover trends in large astronomical
datasets, but a few challenges remain. We show here, for example, that
different methods, or even small variations of the same method, can produce
significantly different outcomes. While intuitively somewhat surprising, this
can naturally occur when applying unsupervised ML to highly dimensional data,
where there can be many reasonable yet different answers to the same question.
In such a case the outcome of any single unsupervised ML method should be
considered a sample from a conceivably wide range of possibilities. We
therefore suggest an approach that eschews finding an optimal outcome, instead
facilitating the production and examination of many valid ones. This can be
achieved by incorporating unsupervised ML into data visualisation portals. We
present here such a portal that we are developing, applied to the sample of
SDSS spectra of galaxies. The main feature of the portal is interactive 2D maps
of the data. Different maps are constructed by applying dimensionality
reduction to different subspaces of the data, so that each map contains
different information that in turn gives a different perspective on the data.
The interactive maps are intuitive to use, and we demonstrate how peculiar
objects and trends can be detected by means of a few button clicks. We believe
that including tools in this spirit in next generation astronomical surveys
will be important for making unexpected discoveries, either by professional
astronomers or by citizen scientists, and will generally enable the benefits of
visual inspection even when dealing with very complex and extensive datasets.
Our portal is available online at galaxyportal.space.

In recent years many works have shown that unsupervised Machine Learning (ML)
can help detect unusual objects and uncover trends in large astronomical
datasets, but a few challenges remain. We show here, for example, that
different methods, or even small variations of the same method, can produce
significantly different outcomes. While intuitively somewhat surprising, this
can naturally occur when applying unsupervised ML to highly dimensional data,
where there can be many reasonable yet different answers to the same question.
In such a case the outcome of any single unsupervised ML method should be
considered a sample from a conceivably wide range of possibilities. We
therefore suggest an approach that eschews finding an optimal outcome, instead
facilitating the production and examination of many valid ones. This can be
achieved by incorporating unsupervised ML into data visualisation portals. We
present here such a portal that we are developing, applied to the sample of
SDSS spectra of galaxies. The main feature of the portal is interactive 2D maps
of the data. Different maps are constructed by applying dimensionality
reduction to different subspaces of the data, so that each map contains
different information that in turn gives a different perspective on the data.
The interactive maps are intuitive to use, and we demonstrate how peculiar
objects and trends can be detected by means of a few button clicks. We believe
that including tools in this spirit in next generation astronomical surveys
will be important for making unexpected discoveries, either by professional
astronomers or by citizen scientists, and will generally enable the benefits of
visual inspection even when dealing with very complex and extensive datasets.
Our portal is available online at galaxyportal.space.

http://arxiv.org/icons/sfx.gif