The Classification of Optical Galaxy Morphology Using Unsupervised Learning Techniques. (arXiv:2206.06165v1 [cs.LG])
<a href="http://arxiv.org/find/cs/1/au:+Fielding_E/0/1/0/all/0/1">Ezra Fielding</a>, <a href="http://arxiv.org/find/cs/1/au:+Nyirenda_C/0/1/0/all/0/1">Clement N. Nyirenda</a>, <a href="http://arxiv.org/find/cs/1/au:+Vaccari_M/0/1/0/all/0/1">Mattia Vaccari</a>

The advent of large scale, data intensive astronomical surveys has caused the
viability of human-based galaxy morphology classification methods to come into
question. Put simply, too much astronomical data is being produced for
scientists to visually label. Attempts have been made to crowd-source this work
by recruiting volunteers from the general public. However, even these efforts
will soon fail to keep up with data produced by modern surveys. Unsupervised
learning techniques do not require existing labels to classify data and could
pave the way to unplanned discoveries. Therefore, this paper aims to implement
unsupervised learning algorithms to classify the Galaxy Zoo DECaLS dataset
without human supervision. First, a convolutional autoencoder was implemented
as a feature extractor. The extracted features were then clustered via k-means,
fuzzy c-means and agglomerative clustering to provide classifications. The
results were compared to the volunteer classifications of the Galaxy Zoo DECaLS
dataset. Agglomerative clustering generally produced the best results, however,
the performance gain over k-means clustering was not significant. With the
appropriate optimizations, this approach could be used to provide
classifications for the better performing Galaxy Zoo DECaLS decision tree
questions. Ultimately, this unsupervised learning approach provided valuable
insights and results that were useful to scientists.

The advent of large scale, data intensive astronomical surveys has caused the
viability of human-based galaxy morphology classification methods to come into
question. Put simply, too much astronomical data is being produced for
scientists to visually label. Attempts have been made to crowd-source this work
by recruiting volunteers from the general public. However, even these efforts
will soon fail to keep up with data produced by modern surveys. Unsupervised
learning techniques do not require existing labels to classify data and could
pave the way to unplanned discoveries. Therefore, this paper aims to implement
unsupervised learning algorithms to classify the Galaxy Zoo DECaLS dataset
without human supervision. First, a convolutional autoencoder was implemented
as a feature extractor. The extracted features were then clustered via k-means,
fuzzy c-means and agglomerative clustering to provide classifications. The
results were compared to the volunteer classifications of the Galaxy Zoo DECaLS
dataset. Agglomerative clustering generally produced the best results, however,
the performance gain over k-means clustering was not significant. With the
appropriate optimizations, this approach could be used to provide
classifications for the better performing Galaxy Zoo DECaLS decision tree
questions. Ultimately, this unsupervised learning approach provided valuable
insights and results that were useful to scientists.

http://arxiv.org/icons/sfx.gif