Margin-free classification and new class detection using finite Dirichlet mixtures. (arXiv:2103.14138v1 [stat.AP])
<a href="http://arxiv.org/find/stat/1/au:+John_P/0/1/0/all/0/1">Prince John</a>, <a href="http://arxiv.org/find/stat/1/au:+Brazzale_A/0/1/0/all/0/1">Alessandra R. Brazzale</a>, <a href="http://arxiv.org/find/stat/1/au:+Suveges_M/0/1/0/all/0/1">Maria S&#xfc;veges</a>

We present a margin-free finite mixture model which allows us to
simultaneously classify objects into known classes and to identify possible new
object types using a set of continuous attributes. This application is
motivated by the needs of identifying and possibly detecting new types of a
particular kind of stars known as variable stars. We first suitably transform
the physical attributes of the stars onto the simplex to achieve scale
invariance while maintaining their dependence structure. This allows us to
compare data collected by different sky surveys which can have different
scales. The model hence combines a mixture of Dirichlet mixtures to represent
the known classes with the semi-supervised classification strategy of Vatanen
et al. (2012) for outlier detection. In line with previous work on
semiparametric model-based clustering, the single Dirichlet distributions can
be seen as providing the baseline pattern of the data. These are then combined
to effectively model the complex distributions of the attributes for the
different classes. The model is estimated using a hierarchical two-step
procedure which combines a suitably adapted version of the
Expectation-Maximization (EM) algorithm with Bayes’ rule. We validate our model
on a reliable sample of periodic variable stars available in the literature
(Dubath et al., 2011) achieving an overall classification accuracy of 71.95%, a
sensitivity of 86.11% and a specificity of 99.79% for new class detection.

We present a margin-free finite mixture model which allows us to
simultaneously classify objects into known classes and to identify possible new
object types using a set of continuous attributes. This application is
motivated by the needs of identifying and possibly detecting new types of a
particular kind of stars known as variable stars. We first suitably transform
the physical attributes of the stars onto the simplex to achieve scale
invariance while maintaining their dependence structure. This allows us to
compare data collected by different sky surveys which can have different
scales. The model hence combines a mixture of Dirichlet mixtures to represent
the known classes with the semi-supervised classification strategy of Vatanen
et al. (2012) for outlier detection. In line with previous work on
semiparametric model-based clustering, the single Dirichlet distributions can
be seen as providing the baseline pattern of the data. These are then combined
to effectively model the complex distributions of the attributes for the
different classes. The model is estimated using a hierarchical two-step
procedure which combines a suitably adapted version of the
Expectation-Maximization (EM) algorithm with Bayes’ rule. We validate our model
on a reliable sample of periodic variable stars available in the literature
(Dubath et al., 2011) achieving an overall classification accuracy of 71.95%, a
sensitivity of 86.11% and a specificity of 99.79% for new class detection.

http://arxiv.org/icons/sfx.gif