Multiple Stellar Populations in NGC 2808: a Case Study for Cluster Analysis. (arXiv:1906.04983v1 [astro-ph.SR])
<a href="http://arxiv.org/find/astro-ph/1/au:+Pasquato_M/0/1/0/all/0/1">Mario Pasquato</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Milone_A/0/1/0/all/0/1">Antonino Milone</a>

In the massive globular cluster NGC 2808, RGB stars form at least five
distinct groups in the so-called chromosome map photometric plane, arguably
corresponding to different stellar populations. While a human expert can
separate the groups by eye relatively easily, algorithmic approaches are
desirable for reproducibility and for handling a larger sample of globular
clusters. Unfortunately, cluster analysis algorithms often produced
unsatisfactory results. Here we apply a range of non-parametric clustering
algorithms to the NGC 2808 RGB dataset: partitioning (k-means, Partitioning
Around Medoids – PAM), hierarchical (AGglomerative NESting – AGNES, DIvisive
ANAlysis – DIANA), and density based (Density-Based Spatial Clustering of
Applications with Noise – DBSCAN, Ordering Points To Identify the Clustering
Struture – OPTICS). For each algorithm we discuss different choices of the
relevant hyperparameters and their impact on the resulting clustering. We find
that AGNES produces results that are most similar to the expectations of a
human expert, depending on the prescription used for joining adjacent groups –
linkage. Among the linkage prescriptions we tested, Ward’s method performs
best, and average linkage obtains comparable results only if outliers are
removed beforehand. We recommend using AGNES with Ward’s method or similar
linkages in future studies to automatically identify stellar populations in the
chromosome map plane.

In the massive globular cluster NGC 2808, RGB stars form at least five
distinct groups in the so-called chromosome map photometric plane, arguably
corresponding to different stellar populations. While a human expert can
separate the groups by eye relatively easily, algorithmic approaches are
desirable for reproducibility and for handling a larger sample of globular
clusters. Unfortunately, cluster analysis algorithms often produced
unsatisfactory results. Here we apply a range of non-parametric clustering
algorithms to the NGC 2808 RGB dataset: partitioning (k-means, Partitioning
Around Medoids – PAM), hierarchical (AGglomerative NESting – AGNES, DIvisive
ANAlysis – DIANA), and density based (Density-Based Spatial Clustering of
Applications with Noise – DBSCAN, Ordering Points To Identify the Clustering
Struture – OPTICS). For each algorithm we discuss different choices of the
relevant hyperparameters and their impact on the resulting clustering. We find
that AGNES produces results that are most similar to the expectations of a
human expert, depending on the prescription used for joining adjacent groups –
linkage. Among the linkage prescriptions we tested, Ward’s method performs
best, and average linkage obtains comparable results only if outliers are
removed beforehand. We recommend using AGNES with Ward’s method or similar
linkages in future studies to automatically identify stellar populations in the
chromosome map plane.

http://arxiv.org/icons/sfx.gif