Robust statistical tools to identify multiple stellar populations in globular clusters in the presence of measurement errors. A case study: NGC 2808. (arXiv:2110.08269v1 [astro-ph.GA])
<a href="http://arxiv.org/find/astro-ph/1/au:+Valle_G/0/1/0/all/0/1">G. Valle</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+DellOmodarme_M/0/1/0/all/0/1">M. Dell&#x27;Omodarme</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Tognelli_E/0/1/0/all/0/1">E. Tognelli</a>

The finding of multiple stellar populations (MP), defined by patterns in the
stellar element abundances, is nowadays considered a distinctive feature of
globular clusters. However, while data availability and quality improved in
last decades, this is not always true for the techniques adopted to their
analysis, rising problems of objectivity of the claims and reproducibility.
Using NGC 2808 as test case we show the use of well established statistical
clustering methods. We focus the analysis to the RGB phase, where two data sets
are available from recent literature for low- and high-resolution spectroscopy.
We adopt both hierarchical clustering and partition methods. We explicitly
address the usually neglected problem of measurement errors. The results of the
clustering algorithms were subjected to silhouette width analysis to compare
the performance of the split into different number of MP. For both data sets
the results are at odd with those reported in the literature. Two MP are
detected for both data sets, while the literature reports five and four MP from
high- and low-resolution spectroscopy respectively. The silhouette analysis
suggests that the population sub-structure is reliable for high-resolution
spectroscopy data, while the actual existence of MP is questionable for the
low-resolution spectroscopy data. The discrepancy with literature claims is
explainable due to the difference of methods adopted to MP characterisation. By
means of Monte Carlo simulations and multimodality statistical tests we show
that the often adopted study of the histogram of the differences in some key
elements is prone to multiple false positive findings. The adoption of
statistically grounded methods, which adopt all the available information to
subset the data and explicitly address the problem of data uncertainty, is of
paramount importance to present more robust and reproducible researches.

The finding of multiple stellar populations (MP), defined by patterns in the
stellar element abundances, is nowadays considered a distinctive feature of
globular clusters. However, while data availability and quality improved in
last decades, this is not always true for the techniques adopted to their
analysis, rising problems of objectivity of the claims and reproducibility.
Using NGC 2808 as test case we show the use of well established statistical
clustering methods. We focus the analysis to the RGB phase, where two data sets
are available from recent literature for low- and high-resolution spectroscopy.
We adopt both hierarchical clustering and partition methods. We explicitly
address the usually neglected problem of measurement errors. The results of the
clustering algorithms were subjected to silhouette width analysis to compare
the performance of the split into different number of MP. For both data sets
the results are at odd with those reported in the literature. Two MP are
detected for both data sets, while the literature reports five and four MP from
high- and low-resolution spectroscopy respectively. The silhouette analysis
suggests that the population sub-structure is reliable for high-resolution
spectroscopy data, while the actual existence of MP is questionable for the
low-resolution spectroscopy data. The discrepancy with literature claims is
explainable due to the difference of methods adopted to MP characterisation. By
means of Monte Carlo simulations and multimodality statistical tests we show
that the often adopted study of the histogram of the differences in some key
elements is prone to multiple false positive findings. The adoption of
statistically grounded methods, which adopt all the available information to
subset the data and explicitly address the problem of data uncertainty, is of
paramount importance to present more robust and reproducible researches.

http://arxiv.org/icons/sfx.gif