Mass and Age determination of the LAMOST data with different Machine Learning methods. (arXiv:2205.06144v1 [astro-ph.GA])
<a href="http://arxiv.org/find/astro-ph/1/au:+Li_Q/0/1/0/all/0/1">Qi-Da Li</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Wang_H/0/1/0/all/0/1">Hai-Feng Wang</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Luo_Y/0/1/0/all/0/1">Yang-Ping Luo</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Li_Q/0/1/0/all/0/1">Qing Li</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Deng_L/0/1/0/all/0/1">Li-Cai Deng</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Ting_Y/0/1/0/all/0/1">Yuan-Sen Ting</a>

We present a catalog of 948,216 stars with mass label and a catalog of
163,105 red clump (RC) stars with mass and age labels simultaneously. The
training dataset is cross matched from the LAMOST DR5 and high resolution
asteroseismology data, mass and age are predicted by random forest method or
convex hull algorithm. The stellar parameters with high correlation with mass
and age are extracted and the test dataset shows that the median relative error
of the prediction model for the mass of large sample is 0.03 and meanwhile, the
mass and age of red clump stars are 0.04 and 0.07. We also compare the
predicted age of red clump stars with the recent works and find that the final
uncertainty of the RC sample could reach 18% for age and 9% for mass, in the
meantime, final precision of the mass for large sample with different type of
stars could reach 13% without considering systematics, all these are implying
that this method could be widely used in the future. Moreover, we explore the
performance of different machine learning methods for our sample, including
bayesian linear regression (BYS), gradient boosting decision Tree (GBDT),
multilayer perceptron (MLP), multiple linear regression (MLR), random forest
(RF) and support vector regression (SVR). Finally we find that the performance
of nonlinear model is generally better than that of linear model, and the GBDT
and RF methods are relatively better.

We present a catalog of 948,216 stars with mass label and a catalog of
163,105 red clump (RC) stars with mass and age labels simultaneously. The
training dataset is cross matched from the LAMOST DR5 and high resolution
asteroseismology data, mass and age are predicted by random forest method or
convex hull algorithm. The stellar parameters with high correlation with mass
and age are extracted and the test dataset shows that the median relative error
of the prediction model for the mass of large sample is 0.03 and meanwhile, the
mass and age of red clump stars are 0.04 and 0.07. We also compare the
predicted age of red clump stars with the recent works and find that the final
uncertainty of the RC sample could reach 18% for age and 9% for mass, in the
meantime, final precision of the mass for large sample with different type of
stars could reach 13% without considering systematics, all these are implying
that this method could be widely used in the future. Moreover, we explore the
performance of different machine learning methods for our sample, including
bayesian linear regression (BYS), gradient boosting decision Tree (GBDT),
multilayer perceptron (MLP), multiple linear regression (MLR), random forest
(RF) and support vector regression (SVR). Finally we find that the performance
of nonlinear model is generally better than that of linear model, and the GBDT
and RF methods are relatively better.

http://arxiv.org/icons/sfx.gif