A Morphological Model to Separate Resolved–unresolved Sources in the DESI Legacy Surveys: Application in the LS4 Alert Stream
Chang Liu, Adam A. Miller, Joshua S. Bloom, Robert A. Knop, Peter E. Nugent
arXiv:2505.17174v1 Announce Type: new
Abstract: Separating resolved and unresolved sources in large imaging surveys is a fundamental step to enable downstream science, such as searching for extragalactic transients in wide-field time-domain surveys. Here we present our method to effectively separate point sources from the resolved, extended sources in the Dark Energy Spectroscopic Instrument (DESI) Legacy Surveys (LS). We develop a supervised machine-learning model based on the Gradient Boosting algorithm texttt{XGBoost}. The features input to the model are purely morphological and are derived from the tabulated LS data products. We train the model using $sim$$2times10^5$ LS sources in the COSMOS field with HST morphological labels and evaluate the model performance on LS sources with spectroscopic classification from the DESI Data Release 1 ($sim$$2times10^7$ objects) and the Sloan Digital Sky Survey Data Release 17 ($sim$$3times10^6$ objects), as well as on $sim$$2times10^8$ Gaia stars. A significant fraction of LS sources are not observed in every LS filter, and we therefore build a “Hybrid” model as a linear combination of two texttt{XGBoost} models, each containing features combining aperture flux measurements from the “blue” ($gr$) and “red” ($iz$) filters. The Hybrid model shows a reasonable balance between sensitivity and robustness, and achieves higher accuracy and flexibility compared to the LS morphological typing. With the Hybrid model, we provide classification scores for $sim$$3times10^9$ LS sources, making this the largest ever machine-learning catalog separating resolved and unresolved sources. The catalog has been incorporated into the real-time pipeline of the La Silla Schmidt Southern Survey (LS4), enabling the identification of extragalactic transients within the LS4 alert stream.arXiv:2505.17174v1 Announce Type: new
Abstract: Separating resolved and unresolved sources in large imaging surveys is a fundamental step to enable downstream science, such as searching for extragalactic transients in wide-field time-domain surveys. Here we present our method to effectively separate point sources from the resolved, extended sources in the Dark Energy Spectroscopic Instrument (DESI) Legacy Surveys (LS). We develop a supervised machine-learning model based on the Gradient Boosting algorithm texttt{XGBoost}. The features input to the model are purely morphological and are derived from the tabulated LS data products. We train the model using $sim$$2times10^5$ LS sources in the COSMOS field with HST morphological labels and evaluate the model performance on LS sources with spectroscopic classification from the DESI Data Release 1 ($sim$$2times10^7$ objects) and the Sloan Digital Sky Survey Data Release 17 ($sim$$3times10^6$ objects), as well as on $sim$$2times10^8$ Gaia stars. A significant fraction of LS sources are not observed in every LS filter, and we therefore build a “Hybrid” model as a linear combination of two texttt{XGBoost} models, each containing features combining aperture flux measurements from the “blue” ($gr$) and “red” ($iz$) filters. The Hybrid model shows a reasonable balance between sensitivity and robustness, and achieves higher accuracy and flexibility compared to the LS morphological typing. With the Hybrid model, we provide classification scores for $sim$$3times10^9$ LS sources, making this the largest ever machine-learning catalog separating resolved and unresolved sources. The catalog has been incorporated into the real-time pipeline of the La Silla Schmidt Southern Survey (LS4), enabling the identification of extragalactic transients within the LS4 alert stream.
2025-05-26