Defects and Inconsistencies in Solar Flare Data Sources: Implications for Machine Learning Forecasting
Ke Hu, Kevin Jin, Victor Verma, Weihao Liu, Ward Manchester IV, Lulu Zhao, Tamas Gombosi, Yang Chen
arXiv:2512.13417v2 Announce Type: replace
Abstract: Machine learning models for forecasting solar flares have been trained and evaluated using a variety of data sources, including Space Weather Prediction Center (SWPC) operational and science-quality data. Typically, data from these sources is minimally processed before being used to train and validate a forecasting model. However, predictive performance can be affected if defects and inconsistencies between these data sources are ignored. For a set of commonly used data sources, along with the software that queries and outputs processed data, we identify their defects and inconsistencies, quantify their extent, and show how they can affect predictions from data-driven machine-learning forecasting models. We also outline procedures for fixing these issues or at least mitigating their impacts. Finally, based on thorough comparisons of the effects of data sources on the trained forecasting model’s predictive skill scores, we offer recommendations for using different data products in operational forecasting.arXiv:2512.13417v2 Announce Type: replace
Abstract: Machine learning models for forecasting solar flares have been trained and evaluated using a variety of data sources, including Space Weather Prediction Center (SWPC) operational and science-quality data. Typically, data from these sources is minimally processed before being used to train and validate a forecasting model. However, predictive performance can be affected if defects and inconsistencies between these data sources are ignored. For a set of commonly used data sources, along with the software that queries and outputs processed data, we identify their defects and inconsistencies, quantify their extent, and show how they can affect predictions from data-driven machine-learning forecasting models. We also outline procedures for fixing these issues or at least mitigating their impacts. Finally, based on thorough comparisons of the effects of data sources on the trained forecasting model’s predictive skill scores, we offer recommendations for using different data products in operational forecasting.
2026-02-02
Comments are closed, but trackbacks and pingbacks are open.