Dimensionality Reduction Techniques for Statistical Inference in Cosmology
Minsu Park, Marco Gatti, Bhuvnesh Jain
arXiv:2409.02102v3 Announce Type: replace
Abstract: We explore linear and non-linear dimensionality reduction techniques for statistical inference of parameters in cosmology. Given the importance of compressing the increasingly complex data vectors used in cosmology, we address questions that impact the constraining power achieved, such as: Are currently used methods effectively lossless? Under what conditions do nonlinear methods, typically based on neural nets, outperform linear methods? Through theoretical analysis and experiments with simulated weak lensing data vectors we compare three standard linear methods and neural network based methods. We propose two linear methods that outperform all others while using less computational resources: a variation of the MOPED algorithm we call e-MOPED and an adaptation of Canonical Correlation Analysis (CCA), which is a method new to cosmology but well known in statistics. Both e-MOPED and CCA utilize simulations spanning the full parameter space, and rely on the sensitivity of the data vector to the parameters of interest. The gains we obtain are significant compared to compression methods used in the literature: up to 30% in the Figure of Merit for $Omega_m$ and $S_8$ in a realistic Simulation Based Inference analysis that includes statistical and systematic errors. We also recommend two modifications that improve the performance of all methods: First, include components in the compressed data vector that may not target the key parameters but still enhance the constraints on due to their correlations. The gain is significant, above 20% in the Figure of Merit. Second, compress Gaussian and non-Gaussian statistics separately — we include two summary statistics of each type in our analysis.arXiv:2409.02102v3 Announce Type: replace
Abstract: We explore linear and non-linear dimensionality reduction techniques for statistical inference of parameters in cosmology. Given the importance of compressing the increasingly complex data vectors used in cosmology, we address questions that impact the constraining power achieved, such as: Are currently used methods effectively lossless? Under what conditions do nonlinear methods, typically based on neural nets, outperform linear methods? Through theoretical analysis and experiments with simulated weak lensing data vectors we compare three standard linear methods and neural network based methods. We propose two linear methods that outperform all others while using less computational resources: a variation of the MOPED algorithm we call e-MOPED and an adaptation of Canonical Correlation Analysis (CCA), which is a method new to cosmology but well known in statistics. Both e-MOPED and CCA utilize simulations spanning the full parameter space, and rely on the sensitivity of the data vector to the parameters of interest. The gains we obtain are significant compared to compression methods used in the literature: up to 30% in the Figure of Merit for $Omega_m$ and $S_8$ in a realistic Simulation Based Inference analysis that includes statistical and systematic errors. We also recommend two modifications that improve the performance of all methods: First, include components in the compressed data vector that may not target the key parameters but still enhance the constraints on due to their correlations. The gain is significant, above 20% in the Figure of Merit. Second, compress Gaussian and non-Gaussian statistics separately — we include two summary statistics of each type in our analysis.
2025-02-05
Comments are closed, but trackbacks and pingbacks are open.