Predicting conditional probability distributions of redshifts of Active Galactic Nuclei using Hierarchical Correlation Reconstruction. (arXiv:2206.06194v1 [cs.LG])
<a href="http://arxiv.org/find/cs/1/au:+Duda_J/0/1/0/all/0/1">Jarek Duda</a>

While there is a general focus on prediction of values, real data often only
allows to predict conditional probability distributions, with capabilities
bounded by conditional entropy $H(Y|X)$. If additionally estimating
uncertainty, we can treat a predicted value as the center of Gaussian of
Laplace distribution – idealization which can be far from complex conditional
distributions of real data. This article applies Hierarchical Correlation
Reconstruction (HCR) approach to inexpensively predict quite complex
conditional probability distributions (e.g. multimodal): by independent MSE
estimation of multiple moment-like parameters, which allow to reconstruct the
conditional distribution. Using linear regression for this purpose, we get
interpretable models: with coefficients describing contributions of features to
conditional moments. This article extends on the original approach especially
by using Canonical Correlation Analysis (CCA) for feature optimization and l1
“lasso” regularization, focusing on practical problem of prediction of redshift
of Active Galactic Nuclei (AGN) based on Fourth Fermi-LAT Data Release 2 (4LAC)
dataset.

While there is a general focus on prediction of values, real data often only
allows to predict conditional probability distributions, with capabilities
bounded by conditional entropy $H(Y|X)$. If additionally estimating
uncertainty, we can treat a predicted value as the center of Gaussian of
Laplace distribution – idealization which can be far from complex conditional
distributions of real data. This article applies Hierarchical Correlation
Reconstruction (HCR) approach to inexpensively predict quite complex
conditional probability distributions (e.g. multimodal): by independent MSE
estimation of multiple moment-like parameters, which allow to reconstruct the
conditional distribution. Using linear regression for this purpose, we get
interpretable models: with coefficients describing contributions of features to
conditional moments. This article extends on the original approach especially
by using Canonical Correlation Analysis (CCA) for feature optimization and l1
“lasso” regularization, focusing on practical problem of prediction of redshift
of Active Galactic Nuclei (AGN) based on Fourth Fermi-LAT Data Release 2 (4LAC)
dataset.

http://arxiv.org/icons/sfx.gif