Towards Bayesian Data Compression. (arXiv:2010.10375v1 [physics.data-an])

Towards Bayesian Data Compression. (arXiv:2010.10375v1 [physics.data-an])
<a href="http://arxiv.org/find/physics/1/au:+Harth_Kitzerow_J/0/1/0/all/0/1">Johannes Harth-Kitzerow</a>, <a href="http://arxiv.org/find/physics/1/au:+Leike_R/0/1/0/all/0/1">Reimar Leike</a>, <a href="http://arxiv.org/find/physics/1/au:+Arras_P/0/1/0/all/0/1">Philipp Arras</a>, <a href="http://arxiv.org/find/physics/1/au:+Ensslin_T/0/1/0/all/0/1">Torsten A. Enßlin</a>

In order to handle the large data sets omnipresent in modern science,
efficient compression algorithms are necessary. There exist general purpose
lossless and lossy compression algorithms, suited for different situations.
Here, a Bayesian data compression (BDC) algorithm that adapts to the specific
data set is derived. BDC compresses a data set under conservation of its
posterior structure with minimal information loss given the prior knowledge on
the signal, the quantity of interest. BDC works hand in hand with the signal
reconstruction from the data. Its basic form is valid for Gaussian priors and
likelihoods. This generalizes to non-linear settings with the help of Metric
Gaussian Variational Inference. BDC requires the storage of effective
instrument response functions for the compressed data and corresponding noise
encoding the posterior covariance structure. Their memory demand counteract the
compression gain. In order to improve this, sparsity of the compressed
responses can be enforced by separating the data into patches and compressing
them separately. The applicability of our method is demonstrated by applying it
to synthetic data and radio astronomical data. Still the algorithm needs to be
improved further as the computation time of the compression exceeds the time of
the inference with the original data.

http://arxiv.org/icons/sfx.gif