Unifying supervised learning and VAEs — automating statistical inference in high-energy physics. (arXiv:2008.05825v2 [cs.LG] UPDATED)
<a href="http://arxiv.org/find/cs/1/au:+Glusenkamp_T/0/1/0/all/0/1">Thorsten Gl&#xfc;senkamp</a>

A KL-divergence objective of the joint distribution of data and labels allows
to unify supervised learning, variational autoencoders (VAEs) and
semi-supervised learning under one umbrella of variational inference. This
viewpoint has several advantages. For VAEs, it clarifies the interpretation of
encoder and decoder parts. For supervised learning, it re-iterates that the
training procedure approximates the true posterior over labels and can always
be viewed as approximate likelihood-free inference. This is typically not
discussed, even though the derivation is well-known in the literature. In the
context of semi-supervised learning it motivates an extended supervised scheme
which allows to calculate a goodness-of-fit p-value using posterior predictive
simulations. Flow-based networks with a standard normal base distribution are
crucial. We discuss how they allow to rigorously define coverage for arbitrary
joint posteriors on $mathbb{R}^n times mathcal{S}^m$, which encompasses
posteriors over directions. Finally, systematic uncertainties are naturally
included in the variational viewpoint. With the three ingredients of (1)
systematics, (2) coverage and (3) goodness-of-fit, flow-based neural networks
have the potential to replace a large part of the statistical toolbox of the
contemporary high-energy physicist.

A KL-divergence objective of the joint distribution of data and labels allows
to unify supervised learning, variational autoencoders (VAEs) and
semi-supervised learning under one umbrella of variational inference. This
viewpoint has several advantages. For VAEs, it clarifies the interpretation of
encoder and decoder parts. For supervised learning, it re-iterates that the
training procedure approximates the true posterior over labels and can always
be viewed as approximate likelihood-free inference. This is typically not
discussed, even though the derivation is well-known in the literature. In the
context of semi-supervised learning it motivates an extended supervised scheme
which allows to calculate a goodness-of-fit p-value using posterior predictive
simulations. Flow-based networks with a standard normal base distribution are
crucial. We discuss how they allow to rigorously define coverage for arbitrary
joint posteriors on $mathbb{R}^n times mathcal{S}^m$, which encompasses
posteriors over directions. Finally, systematic uncertainties are naturally
included in the variational viewpoint. With the three ingredients of (1)
systematics, (2) coverage and (3) goodness-of-fit, flow-based neural networks
have the potential to replace a large part of the statistical toolbox of the
contemporary high-energy physicist.

http://arxiv.org/icons/sfx.gif