Bayesian cross validation for gravitational-wave searches in pulsar-timing array data. (arXiv:1904.05355v1 [astro-ph.IM])
<a href="http://arxiv.org/find/astro-ph/1/au:+Wang_H/0/1/0/all/0/1">Haochen Wang</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Taylor_S/0/1/0/all/0/1">Stephen R. Taylor</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Vallisneri_M/0/1/0/all/0/1">Michele Vallisneri</a>

Gravitational-wave data analysis demands sophisticated statistical noise
models in a bid to extract highly obscured signals from data. In Bayesian model
comparison, we choose among a landscape of models by comparing their marginal
likelihoods. However, this computation is numerically fraught and can be
sensitive to arbitrary choices in the specification of parameter priors. In
Bayesian cross validation, we characterize the fit and predictive power of a
model by computing the Bayesian posterior of its parameters in a training
dataset, and then use that posterior to compute the averaged likelihood of a
different testing dataset. The resulting cross-validation scores are
straightforward to compute; they are insensitive to prior tuning; and they
penalize unnecessarily complex models that overfit the training data at the
expense of predictive performance. In this article, we discuss cross validation
in the context of pulsar-timing-array data analysis, and we exemplify its
application to simulated pulsar data (where it successfully selects the correct
spectral index of a stochastic gravitational-wave background), and to a pulsar
dataset from the NANOGrav 11-year release (where it convincingly favors a model
that represents a transient feature in the interstellar medium). We argue that
cross validation offers a promising alternative to Bayesian model comparison,
and we discuss its use for gravitational-wave detection, by selecting or
refuting models that include a gravitational-wave component.

Gravitational-wave data analysis demands sophisticated statistical noise
models in a bid to extract highly obscured signals from data. In Bayesian model
comparison, we choose among a landscape of models by comparing their marginal
likelihoods. However, this computation is numerically fraught and can be
sensitive to arbitrary choices in the specification of parameter priors. In
Bayesian cross validation, we characterize the fit and predictive power of a
model by computing the Bayesian posterior of its parameters in a training
dataset, and then use that posterior to compute the averaged likelihood of a
different testing dataset. The resulting cross-validation scores are
straightforward to compute; they are insensitive to prior tuning; and they
penalize unnecessarily complex models that overfit the training data at the
expense of predictive performance. In this article, we discuss cross validation
in the context of pulsar-timing-array data analysis, and we exemplify its
application to simulated pulsar data (where it successfully selects the correct
spectral index of a stochastic gravitational-wave background), and to a pulsar
dataset from the NANOGrav 11-year release (where it convincingly favors a model
that represents a transient feature in the interstellar medium). We argue that
cross validation offers a promising alternative to Bayesian model comparison,
and we discuss its use for gravitational-wave detection, by selecting or
refuting models that include a gravitational-wave component.

http://arxiv.org/icons/sfx.gif