Estimating the galaxy two-point correlation function using a split random catalog. (arXiv:1905.01133v1 [astro-ph.CO])
<a href="http://arxiv.org/find/astro-ph/1/au:+Keihanen_E/0/1/0/all/0/1">E. Keih&#xe4;nen</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Kurki_Suonio_H/0/1/0/all/0/1">H. Kurki-Suonio</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Lindholm_V/0/1/0/all/0/1">V. Lindholm</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Viitanen_A/0/1/0/all/0/1">A. Viitanen</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Suur_Uski_A/0/1/0/all/0/1">A.-S. Suur-Uski</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Allevato_V/0/1/0/all/0/1">V. Allevato</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Branchini_E/0/1/0/all/0/1">E. Branchini</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Marulli_F/0/1/0/all/0/1">F. Marulli</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Norberg_P/0/1/0/all/0/1">P. Norberg</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Tavagnacco_D/0/1/0/all/0/1">D. Tavagnacco</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Torre_S/0/1/0/all/0/1">S. de la Torre</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Valiviita_J/0/1/0/all/0/1">J. Valiviita</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Viel_M/0/1/0/all/0/1">M. Viel</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Bel_J/0/1/0/all/0/1">J. Bel</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Frailis_M/0/1/0/all/0/1">M. Frailis</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Sanchez_A/0/1/0/all/0/1">A. G. S&#xe1;nchez</a>

The two-point correlation function of the galaxy distribution is a key
cosmological observable that allows us to constrain the dynamical and
geometrical state of our Universe. To measure the correlation function we need
to know both the galaxy positions and the expected galaxy density field. The
expected field is commonly specified using a Monte-Carlo sampling of the volume
covered by the survey and, to minimize additional sampling errors, this random
catalog has to be much larger than the data catalog. Correlation function
estimators compare data-data pair counts to data-random and random-random pair
counts, where random-random pairs usually dominate the computational cost.
Future redshift surveys will deliver spectroscopic catalogs of tens of millions
of galaxies. Given the large number of random objects required to guarantee
sub-percent accuracy, it is of paramount importance to improve the efficiency
of the algorithm without degrading its precision. We show both analytically and
numerically that splitting the random catalog into a number of subcatalogs of
the same size as the data catalog when calculating random-random pairs, and
excluding pairs across different subcatalogs provides the optimal error at
fixed computational cost. For a random catalog fifty times larger than the data
catalog, this reduces the computation time by a factor of more than ten without
affecting estimator variance or bias.

The two-point correlation function of the galaxy distribution is a key
cosmological observable that allows us to constrain the dynamical and
geometrical state of our Universe. To measure the correlation function we need
to know both the galaxy positions and the expected galaxy density field. The
expected field is commonly specified using a Monte-Carlo sampling of the volume
covered by the survey and, to minimize additional sampling errors, this random
catalog has to be much larger than the data catalog. Correlation function
estimators compare data-data pair counts to data-random and random-random pair
counts, where random-random pairs usually dominate the computational cost.
Future redshift surveys will deliver spectroscopic catalogs of tens of millions
of galaxies. Given the large number of random objects required to guarantee
sub-percent accuracy, it is of paramount importance to improve the efficiency
of the algorithm without degrading its precision. We show both analytically and
numerically that splitting the random catalog into a number of subcatalogs of
the same size as the data catalog when calculating random-random pairs, and
excluding pairs across different subcatalogs provides the optimal error at
fixed computational cost. For a random catalog fifty times larger than the data
catalog, this reduces the computation time by a factor of more than ten without
affecting estimator variance or bias.

http://arxiv.org/icons/sfx.gif