Modeling high-dimensional dependence among astronomical data. (arXiv:2006.06268v2 [stat.ME] UPDATED)
<a href="http://arxiv.org/find/stat/1/au:+Vio_R/0/1/0/all/0/1">Roberto Vio</a>, <a href="http://arxiv.org/find/stat/1/au:+Nagler_T/0/1/0/all/0/1">Thomas W. Nagler</a>, <a href="http://arxiv.org/find/stat/1/au:+Andreani_P/0/1/0/all/0/1">Paola Andreani</a>

Fixing the relationship of a set of experimental quantities is a fundamental
issue in many scientific disciplines. In the 2D case, the classical approach is
to compute the linear correlation coefficient from a scatterplot. This method,
however, implicitly assumes a linear relationship between the variables. Such
an assumption is not always correct. With the use of the partial correlation
coefficients, an extension to the multidimensional case is possible. However,
the problem of the assumed mutual linear relationship of the variables remains.
A relatively recent approach that makes it possible to avoid this problem is
the modeling of the joint probability density function (PDF) of the data with
copulas. These are functions that contain all the information on the
relationship between two random variables. Although in principle this approach
also can work with multidimensional data, theoretical as well computational
difficulties often limit its use to the 2D case. In this paper, we consider an
approach based on so-called vine copulas, which overcomes this limitation and
at the same time is amenable to a theoretical treatment and feasible from the
computational point of view. We applied this method to published data on the
near-IR and far-IR luminosities and atomic and molecular masses of the Herschel
reference sample, a volume-limited sample in the nearby Universe. We determined
the relationship of the luminosities and gas masses and show that the far-IR
luminosity can be considered as the key parameter relating the other three
quantities. Once removed from the 4D relation, the residual relation among the
latter is negligible. This may be interpreted as the correlation between the
gas masses and near-IR luminosity being driven by the far-IR luminosity, likely
by the star formation activity of the galaxy.

Fixing the relationship of a set of experimental quantities is a fundamental
issue in many scientific disciplines. In the 2D case, the classical approach is
to compute the linear correlation coefficient from a scatterplot. This method,
however, implicitly assumes a linear relationship between the variables. Such
an assumption is not always correct. With the use of the partial correlation
coefficients, an extension to the multidimensional case is possible. However,
the problem of the assumed mutual linear relationship of the variables remains.
A relatively recent approach that makes it possible to avoid this problem is
the modeling of the joint probability density function (PDF) of the data with
copulas. These are functions that contain all the information on the
relationship between two random variables. Although in principle this approach
also can work with multidimensional data, theoretical as well computational
difficulties often limit its use to the 2D case. In this paper, we consider an
approach based on so-called vine copulas, which overcomes this limitation and
at the same time is amenable to a theoretical treatment and feasible from the
computational point of view. We applied this method to published data on the
near-IR and far-IR luminosities and atomic and molecular masses of the Herschel
reference sample, a volume-limited sample in the nearby Universe. We determined
the relationship of the luminosities and gas masses and show that the far-IR
luminosity can be considered as the key parameter relating the other three
quantities. Once removed from the 4D relation, the residual relation among the
latter is negligible. This may be interpreted as the correlation between the
gas masses and near-IR luminosity being driven by the far-IR luminosity, likely
by the star formation activity of the galaxy.

http://arxiv.org/icons/sfx.gif