Assessment of Supervised Machine Learning for Atmospheric Retrieval of Exoplanets. (arXiv:2004.10755v1 [astro-ph.EP])
<a href="http://arxiv.org/find/astro-ph/1/au:+Nixon_M/0/1/0/all/0/1">Matthew C. Nixon</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Madhusudhan_N/0/1/0/all/0/1">Nikku Madhusudhan</a>
Atmospheric retrieval of exoplanets from spectroscopic observations requires
an extensive exploration of a highly degenerate and high-dimensional parameter
space to accurately constrain atmospheric parameters. Retrieval methods
commonly conduct Bayesian parameter estimation and statistical inference using
sampling algorithms such as Markov Chain Monte Carlo (MCMC) or Nested Sampling.
Recently several attempts have been made to use machine learning algorithms
either to complement or replace fully Bayesian methods. While much progress has
been made, these approaches are still at times unable to accurately reproduce
results from contemporary Bayesian retrievals. The goal of our present work is
to investigate the efficacy of machine learning for atmospheric retrieval. As a
case study, we use the Random Forest supervised machine learning algorithm
which has been applied previously with some success for atmospheric retrieval
of the hot Jupiter WASP-12b using its near-infrared transmission spectrum. We
reproduce previous results using the same approach and the same semi-analytic
models, and subsequently extend this method to develop a new algorithm that
results in a closer match to a fully Bayesian retrieval. We combine this new
method with a fully numerical atmospheric model and demonstrate excellent
agreement with a Bayesian retrieval of the transmission spectrum of another hot
Jupiter, HD 209458b. Despite this success, and achieving high computational
efficiency, we still find that the machine learning approach is computationally
prohibitive for high-dimensional parameter spaces that are routinely explored
with Bayesian retrievals with modest computational resources. We discuss the
trade offs and potential avenues for the future.
Atmospheric retrieval of exoplanets from spectroscopic observations requires
an extensive exploration of a highly degenerate and high-dimensional parameter
space to accurately constrain atmospheric parameters. Retrieval methods
commonly conduct Bayesian parameter estimation and statistical inference using
sampling algorithms such as Markov Chain Monte Carlo (MCMC) or Nested Sampling.
Recently several attempts have been made to use machine learning algorithms
either to complement or replace fully Bayesian methods. While much progress has
been made, these approaches are still at times unable to accurately reproduce
results from contemporary Bayesian retrievals. The goal of our present work is
to investigate the efficacy of machine learning for atmospheric retrieval. As a
case study, we use the Random Forest supervised machine learning algorithm
which has been applied previously with some success for atmospheric retrieval
of the hot Jupiter WASP-12b using its near-infrared transmission spectrum. We
reproduce previous results using the same approach and the same semi-analytic
models, and subsequently extend this method to develop a new algorithm that
results in a closer match to a fully Bayesian retrieval. We combine this new
method with a fully numerical atmospheric model and demonstrate excellent
agreement with a Bayesian retrieval of the transmission spectrum of another hot
Jupiter, HD 209458b. Despite this success, and achieving high computational
efficiency, we still find that the machine learning approach is computationally
prohibitive for high-dimensional parameter spaces that are routinely explored
with Bayesian retrievals with modest computational resources. We discuss the
trade offs and potential avenues for the future.
http://arxiv.org/icons/sfx.gif