CMacIonize 2.0: a novel task-based approach to Monte Carlo radiation transfer. (arXiv:2006.15147v1 [astro-ph.IM])
<a href="http://arxiv.org/find/astro-ph/1/au:+Vandenbroucke_B/0/1/0/all/0/1">Bert Vandenbroucke</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Camps_P/0/1/0/all/0/1">Peter Camps</a>

(Context) Monte Carlo radiative transfer (MCRT) is a widely used technique to
model the interaction between radiation and a medium, and plays an important
role in astrophysical modelling and when comparing those models with
observations. (Aims) In this work, we present a novel approach to MCRT that
addresses the challenging memory access patterns of traditional MCRT
algorithms, which hinder optimal performance of MCRT simulations on modern
hardware with a complex memory architecture. (Methods) We reformulate the MCRT
photon packet life cycle as a task-based algorithm, whereby the computation is
broken down into small tasks that are executed concurrently. Photon packets are
stored in intermediate buffers, and tasks propagate photon packets through
small parts of the computational domain, moving them from one buffer to another
in the process. (Results) Using the implementation of the new algorithm in the
photoionization MCRT code CMacIonize 2.0, we show that the decomposition of the
MCRT grid into small parts leads to a significant performance gain during the
photon packet propagation phase, which constitutes the bulk of an MCRT
algorithm, as a result of better usage of memory caches. Our new algorithm is a
factor 2 to 4 faster than an equivalent traditional algorithm and shows good
strong scaling up to 30 threads. We briefly discuss how our new algorithm could
be adjusted or extended to other astrophysical MCRT applications. (Conclusions)
We show that optimising the memory access patterns of a memory-bound algorithm
such as MCRT can yield significant performance gains.

(Context) Monte Carlo radiative transfer (MCRT) is a widely used technique to
model the interaction between radiation and a medium, and plays an important
role in astrophysical modelling and when comparing those models with
observations. (Aims) In this work, we present a novel approach to MCRT that
addresses the challenging memory access patterns of traditional MCRT
algorithms, which hinder optimal performance of MCRT simulations on modern
hardware with a complex memory architecture. (Methods) We reformulate the MCRT
photon packet life cycle as a task-based algorithm, whereby the computation is
broken down into small tasks that are executed concurrently. Photon packets are
stored in intermediate buffers, and tasks propagate photon packets through
small parts of the computational domain, moving them from one buffer to another
in the process. (Results) Using the implementation of the new algorithm in the
photoionization MCRT code CMacIonize 2.0, we show that the decomposition of the
MCRT grid into small parts leads to a significant performance gain during the
photon packet propagation phase, which constitutes the bulk of an MCRT
algorithm, as a result of better usage of memory caches. Our new algorithm is a
factor 2 to 4 faster than an equivalent traditional algorithm and shows good
strong scaling up to 30 threads. We briefly discuss how our new algorithm could
be adjusted or extended to other astrophysical MCRT applications. (Conclusions)
We show that optimising the memory access patterns of a memory-bound algorithm
such as MCRT can yield significant performance gains.

http://arxiv.org/icons/sfx.gif