Parthenon — a performance portable block-structured adaptive mesh refinement framework. (arXiv:2202.12309v3 [cs.DC] UPDATED)
<a href="http://arxiv.org/find/cs/1/au:+Grete_P/0/1/0/all/0/1">Philipp Grete</a>, <a href="http://arxiv.org/find/cs/1/au:+Dolence_J/0/1/0/all/0/1">Joshua C. Dolence</a>, <a href="http://arxiv.org/find/cs/1/au:+Miller_J/0/1/0/all/0/1">Jonah M. Miller</a>, <a href="http://arxiv.org/find/cs/1/au:+Brown_J/0/1/0/all/0/1">Joshua Brown</a>, <a href="http://arxiv.org/find/cs/1/au:+Ryan_B/0/1/0/all/0/1">Ben Ryan</a>, <a href="http://arxiv.org/find/cs/1/au:+Gaspar_A/0/1/0/all/0/1">Andrew Gaspar</a>, <a href="http://arxiv.org/find/cs/1/au:+Glines_F/0/1/0/all/0/1">Forrest Glines</a>, <a href="http://arxiv.org/find/cs/1/au:+Swaminarayan_S/0/1/0/all/0/1">Sriram Swaminarayan</a>, <a href="http://arxiv.org/find/cs/1/au:+Lippuner_J/0/1/0/all/0/1">Jonas Lippuner</a>, <a href="http://arxiv.org/find/cs/1/au:+Solomon_C/0/1/0/all/0/1">Clell J. Solomon</a>, <a href="http://arxiv.org/find/cs/1/au:+Shipman_G/0/1/0/all/0/1">Galen Shipman</a>, <a href="http://arxiv.org/find/cs/1/au:+Junghans_C/0/1/0/all/0/1">Christoph Junghans</a>, <a href="http://arxiv.org/find/cs/1/au:+Holladay_D/0/1/0/all/0/1">Daniel Holladay</a>, <a href="http://arxiv.org/find/cs/1/au:+Stone_J/0/1/0/all/0/1">James M. Stone</a>, <a href="http://arxiv.org/find/cs/1/au:+Roberts_L/0/1/0/all/0/1">Luke F. Roberts</a>

On the path to exascale the landscape of computer device architectures and
corresponding programming models has become much more diverse. While various
low-level performance portable programming models are available, support at the
application level lacks behind. To address this issue, we present the
performance portable block-structured adaptive mesh refinement (AMR) framework
Parthenon, derived from the well-tested and widely used Athena++ astrophysical
magnetohydrodynamics code, but generalized to serve as the foundation for a
variety of downstream multi-physics codes. Parthenon adopts the Kokkos
programming model, and provides various levels of abstractions from
multi-dimensional variables, to packages defining and separating components, to
launching of parallel compute kernels. Parthenon allocates all data in device
memory to reduce data movement, supports the logical packing of variables and
mesh blocks to reduce kernel launch overhead, and employs one-sided,
asynchronous MPI calls to reduce communication overhead in multi-node
simulations. Using a hydrodynamics miniapp, we demonstrate weak and strong
scaling on various architectures including AMD and NVIDIA GPUs, Intel and AMD
x86 CPUs, IBM Power9 CPUs, as well as Fujitsu A64FX CPUs. At the largest scale
on Frontier (the first TOP500 exascale machine), the miniapp reaches a total of
$1.7times10^{13}$ zone-cycles/s on 9,216 nodes (73,728 logical GPUs) at ~92%
weak scaling parallel efficiency (starting from a single node). In combination
with being an open, collaborative project, this makes Parthenon an ideal
framework to target exascale simulations in which the downstream developers can
focus on their specific application rather than on the complexity of handling
massively-parallel, device-accelerated AMR.

On the path to exascale the landscape of computer device architectures and
corresponding programming models has become much more diverse. While various
low-level performance portable programming models are available, support at the
application level lacks behind. To address this issue, we present the
performance portable block-structured adaptive mesh refinement (AMR) framework
Parthenon, derived from the well-tested and widely used Athena++ astrophysical
magnetohydrodynamics code, but generalized to serve as the foundation for a
variety of downstream multi-physics codes. Parthenon adopts the Kokkos
programming model, and provides various levels of abstractions from
multi-dimensional variables, to packages defining and separating components, to
launching of parallel compute kernels. Parthenon allocates all data in device
memory to reduce data movement, supports the logical packing of variables and
mesh blocks to reduce kernel launch overhead, and employs one-sided,
asynchronous MPI calls to reduce communication overhead in multi-node
simulations. Using a hydrodynamics miniapp, we demonstrate weak and strong
scaling on various architectures including AMD and NVIDIA GPUs, Intel and AMD
x86 CPUs, IBM Power9 CPUs, as well as Fujitsu A64FX CPUs. At the largest scale
on Frontier (the first TOP500 exascale machine), the miniapp reaches a total of
$1.7times10^{13}$ zone-cycles/s on 9,216 nodes (73,728 logical GPUs) at ~92%
weak scaling parallel efficiency (starting from a single node). In combination
with being an open, collaborative project, this makes Parthenon an ideal
framework to target exascale simulations in which the downstream developers can
focus on their specific application rather than on the complexity of handling
massively-parallel, device-accelerated AMR.

http://arxiv.org/icons/sfx.gif