High Performance Computing for gravitational lens modeling: single vs double precision on GPUs and CPUs. (arXiv:1902.03252v1 [astro-ph.IM])
<a href="http://arxiv.org/find/astro-ph/1/au:+Rexroth_M/0/1/0/all/0/1">Markus Rexroth</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Schafer_C/0/1/0/all/0/1">Christoph Schäfer</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Fourestey_G/0/1/0/all/0/1">Gilles Fourestey</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Kneib_J/0/1/0/all/0/1">Jean-Paul Kneib</a>
Strong gravitational lensing is a powerful probe of cosmology and the dark
matter distribution. Efficient lensing software is already a necessity to fully
use its potential and the performance demands will only increase with the
upcoming generation of telescopes. In this paper, we study the possible impact
of High Performance Computing techniques on a performance-critical part of the
widely used lens modeling software LENSTOOL. We implement the algorithm once as
a highly optimized CPU version and once with graphics card acceleration for a
simple parametric lens model. In addition, we study the impact of finite
machine precision on the lensing algorithm. While double precision is the
default choice for scientific applications, we find that single precision can
be sufficiently accurate for our purposes and lead to a big speedup. Therefore
we develop and present a mixed precision algorithm which only uses double
precision when necessary. We measure the performance of the different
implementations and find that the use of High Performance Computing Techniques
dramatically improves the code performance both on CPUs and GPUs. Compared to
the current LENSTOOL implementation on 12 CPU cores, we obtain speedup factors
of up to 170. We achieve this optimal performance by using our mixed precision
algorithm on a high-end GPU which is common in modern supercomputers. We also
show that these techniques reduce the energy consumption by up to 98%.
Furthermore, we demonstrate that a highly competitive speedup can be reached
with consumer GPUs. While they are an order of magnitude cheaper than the
high-end graphics cards, they are rarely used for scientific computations due
to their low double precision performance. Our mixed precision algorithm
unlocks their full potential. The consumer GPU delivers a speedup which is only
a factor of four lower than the best speedup achieved by a high-end GPU.
Strong gravitational lensing is a powerful probe of cosmology and the dark
matter distribution. Efficient lensing software is already a necessity to fully
use its potential and the performance demands will only increase with the
upcoming generation of telescopes. In this paper, we study the possible impact
of High Performance Computing techniques on a performance-critical part of the
widely used lens modeling software LENSTOOL. We implement the algorithm once as
a highly optimized CPU version and once with graphics card acceleration for a
simple parametric lens model. In addition, we study the impact of finite
machine precision on the lensing algorithm. While double precision is the
default choice for scientific applications, we find that single precision can
be sufficiently accurate for our purposes and lead to a big speedup. Therefore
we develop and present a mixed precision algorithm which only uses double
precision when necessary. We measure the performance of the different
implementations and find that the use of High Performance Computing Techniques
dramatically improves the code performance both on CPUs and GPUs. Compared to
the current LENSTOOL implementation on 12 CPU cores, we obtain speedup factors
of up to 170. We achieve this optimal performance by using our mixed precision
algorithm on a high-end GPU which is common in modern supercomputers. We also
show that these techniques reduce the energy consumption by up to 98%.
Furthermore, we demonstrate that a highly competitive speedup can be reached
with consumer GPUs. While they are an order of magnitude cheaper than the
high-end graphics cards, they are rarely used for scientific computations due
to their low double precision performance. Our mixed precision algorithm
unlocks their full potential. The consumer GPU delivers a speedup which is only
a factor of four lower than the best speedup achieved by a high-end GPU.
http://arxiv.org/icons/sfx.gif