EnuMath 2019

08:30 MS39: Flexible software design and performance tuning for modern HPC architectures (Part 1)
Chair: Dominik Goeddeke

08:30 25 mins	Strategies for the vectorized Block Conjugate Gradients method Nils-Arne Dreier, Engwer Christian Abstract: Block Krylov methods have recently gained a lot of attraction. Due to their increased arithmetic intensity they offer a promising way to improve performance on modern hardware. Recently Frommer et al. presented a block Krylov framework that combines the advantages of block Krylov methods and data parallel methods. We review this framework and apply it on the Block Conjugate Gradients method, to solve linear systems with multiple right hand sides. In this course we consider challenges that occur on modern hardware, like a limited memory bandwidth, the use of SIMD instructions and the communication overhead. We present a performance model to predict the efficiency of different Block CG variants and compare these with experimental numerical results.
08:55 25 mins	Leveraging generative programming for performance portability of high order DG methods Dominic Kempf, Heß René, Steffen Müthing, Peter Bastian Abstract: Explicit SIMD vectorization is one of the key challenges in achieving good floating point performance on modern HPC platforms. Typically, compiler-based auto vectorization is only applicable if the problem exhibits a favorable structure. However, domain knowledge may be used to identify alternate sources of parallelism to be used for SIMD vectorization at the cost of writing code, that is tuned specifically to the PDE problem at hand and to the target architecture. We solve the maintainability issue of such codes through generative programming: Explicitly vectorized finite element assembly kernels to be used with the discretization framework dune-pdelab are generated from a DSL (UFL) describing the finite element assembly problem. We demonstrate the power and flexibility of the approach for high order DG methods on hexahedra, ex- ploiting the tensor product structure of basis functions and quadrature formulae through sum factorization. In the code generation process, we focus on powerful intermediate representations, that allow transformation-based performance tuning via autotuning. Performance studies for the Intel Skylake microarchitecture using the AVX512 instruction set are shown.
09:20 25 mins	FLEXI: A Discontinuous Galerkin Framework for Hyperbolic PDEs on High Performance Systems Nico Krais, Andrea Beck, Thomas Bolemann, Claus-Dieter Munz, Philipp Offenhaeuser Abstract: This talk presents current challenges and progress in the development of the hyperbolic PDE solution framework FLEXI. The solver utilizes the discontinuous Galerkin spectral element method (DGSEM) for the spatial discretization of systems of hyperbolic PDEs and is mainly aimed at scale-resolving simulations in the field of computational fluid dynamics at large scale. DGSEM lends itself well to high performance applications due to high-order accuracy, the tensorproduct structure of the spatial operator as well as its compact stencil, which inherently allows for efficient parallelization. Combined with a low-storage explicit time stepping scheme, the resulting algorithm can be efficiently implemented on modern HPC systems. Current development combines both improvements to the numerical method, e.g. by more efficient treatment of aliasing issues and optimized usage of the computational hardware. Examples for algorithmic optimization are discussed, e.g. communication latency hiding, load balancing and I/O on large scales. The resulting scaling properties of the framework will be demonstrated both for artificial scaling test as well as with real applications. Besides parallel performance, optimization efforts at node level are presented using the example of vectorization of an numerical integration algorithm on a Cray XC40 system.
09:45 25 mins	Augmented Lagrangian Preconditioning with FreeFEM Pierre Jolivet Abstract: Hydrodynamic linear stability analysis of large-scale three-dimensional configurations may require to solve the nonlinear steady Navier--Stokes equations with the Newton method and to determine the largest growth-rate eigenmodes of the linearized equations using a shift-and-invert spectral transformation and a Krylov--Schur algorithm. The solution of the shifted linearized Navier--Stokes problem, which is the bottleneck of this approach, is computed via an iterative Krylov subspace solver preconditioned by the modified augmented Lagrangian preconditioner. We will show how this can be implemented using a high-level language, FreeFEM, on top of PETSc and SLEPc. This work is also extended to fluid--structure interaction studies.