European Numerical Mathematics and
Advanced Applications Conference 2019
30th sep - 4th okt 2019, Egmond aan Zee, The Netherlands
15:45   Numerical Linear Algebra (Part 1)
Chair: Chandrasekhar Venkataraman
25 mins
On the Dirichlet-to-Neumann coarse space for solving the Helmholtz problem using domain decomposition
Niall Bootland, Victorita Dolean
Abstract: We examine the use of the Dirichlet-to-Neumann coarse space within an additive Schwarz method to solve the Helmholtz equation in 2D. In particular, we focus on the selection of how many eigenfunctions should go into the coarse space. We find that wave number independent convergence of a preconditioned iterative method can be achieved in certain special cases with an appropriate and novel choice of threshold in the selection criteria. However, this property is lost in a more general setting, including the heterogeneous problem. Nonetheless, the approach converges in a small number of iterations for the homogeneous problem even for relatively large wave numbers and is robust to the number of subdomains used.
25 mins
Accurate and Reproducible CG Method on GPUs
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki
Abstract: This talk presents an accurate and reproducible implementation of the Conjugate Gradient (CG) method on GPUs. As floating-point computations suffer from rounding errors, the result may be inaccurate and not always the same depending on the order of the arithmetic operation (i.e., non-reproducible). On CG methods, the number of iterations until convergence can be increased by the loss of accuracy. Moreover, the loss of reproducibility may become an issue to reproduce the same convergence on different environments (e.g., on CPU and GPU). Accordingly, its accurate and reproducible implementation is desired. To achieve reproducibility as well as tunable accuracy, we utilize an accurate matrix-multiplication method proposed by Ozaki et al (Ozaki scheme). The method consists of the error-free transformation for matrix-multiplication and an accurate summation method: a matrix-multiplication is transformed into the summation of multiple matrix-multiplications of the matrices split from the input matrices. Those matrix-multiplications can be computed using standard floating-point operations (i.e., DGEMM for double-precision matrices), and it does not matter whether the DGEMM is reproducible or not. This is a significant advantage from the viewpoint of the development cost. The method can achieve not only correct-rounding but also a certain accuracy on demand by changing the number of split matrices used in the computation. The accuracy also depends on the absolute range of the floating-point values in the input matrices as well as the dimension for inner-product. Besides, if the summation is computed by some reproducible method, the accurate matrix-multiplication result also achieves reproducibility. In this study, we used the NearSum, a summation algorithm which can achieve correct-rounding, by Rump et al. The method is applicable for any inner-product based operations including the kernels required in non-preconditioned CG methods, sparse matrix-vector multiplication (SpMV), inner-product (DOT), and some vector-vector operations. The most time-consuming portion of non-preconditioned CG methods is SpMV. The SpMV with the Ozaki scheme can be implemented by utilizing sparse-matrix - dense-matrix multiplication routine (SpMM), which is available on NVIDIA cuSparse library. By utilizing SpMM, the theoretical overhead of the Ozaki scheme compared to the standard floating-operation becomes 4d times where d is the number of split vectors or matrices. Here, the coefficient of 4 represents the number of memory references for the splitting process of the input matrix. Since the SpMV handles the same matrix on each iteration, the splitting is required only once. Thus, the overhead per iteration becomes just d. Furthermore, there is a possibility to reduce the cost for the multiple (d) SpMMs more by implementing a kernel code to compute those SpMMs simultaneously, although we need to implement it from scratch. In our presentation, we will demonstrate the accuracy, reproducibility, and the performance on NVIDIA Titan V, a Volta architecture GPU.
25 mins
The Master-Slave splitting extended to power flow problems on integrated networks with an unbalanced distribution system
Marieke Kootte, Kees Vuik
Abstract: Electrical power systems are complex systems and traditionally modeled in two separate systems: power is generated at the transmission system and at several substations converted to the distribution systems. The increasing amount of generation produced at distribution level, such as locally produced photovoltaic, can eventually effect the transmission network. An integrated model of both systems can help studying these effects and prevent harmful events on the power system. Transmission and Distribution systems differ significantly from each other. Where transmission systems are assumed to be balanced and therefore modeled as a single-phase system, the distribution systems are in general unbalanced and should be modeled in three-phase. Furthermore, the condition of distribution lines, the lower voltage level, the radial structure and the presence of unbalanced loading lead to different solution techniques. Connecting these two systems to an integrated system pose complications for both the solution method as the connection model. One way to do power flow simulations on integrated transmission-distribution networks is the Master-Slave splitting. This method splits the integrated network and iterates between the separate transmission (the master) and distribution (the slave) network. In this paper, we extend the method to hybrid networks: a network consisting of a balanced transmission and an unbalanced distribution network. This network requires a transformation on the boundary between the two separate networks. We explain two approaches to run the Master-Slave splitting on a hybrid network and compare these approaches on accuracy, computational time, and convergence by doing test-simulations. The Master-Slave splitting is interesting when distribution and transmission systems have different characteristics and are in geographically distinct locations, not able or allowed to share data of the entire network with each other. The extension to hybrid networks makes this method general applicable and an interesting choice to do power flow simulations on integrated networks. REFERENCES [1] H. Sun, Q. Guo, B. Zhang, Y. Guo, Z. Li, and J. Wang. Master - Slave-Splitting Based Distributed Global Power Flow Method for Integrated Transmission and Distribution Analysis. IEEE Transac- tions on Smart Grid, 6(3):1484–1492, May 2015. [2] G.N. Taranto and J.M. Marinho. A Hybrid Three-Phase Single-Phase Power Flow Formulation. IEEE Transactions on Power Systems, 23(3):1063–1070, 2008.
25 mins
Preconditioning vs. Prehandling of ill-conditioned problems - how to solve large-scale Poisson problems on low-precision hardware
Stefan Turek, Peter Zajac, Dirk Ribbrock
Abstract: Solving large-scale Poisson problems with large numbers of unknowns, for instance as basic component (for "pressure-Poisson problems") in incompressible flow solvers, can be characterized as (still today) challenging numerical and computational task which often requires small mesh sizes to guarantee sufficient (problem-dependent) accuracy. However, the corresponding condition number of the stiffness matrices, resulting from finite element, finite difference or finite volume discretizations, typically scales with O(h**-2), independent of the space dimension, with h representing the mesh size. Moreover, higher polynomial order of the FEM functions or large jumps in the coefficients (for instance due to density or viscosity changes) may additionally increase the condition number. This "bad" condition number typically leads to numerical problems in case of more and more grid refinement even in double precision arithmetics and is much more visible in single or half precision calculations which are typical for GPUs and new AI hardware components (like TPUs). So, the central question reads: Is is possible at all to solve large-scale Poisson problems on low-precision hardware? First of all, we illustrate the above mentioned problems by several numerical examples and motivate the idea of "prehandling of matrices": We assume the existence of (linear) transformations of the original linear system of equations into an equivalent formulation which results - for exact arithmetic precision - in the same solution of the linear systems, however with much smaller condition numbers so that also the use of low-precision hardware might get feasible. Here, the prehandling procedure is requested to be of complexity O(n), resp., O(n*log n) (with n denoting the problem size) and the resulting matrix should be of similar sparsity as the original one. In our talk, we provide preliminary numerical results as "proof-of-concept" and discuss the challenges and open problems.