Lowain Project

WRF Microphysics Driver: Arithmetic Intensity

The graph relates to the computation of the WRF Microphysics Driver, using single precision numbers (less data traffic across the processor-memory interface).

The arithmetic intensity of the computation (the flop/byte ratio - F/B) is shown as a function of the cache size (in single precision numbers); a virtual computer with variable cache size was used. The y-axis also shows the roof-line extrapolated Summit performance corresponding to the given F/B ratio (note that the Summit's single precision peak is 400 PFlop/s).

The upper bundle of curves corresponds to the assumption that one EXP function computation takes the same time as 15 MPY (a realistic assumption for single precision).

The horizontal magenta line (SP SpMV) is an estimation of the F/B of a single precision of a sparse matrix - vector product, the basic part of the HPCG. The Driver data for 1 EXP = 15 MPY are not much higher.

The left vertical Volta-100 line shows the size of cache of Nvidia Volta-100 per one core. The right vertical line is the size of all caches of Volta-100 in one package. Note that the grid sizes used in the experiment are quite small for recent standards, larger grids would shift the increase of the F/B to much larger cache sizes.

Consequently, the F/B of the Microphysics driver, using caches of realistic sizes, is almost as low as the F/B of the HPCG, thus making LOWAIN a promising architecture for WRF (and NWP) computations.