We provide these floating point performance numbers as a guide to users to indicate the type of floating point rates they should expect while using PETSc. We have done our best to provide fair and accurate values but do not guarantee any of the numbers presented here.
See the "Profiling" chapter of the PETSc users manual for instructions on techniques to obtain accurate performance numbers with PETSc
Single Processor Performance
In many PDE application codes one most solve systems of linear equations
arising from the descretization of multicomponent PDEs, the sparse matrices computed
naturally have a block structure.
PETSc has special sparse matrix storage formats and routines to take advantage of that block structure to deliver much higher (two or three times as high) floating point computation rates. Below we give the floating point rates for the matrix-vector product for a 1503 by 1503 sparse matrix with a block size of three arising from a simple oil reservoir simulation.
The next table depicts performance for the entire linear solve using GMRES(30) and ILU(0) preconditioning.
These tests were run using the code src/sles/examples/tutorials/ex10.c with the options
mpiexec -n 1 ex10 -f0 arco1 -f1 arco1 -pc_type ilu -ksp_gmres_unmodifiedgramschmidt -optionsleft -mat_baij -matload_block_size 3 -log_summary
Parallel Performance for Euler Solver
Scalability for Laplacian
A typical "model" problem people work with in numerical analysis for PDEs is the
Laplacian. Discretization of the Laplacian in two dimensions with finite differences
is typically done using the "five point" stencil. This results in a very sparse
(at most five nonzeros per row), ill-conditioned matrix.
Because the matrix is so sparse and has no block structure it is difficult to get very good sequential or parallel floating point performance, especially for small problems. Here we demonstrate scalability of the parallel PETSc matrix vector product for the five point stencil on two grids. These were run on three machines: an IBM SP2 with the Power2Super chip and two memory cards at ANL, the Cray T3E at NERSC and the Origin2000 at NCSA.
Since PETSc is intended for much more general problems then the Laplacian we don't consider the Laplacian to be a particularlly important benchmark; we include it due to interest from the community.
Notes: The problem here is simply to small to parallelize on a distributed memory computer.