NAS benchmarks
Encyclopedia
The NAS Parallel Benchmarks (NPB) are a set of benchmark
s targeting performance evaluation of highly parallel
supercomputer
s. They are developed and maintained by the NASA
Advanced Supercomputing (NAS) Division (formerly the NASA Numerical Aerodynamic Simulation Program) based at the NASA Ames Research Center
. NAS solicits performance results for NPB from all sources.
, the LINPACK Benchmark
and the NAS Kernel Benchmark Program, were usually specialized for vector computers. They generally suffered from inadequacies including parallelism-impeding tuning restrictions and insufficient problem sizes, which rendered them inappropriate for highly parallel systems. Equally unsuitable were full-scale application benchmarks due to high porting cost and unavailability of automatic software parallelization tools. As a result, NPB were developed in 1991 and released in 1992 to address the ensuing lack of benchmarks applicable to highly parallel machines.
In the light of these guidelines, it was deemed the only viable approach to use a collection of "paper-and-pencil" benchmarks that specified a set of problems only algorithmically and left most implementation details to the implementer's discretion under certain necessary limits.
NPB 1 defined eight benchmarks, each in two problem sizes dubbed Class A and Class B. Sample codes written in Fortran 77 were supplied. They used a small problem size Class S and were not intended for benchmarking purposes.
NPB 2, released in 1996, came with source code implementations for five out of eight benchmarks defined in NPB 1 to supplement but not replace NPB 1. It extended the benchmarks with an up-to-date problem size Class C. It also amended the rules for submitting benchmarking results. The new rules included explicit requests for output files as well as modified source files and build scripts to ensure public availability of the modifications and reproducibility of the results.
NPB 2.2 contained implementations of two more benchmarks. NPB 2.3 of 1997 was the first complete implementation in MPI
. It shipped with serial versions of the benchmarks consistent with the parallel versions and defined a problem size Class W for small-memory systems. NPB 2.4 of 2002 offered a new MPI implementation and introduced another still larger problem size Class D. It also augmented one benchmark with I/O
-intensive subtypes.
, Java
and High Performance Fortran
. These new parallel implementations were derived from the serial codes in NPB 2.3 with additional optimizations. NPB 3.1 and NPB 3.2 added three more benchmarks, which, however, were not available across all implementations; NPB 3.3 introduced a Class E problem size. Based on the single-zone NPB 3, a set of multi-zone benchmarks taking advantage of the MPI/OpenMP hybrid programming model were released under the name NPB-Multi-Zone (NPB-MZ) for "testing the effectiveness of multi-level and hybrid parallelization paradigms and tools".
Benchmark (computing)
In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it...
s targeting performance evaluation of highly parallel
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...
s. They are developed and maintained by the NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...
Advanced Supercomputing (NAS) Division (formerly the NASA Numerical Aerodynamic Simulation Program) based at the NASA Ames Research Center
NASA Ames Research Center
The Ames Research Center , is one of the United States of America's National Aeronautics and Space Administration 10 major field centers.The centre is located in Moffett Field in California's Silicon Valley, near the high-tech companies, entrepreneurial ventures, universities, and other...
. NAS solicits performance results for NPB from all sources.
Motivation
Traditional benchmarks that existed before NPB, such as the Livermore loopsLivermore loops
Livermore loops is a benchmark for parallel computers. It was created by Francis H. McMahon from scientific source code run on computers at Lawrence Livermore National Laboratory...
, the LINPACK Benchmark
LINPACK
LINPACK is a software library for performing numerical linear algebra on digital computers. It was written in Fortran by Jack Dongarra, Jim Bunch, Cleve Moler, and Gilbert Stewart, and was intended for use on supercomputers in the 1970s and early 1980s...
and the NAS Kernel Benchmark Program, were usually specialized for vector computers. They generally suffered from inadequacies including parallelism-impeding tuning restrictions and insufficient problem sizes, which rendered them inappropriate for highly parallel systems. Equally unsuitable were full-scale application benchmarks due to high porting cost and unavailability of automatic software parallelization tools. As a result, NPB were developed in 1991 and released in 1992 to address the ensuing lack of benchmarks applicable to highly parallel machines.
NPB 1
The first specification of NPB recognized that the benchmarks should feature- new parallel-aware algorithmic and software methods,
- genericness and architecture neutrality,
- easy verifiability of correctness of results and performance figures,
- capability of accommodating new systems with increased power,
- and ready distributability.
In the light of these guidelines, it was deemed the only viable approach to use a collection of "paper-and-pencil" benchmarks that specified a set of problems only algorithmically and left most implementation details to the implementer's discretion under certain necessary limits.
NPB 1 defined eight benchmarks, each in two problem sizes dubbed Class A and Class B. Sample codes written in Fortran 77 were supplied. They used a small problem size Class S and were not intended for benchmarking purposes.
NPB 2
Since its release, NPB 1 displayed two major weaknesses. Firstly, due to its "paper-and-pencil" specification, computer vendors usually highly tuned their implementations so that their performance became difficult for scientific programmers to attain. Secondly, many of these implementation were proprietary and not publicly available, effectively concealing their optimizing techniques. Secondly, problem sizes of NPB 1 lagged behind the development of supercomputers as the latter continued to evolve.NPB 2, released in 1996, came with source code implementations for five out of eight benchmarks defined in NPB 1 to supplement but not replace NPB 1. It extended the benchmarks with an up-to-date problem size Class C. It also amended the rules for submitting benchmarking results. The new rules included explicit requests for output files as well as modified source files and build scripts to ensure public availability of the modifications and reproducibility of the results.
NPB 2.2 contained implementations of two more benchmarks. NPB 2.3 of 1997 was the first complete implementation in MPI
Message Passing Interface
Message Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...
. It shipped with serial versions of the benchmarks consistent with the parallel versions and defined a problem size Class W for small-memory systems. NPB 2.4 of 2002 offered a new MPI implementation and introduced another still larger problem size Class D. It also augmented one benchmark with I/O
Input/output
In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...
-intensive subtypes.
NPB 3
NPB 3 retained the MPI implementation from NPB 2 and came in more flavors, namely OpenMPOpenMP
OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms...
, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
and High Performance Fortran
High Performance Fortran
High Performance Fortran is an extension of Fortran 90 with constructs that support parallel computing, published by the High Performance Fortran Forum . The HPFF was convened and chaired by Ken Kennedy of Rice University...
. These new parallel implementations were derived from the serial codes in NPB 2.3 with additional optimizations. NPB 3.1 and NPB 3.2 added three more benchmarks, which, however, were not available across all implementations; NPB 3.3 introduced a Class E problem size. Based on the single-zone NPB 3, a set of multi-zone benchmarks taking advantage of the MPI/OpenMP hybrid programming model were released under the name NPB-Multi-Zone (NPB-MZ) for "testing the effectiveness of multi-level and hybrid parallelization paradigms and tools".
The benchmarks
As of NPB 3.3, eleven benchmarks are defined as summarized in the following table.Benchmark | Name derived from | Available since | Description | Remarks |
---|---|---|---|---|
MG | MultiGrid | NPB 1 | Approximate the solution to a three-dimensional discrete Poisson equation Discrete Poisson equation In mathematics, the discrete Poisson equation is the finite difference analog of the Poisson equation. In it, the discrete Laplace operator takes the place of the Laplace operator... using the V-cycle multigrid method Multigrid method Multigrid methods in numerical analysis are a group of algorithms for solving differential equations using a hierarchy of discretizations. They are an example of a class of techniques called multiresolution methods, very useful in problems exhibiting multiple scales of behavior... |
|
CG | Conjugate Gradient | Estimate the smallest eigenvalue of a large sparse Sparse matrix In the subfield of numerical analysis, a sparse matrix is a matrix populated primarily with zeros . The term itself was coined by Harry M. Markowitz.... symmetric positive-definite matrix Positive-definite matrix In linear algebra, a positive-definite matrix is a matrix that in many ways is analogous to a positive real number. The notion is closely related to a positive-definite symmetric bilinear form .... using the inverse iteration Inverse iteration In numerical analysis, inverse iteration is an iterative eigenvalue algorithm. It allows to find an approximateeigenvector when an approximation to a corresponding eigenvalue is already known.... with the conjugate gradient method Conjugate gradient method In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. The conjugate gradient method is an iterative method, so it can be applied to sparse systems that are too... as a subroutine for solving systems of linear equations |
||
FT | Fast Fourier Transform | Solve a three-dimensional partial differential equation Partial differential equation In mathematics, partial differential equations are a type of differential equation, i.e., a relation involving an unknown function of several independent variables and their partial derivatives with respect to those variables... (PDE) using the fast Fourier transform Fast Fourier transform A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple... (FFT) |
||
IS | Integer Sort Integer sorting In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by numeric keys, each of which is an integer. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers or text strings... |
Sort small integers using the bucket sort Bucket sort Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, and is a cousin of... |
||
EP | Embarrassingly Parallel Embarrassingly parallel In parallel computing, an embarrassingly parallel workload is one for which little or no effort is required to separate the problem into a number of parallel tasks... |
Generate independent Gaussian random variate Random variate A random variate is a particular outcome of a random variable: the random variates which are other outcomes of the same random variable would have different values. Random variates are used when simulating processes driven by random influences... s using the Marsaglia polar method Marsaglia polar method The polar method is a pseudo-random number sampling method for generating a pair of independent standard normal random variables... |
||
BT | Block Tridiagonal | Solve a synthetic system of nonlinear PDEs using three different algorithms involving block tridiagonal, scalar pentadiagonal Pentadiagonal matrix In linear algebra, a pentadiagonal matrix is a matrix that is nearly diagonal; to be exact, it is a matrix in which the only nonzero entries are on the main diagonal, and the first two diagonals above and below it... and symmetric successive over-relaxation Successive over-relaxation In numerical linear algebra, the method of successive over-relaxation is a variant of the Gauss–Seidel method for solving a linear system of equations, resulting in faster convergence. A similar method can be used for any slowly converging iterative process.It was devised simultaneously by David... (SSOR) solver kernels, respectively |
|
|
SP | Scalar Pentadiagonal | |||
LU | Lower-Upper symmetric Gauss-Seidel | |||
UA | Unstructured Adaptive | NPB 3.1 | ||
DC | Data Cube Data cube In computer programming contexts, a data cube is a three- dimensional array of values, commonly used to describe a time series of image data... operator |
|||
DT | Data Traffic | NPB 3.2 |
External links
- NAS Parallel Benchmarks Changes (official website)