Sunteți pe pagina 1din 27

Intel Math Kernel Library

Linear Solvers Basics

Document Number: 308659-001 World Wide Web: http://developer.intel.com

egal Information

Version -001

Version Information Contains information derived from Intel MKL Reference Manual for 8.0 gold release (document number 630813-019)

Date 07/05

The information in this document is subject to change without notice and Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. This document and the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. The information in this document is provided in connection with Intel products and should not be construed as a commitment by Intel Corporation. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The software described in this document may contain software defects which may cause the product to deviate from published specifications. Current characterized software defects are available on request. Intel, the Intel logo, Intel SpeedStep, Intel NetBurst, Intel NetStructure, MMX, Intel386, Intel486, Celeron, Intel Centrino, Intel Xeon, Intel XScale, Itanium, Pentium, Pentium II Xeon, Pentium III Xeon, Pentium M, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands may be claimed as the property of others. Copyright 2005, Intel Corporation. Portions Copyright 2001 Hewlett-Packard Development Company, L.P.

ii

Contents
Overview ................................................................................................... A-2 Sparse Linear Systems ............................................................................. A-2 Matrix Fundamentals ............................................................................ A-3 Direct Method ....................................................................................... A-4 Fill-In and Reordering of Sparse Matrices ....................................... A-5 Sparse Matrix Storage Formats............................................................ A-8 Storage Formats for the PARDISO Solver ....................................... A-9 Sparse Storage Formats for Sparse BLAS Levels 2-3................... A-12 CSR Format ................................................................................... A-12 CSC Format ................................................................................... A-14 Coordinate Format ......................................................................... A-15 Diagonal Storage Scheme ............................................................. A-15 Skyline Storage Scheme................................................................ A-16 Interval Linear Systems........................................................................... A-18 Intervals .............................................................................................. A-18 Interval vectors and matrices.............................................................. A-19 Interval Linear Systems ................................................................. A-20 Preconditioning................................................................................... A-23 Inverting interval matrices................................................................... A-23 References .............................................................................................. A-24

iii

Linear Solvers Basics

The Intel Math Kernel Library (Intel MKL) provides Fortran routines and functions that perform a wide variety of operations on vectors and matrices including sparse matrices and interval matrices. The library also includes discrete Fourier transform routines, as well as vector mathematical and vector statistical functions with Fortran and C interfaces. Several Intel MKL domains (including, for example, LAPACK routines, Sparse Solver routines, or Interval Arithmetic routines) provide functionality aimed at solving systems of linear equations and require understanding of basic concepts and methods used for solving such problems, as well as knowledge of specific data storage formats. Thus, the LAPACK routines of Intel MKL include routines for solving systems of linear equations, factoring and inverting matrices, and estimating condition numbers. Direct sparse solver routines in Intel MKL solve symmetric and symmetrically-structured sparse matrices with real or complex coefficients. Iterative sparse solver in Intel MKL uses sparse BLAS level 2 and 3 routines and works with different sparse data formats that take advantage of vector and matrix sparsity and allow you to store only non-zero elements of vectors and matrices. Interval Solver routines included into Intel MKL can be used to solve interval systems of linear equations and related problems. This document (included also as an appendix A in the Intel MKL Reference Manual) describes some basic terms and definitions used in solving linear systems with more emphasis on sparse systems and sparse data storage formats, as well as gives general concepts and approaches used for solving interval linear systems.

A-1

Intel Math Kernel Library Reference Manual

Overview
Many applications in science and engineering require the solution of a system of linear equations. This problem is usually expressed mathematically by the matrix-vector equation, Ax = b, where A is an m-by-n matrix, x is the n element column vector and b is the m element column vector. The matrix A is usually referred to as the coefficient matrix, and the vectors x and b are referred to as the solution vector and the right-hand side, respectively. Basic concepts related to solving linear systems with sparse matrices are described in section Sparse Linear Systems that follows. If the coefficients in matrix A and right-hand sides in vector b are not defined exactly but rather belong to known intervals, the system is called an interval linear system. Some basic definitions and concepts used in solving interval linear systems are described in Interval Linear Systems section below.

Sparse Linear Systems


In many real-life applications, most of the elements in A are zero. Such a matrix is referred to as sparse. Conversely, matrices with very few zero elements are called dense. For sparse matrices, computing the solution to the equation Ax = b can be made much more efficient with respect to both storage and computation time, if the sparsity of the matrix can be exploited. The more an algorithm can exploit the sparsity without sacrificing the correctness, the better the algorithm. Generally speaking, computer software that finds solutions to systems of linear equations is called a solver. A solver designed to work specifically on sparse systems of equations is called a sparse solver. Solvers are usually classified into two groups - direct and iterative.
Iterative Solvers start with an initial approximation to a solution and attempt to estimate the difference between the approximation and the true result. Based on the difference, an iterative solver calculates a new approximation that is closer to the true result than the initial approximation. This process is repeated until the difference between the approximation and the true result is sufficiently small. The main drawback to iterative solvers is that the rate of convergence depends greatly on the values in the matrix A. Consequently, it is not possible to predict how long it will take for an iterative solver to produce a solution. In fact, for ill-conditioned matrices, the iterative process will not converge to a solution at all. However, for well-conditioned matrices it is possible for iterative solvers to converge to a solution very quickly. Consequently for the right applications, iterative solvers can be very efficient. Direct Solvers, on the other hand, often factor the matrix A into the product of two triangular matrices and then perform a forward and backward triangular solve.

A-2

Linear Solvers Basics

This approach makes the time required to solve a systems of linear equations relatively predictable, based on the size of the matrix. In fact, for sparse matrices, the solution time can be predicted based on the number of non-zero elements in the array A.

Matrix Fundamentals
A matrix is a rectangular array of either real or complex numbers. A matrix is denoted by a capital letter; its elements are denoted by the same lower case letter with row/column subscripts. Thus, the value of the element in row i and column j in matrix A is denoted by a(i,j). For example, a 3 by 4 matrix A, is written as follows:

a ( 1, 1 ) a ( 1, 2 ) a ( 1, 3 ) a ( 1, 4 ) A = a ( 2, 1 ) a ( 2, 2 ) a ( 2, 3 ) a ( 2, 4 ) a ( 3, 1 ) a ( 3, 2 ) a ( 3, 3 ) a ( 3, 4 )
Note that with the above notation, we assume the standard Fortran programming language convention of starting array indices at 1 rather than the C programming language convention of starting them at 0. A matrix in which all of the elements are real numbers is called a real matrix. A matrix that contains at least one complex number is called a complex matrix. A real or complex matrix A with the property that a(i,j) = a(j,i), is called a symmetric matrix. A complex matrix A with the property that a(i,j) = conj(a(j,i)), is called a Hermitian matrix. Note that programs that manipulate symmetric and Hermitian matrices need only store half of the matrix values, since the values of the non-stored elements can be quickly reconstructed from the stored values. A matrix that has the same number of rows as it has columns is referred to as a square matrix. The elements in a square matrix that have same row index and column index are called the diagonal elements of the matrix, or simply the diagonal of the matrix. The transpose of a matrix A is the matrix obtained by flipping the elements of the array about its diagonal. That is, we exchange the elements a(i,j) and a(j,i). For a complex matrix, if we both flip the elements about the diagonal and then take the complex conjugate of the element, the resulting matrix is called the Hermitian transpose or conjugate transpose of the original matrix. The transpose and Hermitian transpose of a matrix A are denoted by AT and AH respectively. A column vector, or simply a vector, is a n 1 matrix, and a row vector is a 1 n matrix. A real or complex matrix A is said to be positive definite if the vector-matrix product xTAx is greater than zero for all non-zero vectors x. A matrix that is not positive definite is referred to as indefinite.

A-3

Intel Math Kernel Library Reference Manual

An upper (or lower) triangular matrix, is a square matrix in which all elements below (or above) the diagonal are zero. A unit triangular matrix is an upper or lower triangular matrix with all 1s along the diagonal. A matrix P is called a permutation matrix if, for any matrix A, the result of the matrix product PA is identical to A except for interchanging the rows of A. For a square matrix, it can be shown that if PA is a permutation of the rows of A, then APT is the same permutation of the columns of A. Additionally, it can be shown that the inverse of P is PT. In order to save space, a permutation matrix is usually stored as a linear array, called a permutation vector, rather than as an array. Specifically, if the permutation matrix maps the i-th row of a matrix to the j-th row, then the i-th element of the permutation vector is j. A matrix with non-zero elements only on the diagonal is called a diagonal matrix. As is the case with a permutation matrix, it is usually stored as a vector of values, rather than as a matrix.

Direct Method
For solvers that use the direct method, the basic technique employed in finding the solution of the system Ax = b is to first factor A into triangular matrices. That is, find a lower triangular matrix L and an upper triangular matrix U, such that A = LU. Having obtained such a factorization (usually referred to as an LU decomposition or LU factorization), the solution to the original problem can be rewritten as follows.

Ax = b LUx = b ( Ux ) = b

This leads to the following two-step process for finding the solution to the original system of equations: 1. 2. Solve the systems of equations Ly = b. Solve the system Ux = y.

Solving the systems Ly = b and Ux = y is referred to as a forward solve and a backward solve, respectively. If a symmetric matrix A is also positive definite, it can be shown that A can be factored as LLT where L is a lower triangular matrix. Similarly, a Hermitian matrix, A, that is positive definite can be factored as A = LLH. For both symmetric and Hermitian matrices, a factorization of this form is called a Cholesky factorization.

A-4

Linear Solvers Basics

In a Cholesky factorization, the matrix U in an LU decomposition is either LT or LH. Consequently, a solver can increase its efficiency by only storing L, and one-half of A, and not computing U. Therefore, users who can express their application as the solution of a system of positive definite equations will gain a significant performance improvement over using a general representation. For matrices that are symmetric (or Hermitian) but not positive definite, there are still some significant efficiencies to be had. It can be shown that if A is symmetric but not positive definite, then A can be factored as A = LDLT, where D is a diagonal matrix and L is a lower unit triangular matrix. Similarly, if A is Hermitian, it can be factored as A = LDLH. In either case, we again only need to store L, D, and half of A and we need not compute U. However, the backward solve phases must be amended to solving LTx = D-1y rather than LTx = y.

Fill-In and Reordering of Sparse Matrices


Two important concepts associated with the solution of sparse systems of equations are fill-in and reordering. The following example illustrates these concepts. Consider the system of linear equation Ax = b, where A is the symmetric positive definite sparse matrix defined by the following:

3 9 -2 3 1 -- -2 2 A = 6 * 3 -- * 4 3 *

1 * * * 2 b = 3 12 * * 4 5 - * * -5 8 * * 16

3 - 3 6 -4

A star (*) is used to represent zeros and to emphasize the sparsity of A. The Cholesky factorization of A is: A = LLT, where L is the following:

3 1 -2 L = 2 1 -4 1

* 1 -2 2 1 ----4 1

* * * * * * 2 1 ----2 2 * * 1 -- * 2 3 1

A-5

Intel Math Kernel Library Reference Manual

Notice that even though the matrix A is relatively sparse, the lower triangular matrix L has no zeros below the diagonal. If we computed L and then used it for the forward and backward solve phase, we would do as much computation as if A had been dense. The situation of L having non-zeros in places where A has zeros is referred to as fill-in. Computationally, it would be more efficient if a solver could exploit the non-zero structure of A in such a way as to reduce the fill-in when computing L. By doing this, the solver would only need to compute the non-zero entries in L. Toward this end, consider permuting the rows and columns of A. As described in Matrix Fundamentals section, the permutations of the rows of A can be represented as a permutation matrix, P. The result of permuting the rows is the product of P and A. Suppose, in the above example, we swap the first and fifth row of A, then swap the first and fifth columns of A, and call the resulting matrix B. Mathematically, we can express the process of permuting the rows and columns of A to get B as B = PAPT. After permuting the rows and columns of A, we see that B is given by the following:

16 * * * 3 1 3 - * * -* -2 2 B = * * 12 * 6 5 - 3 -* * * -8 4 3 3 - 6 -- 9 3 -2 4
Since B is obtained from A by simply switching rows and columns, the numbers of non-zero entries in A and B are the same. However, when we find the Cholesky factorization, B = LLT, we see the following:

4 * 1* -----2 L = * *

* *

* * *

* * *

* * 2( 3 ) *

10 --------- * 4 3 -3 5 --------- -----10 4

3 3 -- -----4 2

A-6

Linear Solvers Basics

The fill-in associated with B is much smaller than the fill-in associated with A. Consequently, the storage and computation time needed to factor B is much smaller than to factor A. Based on this, we see that an efficient sparse solver needs to find permutation P of the matrix A, which minimizes the fill-in for factoring B = PAPT, and then use the factorization of B to solve the original system of equations. Although the above example is based on a symmetric positive definite matrix and a Cholesky decomposition, the same approach works for a general LU decomposition. Specifically, let P be a permutation matrix, B = PAPT and suppose that B can be factored as B = LU. Then

Ax = b 1 PA ( P P ) x = Pb T PA ( P P ) x = Pb T ( PAP ) ( Px ) = Pb B ( Px ) = Pb LU ( Px ) = Pb
It follows that if we obtain an LU factorization for B, we can solve the original system of equations by a three step process: 1. 2. 3. Solve Ly = Pb. Solve Uz = y. Set x = PTz.

If we apply this three-step process to the current example, we first need to perform the forward solve of the systems of equation Ly = Pb:

4 * 1 * -----2

* *

* * * 10 --------4 3 --------10

* * y1 5 y 2 2 * * y3 = 3 * y4 4 1 3 y5 -5 -----4

* * 2( 3 ) * * *

3 3 -- -----4 2

A-7

Intel Math Kernel Library Reference Manual

979 3 -T 5 3 16 5 -. - , 2 2 , ------ , --------- , ------------------This gives: y = -12 4 2 10 The second step is to perform the backward solve, Uz = y. Or, in this case, since we are using a Cholesky factorization, LTz = y. 3 -4 3 -----2 5 -4

4 * 1 * -----2

* *

* *

2( 2) z1 3 z2 -----* * 2( 3) * 3 * 2 = z3 1610 3 --------- --------- z4 * * * --------4 10 10 z 5 3 3 979 --5 5 ------------------* * * * -----12 4 123 1961 979 - , 983, ----------- , 398, -----------. This gives z = -------2 12 3
The third and final step is to set x = PTz. This gives x
T

979 1961 123 - , 983, ----------- , 398, --------. = ----------3 12 2

Sparse Matrix Storage Formats


As discussed above, it is more efficient to store only the non-zeros of a sparse matrix. This assumes that the sparsity is large, i.e., the number of non-zero entries is a small percentage of the total number of entries. If there is only an occasional zero entry, the cost of exploiting the sparsity actually slows down the computation when compared to simply treating the matrix as dense, meaning that all the values, zero and non-zero, are used in the computation. There are a number of common storage schemes used for sparse matrices, but most of the schemes employ the same basic technique. That is, compress all of the non-zero elements of the matrix into a linear array, and then provide some number of auxiliary arrays to describe the locations of the non-zeros in the original matrix.

A-8

Linear Solvers Basics

Storage Formats for the PARDISO Solver


The compression of the non-zeros of a sparse matrix A into a linear array is done by walking down each column (column major format) or across each row (row major format) in order, and writing the non-zero elements to a linear array in the order that they appear in the walk. When storing symmetric matrices, it is necessary to store only the upper triangular half of the matrix (upper triangular format) or the lower triangular half of the matrix (lower triangular format). The Intel MKL direct sparse solver uses a row major upper triangular storage format. That is, the matrix is compressed row-by-row and for symmetric matrices only non-zeros in the upper triangular half of the matrix are stored. The Intel MKL storage format accepted for the PARDISO software for sparse matrices consists of three arrays, which are called the values, columns, and rowIndex arrays. The following table describes the arrays in terms of the values, row, and column positions of the non-zero elements in a sparse matrix A.
values

A real or complex array that contains the non-zero entries of A. The non-zero values of A are mapped into the values array using the row major, upper triangular storage mapping described above. Element i of the integer array columns contains the number of the column in A that contained the value in values(i).

columns

rowIndex Element j of the integer array rowIndex gives the index into the values array that contains the first non-zero element in a row j of A.

The length of the values and columns arrays is equal to the number of non-zeros in A. Since the rowIndex array gives the location of the first non-zero within a row, and the non-zeros are stored consecutively, then we would like to be able to compute the number of non-zeros in the i-th row as the difference of rowIndex(i) and rowIndex(i+1). In order to have this relationship hold for the last row of A, we need to add an entry (dummy entry) to the end of rowIndex whose value is equal to the number of non-zeros in A, plus one. This makes the total length of the rowIndex array one larger than the number of rows of A. NOTE. The Intel MKL sparse storage scheme uses the Fortran programming language convention of starting array indices at 1, rather than the C programming language convention of starting at 0.

A-9

Intel Math Kernel Library Reference Manual

With the above in mind, consider storing the symmetric matrix discussed in the example from the previous section.

3 - 6 9 -2 1 - * * -2 A = 1 * * -2

3 -- 3 4 * * * *

5 - * * * * -8 * * * * 16
In this case, A has nine non-zero elements, so the lengths of the values and columns arrays will be nine. Also, since the matrix A has five rows, the rowIndex array is of length six. The actual values for each of the arrays for the example matrix are as follows:

Table A-1

Storage Arrays for a Symmetric Example Matrix


values columns rowIndex = = =
(9 (1 (1 3/2 2 6 6 3 7 3/4 4 8 3 5 9 1/2 2 10) 1/2 3 5/8 4 16) 5)

For a non-symmetric or non-Hermitian array, all of the non-zeros need to be stored. Consider the non-symmetric matrix B defined by the following:

1 2 B = * 4 *

1 5 * * 8

* * 4 2 *

3 * 6 7 *

* * 4 * 5

A-10

Linear Solvers Basics

We see that B has 13 non-zeros, and we store B as follows:


Table A-2 Storage Arrays for a Non-Symmetric Example Matrix
values columns rowIndex = = =
(1 (1 (1 -1 2 4 -3 4 6 -2 1 9 5 2 12 4 3 14) 6 4 4 5 -4 1 2 3 7 4 8 2 -5) 5)

In the current version of Intel MKL, direct sparse solvers cannot solve non-symmetric systems of equations. However, it can solve symmetrically structured systems of equations. A symmetrically structured system of equations is one where the pattern of non-zeros is symmetric. That is, a matrix has a symmetric structure if a(j,i) is non-zero if and only if a(j,i) is non-zero. From the point of view of the solver software, a non-zero element of a matrix is anything that is stored in the values array. In that sense, we can turn any non-symmetric matrix into a symmetrically structured matrix by carefully adding zeros to the values array. For example, suppose we consider the matrix B to have the following set of non-zero entries:

1 2 B = * 4 *

1 5 * * 8

* * 4 2 0

3 * 6 7 *

* 0 4 * 5

Now B can be considered to be symmetrically structured with 15 non-zero entries.We would represent the matrix as:
Table A-3 Storage Arrays for a Symmetrically Structured Example Matrix
values columns rowIndex = = =
(1 (1 (1 -1 2 4 -3 4 7 -2 1 10 5 2 13 0 5 16) 4 3 6 4 4 5 -4 1 2 3 7 4 8 2 0 3 -5) 5)

Storage Format Restrictions. The storage format for the sparse solver must conform to two important restrictions:

First, the non-zero values in a given row must be placed into the values array in the order in which they occur in the row (from left to right). Second, no diagonal element can be omitted from the values array for any symmetric or structurally symmetric matrix.

A-11

Intel Math Kernel Library Reference Manual

The second restriction implies that when dealing with symmetric or structurally symmetric matrices that have zeros on the diagonal, the zero diagonal elements must be explicitly represented in the values array.

Sparse Storage Formats for Sparse BLAS Levels 2-3


This section describes in detail the sparse data structures supported in the current version of the Intel MKL Sparse BLAS level 2 and 3.

CSR Format
The Intel MKL compressed sparse row (CSR) format for sparse matrices consists of four arrays, which are called the values, columns, pointerB, and pointerE arrays. The following table describes the arrays in terms of the values, row, and column positions of the non-zero elements in a sparse matrix A.
values

A real or complex array that contains the non-zero entries of A. The non-zero values of A are mapped into the values array using the row major storage mapping described above. Element i of the integer array columns contains the number of the column in A that contained the value in values(i).

columns

pointerB Element j of this integer array gives the index into the values array that contains the first non-zero element in a row j of A. Note that this index is equal to pointerB(j) - pointerB(1)+1 . pointerE An integer array contains row indices, such that pointerE(j)-pointerB(1) is the index into the values array that contains the last non-zero element in a row j of A.

The length of the values and columns arrays is equal to the number of non-zeros in A.The length of the pointerB and pointerE arrays is equal to the number of rows in A. Previously defined matrix B

1 2 B = * 4 *

1 5 * * 8

* * 4 2 *

3 * 6 7 *

* * 4 * 5

A-12

Linear Solvers Basics

can be represented in the CSR format as:


Table A-4 Storage Arrays for an Example Matrix in CSR Format
values columns pointerB pointerE = = = =
(1 (1 (1 (4 -1 2 4 6 -3 4 6 9 -2 1 9 12 5 2 12) 14) 4 3 6 4 4 5 -4 1 2 3 7 4 8 2 -5) 5)

This storage format is used in the NIST Sparse BLAS library [Rem05]. Note that the storage format accepted for the PARDISO software and described above (see Storage Formats for the PARDISO Solver), is a variation of the CSR format. The PARDISO format has a restriction - all non-zero elements are stored continuously, that is the set of non-zero elements in the row J goes just after the set of non-zero elements in the row J-1. There is no such restrictions in the CSR format. This advantage can be useful, for example, if there is a need to operate with different submatrices of the matrix at the same time. In this case, it is enough to define the arrays pointerB and pointerE for each needed submatrix so that all these array are pointers to the one array values. Comparing the array rowIndex from the Table A-2 with the arrays pointerB and pointerE from the Table A-4 it is easy to see that pointerB(i) = rowIndex(i) for i=1, ..5; pointerE(i) = rowIndex(i+1) for i=1, ..5. This gives the possibility to call a routine that has values, columns, pointerB and pointerE as input parameters for a sparse matrix stored in the format accepted for PARDISO. For example, a routine with the interface:
Subroutine name_routine(. , values, columns, pointerB, pointerE, ) can be called with arguments values, columns, rowIndex in the following way: call name_routine(. , values, columns, rowIndex, rowindex(2), ).

NOTE. Intel MKL Sparse BLAS level 2 provide routines for both flavors of the CSR format.

A-13

Intel Math Kernel Library Reference Manual

CSC Format
The compressed sparse column format (CSC), often called Harwell-Boeing sparse matrix format, is similar to the CSR format, but the columns are used instead the rows. Or, in other words, CSC format is equal to the CSR format for the transposed matrix. By analogy with the CSR format Intel MKL Sparse BLAS level 2 library provides routines for two variations of the CSC format. Variation of this format accepted for the PARDISO software consists of three arrays, which are called the values, rows, and colIndex arrays. The following table describes these arrays:
values

A real or complex array that contains the non-zero entries of A. The non-zero values of A are mapped into the values array using the column major storage mapping described above. Element i of the integer array rows contains the number of the row in A that contained the value in values(i). Element j of the integer array colIndex gives the index into the values array that contains the first non-zero element in a column j of A.

rows colIndex

The length of the values and rows arrays is equal to the number of non-zero elements in A. For example, the sparse matrix B

1 2 B = * 4 *

1 5 * * 8

* * 4 2 *

3 * 6 7 *

* * 4 * 5

can be represented in the CSC format for PARDISO as follows:


Table A-5 Storage Arrays for an Example Matrix in the Harwell-Boeing format
= = =
(1 (1 (1 -2 2 4 -4 4 7 -1 1 9 5 2 12 8 3 14) 4 4 2 5 -3 1 6 3 7 4 4 2 -5) 5)

values rows colIndex

A-14

Linear Solvers Basics

Coordinate Format
The coordinate format is the most flexible and simplest format for the sparse matrix representation. Only nonzero entries are provided, and the coordinates of each nonzero entry is given explicitly. Many commercial libraries support the matrix-vector multiplication for the sparse matrices in the coordinate format. The Intel MKL coordinate format consists of three arrays, which are called the values, rows, and column arrays, and a parameter nnz which is number of non-zero entries in A. All three arrays have to be dimensioned as nnz. The following table describes the arrays in terms of the values, row, and column positions of the non-zero elements in a sparse matrix A.
values rows columns

A real or complex array that contains the non-zero entries of A given in any order. Element i of the integer array rows contains the number of the row in A that contained the value in values(i). Element i of the integer array columns contains the number of the column in A that contained the value in values(i).

For example, the sparse matrix C

1 2 C = 0 4 0

1 5 0 0 8

3 0 4 2 0

0 0 6 7 0

0 0 4 0 5

can be represented in the coordinate format as follows:


Table A-6 Storage Arrays for an Example Matrix in case of the coordinate format
values rows columns = = =
(1 (1 (1 -1 1 2 -3 1 3 -2 2 1 5 2 2 4 3 3 6 3 4 4 3 5 -4 4 1 2 4 3 7 4 4 8 5 2 -5) 5) 5)

Diagonal Storage Scheme


If the matrix A has a few diagonals, then this structure can be used to reduce the amount of information needed for the location of the non-zero elements. This storage scheme is particularly useful in many applications where the matrix arises from a finite element or finite difference discretization. The Intel MKL diagonal storage scheme consists of two arrays, which are called the

A-15

Intel Math Kernel Library Reference Manual

values and distance arrays, and parameters ndiag which is the number of non-empty diagonals, and lval which is declared leading dimension in the calling (sub) program. The following table describes the arrays values and distance: values

A real or complex two dimensional array is dimensioned as lval by ndiag. It contains the non-zero diagonals of A. The key point of the storage is that each element in values retains the row corresponding to the row in the original matrix. In order to do so diagonals in the lower triangular part of A are padded from the top, and those in the upper triangular part are padded from the bottom. Note that the value of distance(i) is the number of elements to be padded for diagonal i. An integer array is dimensioned as ndiag. Element i of the array integer distance contains the distance between i-diagonal and the main diagonal. The distance is positive if the diagonal is above the main diagonal, and negative if the diagonal is below the main diagonal. The main diagonal has a distance equal to zero.

distance

The sparse matrix C given above can be stored in the diagonal storage scheme as follows:
distance = ( -3 -1 0 1 2 )

* * values = * 4 8

* 2 0 2 0

1 5 4 7 5

1 0 6 0 *

3 0 4 * *

where the asterisks denote padded elements. It is clear that the upper triangle or lower triangle can be stored if the matrix is symmetric, hermitian, or skew-symmetric. The diagonals can be stored in any order if the sparse diagonal representation is used for Intel MKL sparse matrix-matrix or matrix-vector multiplication routines. However, all elements of the array distance must be sorted in increasing order if the sparse diagonal representation is used for Intel MKL sparse triangular solver routines.

Skyline Storage Scheme


The skyline storage scheme is important in the direct sparse solvers, and it is well suited for Cholesky or LU decomposition when no pivoting is required.

A-16

Linear Solvers Basics

The skyline storage scheme accepted in the Intel MKL can store only triangular matrix or triangular part of the matrix. This variant consists of two arrays which are called values and pointers arrays. The following table describes these arrays:
values

A scalar array. It contains the set of elements from each row of A starting from the first non-zero elements to and uncluding the diagonal element if the matrix is lower triangular, and the set of elements from each column of A starting with the first non-zero element down to and including the diagonal element. Encountered zero elements are included in the sets. An integer array is dimensioned as m+1, where m is the number of rows for lower triangle (columns for the upper triangle). pointers(i) pointers(1)+1 points to the location in values of the first non-zero element in row (column) i. The value of pointers(m+1) is set to the value nnz+pointers(1), where nnz is the number of elements in the array values.

pointers

Note that Intel MKL Sparse BLAS does not support general matrices for the routines operating with the skyline storage format. For example, the low triangle of the matrix C given above can be stored as follows:
values = ( 1 -2 5 4 -4 0 2 7 8 0 0 -5 ) pointers = ( 1 2 4 5 9 13 )

and the upper triangle of this matrix C can be stored as follows:


values = ( 1 -1 5 -3 0 4 6 7 4 0 -5 ) pointers = ( 1 2 4 7 9 12 )

This storage format is supported by the NIST Sparse BLAS library [Rem05].

A-17

Intel Math Kernel Library Reference Manual

Interval Linear Systems


Intervals
An interval is a compact connected subset of the real axis R. It is thus completely defined by two numbers, namely, its lower endpoint and upper endpoint (sometimes called left endpoint and right endpoint respectively), so that [a, b] denotes the interval { x R a x b } . The set of all real intervals is denoted by IR. In mathematical notation, taking the lower and upper endpoints of an interval is usually denoted by

inf [ a, b ] = a , sup [ a, b ] = b .
In the discussion below, intervals and interval objects are denoted by boldface letters, while underscores and overscores designate the lower and upper endpoints of the interval x = [ x, x ] . Every interval is uniquely determined by its midpoint, 1 - (a + a) , mid a = -2 and radius, 1 -(a a) rad a = -2 the latter being equivalent to the width wid a = ( a a ) . Intervals of the form [a, a] that have equal lower and upper endpoints, that is, intervals of zero width, are called degenerate or point or thin, and they coincide with usual real numbers so that it can be implied R IR . On the contrary, the intervals with nonzero width are called thick intervals. Since intervals are sets, set-theoretical relations and operations between them are applicable, for example, inclusion, intersection, and so on. In particular, a point t R is a member of the interval a (written as t a ) if a t a . Also, the inclusion is defined as a b if and only if a b and a b. Intervals and interval objects (vectors, matrices, etc.) are a convenient tool to represent the so-called bounded uncertainty and ambiguity, when only the lower and upper bounds of the possible variation of some value are known. In this sense, intervals provide an alternative to probabilistic and fuzzy approaches for describing quantitative uncertainty. Arithmetic operations, such as addition, subtraction, multiplication and division, can be extended to intervals according to the fundamental principle

a b : = { a b a a, b b } , * { + , - , . , / } ,

(1)

which makes it possible to define the so-called classical interval arithmetic. Note that the empty interval [ ] is often incorporated into the computer interval arithmetic structures.

A-18

Linear Solvers Basics

Interval vectors and matrices


An interval vector is an ordered tuple of intervals placed vertically (column vector) or horizontally (row vector). So, if a1, a2, . . . , an are intervals, then

a 1 a 2 a = . is a column vector, . . a n
and

a = ( a 1, a 2, , a n ) is a row vector.
The set of all interval n-vectors is denoted later in the text by IRn.

x2

x1 The interval vectors can be associated with their geometric images, namely rectangular boxes of the space Rn, whose sides are parallel to the coordinate axes. For this reason, interval vectors are often called boxes for brevity. An interval matrix is a rectangular table composed of the intervals: .. . a1n a a 12 11 . .. a 21 a2n a 22 . . . . , A: = . . . . . . . . ... am 1 am 2 a mn
A-19

Intel Math Kernel Library Reference Manual

or A = (aij). Interval vectors can be identified with interval matrices either of the size n 1 (column vectors) or 1 n (row vectors). The set of all interval m n-matrices is denoted by IRmn. Arithmetic operations between interval vectors and matrices can be introduced on the basis of the relation that generalizes (1) (see Alefeld83, Neumaier90). An interval square matrix A IR is referred to as regular (nonsingular) if and only if all the point matrices A A are regular (nonsingular), that is, have nonzero determinants. Otherwise, the nn interval matrix A IR is called singular, which means that it contains at least one singular point matrix. Generally, recognition of whether an interval matrix is regular or singular is an NP-hard problem, which implies that there may be no relatively simple (polynomially complex) algorithms that completely solve the problem in a reasonable time. For practical needs, it is important to have a set of workable sufficient criteria for testing regularity of a wide range of interval matrices. Intel MKL provides routines that implement Ris-Beeck spectral criterion, Rump singular value criterion, as well as Rohn-Rex singular value criterion for testing regularity/singularity of interval matrices. Sometimes, a related property (called strong regularity) needs to be checked for interval matrices. Strong regularity requires that the product of the interval matrix by its midpoint inverse is regular. The routine ?gerbr enables to check the strong regularity judging by the value of its output parameter sr.
nn

Interval Linear Systems


Solving systems of linear algebraic equations of the form a 11 x 1 + a 12 x 2 + . . . + a 1 n x n = b 1 , a 21 x 1 + a 22 x 2 + . . . + a 2 n x n = b 2 , . . . . . . . . . a x + a x + . . .+ a x = b , n2 2 mn n m m1 1 or, concisely,

(2)

Ax = b
with an m n matrix A and a right-hand side m-vector b, is one of the key problems in science and engineering. If aij and bi are not defined exactly but rather belong to known intervals aij and bi respectively, the system is called an interval linear system and can be written as

A-20

Linear Solvers Basics

a 11 x 1 + a 12 x 2 + . . . + a 1 n x n = b 1 , a 21 x 1 + a 22 x 2 + . . . + a 2 n x n = b 2 , . . . . . . . . . a x + a x + . . .+ a x = b , n2 2 mn n m n1 1 with intervals aij and bi, or in a short form as


Ax = b
(4)

(3)

with an interval matrix A = ( aij) and interval right-hand side vector b = ( bi). An interval linear system (3)(4) is considered as a set of point linear systems of the same form Ax = b with the parameters aij and bi such that a ij a ij and b i b i . When aij and bi are changing within intervals aij and bi, the solutions to the corresponding point systems Ax = b with A = ( aij) and b = (bi) form a set in the space Rn, namely

( A, b ) : = { x R ( A A ) ( b b ) ( Ax = b ) } .

(5)

The set (5), made up of solutions to all the point systems Ax = b with A A and b b , is called a solution set to the interval linear system (3)(4). Usually, the solution set is a solid polyhedron in Rn for independent aij and bi, 1 i, j n, sometimes star-shaped as in the figure below. x2

x1

A-21

Intel Math Kernel Library Reference Manual

. NOTE. The above described set ( A, b ) is often called a united solution set, since there exist a variety of other solution sets to interval systems of equations (see Shary02).

An exact description of the solution set is practically impossible for dimensions n larger than several tens, since its complexity grows exponentially with n. On the other hand, such an exact description is not really necessary in most cases. Usually, one needs to compute some estimates, in a prescribed sense, of the solution set. The most popular in practice is the following problem of outer (by supersets) interval estimation: For an interval system of linear equations A x = b find an interval enclosure of the solution set ( A, b ) . Frequently, a component-wise form of the problem (6) is considered: For an interval system of linear equations A x = b find estimates for min { x v x ( A, b ) } from below, for max { x v x ( A, b ) } from above, v = 1, 2, . . . , n. (7) (6)

In particular, Intel MKL ?gepps routines operate with this type of the problem statement. The problem (6)(7) is one of the historically first and most popular in modern interval analysis. You can find an extensive bibliography on this problem, for example, in Alefeld83, Kearfott96 , Neumaier90). Thus, solving an interval linear system is understood here as computing an outer interval estimate of the solution set to an interval linear system (3)(4). The matrix A of the system is usually assumed to be square nonsingular. Unlike classical computational linear algebra, solving interval linear systems proves to be very computationally hard in general. Computing the optimal (smallest) interval enclosures of the solution in (6), or, equivalently, computing exact estimates of the solution set in (7), is an NP-hard problem (see Kreinovich97), if there are no restrictions on the widths of the intervals in the system and/or the structure of nonzero elements in the matrix A. Moreover, the problem remains NP-hard even if we weaken the requirements on the solution and compute estimates of the solution sets that must be precise to within a predetermined absolute or relative accuracy. From the practical standpoint, NP-hardness means that with a high probability a general problem cannot be solved in polynomial time with respect to problem size.

A-22

Linear Solvers Basics

For this reason, numerical algorithms employed in Intel MKL for solving interval linear systems are divided into two classes depending on whether or not they provide a guaranteed accuracy of the result. Fast algorithms work fast and compute an enclosure of the solution set in a reasonable time, but without any accuracy assumptions. Optimal, or sharp algorithms may take a lot of time to complete execution, but the results they obtain are less crude and may satisfy some accuracy requirements. Intel MKL includes interval solver routines that implement algorithms of both types. For example, fast methods, such as interval Gauss method, interval Householder method, Hansen-Bliek-Rohn method, and Krawczyk iteration, are implemented in routines ?gegas, ?gehss, ?gehbs, and ?gekws, respectively. Parameter partitioning method (PPS-method) implemented in ?gepps routine is an example of a sharp method. The routine ?trtrs is subsumed under both categories due to a very special matrix structure.

Preconditioning
Preconditioning of interval linear system (4) is multiplying both the matrix A and the right-hand side vector b by a point matrix, with the intension to improve the properties of the system. So the system A x = b is substituted by the following system

( CA )x = Cb
where C is some square point matrix. Preconditioning is widely used in classical computational linear algebra, and many interval solver algorithms (for example, interval Gauss method, interval Gauss-Seidel method and some others) also require a suitable preconditioning prior to their use. One of the widely used preconditioning methods for the interval linear systems is preconditioning done by the inverse of the midpoint matrix, often called midpoint-inverse preconditioning. In Intel MKL, the midpoint inverse preconditioning is implemented in the routine ?gemip.

Inverting interval matrices


Given an interval square matrix A, an enclosure for the set of all inverse point matrices in A is called the inverse interval matrix A-1, that is,

{A

A A} .

In classical linear algebra, the solution to a system of linear algebraic equations Ax = b with square nonsingular matrix A can be expressed as the product of the inverse A-1 by the right-hand 1 side vector, or x = A b .

A-23

Intel Math Kernel Library Reference Manual

In interval analysis, the similar product A b also produces an enclosure for the solution set ( A, b ) of the interval linear system A x = b . However, this method usually causes substantial overestimation and is not recommended. Using specialized procedures for outer estimation of the solution sets is preferable. Nevertheless, computing tight enclosures for inverse interval matrices is essential in sensitivity-like analysis of equation systems and the like. Computing the inverse interval matrix may be carried out as finding an enclosure for the solution set of the following interval matrix equation

AY = I ,

where I is the identity matrix,

by applying n times (for every column of the matrix Y) any method to solve the interval linear systems. Note also that direct iterative procedures for finding the inverse interval matrix exist, such as Schulz method (see Herzberger94), which is included into Intel MKL as ?geszi routine.

References
[Intel MKL] [Rem05] [Alefeld83] [Neumaier90] [Shary02] [Kearfott96] [Kreinovich97] Intel Math Kernel Library Reference Manual, document number 630813-019, 2005. K.Remington. A NIST FORTRAN Sparse Blas Users Guide. (available on http://math.nist.gov/~KRemington/fspblas/) G. Alefeld and J. Herzberger, Introduction to Interval Computations. Academic Press, New York, 1983. A. Neumaier, Interval Methods for Systems of Equations. Cambridge, Cambridge University Press, 1990. S. P. Shary, A new technique in systems analysis under interval uncertainty and ambiguity // Reliable Computing. 2002. Vol. 8. 321419. R.B. Kearfott, Rigorous Global Search: Continuous Problems. Dordrecht, Kluwer, 1996. V. Kreinovich, A. Lakeyev, J. Rohn, P. Kahl, Computational Complexity and Feasibility of Data Processing and Interval Computations. Kluwer, Dordrecht, 1997. J. Herzberger, Iterative methods for the inclusion of the inverse of a matrix // Topics in Validated Computations, J. Herzberger, ed. Amsterdam: Elsevier, 1994.

[Herzberger94]

A-24