Sunteți pe pagina 1din 13

SCIENCE CHINA

Physics, Mechanics & Astronomy


• Article • March 2014 Vol.57 No.3: 477–489
doi: 10.1007/s11433-013-5203-5

Parallel computing study for the large-scale generalized eigenvalue


problems in modal analysis
FAN XuanHua1*, CHEN Pu2*, WU RuiAn1 & XIAO ShiFu1
1
Institute of Systems Engineering, CAEP, Mianyang 621900, China;
2
Department of Mechanics and Aerospace Engineering, College of Engineering, Peking University, Beijing 100871, China

Received June 11, 2012; accepted May 30, 2013; published online January 7, 2014

In this paper we study the algorithms and their parallel implementation for solving large-scale generalized eigenvalue problems
in modal analysis. Three predominant subspace algorithms, i.e., Krylov-Schur method, implicitly restarted Arnoldi method and
Jacobi-Davidson method, are modified with some complementary techniques to make them suitable for modal analysis. De-
tailed descriptions of the three algorithms are given. Based on these algorithms, a parallel solution procedure is established via
the PANDA framework and its associated eigensolvers. Using the solution procedure on a machine equipped with up to 4800
processors, the parallel performance of the three predominant methods is evaluated via numerical experiments with typical en-
gineering structures, where the maximum testing scale attains twenty million degrees of freedom. The speedup curves for dif-
ferent cases are obtained and compared. The results show that the three methods are good for modal analysis in the scale of ten
million degrees of freedom with a favorable parallel scalability.

modal analysis, parallel computing, eigenvalue problems, Krylov-Schur method, implicitly restarted Arnoldi method,
Jacobi-Davidson method
PACS number(s): 46.15.-x, 46.40.-f, 43.40.At, 02.60.-x

Citation: Fan X H, Chen P, Wu R A, et al. Parallel computing study for the large-scale generalized eigenvalue problems in modal analysis. Sci China-Phys
Mech Astron, 2014, 57: 477489, doi: 10.1007/s11433-013-5203-5

1 Introduction For some large complex structures in civil engineering,


aeronautics and space realms, in order to describe their dy-
Modal analysis is an important numerical tool to obtain dy- namic characteristics more exactly, the degrees of freedom
namic properties of engineering structures. Mathematically, of the finite element model, which corresponds to the order
modal analysis is equivalent to computing a number of of K and M, can attain ten million or above . In the case of
lower eigenpairs of the generalized eigenvalue problem millions of degrees of freedom, the computation of equation
(1) becomes difficult. Both algorithms and parallel compu-
Kx   Mx, K , M  R n n , (1) ting techniques should be therefore considered.
Researchers have proposed and implemented a large va-
using various algorithms, where K and M denote stiffness
and mass matrix, respectively. Both K and M can be ob- riety of algorithms [1–9] for large-scale eigenvalue prob-
tained by finite element discretization and they are usually lems. All these algorithms start with the techniques of sub-
large, sparse, and symmetric positive definite. space iterations and projections. The general framework of
these methods is to generate a sequence of subspaces V1,
V2, ... of small dimensions commensurate in size with the
*Corresponding author (FAN XuanHua, email: somuchfan@gmail.com; CHEN Pu, number of desired eigenvalues and project the large matri-
email: chenpu@pku.edu.cn)

© Science China Press and Springer-Verlag Berlin Heidelberg 2014 phys.scichina.com link.springer.com
478 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3

ces onto these small subspaces, and then use these small 2.1 A brief review of subspace-based methods
subspaces computing part eigenpairs of original eigenvalue
problems [1,2]. Of these algorithms, three predominant Subspace-based methods differ from each other in the ways
the subspaces are generated. The dimension of subspaces is
methods, i.e., Implicitly Restarted Arnoldi (IRA) method
either fixed or variable. The classical algorithms working
[3,10], Krylov-Schur (K-S) method [6] and Jacobi-Davidson
with fixed dimensions mainly include power method
(J-D) method [5], have been regarded as the most successful
[1,2,14], Rayleigh quotient iteration [1,2,15] and subspace
and flexible ones for finding a few eigenpairs of a large,
iteration [1,16]. Starting from a subspace Vk, these methods
sparse matrix. IRA and K-S belong to Krylov subspace generate the next subspace Vk+1 of the same dimension by
methods and J-D belongs to Davidson-based methods. For applying a linear operator A on Vk. As k increases, Vk con-
the standard Hermitian eigenvalue problems, IRA and K-S tains better approximate eigenvectors corresponding to the
are also named Implicitly Restarted Lanczos method [11] eigenvalues of A with a larger magnitude.
and Krylov-Spectral method [6,12], respectively. A further class of subspace methods involves those
However, the applications or comparisons of these whose dimensions increase as the iteration proceeds. Usu-
methods on large-scale modal analysis are seldom reported ally one starts with a subspace of dimension one and in-
in the literature. We know that Peter Arbenz and his collab- creases the dimension at each iteration step. These methods
orators [13] did a parallel modal analysis of an aircraft car- are in general more efficient than fixed-dimension ones and
rier using several algorithms in 2005, the order n of K and have become the mainstream for large-scale eigenvalue
M being 1892644 and the numbers of parallel CPU proces- problems. The most popular subclass in this class is the
sors being 256. They believed that was the most challenging Krylov subspace method [1,2,4], which can be traced back
problem at that time. to the Lanczos method [17] for symmetric matrices and the
The goal of our work is to propose a strategy of parallel Arnoldi method [18] for nonsymmetric matrices. There are
modal analysis by modifying the existing, mature algo- many other updated developments for the two basic Krylov
rithms with some complementary techniques. These tech- subspace methods. A significant improvement of such sub-
niques include spectral transformation, solution strategy of class appertains to Sorensen’s IRA/Lanczos method [10,11].
linear systems for the three methods (IRA, K-S and J-D), as Later, Stewart [6] proposed the K-S method by expanding
well as restarting and deflation techniques for the J-D Arnoldi decomposition to a general Krylov decomposition.
method. The rest of the paper is organized as follows. Sect. The IRA/Lanczos method and the K-S method are mathe-
2 gives a brief review of the subspace algorithms, based on matically equivalent, recognized as the most successful al-
which three predominant algorithms (i.e., IRA, K-S and J-D) gorithms in Krylov subspace methods.
are chosen for modal analysis. Detailed algorithms with There exists another subclass with increasing subspace
modifications for modal analysis are given. Sect. 3 discuss- dimension, but without using Krylov subspaces. A Newton
es the parallel implementation, including pre-processing iteration step or an approximate Newton iteration step can
modeling, generation of K and M, parallel process manage- be applied to expand the subspace. The typical representa-
ment and solution with eigensolvers. Two representative tions for this subclass are Davidson-based methods [1,2,5,19].
examples with different scales in modal analysis are given Based on the standard Davidson method [19], some gener-
alized Davidson methods [1,2,20–22] were presented by
to compare the performance of the three algorithms in sect.
using different preconditioning. In 1996, Sleijpen and Vorst
4. Finally, the paper ends with a brief conclusion in sect. 5.
[5] presented a J-D algorithm, which speeded up the devel-
opment of Davidson-based methods. So far, the J-D method
2 Algorithm design for modal analysis has been extended to various eigenvalue problems [23–26].
For details of the J-D algorithm, refer to refs. [2,27].
With the above algorithms, a number of codes were de-
For large-scale eigenvalue problems, subspace-based algo-
veloped for the numerical solution to large-scale eigenvalue
rithms are initially presented for standard eigenvalue prob-
problems. Hernandez et al. [28] made a survey of freely
lems Ax = x, most of which come from the power method available software tools, including a list of libraries, pro-
and Rayleigh quotient iteration [14–16], providing good grams or subroutines. Among these codes, ANASAZI [29]
approximations quickly to the largest eigenvalues. However, and SLEPc [30] are recognized as two predominant tools
what we deal with in modal analysis is a generalized eigen- that come closest to a robust, efficient and general purpose
value equation, and the eigenpairs most relevant to modal code. So far, both ANASAZI and SLEPc are being actively
analysis are the lowest ones, which affect dynamic behavior developed.
of engineering structures. Therefore, the key problem in
algorithm design for modal analysis is to modify these algo-
2.2 Krylov subspace methods for modal analysis
rithms so that they can solve the corresponding generalized
eigenvalue equation and approximate the lowest eigenval- Krylov subspace methods [1,14] start from a single vector
ues efficiently. space K1(A,v) = span{v}, and then expand the Krylov sub-
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 479

space Kk(A,v) = span{v, Av, A2v, …, Ak1v} in the k-th itera- viewed as a specialized variant of IRA and K-S, we de-
tion to Kk+1(A,v). We focus our study on IRA and K-S scribe the IRA and K-S algorithms in a general sense for
methods in this section. Since the implicitly restarted applications in other realms. Our description starts with the
Lanczos method and the Krylov-Spectral method can be basic Arnoldi method [2,10] shown in Algorithm 1.

Algorithm 1 Basic Arnoldi method


Input Matrix A(n×n), number of steps m(m  n), and initial vector v1 of norm 1.
Output Vm (n×m), Hm (upper Hessenberg matrix with m×m), fm and  so that AVmVmHm= f m emT ,  =|| fm ||2.
1: for j = 1, 2, …, m
2: Increase Krylov vector: = Avj.
3: Orthogonalize with respect to Vj and obtain hij (i  j):
4: f j    h1 j v1  ...  hij vi  ...  h jj v j with hij   T vi .
5: hj+1, j = || fj||2 .
6: if hj+1,j = 0 or j = m,  = hj+1, j, stop.
7: vj+1 = fj / hj+1, j.
8: end for

When A is a symmetric matrix, the orthogonal process in standard problem


step 3 of Algorithm 1 becomes a three-term recurrence
Ax = x, (3)
formula [1,18], i.e., f j  w  h j 1 j v1  h jj v j , due to the
symmetry. Thus the upper Hessenberg matrix Hm becomes a which can be directly solved via IRA. Meanwhile, we can
tridiagonal one and Algorithm 1 turns into a basic Lanczos select an effective shift, for example, a little lower than
algorithm. It is desirable for  to become small enough in the minimum eigenvalue of original problems. Then the fast
Algorithm 1, because this indicates that the eigenvalues of convergence to larger eigenvalues in eq. (3) is transformed
Hm are accurate approximations to the eigenvalues of A. to that of smaller ones in eq. (1), due to the invert relation
However, a small  may not appear until m becomes very between  and .
large, which leads to difficulties of storage and numerical The S-I spectral transformation provides a powerful tool
orthogonality. In this case, a restarting technique should be in the treatment of modal analysis, but brings a new prob-
considered. The IRA method offers an implicit restarting lem. In each Krylov subspace iteration, the new subspace
technique which combines the implicitly shifted QR scheme vector = A (Step 2 of Algorithm 1) becomes  =
with an m-step Arnoldi factorization to obtain a truncated (KM)1Mvj, and an equivalent form is
form of the Arnoldi factorization [2,10]. Implicit restarting
(KM)= Mvj. (4)
provides a means to extract interesting information from
large Krylov subspaces while avoiding the storage and nu- To obtain requires solving a linear equation of eq. (4).
merical difficulties associated with the basic approach. The Due to , the coefficient matrix KM is usually ill-condi-
IRA method has been remarkably successful and has been tioned or even nearly singular. Thus an iterative solution to
implemented in the widely used ARPACK package [31]. eq. (2.3) becomes difficult [1,2]; instead a direct method is
However, for the use of IRA in modal analysis, two as- often recommended. Considering the symmetry of the coef-
pects should be emphasized [7]. One is the reduction of ficient matrix (KM), a sparse LU or a Cholesky factoriza-
generalized eigenvalue problems to standard ones; the other tion [33,34] will do the job. The implementations of these
is the fast convergence to small eigenvalues, which are our factorizations are included in many software packages, such
interest for modal analysis. These two aspects can be treated as MUMPS [35]. Let
with a spectral transformation technique, Shift-and-Invert
(S-I) or Cayley transformations [32], for example. For eq. KM = LU (5)
(1), the S-I transformation leads to represent some convenient factorization of KM, and =
1 Av can be calculated as follows:
Kx   Mx   K   M  Mx 
1
x, (2)
  v =Mv, Lw= v , Uw. (6)
where  denotes a user-selected shift. Let A = (KM)1M Based on the above consideration, we describe the IRA
and = 1/(), and we can theoretically transform (1) to a method for modal analysis in Algorithm 2.
480 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3

Algorithm 2 IRA method for modal analysis


Input the matrices K and M, initial vector v1 of norm 1, number of maximum dimensions of searching subspace m, num-
ber of desired eigenvalues k (k<m), a user-selected shift , and the converged error estimation .
Output the k minimum eigenpairs (i, xi) of Kx=Mx.
1: do a S-I transformation of Kx = Mx according to eqs. (2) and (3).
2: do a sparse LU or a Cholesky factorization on (KM) (via MUMPS [35] or others).
3: compute an m-step Arnoldi factorization using Algorithm 1 and step 2: AVm VmHm= f m emT .
4: Repeat until convergence.
5: compute all eigenvalues of Hm: (i: i =1, 2, …, m).
6: if k or more eigenvalues satisfy the converged error estimation , then
7: compute eigenvectors yi of Hm corresponding to the converged eigenvalues.
i=1/i+, xi = Vm yi, break.
9: end if.
10: sort i with k wanted eigenvalues and p (p = mk) unwanted eigenvalues.
11: do p-step shifted QR iteration with p unwanted eigenvalues (denoted as 1, 2, …,p):
12: Q = I m.
13: for j = 1, 2, …, p
14: QR factorize QjRj = Hm  jI.
15: H m : Q j H m Q j , Q : QQ j .
*

16: end for.


17: deflate the m-step Arnoldi factorization to k-step one:
18: do right multiplication of Q on AVm VmHm= f m emT of step 2 and take the first k columns;
19: Vk = VmQ(:,1:k); Hk=Hm(1:k,1:k); k = Hm(k+1, k); k = Q(m,k) ; fk =vk+1k+ fmk.
20: restart, beginning with the k-step Arnoldi factorization AVk – VkHk= f k ekT , apply p additional steps of the Arnoldi
procedure to obtain a new m-step factorization AVm VmHm= f m emT .
21: end repeat.

The IRA algorithm is one of the most successful and instead of an upper Hessenberg matrix Hm in Arnoldi de-
flexible methods for finding a few eigenpairs of a large ma- composition, bm+1 is a common vector instead of a unit one,
trix. However, the method needs to preserve structures of and the columns of (Um, um+1) are independent and called
the Arnoldi decomposition, i.e., an upper Hessenberg matrix the basis of Krylov decomposition. Using this Krylov de-
Hm and a unit vector em, which restricts the range of trans- composition, Stewart proposed a K-S method [6] via Schur
formations that can be performed on the decomposition. decomposition on Bm. For modal analysis, the K-S method
Just for this reason, Stewart [6] makes a minor improvement is treated in the same way as the IRA, i.e., eqs. (2)–(5).
on the IRA, which expands the Arnoldi decomposition
Since the two methods are similar and mathematically
AVm VmHm= f m emT to a general Krylov decomposition AUm equivalent, we easily get the K-S method for modal analysis
 UmBm = um+1 bmT1 , where Bm (m×m) is a common matrix as Algorithm 3.

Algorithm 3 The K-S method for modal analysis


Input the matrices K and M, initial vector v1 of norm 1, number of maximum dimensions for searching subspace m,
number of desired eigenvalues k (k<m), a user-selected shift , and the converged error estimation .
Output the k minimum eigenpairs (i, xi) of Kx = Mx.
1: do an S-I transformation of Kx = Mx according to eqs. (2) and (3).
2: do a sparse LU or a Cholesky factorization on (KM) (via MUMPS [35] or others).
3: get an initial Krylov factorization AUmUmBm = um+1 bmT1 via an m-step Arnoldi factorization (here, Um = Vm, Bm = Hm,
um+1 = vm+1, bmT1 = || fm|| emT , corresponding to Algorithm 1).
4: Repeat until convergence.
5: apply orthogonal transformations (Schur decomposition) on Bm: BmQ1=Q1Tm, where Tm is a (quasi-)triangular form.
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 481

6: get a K-S factorization AUm UmTm = um+1 bmT1 with updated variables: Um ← Um Q1, bmT1 ← bmT1 Q1, via
right-multiplying Q1 on Krylov factorization AUm  UmBm= um+1 bmT1 .
7: reorder the diagonal blocks of the K-S factorization with another orthogonal transformation Q2 (see ref. [36] for de-
tails): Um←Um Q2, Tm← Q2TTm Q2 , bmT1 ← bmT1 Q2.
8: compute all eigenpairs of Tm, Tmyi = iyi.
9: if k or more eigenvalues satisfy the converged error estimation , then
10: i = 1/i+, xi = Um yi, break.
11: end if
12: deflate to a K-S decomposition of order p ( k< p<m ), where the already converged vectors are locked: AUp – UpTp =
up+1 bpT1 with Up= Um(:,1:p), Tp= Tm(:,1:p), up+1= um+1 and bp+1 = bm+1(1:p).
13: extend again to a Krylov decomposition of order m: AUm UmBm= um+1 bmT1 , via mp additional steps of the Arnoldi
procedure with initial up+1.
14: end repetition.

Two aspects should be emphasized for Algorithms 2 and M-orthogonal basis v1, v2, …, vm for the search subspace is
3. Firstly, the matrix A = (KM)1M is not explicitly com- adopted, i.e., making vi* Mv j equal to one for i = j and equal
puted throughout the implementations. The expansion of to zero for i ≠ j, which results in
Krylov subspace vector Avj is accomplished by solution of a
linear system (eq. (4)); secondly, although both K and M are Vm* MVm  I m . (9)
symmetric for modal analysis, the matrix A = (KM)1M
becomes a non-symmetric one; thus Algorithm 2 cannot be For our modal analysis, several aspects should be con-
simply equivalent to the Implicitly Restarted Lanczos sidered. Firstly, a fast iterative solution to eq. (7) with prop-
method [11], and Algorithm 3 cannot be equivalent to the er preconditioning is needed for outer subspace iterations.
thick-restart Lanczos method (or named as Krylov-Spectral Secondly, the dimensions of searching subspace should not
method) [12]. be too large due to the time and storage cost. A restart and
The major problem of Algorithms 2 and 3, which often deflation strategy similar to the Krylov subspace methods
becomes a bottleneck, is to find a convenient factorization should be taken into consideration. Thirdly, a spectral
of eq. (4) so that the associated linear systems of equations transformation technique is required to get a number of
can be solved efficiently. The factorization of eq. (4) takes a smallest eigenpairs for modal analysis since only the largest
big percentage of cost in terms of computing time and stor- one is computed in the standard J-D method [5]. Based on
age in the two algorithms. Also, an effective shift selection the above three considerations, we modify the standard J-D
is important, which depends on the user’s preferences and method as follows.
on knowledge of the underlying generalized eigenvalue The iterative solution to the correction equation In
problem. computing a number of eigenpairs in J-D method, it’s nec-
essary to extract the already converged ones from the itera-
tive subspaces timely. For k already converged eigenpairs,
2.3 J-D method for modal analysis
we construct a Schur form of the generalized eq. (1) via a
The J-D method [2,5,23] belongs to the Davidson-based M-orthogonal technique [1,2], that is
algorithms, which is an effective complementarity to the KQk = ZkDk, (10)
Krylov subspace algorithms for solving large, sparse eigen-
value problems. The kernel of the J-D method is to solve a where, Dk is a diagonal matrix with the k computed eigen-
so-called correction equation, which becomes [2] values on its diagonal; Zk =MQk and Qk is M-orthogonal
matrix consisting of k computed eigenvectors of the gener-
 I  Mu u   K   M   I  u u M  t  r
k
*
k k k
*
k k with t  Muk alized eigenvalue problem (1). Thus, the correction eq. (7)
(7) is written as:

for the generalized eigenvalue problem eq. (1). Here, k and  I  ZQ


    K   M   I  QZ
*
k
   t  r ,
*
k (11)
uk are the Ritz value and Ritz vector, respectively, t is an
orthogonal component vector to be solved, and rk is the re- where, Z  [ Z k , uk ] and Q  MZ . To improve the con-
sidual vector verged speed of the iterative solver, we can choose a pre-
rk =(K –kM) uk. (8) conditioning of the form

In order to work with orthogonal operators in eq. (7), an 


P  I  ZQ  
  * P I  QZ
 *  (12)
482 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3

for eq. (11). P is a preconditioned matrix of K − kM, which numbers of inner iterations are often restricted to a fixed
approximates K −kM and is cheap to “invert”. Applying value (such as 30 steps), with an “inexact” t for outer itera-
P 1 on eq. (11), we can get tion.
Restart and deflation techniques With the increasing
   P 1 r ,
P 1 At (13) dimension m of the searching subspace, the computing time
k
and storage increase quickly. To overcome this problem,

with A  I  ZQ
 *   K   M   I  QZ
k
 . *
The solution to consider a restart strategy similar to the Krylov subspace
methods. A good strategy as shown in ref. [2] is to restart
eq. (13) is transformed to some easily solvable linear equa-
with the subspace spanned by the Ritz vectors of a small
tions [2,37], which are described as follows.
number of the Ritz values closest to a specified target value.
For the right hand of eq. (13), denoting rk  P 1 rk , we can We restart as soon as the dimension of the searching
obtain an equivalent form space for the current eigenvector exceeds the maximum.
  r . Before restarting, some Ritz values may have been close
Pr (14)
k k enough to the wanted eigenvalues, and the remaining part of
Inserting eq. (12) into eq. (14), and using the orthogonal the current subspace still has rich components in nearby
eigenpairs. We can use this information as the basis of a
condition Z * rk  0 , Z * rk  0 , we get
subspace for computing a next eigenvector. In order to
avoid the already converged eigenvector reentering the
 
1
rk  P 1 rk  P 1 Z Z * P 1 Z Z * P 1 rk , (15)
computational process, we adopt a deflation technique [2],
which makes the new searching vectors in the J-D algorithm
from which we easily compute rk since Z is a matrix of explicitly orthogonal to the converged eigenvectors.
 
n×(k+1) and Z * P 1 Z is a matrix of (k+1)×(k+1), with In this way, the standard J-D algorithm is modified by
adding a maximum subspace dimension mmax for expanding
k  n for our modal analysis. When rk is obtained from eq.
subspaces and a minimum one mmin for restarting. Further-
(15), eq. (13) becomes more, the basis vectors for restarting contain much infor-
   r ,
P 1 At (16) mation for the wanted eigenpairs inherited from the previ-
k
ous subspace expansions.
which can be solved by a Krylov subspace method (such as Spectral transformation Theoretically, the J-D algo-
CG or MINRES [1,2]). The solving process of eq. (16) can rithm will converge to the largest eigenvalues when the
   rˆ with vectors v and rˆ for
be described by P 1 Av i i i i
chosen target  is larger than max, and converges to the
the iteration process. In order to implement the iterative smallest eigenvalues if the chosen is smaller than min. But
process in the orthogonal component subspaces of uk, we the Rayleigh-Ritz procedure in the J-D method will always
  r ) for lead to the computed Ritz values converging to the largest
choose start t0 = 0 (i.e., v0  rk  P 1 At 0 k
parts. A small target value  and a large number of iterative
   rˆ ; then the iterative vectors v and rˆ will sat-
P 1 Av
i i i i corrections via eq. (11) may change the situation but too
isfy the orthogonal conditions Z * vi  0 , Z * ri  0 . Using costly or even no smallest eigenvalue can be found.
Therefore, a spectral transformation is needed to change
these two orthogonal relations, we insert eq. (12) and ex-
the smallest eigenvalue problems for modal analysis to the
pression of A into P 1 Av
   rˆ and finally derive a for-
i i largest ones. Similarly, we do this by a S-I spectral trans-
mula with the same form as eq. (15), that is formation

 
1
rˆi  P 1 z  P 1 Z Z * P -1 Z Z * P 1 z , with z   K   k M  vi . 1
Kx   Mx  Mx  ( K   M ) x, (18)
 
(17)
and substitute K and M in the J-D method with M and
Then, we can solve rˆi from eq. (17) in the same way as KM, respectively. In most cases where no rigid body
eq. (15). The iteration stops when rˆi approximates  rk , modes exist (i.e., >0), we let  = 0 and the S-I transfor-
thus t  vi . Usually, the process of solution to the correc- mation becomes the interchange between M and K in the
tion eq. (13) is called “inner iteration”, and the process of implementation of the J-D method. Correspondingly, the
expanding subspace is called “outer iteration”. For each step eigenvalues computed in J-D method become the recipro-
of outer iteration, inner iterations are required. In practical cals of original ones. Thus, the target value  can be chosen
implementations, there exists a balance between the inner a little larger than the reciprocal of smallest eigenvalue of
iteration and the outer iteration. Generally, increasing the eq. (1).
number of inner iterations causes a decrease of outer itera- Algorithm design Based on the above analysis and
tions, but the total computing time may increase due to modifications, the J-D method for modal analysis is de-
computational cost of the inner iterations. Therefore, the scribed as Algorithm 4.
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 483

Algorithm 4 The J-D method for modal analysis


Input the matrices K and M, initial vector v0 of norm 1, numbers of maximum subspace dimensions mmax, numbers of
minimum subspace dimensions mmin, desired eigenvalues kmax, a user-selected target , and the converged error es-
timation .
Output the kmax minimum eigenpairs (i, xi) of Kx=Mx.
1: do a spectral transformation: Mx = Kx with =1/
2: initialization: t = v0, k=0, m = 0, Q = [], V = [], VM = [], VK = [], Z = []. (where, k denotes the number of converged
eigenpairs, m denotes the dimension of outer iteration subspaces, Q contains the converged eigenvectors, V contains
basis vectors of iterative subspaces, VM = MV, VK = KV, and Z = KQ.)
3: while k< kmax.
4: do K-orthogonal treatments on t with respect to the current search subspace:
for i=1, 2, …, m { t  t  (viK *t )vi ;} end for.
5: normalize t with K-inner products and K-norms: m = m+1; vm  t / t * Kt ; vmM  Mvm ; vmK  Kvm .
6: expand the searching subspace and projective matrix H: V  [V , vm ] ; V M  [V M , vmM ] ; V K  [V K , vmK ] ; for i = 1,
2, …, m { H i , m  vi* vmM ;} end for.
7: compute all eigenvalues (i, si) of H ( ||si||2=1) with sorted orders: | i |≥| i1 |.
8: compute the Ritz vector and the residual vector closest to : u = Vs1; uM = VMs1; uK = VKs1; r = Mu1Ku = uM
1uK .
9: while ||r||2 ≤.
10: find a converged eigenpairs: k+1=1/1; xk+1=u; Q=[ Q, u]; Z=[ Z, uK], k = k+1.
11: if k = kmax, exit.
12: continue the search for a next eigenpair with the remaining Ritz vectors as a basis for the initial search space:
m=m1: H=0;
for i = 1, 2, …, m
{vi =Vsi+1; viM  V M si 1 ; viK  V K si 1 ; H i ,i   i 1 ; si = ei; i =i+1 ;} end for.
u  v1 ; r  v1M  1v1K ;
13: end while.
14: if m > mmax, restart: H=0; m = mmin; for i =1, 2, …, m {vi =Vsi; viM  V M si ; viK  V K si ; H i ,i   i ;} end for.
end if.
15: solve t from the correction equation  I  ZQ
    M   K   I  QZ
*    t  r
*
with = 1, Q = [Q, u], Z = [Z, uK].
16: end while.

From Algorithms 2–4, we find some differences between where (i, xi) is a computed eigenpair. It is conceivable that
the Krylov subspace methods and the J-D method. In Algo- a small residual error implies good accuracy in the comput-
rithms 2 and 3 for IRA and K-S methods, the solution of ed i and xi [1,2]. Also, we adopt a mode error (or relative
eigenpairs is carried out by an outer subspace iteration and a error) defined by Bathe [38] to give further evaluation for
direct matrix decomposition for the inner linear systems, the convergence, that is
and the wanted eigenpairs are extracted simultaneously
 mod = ||Kxi Mxi||2 / ||iMxi||2. (20)
from the corresponding projective matrix; while the solution
of eigenpairs in the J-D method (Algorithm 4) is carried out For more information about convergence properties of
by an inner-outer iteration, and the wanted eigenpairs are these methods, refer to refs. [1,2,22,38].
extracted one by one.
The stopping criterion for IRA, K-S and J-D methods is
3 Implementation of parallel computing
to accept an eigenvector approximation as soon as the norm
of the residual (for the normalized eigenvector approxima-
tion) is below a given . The residual vector (or residual The algorithms mentioned in sect. 2 are the kernel to realize
error) is defined by the modal analysis of large engineering structures. Besides,
many other assistant processes are absolutely necessary to
ri = Kxi iMxi, (19) transform the engineering problems to mathematic ones.
484 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3

These processes include the finite element modeling of en- employed to form K and M in a parallel manner.
gineering structures, the discretization of the finite element (3) Parallel computing of the generalized eigenvalue
models (FEM) to generate K and M, and the calling of the problems Kx = Mx. In this step, three different methods as
algorithms. shown in Algorithms 2–4 are implemented via SLEPc
The finite element modeling of a large structure can be (PETSc), ARPACK and some other external packages such
finished via pre-processing commercial software such as as MUMPS. The PANDA framework is responsible for the
MSC.Patran. At present, MSC.Patran is able to build a FEM calling and managing of these software packages.
over ten million degrees of freedom on a common work-
station. We realized the parallel modal analysis via a
PANDA framework [39,40], developed by our parallel 4 Numerical experiments
computing team of China Academy of Engineering and
Physics. The PANDA framework is a finite element simula- In this section, some numerical experiments of two repre-
tion code, which is capable of execution on parallel plat- sentative examples in modal analysis are given to compare
forms and includes interfaces to third-party libraries; the the parallel performance and efficiency of the three algo-
communication of parallel computing in PANDA is based rithms in sect. 3. The two examples are the FEMs of a typi-
on MPI. cal aircraft and a vibration table. The structure of an aircraft
In MSC.Patran, we build the FEM of a structure and is mainly composed of laminose plates and shells, and the
generate element and node groups according to different vibration table is composed of solid structures as well as
material properties and boundary conditions. Then we ex- shell ones. The experiments were tested on two high per-
port these element and node information to a neutral file formance computers with a Linux platform.
provided in MSC.Patran. Using the interface between
PANDA and MSC.Patran, the neutral file is translated to the 4.1 The aircraft example
finite element grid file that can be identified by PANDA.
The K and M for the FEM are generated with a parallel The FEM of the aircraft example is shown in Figure 1. The
manner in PANDA. Integrating with Metis [41], the model is meshed with hexahedron element of 8 nodes, and
PANDA framework provides a domain decomposition the node displacements at the bottom of the model are fixed
function, which partitions the FEM into a number of small as a boundary constraint. We test this example with a com-
subdomains. The information of each subdomain is deliv- puting scale of n = 710526, where the corresponding stiff-
ered to different processors for finite element discretization ness matrix K has 52078554 non-zero entries. The rough
and numerical integral, until the matrices K and M form. non-zero distributions of K matrix are shown in Figure 2.
Till this step, the preparations for the parallel modal analy- The M matrix is adopted as a lumped mass matrix (i.e., di-
sis are completed. agonal matrix).
Another two important tools for parallel modal analysis We carried out the parallel modal analysis on a Dawning
are SLEPc [30] and ARPACK [31]. SLEPc provides a col- 5000A cluster, a machine consisting of 16 blade computing
lection of eigensolvers on top of PETSc [42], including K-S, nodes, each of them with eight Intel Xeon E5550 2.66 GHz
J-D, Arnoldi/Lanczos and some other subspace methods. processors and 16 GB of shared memory. The communica-
The basic parallel implementations in SLEPc are vector tion of the machine is connected by a kilomega network.
operations, the matrix-vector product and linear equation The parallel tests are executed from 4 processors, until
solvers, which are supplied by PETSc objects. ARPACK is
a free software package specially developed for the imple-
mentation of IRA methods.
Both SLEPc and ARPACK employ the MPI standard for
message-passing communication and had been integrated
into the PANDA framework. Besides, the MUMPS [35]
software for direct, sparse matrix decomposition was also
integrated into the PANDA framework.
In this way, the parallel computing flow of modal analy-
sis for large engineering structures can be shown as follows.
(1) Pre-processing modeling of the engineering structures.
In this step, the engineering structures are approximately
substituted with a large number of finite element meshes.
These meshes are divided into some groups according to
different properties and exported as a neutral file.
(2) Model translating and assembling of eigenvalue
equations. In this step, PANDA framework and METIS are Figure 1 (Color online) The FEM of an aircraft.
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 485

the three methods to a certain extent.


In order to describe the precision of the three methods,
the mode errors of the 20 lowest modal frequencies are
computed via eq. (20) and given in Figure 3. We can see
that, for the same stopping criterion, the mode errors for
IRA and K-S methods are in the range of 1×1012–1×1010,
while those for J-D method are in the range of 1×105–
1×104. In other words, the approximate solution to correc-
tion equation in Algorithm 4 decreases the solving precision
of eigenpairs in contrast to the exact direct matrix decom-
position in IRA and K-S. On the whole, the solving preci-
sions in Figure 4 are acceptable for modal analysis.
The testing time and efficiency for different numbers of
processors are shown in Table 2, from which we can see
Figure 2 (Color online) The non-zero distributions of K.
that the IRA and K-S methods have an absolute advantage
in computing time when a few processors are used, but this
the maximum 128 processors. The minimum 20 eigenpairs advantage disappears with the increase of processors due to
are desired for the three methods. The maximum subspace their relatively poor parallel efficiency. Instead, the J-D
dimension for the three methods is set to 40 (double of method contains an excellent parallel scalability, which re-
numbers of desired eigenpairs), and the converged error sults in a faster computing time relative to the IRA and K-S
for the three methods is set to 1×104, with a stopping crite- methods at 128 processors.
rion as soon as the norm of the residual vectors (eq. (19)) is
below  For IRA and K-S methods, the SI was selected for
the spectral transformation with a shift value  =0, and a
Cholesky factorization was used for the solution to linear
equations in each step of subspace iteration. For the J-D
method, the minimum subspace dimension for restarting is
set to 8, and a Jacobi preconditioning and a qcg solver
within PETSc are responsible for the solution to correction
equation, where the inner maximum iteration steps are lim-
ited at 30, and the target value  is set to 1×106 (a little
larger than the reciprocal of minimum eigenvalue 1).
The computing results of the 20 modal frequencies are
almost the same for the three methods, and the largest rela-
tive error among them is below 0.2%. For brief, Table 1
only lists the ten lowest modal frequencies for comparison.
These results reflect the accuracy in the implementation of Figure 3 (Color online) The mode error for each modal.

Table 1 Computing results of the ten lowest modal frequencies for the three methods

Order of mode 1 2 3 4 5 6 7 8 9 10
IRA 188.5 188.5 691.5 691.5 918.7 968.4 968.4 1064.5 1064.5 1148.6
Frequency
K-S 188.5 188.5 691.5 691.5 918.7 968.4 968.4 1064.5 1064.5 1148.6
(Hz)
J-D 188.5 188.7 691.5 691.5 918.5 968.2 968.2 1062.4 1064.9 1147.7

Table 2 Parallel performance of the modal analysis (n=710526) on Dawning 5000A

Processors 4 8 16 32 64 96 128
time (s) 440 365 293 252 213 258 289
K-S
efficiency (%) 100 60.3 37.5 21.8 12.9 7.1 4.8
time (s) 433 351 278 267 229 262 297
IRA
Efficiency(%) 100 61.7 38.9 20.3 11.8 6.9 4.6
time (s) 5622 2883 1436 796 400 290 249
J-D
Efficiency(%) 100 97.5 97.8 88.3 87.8 80.7 70.5
486 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3

4.2 The vibration example The parallel tests are implemented from 16 processors,
until to the maximum 128 processors. The minimum 50
The FEM of the vibration table is shown in Figure 4, which eigenpairs are desired for the three methods and the maxi-
is also meshed with hexahedron element of 8 nodes. Three mum iterative subspace dimension is set to 100 (double of
different cases of the FEM were created according to dif-
numbers of desired eigenpairs), the target value  for the
ferent numbers of degrees of freedom (i.e., the dimensions
J-D method is set to 5×107, and other parameters or con-
of K or M). The three different scales are Case 1, n=
6525024; Case 2, n=10264662 and Case 3, n=20753856. figurations are the same as those for the aircraft example.
The rough non-zero distributions of K are shown in Figure 5. Monitoring the computing process on the machine, we
The parallel computations are carried out on the Dawning found some phenomena similar to the aircraft example, i.e.,
5000A cluster mentioned above and a YH supercomputer, the computing cost in each processor is uniform with a fa-
respectively. vorable load balancing; the mode error for each modal fre-
The experiments on the Dawning 5000A cluster The quency computed by the IRA and K-S methods is still be-
configurations of the Dawning 5000A cluster are introduced low 1×1010 and that computed by the J-D method is in the
in sect. 4.1. For the IRA and K-S methods, the computing range of 1×105–1×104. Besides, we found that the maxi-
for Cases 2 and 3 is out of memory on the Dawning 5000A mum memory requirement for IRA and K-S methods is
cluster (up to 256 GB), so we focused on Case 1 and com- about 180 GB, and that for J-D method is about 40 GB.
pare the three algorithms on this machine. In Case 1, the The parallel computing time and efficiency for different
stiffness matrix K has 482348088 non-zero entries. The numbers of processors are shown in Table 3. Based on the
mass matrix M was adopted as a lumped one which has testing results of 16 processors, we plot speedup curves ap-
6525024 non-zero entries. proximately for the three methods in Figure 6. A converged
history for the J-D method is shown in Figure 7. Some in-
teresting phenomena are found from Table 3, Figures 6 and
7.

Figure 4 (Color online) The FEM of a vibration table. Figure 6 (Color online) Speedups for the three methods.

Figure 5 (Color online) The non-zero distributions of K. Figure 7 (Color online) Iterative history for the J-D method.
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 487

(1) The parallel performances for the IRA and K-S are challenging problems for modal analysis by far. The
almost coincident, with a relative poor parallel scalability non-zero entries of K for the two cases attain 757532056
that the computing inflexion emerges at 64 processors. In- and 1526838588, respectively.
stead, the parallel performances for the J-D method are ex- The experiments were carried out on a YH supercomput-
cellent, no computing inflexion emerges and the speedup is er, which include thousands of blade computing nodes, each
close to the ideal one. of them with twelve Intel processors and 48 GB of shared
(2) From another point of view, the IRA and K-S have memory. Besides, the communication bandwidth of the YH
their predominance arising from the direct factorization of supercomputer attains tens of kilomega, over one order of
linear equations, and the practical executing time is less magnitude higher than that of the Dawning 5000A.
than that of J-D method. However, the matrix factorization For Case 2, the three methods were compared within 512
in IRA and K-S takes more storage and bandwidth, which processors. The computing conditions are the same as those
influences the parallel scalability. Although the computing in Case 1. The computing time and efficiency for different
time for the J-D method is over 1 order of magnitude more processors are shown in Table 4. The speedup curves are
than that of IRA and K-S at 16 processors, this situation given in Figure 8. In this case, the maximum memory re-
changed to the same order at 128 processors due to the ex- quirement for IRA and K-S methods is about 350 GB, and
cellent parallel performance of the J-D method. that for J-D method is about 60 GB.
(3) Reviewing the implementing processes of subspace Due to the larger communication bandwidth and better
iteration in Algorithm 4, we find that the residual error for CPU processors on YH supercomputer, we can see from
the J-D method is computed in each step of outer iteration, Table 2 and Figure 5 that the parallel performance for Case
and the eigenpairs are computed one by one. We explain 2 is better than that for Case 1 on Dawning 5000A. The
this with the outer iterative history for the J-D methods computing inflexion for the IRA and the K-S emerges at
shown in Figure 7, where the residual error is below 1×104 384 processors whereas no inflexion emerges for the J-D
after about 300 iterative steps, which means the first method.
eigenpair is converged and extracted. The algorithm turns to In order to obtain the computing inflexion of the J-D
solving the next eigenpair with a corresponding residual method, we carried out the parallel tests with more proces-
error estimation in the following iterations. As the iterations sors for Case 2. Besides, parallel tests for Case 3 are also
proceed, the converged velocity for the J-D method be- carried out with the J-D method. Unfortunately, the parallel
comes larger due to the restart and deflation techniques tests with the IRA and the K-S methods for Case 3 are
where the repeatedly constructed subspaces contain more aborted due to the unsuccessful matrix factorization for such
information for the desired eigenpairs. a large scale in MUMPS. The numbers of parallel proces-
In order to compare the three methods more sufficient, sors for Cases 2 and 3 are extended to 4800 with the J-D
we challenged ourselves to a larger scale and more CPU method. Table 5 gives part of the testing data and Figure 9
processors in following. plots the speedup curves for the two cases, from which we
The experiments on a YH supercomputer The ex- can see that the computing inflexion emerges at 2048 pro-
periments in this section focus on Case 2 (n = 10264662) cessors for Case 2 and 3584 processors for Case 3.
and Case 3 (n = 20753856), which are probably the most Again comparing the J-D method with the IRA and the

Table 3 Parallel performance of the modal analysis (n=6525024) on Dawning 5000A

Processors 16 32 48 64 80 96 112 128


time (s) 2963 2538 2030 1762 2022 2193 2372 2278
K-S
efficiency(%) 100 58.4 48.7 42.0 29.3 22.5 17.8 16.3
time (s) 2924 2467 1862 1806 2014 2215 2424 2431
IRA
efficiency (%) 100 59.3 52.3 40.5 29.0 22.0 17.2 15.0
time (s) 47782 26459 17081 13325 11538 9912 9209 8275
J-D
efficiency (%) 100 90.3 93.2 89.6 82.8 80.3 74.1 72.2

Table 4 Parallel performance of the modal analysis (n=10264662) on YH

Processors 16 32 64 128 256 320 384 448 512


time (s) 13413 7529 4446 1855 1622 1332 1133 1305 1509
K-S
efficiency (%) 100 89.1 75.4 90.4 51.7 50.3 49.3 36.7 27.8
time (s) 9445 5345 2526 1910 1517 1345 1108 1261 1478
IRA
efficiency (%) 100 88.4 93.5 61.8 38.9 35.1 35.5 26.8 20.0
time (s) 120073 63754 31351 15836 8145 7005 6389 5753 5152
J-D
efficiency (%) 100 94.2 95.7 94.8 92.1 85.7 78.3 74.5 72.8
488 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3

Table 5 Parallel performance for Cases 2 and 3 with the J-D method

Processors 16 512 1024 1536 2048 2560 3072 3584 4096 4800
time (s) 120073 5152 2226 1622 1226 1482 1532 1666 1536 2068
Case 2
efficiency (%) 100 72.8 84.3 77.1 76.5 50.6 40.8 32.2 30.5 19.4
time (s) 292264 20722 8204 7372 4046 3316 3032 2148 2634 2708
Case 3
efficiency (%) 100 44.1 55.7 41.3 56.4 55.1 50.2 60.7 43.3 36.0

analysis. According to the requirements of modal analysis,


the three algorithms are modified by adding some assistant
techniques. Moreover, an integrated parallel computing
system for modal analysis is presented based on the
PANDA framework and some software packages. Our nu-
merical experiments via two typical engineering structures
validate the feasibility and adaptability of the three algo-
rithms for large-scale modal analysis. Meanwhile, some
detailed comparisons are given to study the parallel scala-
bility of the three algorithms.
We find that these three methods have their own merits
as well as demerits. Taking advantage of exact matrix fac-
torization, both IRA and K-S give a fast computation of
Figure 8 (Color online) Speedups for the three methods. modal analysis with fewer subspace iterative steps within a
small number of processors. However, the matrix factoriza-
tion also results in relatively poor parallel scalability and the
demand of vast memory storage. Instead, using inner-outer
iterations, the J-D method takes lower memory storage and
demonstrates an excellent parallel scalability, but the ap-
proximate solution of inner iterations results in more outer
iterations, as well as far longer parallel computing time
within a few processors. According to these merits and de-
merits, we prefer the IRA and K-S methods for modal anal-
ysis in case that fewer processors and sufficient memory are
provided, and prefer the J-D method in case that more pro-
cessors or less memory are provided.
Our implementations of the modal analysis fit the elastic
structures coming from various engineering realms. The
remarkable progress we have made is that we have elevated
Figure 9 (Color online) Speedups for the J-D method. the computing scale of modal analysis to a new level with
favorable parallel performance.
K-S methods for Case 2, we find that the practical minimum
computing time for the three methods is very close, that is, This work was supported by the National Defence Basic Fundamental
Research Program of China (Grant No. C1520110002) and the Funda-
1133 s for K-S, 1108 s for IRA and 1226 s for J-D. In any
mental Development Foundation of China Academy Engineering Physics
case, we believe that solving a modal analysis problem over (Grant No. 2012A0202008). We also thank the research teams of SLEPc,
ten millions of degrees of freedom in tens of minutes is very PETSc and some other teams for providing abundant codes on the internet.
challenging.
1 Saad Y. Numerical Methods for Large Eigenvalue Problems. Man-
chester: Manchester University Press, 1992. 151–218
5 Conclusions 2 Bai Z, Demmel J, Dongarra J, et al. Templates for the Solution of
Algebraic Eigenvalue Problems: A Practical Guide. Philadelphia:
In this paper, we focus on the algorithms and parallel com- SIAM, 2000. 41–183
3 Lehoucq R B. Analysis and Implementation of an Implicitly Restart-
putations for the large-scale modal analysis coming from
ed Iteration. Dissertation for Doctoral Degree. Houston: Rice Univer-
complex engineering structures. After a brief review of sity, 1995
subspace algorithms for eigenvalue problems, three pre- 4 Watkins D S. The Matrix Eigenvalue Problem: GR and Krylov Sub-
dominant algorithms are chosen for the large-scale modal space Methods. Philadelphia: SIAM, 2007. 349–420
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 489

5 Sleijpen G L G, Vorst H A. A Jacobi-Davidson iteration method for method for large scale eigenvalue problems. Appl Numer Math, 2010,
linear eigenvalue problems. SIAM J Matrix Anal Appl, 1996, 17: 60: 1083–1099
401–425 25 Betcke T, Voss H. A Jacobi-Davidson-type projection method for
6 Stewart G W. A Krylov-Schur algorithm for large eigenproblems. nonlinear eigenvalue problems. Future Generation Comput Syst,
SIAM J Matrix Anal Appl, 2001, 23: 601–614 2004, 20: 363–372
7 Grimes R G, Lewis J G, Simon H D. A shifted block Lanczos algo- 26 Sleijpen G L G, Booten G L, Fokkema D R, et al. Jacobi-Davidson
rithm for solving sparse symmetric generalized eigenproblems. SIAM type methods for generalized eigenproblems and polynomial
J Matrix Anal Appl, 1994, 15: 228–272 eigenproblems. BIT Numer Math, 1996, 36: 595–633
8 Guarracino M R, Perla F, Zanetti P. A parallel block Lanczos algo- 27 Hochstenbach M E, Notay Y. The Jacobi-Davidson method. GAMM
rithm and its implementation for the evaluation of some eigenvalues Mitt, 2006, 29: 368–382
of large sparse symmetric matrices on multicomputers. Int J Appl 28 Hernandez V, Roman J E, Tomas A, et al. A Survey of Software for
Math Comput Sci, 2006, 16: 241–249 Sparse Eigenvalue Problems. Technical Report, SLEPc STR-6, Un-
9 Xue F. Numerical Solution of Eigenvalue Problems with Spectral versidad Politecnica de Valencia. 2006
Transformations. Dissertation for Doctoral Degree. Maryland: Uni- 29 Baker C G, Hetmaniuk U L, Lehoucq R B, et al. Anasazi software for
versity of Maryland, 2009 the numerical solution of large-scale eigenvalue problems. ACM
10 Sorensen D C. Implicit application of polynomial filters in a k-step Trans Math Software, 2009, 36: 1–23
Arnoldi method. SIAM J Matrix Anal Appl, 1992, 13: 357–385 30 Hernandez V, Roman J E, Vidal V. SLEPc: A scalable and flexible
11 Calvetti D, Reichel L, Sorensen D C. An implicitly restarted Lanczos toolkit for the solution of eigenvalue problems. ACM Trans Math
method for large symmetric eigenvalue problems. Electron Trans Software, 2005, 31: 351–362
Numer Anal, 1994, 2: 1–21 31 Lehoucq R B, Sorensen D C, Yang C. ARPACK Users’ Guide: Solu-
12 Wu K, Simon H. Thick-restart Lanczos method for large symmetric tion of Large-Scale Eigenvalue Problems with Implicitly Restarted
eigenvalue problems. SIAM J Matrix Anal Appl, 2001, 22: 602–616 Arnoldi Methods. Philadelphia: SIAM, 1998
13 Arbenz P, Hetmaniuk U L, Lehoucq R B, et al. A comparison of ei- 32 Meerbergen K, Spence A, Roose D. Shift-invert and Cayley trans-
gensolvers for large-scale 3D modal analysis using AMG-precondi- forms for detection of rightmost eigenvalues of nonsymmetric matri-
tioned iterative methods. Int J Numer Methods Engrg, 2005, 64: 204– ces. BIT Numer Math, 1994, 34: 409–423
236 33 Duff I S, Erisman A M, Reid J K. Direct Methods for Sparse Matri-
14 Parlett B N. The Symmetric Eigenvalue Problem. Philadelphia: ces. London: Oxford University Press, 1986. 10: 101
SIAM, 1998. 261–367 34 Amestoy P R, Guermouche A, L’Excellent J Y, et al. Hybrid sched-
15 Golub G H, van Loan C F. Matrix Computations. 3rd ed. Baltimore: uling for the parallel solution of linear systems. Parallel Comput,
Johns Hopskins University Press, 1996. 391–507 2006, 32: 136–156
16 Stewart G W. Matrix Algorithms Volumn II: Eigensystems. Phila- 35 Amestoy P R, Duff I S, L’Excellent J Y, et al. MUMPS: A general
delphia: SIAM, 2001. 129–156 purpose distributed memory sparse solver. Lecture Notes Comp Sci,
17 Lanczos C. An iterative method for the solution of the eigenvalue 2001, 1947: 121–130
problem of linear differential and integral operators. J Res Nat Bur 36 Bai Z, Demmel J W. On swaping diagonal blocks in real Schur form.
Stand, 1950, 45: 255–282 Linear Algebra Appl, 1993, 186: 73–95
18 Arnoldi W E. The principle of minimized iterations in the solution of 37 Sleijpen G L G, Vorst H A, Meijerink E. Efficient expansion of sub-
the matrix eigenvalue problem. Quart Appl Math, 1951, 9: 17–29 spaces in the Jacobi-Davidson method for standard and generalized
19 Davidson E R. The iterative calculation of a few of the lowest eigen- eigenproblems. Electron Trans Numer Anal, 1998, 7: 75–89
values and corresponding eigenvectors of large real-symmetric ma- 38 Bathe K J. Finite Element Procedure. New Jersey: Prentice-Hall, Inc.,
trices. J Comput Phys, 1975, 17: 87–94 1996
20 Morgan R B, Scott D S. Generalizations of Davidson’s method for 39 Shi G M, Wu R, Wang K Y, et al. Discussion about the design for
computing eigenvalues of sparse symmetric matrices. SIAM J Sci mesh data structure within the parallel framework. In: World Con-
Stat Comput, 1986, 7: 817–825 gress on Computational Mechanics and 4th Asian Pacific Congress
21 Morgan R B. Generalizations of Davidson’s method for computing on Computational Mechanics, Sydney, 2010
eigenvalues of large nonsymmetric matrices. J Comput Phys, 1992, 40 Fan X H, Wu R, Chen P. Scalability study on large-scale parallel fi-
101: 287–291 nite element computing in Panda frame. Appl Mech Mater, 2012,
22 Ovtchinnikov E. Convergence estimates for the generalized Davidson 117-119: 489–492
method for symmetric eigenvalue problems I: The preconditioning 41 Karypis G, Kumar V. Metis-A Software Package for Partitioning Un-
aspect. SIAM J Numer Anal, 2003, 41: 258–271 structured Graphs, Partitioning Meshes, and Computing Fill-reducing
23 Voss H. A new justification of the Jacobi-Davidson method for large Orderings of Sparse Matrices, Version 4.0. Technical Report, Min-
eigenproblems. Linear Algebra Appl, 2007, 424: 448–455 neapolis: University of Minnesota. 1998
24 Genseberger M. Improving the parallel performance of a domain de- 42 Balay S, Buschelman K, Eijkhout V, et al. PETSc Users Manual.
composition preconditioning technique in the Jacobi-Davidson Technical Report, Argonne National Laboratory. 2008

S-ar putea să vă placă și