Documente Academic
Documente Profesional
Documente Cultură
Received June 11, 2012; accepted May 30, 2013; published online January 7, 2014
In this paper we study the algorithms and their parallel implementation for solving large-scale generalized eigenvalue problems
in modal analysis. Three predominant subspace algorithms, i.e., Krylov-Schur method, implicitly restarted Arnoldi method and
Jacobi-Davidson method, are modified with some complementary techniques to make them suitable for modal analysis. De-
tailed descriptions of the three algorithms are given. Based on these algorithms, a parallel solution procedure is established via
the PANDA framework and its associated eigensolvers. Using the solution procedure on a machine equipped with up to 4800
processors, the parallel performance of the three predominant methods is evaluated via numerical experiments with typical en-
gineering structures, where the maximum testing scale attains twenty million degrees of freedom. The speedup curves for dif-
ferent cases are obtained and compared. The results show that the three methods are good for modal analysis in the scale of ten
million degrees of freedom with a favorable parallel scalability.
modal analysis, parallel computing, eigenvalue problems, Krylov-Schur method, implicitly restarted Arnoldi method,
Jacobi-Davidson method
PACS number(s): 46.15.-x, 46.40.-f, 43.40.At, 02.60.-x
Citation: Fan X H, Chen P, Wu R A, et al. Parallel computing study for the large-scale generalized eigenvalue problems in modal analysis. Sci China-Phys
Mech Astron, 2014, 57: 477489, doi: 10.1007/s11433-013-5203-5
© Science China Press and Springer-Verlag Berlin Heidelberg 2014 phys.scichina.com link.springer.com
478 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3
ces onto these small subspaces, and then use these small 2.1 A brief review of subspace-based methods
subspaces computing part eigenpairs of original eigenvalue
problems [1,2]. Of these algorithms, three predominant Subspace-based methods differ from each other in the ways
the subspaces are generated. The dimension of subspaces is
methods, i.e., Implicitly Restarted Arnoldi (IRA) method
either fixed or variable. The classical algorithms working
[3,10], Krylov-Schur (K-S) method [6] and Jacobi-Davidson
with fixed dimensions mainly include power method
(J-D) method [5], have been regarded as the most successful
[1,2,14], Rayleigh quotient iteration [1,2,15] and subspace
and flexible ones for finding a few eigenpairs of a large,
iteration [1,16]. Starting from a subspace Vk, these methods
sparse matrix. IRA and K-S belong to Krylov subspace generate the next subspace Vk+1 of the same dimension by
methods and J-D belongs to Davidson-based methods. For applying a linear operator A on Vk. As k increases, Vk con-
the standard Hermitian eigenvalue problems, IRA and K-S tains better approximate eigenvectors corresponding to the
are also named Implicitly Restarted Lanczos method [11] eigenvalues of A with a larger magnitude.
and Krylov-Spectral method [6,12], respectively. A further class of subspace methods involves those
However, the applications or comparisons of these whose dimensions increase as the iteration proceeds. Usu-
methods on large-scale modal analysis are seldom reported ally one starts with a subspace of dimension one and in-
in the literature. We know that Peter Arbenz and his collab- creases the dimension at each iteration step. These methods
orators [13] did a parallel modal analysis of an aircraft car- are in general more efficient than fixed-dimension ones and
rier using several algorithms in 2005, the order n of K and have become the mainstream for large-scale eigenvalue
M being 1892644 and the numbers of parallel CPU proces- problems. The most popular subclass in this class is the
sors being 256. They believed that was the most challenging Krylov subspace method [1,2,4], which can be traced back
problem at that time. to the Lanczos method [17] for symmetric matrices and the
The goal of our work is to propose a strategy of parallel Arnoldi method [18] for nonsymmetric matrices. There are
modal analysis by modifying the existing, mature algo- many other updated developments for the two basic Krylov
rithms with some complementary techniques. These tech- subspace methods. A significant improvement of such sub-
niques include spectral transformation, solution strategy of class appertains to Sorensen’s IRA/Lanczos method [10,11].
linear systems for the three methods (IRA, K-S and J-D), as Later, Stewart [6] proposed the K-S method by expanding
well as restarting and deflation techniques for the J-D Arnoldi decomposition to a general Krylov decomposition.
method. The rest of the paper is organized as follows. Sect. The IRA/Lanczos method and the K-S method are mathe-
2 gives a brief review of the subspace algorithms, based on matically equivalent, recognized as the most successful al-
which three predominant algorithms (i.e., IRA, K-S and J-D) gorithms in Krylov subspace methods.
are chosen for modal analysis. Detailed algorithms with There exists another subclass with increasing subspace
modifications for modal analysis are given. Sect. 3 discuss- dimension, but without using Krylov subspaces. A Newton
es the parallel implementation, including pre-processing iteration step or an approximate Newton iteration step can
modeling, generation of K and M, parallel process manage- be applied to expand the subspace. The typical representa-
ment and solution with eigensolvers. Two representative tions for this subclass are Davidson-based methods [1,2,5,19].
examples with different scales in modal analysis are given Based on the standard Davidson method [19], some gener-
alized Davidson methods [1,2,20–22] were presented by
to compare the performance of the three algorithms in sect.
using different preconditioning. In 1996, Sleijpen and Vorst
4. Finally, the paper ends with a brief conclusion in sect. 5.
[5] presented a J-D algorithm, which speeded up the devel-
opment of Davidson-based methods. So far, the J-D method
2 Algorithm design for modal analysis has been extended to various eigenvalue problems [23–26].
For details of the J-D algorithm, refer to refs. [2,27].
With the above algorithms, a number of codes were de-
For large-scale eigenvalue problems, subspace-based algo-
veloped for the numerical solution to large-scale eigenvalue
rithms are initially presented for standard eigenvalue prob-
problems. Hernandez et al. [28] made a survey of freely
lems Ax = x, most of which come from the power method available software tools, including a list of libraries, pro-
and Rayleigh quotient iteration [14–16], providing good grams or subroutines. Among these codes, ANASAZI [29]
approximations quickly to the largest eigenvalues. However, and SLEPc [30] are recognized as two predominant tools
what we deal with in modal analysis is a generalized eigen- that come closest to a robust, efficient and general purpose
value equation, and the eigenpairs most relevant to modal code. So far, both ANASAZI and SLEPc are being actively
analysis are the lowest ones, which affect dynamic behavior developed.
of engineering structures. Therefore, the key problem in
algorithm design for modal analysis is to modify these algo-
2.2 Krylov subspace methods for modal analysis
rithms so that they can solve the corresponding generalized
eigenvalue equation and approximate the lowest eigenval- Krylov subspace methods [1,14] start from a single vector
ues efficiently. space K1(A,v) = span{v}, and then expand the Krylov sub-
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 479
space Kk(A,v) = span{v, Av, A2v, …, Ak1v} in the k-th itera- viewed as a specialized variant of IRA and K-S, we de-
tion to Kk+1(A,v). We focus our study on IRA and K-S scribe the IRA and K-S algorithms in a general sense for
methods in this section. Since the implicitly restarted applications in other realms. Our description starts with the
Lanczos method and the Krylov-Spectral method can be basic Arnoldi method [2,10] shown in Algorithm 1.
The IRA algorithm is one of the most successful and instead of an upper Hessenberg matrix Hm in Arnoldi de-
flexible methods for finding a few eigenpairs of a large ma- composition, bm+1 is a common vector instead of a unit one,
trix. However, the method needs to preserve structures of and the columns of (Um, um+1) are independent and called
the Arnoldi decomposition, i.e., an upper Hessenberg matrix the basis of Krylov decomposition. Using this Krylov de-
Hm and a unit vector em, which restricts the range of trans- composition, Stewart proposed a K-S method [6] via Schur
formations that can be performed on the decomposition. decomposition on Bm. For modal analysis, the K-S method
Just for this reason, Stewart [6] makes a minor improvement is treated in the same way as the IRA, i.e., eqs. (2)–(5).
on the IRA, which expands the Arnoldi decomposition
Since the two methods are similar and mathematically
AVm VmHm= f m emT to a general Krylov decomposition AUm equivalent, we easily get the K-S method for modal analysis
UmBm = um+1 bmT1 , where Bm (m×m) is a common matrix as Algorithm 3.
6: get a K-S factorization AUm UmTm = um+1 bmT1 with updated variables: Um ← Um Q1, bmT1 ← bmT1 Q1, via
right-multiplying Q1 on Krylov factorization AUm UmBm= um+1 bmT1 .
7: reorder the diagonal blocks of the K-S factorization with another orthogonal transformation Q2 (see ref. [36] for de-
tails): Um←Um Q2, Tm← Q2TTm Q2 , bmT1 ← bmT1 Q2.
8: compute all eigenpairs of Tm, Tmyi = iyi.
9: if k or more eigenvalues satisfy the converged error estimation , then
10: i = 1/i+, xi = Um yi, break.
11: end if
12: deflate to a K-S decomposition of order p ( k< p<m ), where the already converged vectors are locked: AUp – UpTp =
up+1 bpT1 with Up= Um(:,1:p), Tp= Tm(:,1:p), up+1= um+1 and bp+1 = bm+1(1:p).
13: extend again to a Krylov decomposition of order m: AUm UmBm= um+1 bmT1 , via mp additional steps of the Arnoldi
procedure with initial up+1.
14: end repetition.
Two aspects should be emphasized for Algorithms 2 and M-orthogonal basis v1, v2, …, vm for the search subspace is
3. Firstly, the matrix A = (KM)1M is not explicitly com- adopted, i.e., making vi* Mv j equal to one for i = j and equal
puted throughout the implementations. The expansion of to zero for i ≠ j, which results in
Krylov subspace vector Avj is accomplished by solution of a
linear system (eq. (4)); secondly, although both K and M are Vm* MVm I m . (9)
symmetric for modal analysis, the matrix A = (KM)1M
becomes a non-symmetric one; thus Algorithm 2 cannot be For our modal analysis, several aspects should be con-
simply equivalent to the Implicitly Restarted Lanczos sidered. Firstly, a fast iterative solution to eq. (7) with prop-
method [11], and Algorithm 3 cannot be equivalent to the er preconditioning is needed for outer subspace iterations.
thick-restart Lanczos method (or named as Krylov-Spectral Secondly, the dimensions of searching subspace should not
method) [12]. be too large due to the time and storage cost. A restart and
The major problem of Algorithms 2 and 3, which often deflation strategy similar to the Krylov subspace methods
becomes a bottleneck, is to find a convenient factorization should be taken into consideration. Thirdly, a spectral
of eq. (4) so that the associated linear systems of equations transformation technique is required to get a number of
can be solved efficiently. The factorization of eq. (4) takes a smallest eigenpairs for modal analysis since only the largest
big percentage of cost in terms of computing time and stor- one is computed in the standard J-D method [5]. Based on
age in the two algorithms. Also, an effective shift selection the above three considerations, we modify the standard J-D
is important, which depends on the user’s preferences and method as follows.
on knowledge of the underlying generalized eigenvalue The iterative solution to the correction equation In
problem. computing a number of eigenpairs in J-D method, it’s nec-
essary to extract the already converged ones from the itera-
tive subspaces timely. For k already converged eigenpairs,
2.3 J-D method for modal analysis
we construct a Schur form of the generalized eq. (1) via a
The J-D method [2,5,23] belongs to the Davidson-based M-orthogonal technique [1,2], that is
algorithms, which is an effective complementarity to the KQk = ZkDk, (10)
Krylov subspace algorithms for solving large, sparse eigen-
value problems. The kernel of the J-D method is to solve a where, Dk is a diagonal matrix with the k computed eigen-
so-called correction equation, which becomes [2] values on its diagonal; Zk =MQk and Qk is M-orthogonal
matrix consisting of k computed eigenvectors of the gener-
I Mu u K M I u u M t r
k
*
k k k
*
k k with t Muk alized eigenvalue problem (1). Thus, the correction eq. (7)
(7) is written as:
for eq. (11). P is a preconditioned matrix of K − kM, which numbers of inner iterations are often restricted to a fixed
approximates K −kM and is cheap to “invert”. Applying value (such as 30 steps), with an “inexact” t for outer itera-
P 1 on eq. (11), we can get tion.
Restart and deflation techniques With the increasing
P 1 r ,
P 1 At (13) dimension m of the searching subspace, the computing time
k
and storage increase quickly. To overcome this problem,
with A I ZQ
* K M I QZ
k
. *
The solution to consider a restart strategy similar to the Krylov subspace
methods. A good strategy as shown in ref. [2] is to restart
eq. (13) is transformed to some easily solvable linear equa-
with the subspace spanned by the Ritz vectors of a small
tions [2,37], which are described as follows.
number of the Ritz values closest to a specified target value.
For the right hand of eq. (13), denoting rk P 1 rk , we can We restart as soon as the dimension of the searching
obtain an equivalent form space for the current eigenvector exceeds the maximum.
r . Before restarting, some Ritz values may have been close
Pr (14)
k k enough to the wanted eigenvalues, and the remaining part of
Inserting eq. (12) into eq. (14), and using the orthogonal the current subspace still has rich components in nearby
eigenpairs. We can use this information as the basis of a
condition Z * rk 0 , Z * rk 0 , we get
subspace for computing a next eigenvector. In order to
avoid the already converged eigenvector reentering the
1
rk P 1 rk P 1 Z Z * P 1 Z Z * P 1 rk , (15)
computational process, we adopt a deflation technique [2],
which makes the new searching vectors in the J-D algorithm
from which we easily compute rk since Z is a matrix of explicitly orthogonal to the converged eigenvectors.
n×(k+1) and Z * P 1 Z is a matrix of (k+1)×(k+1), with In this way, the standard J-D algorithm is modified by
adding a maximum subspace dimension mmax for expanding
k n for our modal analysis. When rk is obtained from eq.
subspaces and a minimum one mmin for restarting. Further-
(15), eq. (13) becomes more, the basis vectors for restarting contain much infor-
r ,
P 1 At (16) mation for the wanted eigenpairs inherited from the previ-
k
ous subspace expansions.
which can be solved by a Krylov subspace method (such as Spectral transformation Theoretically, the J-D algo-
CG or MINRES [1,2]). The solving process of eq. (16) can rithm will converge to the largest eigenvalues when the
rˆ with vectors v and rˆ for
be described by P 1 Av i i i i
chosen target is larger than max, and converges to the
the iteration process. In order to implement the iterative smallest eigenvalues if the chosen is smaller than min. But
process in the orthogonal component subspaces of uk, we the Rayleigh-Ritz procedure in the J-D method will always
r ) for lead to the computed Ritz values converging to the largest
choose start t0 = 0 (i.e., v0 rk P 1 At 0 k
parts. A small target value and a large number of iterative
rˆ ; then the iterative vectors v and rˆ will sat-
P 1 Av
i i i i corrections via eq. (11) may change the situation but too
isfy the orthogonal conditions Z * vi 0 , Z * ri 0 . Using costly or even no smallest eigenvalue can be found.
Therefore, a spectral transformation is needed to change
these two orthogonal relations, we insert eq. (12) and ex-
the smallest eigenvalue problems for modal analysis to the
pression of A into P 1 Av
rˆ and finally derive a for-
i i largest ones. Similarly, we do this by a S-I spectral trans-
mula with the same form as eq. (15), that is formation
1
rˆi P 1 z P 1 Z Z * P -1 Z Z * P 1 z , with z K k M vi . 1
Kx Mx Mx ( K M ) x, (18)
(17)
and substitute K and M in the J-D method with M and
Then, we can solve rˆi from eq. (17) in the same way as KM, respectively. In most cases where no rigid body
eq. (15). The iteration stops when rˆi approximates rk , modes exist (i.e., >0), we let = 0 and the S-I transfor-
thus t vi . Usually, the process of solution to the correc- mation becomes the interchange between M and K in the
tion eq. (13) is called “inner iteration”, and the process of implementation of the J-D method. Correspondingly, the
expanding subspace is called “outer iteration”. For each step eigenvalues computed in J-D method become the recipro-
of outer iteration, inner iterations are required. In practical cals of original ones. Thus, the target value can be chosen
implementations, there exists a balance between the inner a little larger than the reciprocal of smallest eigenvalue of
iteration and the outer iteration. Generally, increasing the eq. (1).
number of inner iterations causes a decrease of outer itera- Algorithm design Based on the above analysis and
tions, but the total computing time may increase due to modifications, the J-D method for modal analysis is de-
computational cost of the inner iterations. Therefore, the scribed as Algorithm 4.
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 483
From Algorithms 2–4, we find some differences between where (i, xi) is a computed eigenpair. It is conceivable that
the Krylov subspace methods and the J-D method. In Algo- a small residual error implies good accuracy in the comput-
rithms 2 and 3 for IRA and K-S methods, the solution of ed i and xi [1,2]. Also, we adopt a mode error (or relative
eigenpairs is carried out by an outer subspace iteration and a error) defined by Bathe [38] to give further evaluation for
direct matrix decomposition for the inner linear systems, the convergence, that is
and the wanted eigenpairs are extracted simultaneously
mod = ||Kxi Mxi||2 / ||iMxi||2. (20)
from the corresponding projective matrix; while the solution
of eigenpairs in the J-D method (Algorithm 4) is carried out For more information about convergence properties of
by an inner-outer iteration, and the wanted eigenpairs are these methods, refer to refs. [1,2,22,38].
extracted one by one.
The stopping criterion for IRA, K-S and J-D methods is
3 Implementation of parallel computing
to accept an eigenvector approximation as soon as the norm
of the residual (for the normalized eigenvector approxima-
tion) is below a given . The residual vector (or residual The algorithms mentioned in sect. 2 are the kernel to realize
error) is defined by the modal analysis of large engineering structures. Besides,
many other assistant processes are absolutely necessary to
ri = Kxi iMxi, (19) transform the engineering problems to mathematic ones.
484 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3
These processes include the finite element modeling of en- employed to form K and M in a parallel manner.
gineering structures, the discretization of the finite element (3) Parallel computing of the generalized eigenvalue
models (FEM) to generate K and M, and the calling of the problems Kx = Mx. In this step, three different methods as
algorithms. shown in Algorithms 2–4 are implemented via SLEPc
The finite element modeling of a large structure can be (PETSc), ARPACK and some other external packages such
finished via pre-processing commercial software such as as MUMPS. The PANDA framework is responsible for the
MSC.Patran. At present, MSC.Patran is able to build a FEM calling and managing of these software packages.
over ten million degrees of freedom on a common work-
station. We realized the parallel modal analysis via a
PANDA framework [39,40], developed by our parallel 4 Numerical experiments
computing team of China Academy of Engineering and
Physics. The PANDA framework is a finite element simula- In this section, some numerical experiments of two repre-
tion code, which is capable of execution on parallel plat- sentative examples in modal analysis are given to compare
forms and includes interfaces to third-party libraries; the the parallel performance and efficiency of the three algo-
communication of parallel computing in PANDA is based rithms in sect. 3. The two examples are the FEMs of a typi-
on MPI. cal aircraft and a vibration table. The structure of an aircraft
In MSC.Patran, we build the FEM of a structure and is mainly composed of laminose plates and shells, and the
generate element and node groups according to different vibration table is composed of solid structures as well as
material properties and boundary conditions. Then we ex- shell ones. The experiments were tested on two high per-
port these element and node information to a neutral file formance computers with a Linux platform.
provided in MSC.Patran. Using the interface between
PANDA and MSC.Patran, the neutral file is translated to the 4.1 The aircraft example
finite element grid file that can be identified by PANDA.
The K and M for the FEM are generated with a parallel The FEM of the aircraft example is shown in Figure 1. The
manner in PANDA. Integrating with Metis [41], the model is meshed with hexahedron element of 8 nodes, and
PANDA framework provides a domain decomposition the node displacements at the bottom of the model are fixed
function, which partitions the FEM into a number of small as a boundary constraint. We test this example with a com-
subdomains. The information of each subdomain is deliv- puting scale of n = 710526, where the corresponding stiff-
ered to different processors for finite element discretization ness matrix K has 52078554 non-zero entries. The rough
and numerical integral, until the matrices K and M form. non-zero distributions of K matrix are shown in Figure 2.
Till this step, the preparations for the parallel modal analy- The M matrix is adopted as a lumped mass matrix (i.e., di-
sis are completed. agonal matrix).
Another two important tools for parallel modal analysis We carried out the parallel modal analysis on a Dawning
are SLEPc [30] and ARPACK [31]. SLEPc provides a col- 5000A cluster, a machine consisting of 16 blade computing
lection of eigensolvers on top of PETSc [42], including K-S, nodes, each of them with eight Intel Xeon E5550 2.66 GHz
J-D, Arnoldi/Lanczos and some other subspace methods. processors and 16 GB of shared memory. The communica-
The basic parallel implementations in SLEPc are vector tion of the machine is connected by a kilomega network.
operations, the matrix-vector product and linear equation The parallel tests are executed from 4 processors, until
solvers, which are supplied by PETSc objects. ARPACK is
a free software package specially developed for the imple-
mentation of IRA methods.
Both SLEPc and ARPACK employ the MPI standard for
message-passing communication and had been integrated
into the PANDA framework. Besides, the MUMPS [35]
software for direct, sparse matrix decomposition was also
integrated into the PANDA framework.
In this way, the parallel computing flow of modal analy-
sis for large engineering structures can be shown as follows.
(1) Pre-processing modeling of the engineering structures.
In this step, the engineering structures are approximately
substituted with a large number of finite element meshes.
These meshes are divided into some groups according to
different properties and exported as a neutral file.
(2) Model translating and assembling of eigenvalue
equations. In this step, PANDA framework and METIS are Figure 1 (Color online) The FEM of an aircraft.
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 485
Table 1 Computing results of the ten lowest modal frequencies for the three methods
Order of mode 1 2 3 4 5 6 7 8 9 10
IRA 188.5 188.5 691.5 691.5 918.7 968.4 968.4 1064.5 1064.5 1148.6
Frequency
K-S 188.5 188.5 691.5 691.5 918.7 968.4 968.4 1064.5 1064.5 1148.6
(Hz)
J-D 188.5 188.7 691.5 691.5 918.5 968.2 968.2 1062.4 1064.9 1147.7
Processors 4 8 16 32 64 96 128
time (s) 440 365 293 252 213 258 289
K-S
efficiency (%) 100 60.3 37.5 21.8 12.9 7.1 4.8
time (s) 433 351 278 267 229 262 297
IRA
Efficiency(%) 100 61.7 38.9 20.3 11.8 6.9 4.6
time (s) 5622 2883 1436 796 400 290 249
J-D
Efficiency(%) 100 97.5 97.8 88.3 87.8 80.7 70.5
486 Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3
4.2 The vibration example The parallel tests are implemented from 16 processors,
until to the maximum 128 processors. The minimum 50
The FEM of the vibration table is shown in Figure 4, which eigenpairs are desired for the three methods and the maxi-
is also meshed with hexahedron element of 8 nodes. Three mum iterative subspace dimension is set to 100 (double of
different cases of the FEM were created according to dif-
numbers of desired eigenpairs), the target value for the
ferent numbers of degrees of freedom (i.e., the dimensions
J-D method is set to 5×107, and other parameters or con-
of K or M). The three different scales are Case 1, n=
6525024; Case 2, n=10264662 and Case 3, n=20753856. figurations are the same as those for the aircraft example.
The rough non-zero distributions of K are shown in Figure 5. Monitoring the computing process on the machine, we
The parallel computations are carried out on the Dawning found some phenomena similar to the aircraft example, i.e.,
5000A cluster mentioned above and a YH supercomputer, the computing cost in each processor is uniform with a fa-
respectively. vorable load balancing; the mode error for each modal fre-
The experiments on the Dawning 5000A cluster The quency computed by the IRA and K-S methods is still be-
configurations of the Dawning 5000A cluster are introduced low 1×1010 and that computed by the J-D method is in the
in sect. 4.1. For the IRA and K-S methods, the computing range of 1×105–1×104. Besides, we found that the maxi-
for Cases 2 and 3 is out of memory on the Dawning 5000A mum memory requirement for IRA and K-S methods is
cluster (up to 256 GB), so we focused on Case 1 and com- about 180 GB, and that for J-D method is about 40 GB.
pare the three algorithms on this machine. In Case 1, the The parallel computing time and efficiency for different
stiffness matrix K has 482348088 non-zero entries. The numbers of processors are shown in Table 3. Based on the
mass matrix M was adopted as a lumped one which has testing results of 16 processors, we plot speedup curves ap-
6525024 non-zero entries. proximately for the three methods in Figure 6. A converged
history for the J-D method is shown in Figure 7. Some in-
teresting phenomena are found from Table 3, Figures 6 and
7.
Figure 4 (Color online) The FEM of a vibration table. Figure 6 (Color online) Speedups for the three methods.
Figure 5 (Color online) The non-zero distributions of K. Figure 7 (Color online) Iterative history for the J-D method.
Fan X H, et al. Sci China-Phys Mech Astron March (2014) Vol. 57 No. 3 487
(1) The parallel performances for the IRA and K-S are challenging problems for modal analysis by far. The
almost coincident, with a relative poor parallel scalability non-zero entries of K for the two cases attain 757532056
that the computing inflexion emerges at 64 processors. In- and 1526838588, respectively.
stead, the parallel performances for the J-D method are ex- The experiments were carried out on a YH supercomput-
cellent, no computing inflexion emerges and the speedup is er, which include thousands of blade computing nodes, each
close to the ideal one. of them with twelve Intel processors and 48 GB of shared
(2) From another point of view, the IRA and K-S have memory. Besides, the communication bandwidth of the YH
their predominance arising from the direct factorization of supercomputer attains tens of kilomega, over one order of
linear equations, and the practical executing time is less magnitude higher than that of the Dawning 5000A.
than that of J-D method. However, the matrix factorization For Case 2, the three methods were compared within 512
in IRA and K-S takes more storage and bandwidth, which processors. The computing conditions are the same as those
influences the parallel scalability. Although the computing in Case 1. The computing time and efficiency for different
time for the J-D method is over 1 order of magnitude more processors are shown in Table 4. The speedup curves are
than that of IRA and K-S at 16 processors, this situation given in Figure 8. In this case, the maximum memory re-
changed to the same order at 128 processors due to the ex- quirement for IRA and K-S methods is about 350 GB, and
cellent parallel performance of the J-D method. that for J-D method is about 60 GB.
(3) Reviewing the implementing processes of subspace Due to the larger communication bandwidth and better
iteration in Algorithm 4, we find that the residual error for CPU processors on YH supercomputer, we can see from
the J-D method is computed in each step of outer iteration, Table 2 and Figure 5 that the parallel performance for Case
and the eigenpairs are computed one by one. We explain 2 is better than that for Case 1 on Dawning 5000A. The
this with the outer iterative history for the J-D methods computing inflexion for the IRA and the K-S emerges at
shown in Figure 7, where the residual error is below 1×104 384 processors whereas no inflexion emerges for the J-D
after about 300 iterative steps, which means the first method.
eigenpair is converged and extracted. The algorithm turns to In order to obtain the computing inflexion of the J-D
solving the next eigenpair with a corresponding residual method, we carried out the parallel tests with more proces-
error estimation in the following iterations. As the iterations sors for Case 2. Besides, parallel tests for Case 3 are also
proceed, the converged velocity for the J-D method be- carried out with the J-D method. Unfortunately, the parallel
comes larger due to the restart and deflation techniques tests with the IRA and the K-S methods for Case 3 are
where the repeatedly constructed subspaces contain more aborted due to the unsuccessful matrix factorization for such
information for the desired eigenpairs. a large scale in MUMPS. The numbers of parallel proces-
In order to compare the three methods more sufficient, sors for Cases 2 and 3 are extended to 4800 with the J-D
we challenged ourselves to a larger scale and more CPU method. Table 5 gives part of the testing data and Figure 9
processors in following. plots the speedup curves for the two cases, from which we
The experiments on a YH supercomputer The ex- can see that the computing inflexion emerges at 2048 pro-
periments in this section focus on Case 2 (n = 10264662) cessors for Case 2 and 3584 processors for Case 3.
and Case 3 (n = 20753856), which are probably the most Again comparing the J-D method with the IRA and the
Table 5 Parallel performance for Cases 2 and 3 with the J-D method
Processors 16 512 1024 1536 2048 2560 3072 3584 4096 4800
time (s) 120073 5152 2226 1622 1226 1482 1532 1666 1536 2068
Case 2
efficiency (%) 100 72.8 84.3 77.1 76.5 50.6 40.8 32.2 30.5 19.4
time (s) 292264 20722 8204 7372 4046 3316 3032 2148 2634 2708
Case 3
efficiency (%) 100 44.1 55.7 41.3 56.4 55.1 50.2 60.7 43.3 36.0
5 Sleijpen G L G, Vorst H A. A Jacobi-Davidson iteration method for method for large scale eigenvalue problems. Appl Numer Math, 2010,
linear eigenvalue problems. SIAM J Matrix Anal Appl, 1996, 17: 60: 1083–1099
401–425 25 Betcke T, Voss H. A Jacobi-Davidson-type projection method for
6 Stewart G W. A Krylov-Schur algorithm for large eigenproblems. nonlinear eigenvalue problems. Future Generation Comput Syst,
SIAM J Matrix Anal Appl, 2001, 23: 601–614 2004, 20: 363–372
7 Grimes R G, Lewis J G, Simon H D. A shifted block Lanczos algo- 26 Sleijpen G L G, Booten G L, Fokkema D R, et al. Jacobi-Davidson
rithm for solving sparse symmetric generalized eigenproblems. SIAM type methods for generalized eigenproblems and polynomial
J Matrix Anal Appl, 1994, 15: 228–272 eigenproblems. BIT Numer Math, 1996, 36: 595–633
8 Guarracino M R, Perla F, Zanetti P. A parallel block Lanczos algo- 27 Hochstenbach M E, Notay Y. The Jacobi-Davidson method. GAMM
rithm and its implementation for the evaluation of some eigenvalues Mitt, 2006, 29: 368–382
of large sparse symmetric matrices on multicomputers. Int J Appl 28 Hernandez V, Roman J E, Tomas A, et al. A Survey of Software for
Math Comput Sci, 2006, 16: 241–249 Sparse Eigenvalue Problems. Technical Report, SLEPc STR-6, Un-
9 Xue F. Numerical Solution of Eigenvalue Problems with Spectral versidad Politecnica de Valencia. 2006
Transformations. Dissertation for Doctoral Degree. Maryland: Uni- 29 Baker C G, Hetmaniuk U L, Lehoucq R B, et al. Anasazi software for
versity of Maryland, 2009 the numerical solution of large-scale eigenvalue problems. ACM
10 Sorensen D C. Implicit application of polynomial filters in a k-step Trans Math Software, 2009, 36: 1–23
Arnoldi method. SIAM J Matrix Anal Appl, 1992, 13: 357–385 30 Hernandez V, Roman J E, Vidal V. SLEPc: A scalable and flexible
11 Calvetti D, Reichel L, Sorensen D C. An implicitly restarted Lanczos toolkit for the solution of eigenvalue problems. ACM Trans Math
method for large symmetric eigenvalue problems. Electron Trans Software, 2005, 31: 351–362
Numer Anal, 1994, 2: 1–21 31 Lehoucq R B, Sorensen D C, Yang C. ARPACK Users’ Guide: Solu-
12 Wu K, Simon H. Thick-restart Lanczos method for large symmetric tion of Large-Scale Eigenvalue Problems with Implicitly Restarted
eigenvalue problems. SIAM J Matrix Anal Appl, 2001, 22: 602–616 Arnoldi Methods. Philadelphia: SIAM, 1998
13 Arbenz P, Hetmaniuk U L, Lehoucq R B, et al. A comparison of ei- 32 Meerbergen K, Spence A, Roose D. Shift-invert and Cayley trans-
gensolvers for large-scale 3D modal analysis using AMG-precondi- forms for detection of rightmost eigenvalues of nonsymmetric matri-
tioned iterative methods. Int J Numer Methods Engrg, 2005, 64: 204– ces. BIT Numer Math, 1994, 34: 409–423
236 33 Duff I S, Erisman A M, Reid J K. Direct Methods for Sparse Matri-
14 Parlett B N. The Symmetric Eigenvalue Problem. Philadelphia: ces. London: Oxford University Press, 1986. 10: 101
SIAM, 1998. 261–367 34 Amestoy P R, Guermouche A, L’Excellent J Y, et al. Hybrid sched-
15 Golub G H, van Loan C F. Matrix Computations. 3rd ed. Baltimore: uling for the parallel solution of linear systems. Parallel Comput,
Johns Hopskins University Press, 1996. 391–507 2006, 32: 136–156
16 Stewart G W. Matrix Algorithms Volumn II: Eigensystems. Phila- 35 Amestoy P R, Duff I S, L’Excellent J Y, et al. MUMPS: A general
delphia: SIAM, 2001. 129–156 purpose distributed memory sparse solver. Lecture Notes Comp Sci,
17 Lanczos C. An iterative method for the solution of the eigenvalue 2001, 1947: 121–130
problem of linear differential and integral operators. J Res Nat Bur 36 Bai Z, Demmel J W. On swaping diagonal blocks in real Schur form.
Stand, 1950, 45: 255–282 Linear Algebra Appl, 1993, 186: 73–95
18 Arnoldi W E. The principle of minimized iterations in the solution of 37 Sleijpen G L G, Vorst H A, Meijerink E. Efficient expansion of sub-
the matrix eigenvalue problem. Quart Appl Math, 1951, 9: 17–29 spaces in the Jacobi-Davidson method for standard and generalized
19 Davidson E R. The iterative calculation of a few of the lowest eigen- eigenproblems. Electron Trans Numer Anal, 1998, 7: 75–89
values and corresponding eigenvectors of large real-symmetric ma- 38 Bathe K J. Finite Element Procedure. New Jersey: Prentice-Hall, Inc.,
trices. J Comput Phys, 1975, 17: 87–94 1996
20 Morgan R B, Scott D S. Generalizations of Davidson’s method for 39 Shi G M, Wu R, Wang K Y, et al. Discussion about the design for
computing eigenvalues of sparse symmetric matrices. SIAM J Sci mesh data structure within the parallel framework. In: World Con-
Stat Comput, 1986, 7: 817–825 gress on Computational Mechanics and 4th Asian Pacific Congress
21 Morgan R B. Generalizations of Davidson’s method for computing on Computational Mechanics, Sydney, 2010
eigenvalues of large nonsymmetric matrices. J Comput Phys, 1992, 40 Fan X H, Wu R, Chen P. Scalability study on large-scale parallel fi-
101: 287–291 nite element computing in Panda frame. Appl Mech Mater, 2012,
22 Ovtchinnikov E. Convergence estimates for the generalized Davidson 117-119: 489–492
method for symmetric eigenvalue problems I: The preconditioning 41 Karypis G, Kumar V. Metis-A Software Package for Partitioning Un-
aspect. SIAM J Numer Anal, 2003, 41: 258–271 structured Graphs, Partitioning Meshes, and Computing Fill-reducing
23 Voss H. A new justification of the Jacobi-Davidson method for large Orderings of Sparse Matrices, Version 4.0. Technical Report, Min-
eigenproblems. Linear Algebra Appl, 2007, 424: 448–455 neapolis: University of Minnesota. 1998
24 Genseberger M. Improving the parallel performance of a domain de- 42 Balay S, Buschelman K, Eijkhout V, et al. PETSc Users Manual.
composition preconditioning technique in the Jacobi-Davidson Technical Report, Argonne National Laboratory. 2008