Covariance Analysis For Seismic Signal Processing

Downloaded 06/26/14 to 134.153.184.170. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.
org/
Downloaded 06/26/14 to 134.153.184.170. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
Covariance Analysis
for
Seismic Signal Processing
Edited by
R. Lynn Kirlin
William J. Done
Series Editor
Stephen J. Hill
Geophysical Developments Series, No. 8

Society of Exploration Geophysicists
Covariance analysis for seismic signal processing / edited by R. Lynn Kirlin and William J. Done.
p. cm.(Geophysical development series ; v. 8)
Includes bibliographical references and index.
ISBN 1-56080-081-X (vol.). ISBN 0-931830-41-9 (series)
1. Seismic prospecting. 2. Signal Processing. 3. Analysis of covariance.
I. Kirlin, R. Lynn. II. Done, William J.
III. Series.
TN269.8.C68 1998
622.1592dc21
988792
CIP
ISBN 978-0-931830-41-9 (Series)
ISBN 978-1-56080-081-1 (Volume)
Society of Exploration Geophysicists
P.O. Box 702740
Tulsa, OK 74170-2740
1999 Society of Exploration Geophysicists
All rights reserved. This book or parts hereof may not be reproduced in any form without written
permission from the publisher.
Published 1999
Reprinted 2009
Printed in the United States of America.
Contents
1
Introduction ................................................................................... 1
R. Lynn Kirlin
Data Vectors and Covariance Matrices ........................................ 5

R. Lynn Kirlin
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3
Analysis Regions.................................................................... 6
Data Windows....................................................................... 7
Data Vectors.......................................................................... 8
Sample Data Covariance Matrix ............................................ 9
Rationale for Sample Covariance Analysis ........................... 10
Statistics of the Sample Covariance Matrix........................... 11
Robust Estimation of Sample Covariance Matrices ............... 13
References ........................................................................... 17
Eigenstructure, the Karhunen Loeve Transform, and SingularValue Decomposition .................................................................. 19

R. Lynn Kirlin
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
4
Eigenstructure and Least-Squares Fit of a Random Vector .... 19

The Eigenstructure Forms of the Covariance Matrix ............. 20
Singular-Value Decomposition and the Karhunen
Loeve Transform .................................................................. 21
3.3.1 The Karhunen Loeve Transform .............................. 23
3.3.2 Null Space and the Minimum Norm Solution ......... 24
A Seismic Example .............................................................. 26
A Second Example............................................................... 27
Bias-Variance Tradeoff in a Seismic Profile .......................... 29
A Robust Eigenstructure Estimator........................................ 31
References ........................................................................... 33
Vector Subspaces ......................................................................... 35

R. Lynn Kirlin
4.1
4.2
4.3
4.4
4.5
4.6
The Linear Statistical Model................................................. 35

4.1.1 Comments on Whiteness and Stationarity ............... 37
Covariance Matrix Structure ................................................ 37
4.2.1 Eigenstructure and Subspaces ................................. 38
4.2.2 Statistics of Eigenstructure Estimates........................ 40
4.2.3 Statistics of Subspace Component Estimates............ 42
Examples of Signal Subspaces.............................................. 44
Seismic Wavefronts in Noise ............................................... 46
Nonwhite Noise .................................................................. 47
References ........................................................................... 50
iii
Temporal and Spatial Spectral Analysis ..................................... 51

R. Lynn Kirlin
5.1
The Discrete Power Spectrum.............................................. 52
5.1.1 Relation of Sx(z) to Eigenstructure of Rx................... 53
5.1.2 All-Pole Model of Sx(z) ........................................... 55
5.1.3 Sx(z) as a Function of Rx.......................................... 57
5.2
High-Resolution Spectral Estimators..................................... 58
5.2.1 Minimum Variance Distortionless Response
(MVDR) .................................................................. 60
5.2.2 MUSIC.................................................................... 61
5.2.3 Eigenvalue .............................................................. 62
5.2.4 Enhanced Minimum Variance................................. 62
5.2.5 Maximum Entropy .................................................. 63
5.2.6 Minimum Norm...................................................... 64
5.2.7 Maximum Entropy Spectrum with Eigenstructure
Projection Constraints ............................................. 65
5.2.7.1 Example 1................................................ 68
5.2.7.2 Example 2................................................ 69
5.2.7.3 Example 3................................................ 69
5.2.8 Complex New Maximum Entropy Estimator............ 71
5.2.9 Example Spectral Estimates ..................................... 72
5.2.9.1 Comparison with Minimum Norm ........... 73
5.3
Conclusions......................................................................... 74
5.4
References ........................................................................... 77
Root-Mean-Square Velocity Estimation ...................................... 83

R. Lynn Kirlin
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
7
Introduction......................................................................... 83
Multiple Wavefront Model................................................... 83
Frequency Focusing and Spatial Smoothing ......................... 87
Discussion ........................................................................... 92
Comparison of MUSIC with Semblance ............................... 93
Keys Algorithm ................................................................... 95
A Subspace Semblance Coefficient ...................................... 98
Multiple Sidelobe Canceler................................................ 101
Summary of Coherence Detection and Velocity
Estimation.......................................................................... 105
References ......................................................................... 107
Subspace-Based Seismic Velocity Analysis............................... 109

Fu Li and Hui Liu
iv
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
8
Problem Formulation ......................................................... 110

Subspace Approach ........................................................... 113
7.2.1 Structure Extraction............................................... 113
7.2.2 Estimation of Time Delays of a Seismic Wavefront 115
7.2.2.1 MUSIC................................................... 116
7.2.2.2 Minimum-Norm .................................... 116
7.2.2.3 ESPRIT ................................................... 117
7.2.3 Estimation of Velocity and Zero-Offset Time ......... 118
Improved Subspace Approach ........................................... 119
Performance Analysis ........................................................ 120
7.4.1 Perturbation of the Signal and Orthogonal
Subspaces ............................................................. 120
7.4.2 Statistical Property of the Noise Matrix ................. 121
7.4.3 Perturbation of the Time Delay .......................... 122
7.4.3.1 Extrema Searching: MUSIC and MN ...... 123
7.4.3.2 Polynomial-Rooting: MUSIC and MN .... 124
7.4.3.3 ESPRIT Algorithm................................... 127
7.4.3.4 Vector-Wise ESPRIT Algorithm .............. 128
7.4.3.5 Mean-Squared Error of Time-Delay
Estimation.............................................. 129
7.4.4 Relating Time-Delay Estimation to Parameter
Estimation ............................................................. 130
Simulations........................................................................ 132
Conclusion ........................................................................ 135
References ......................................................................... 136
Appendix A, Verification of Equations (7.60) ..................... 140
Enhanced Covariance Estimation with Application to the Velocity

Spectrum..................................................................................... 141
R. Lynn Kirlin
8.1
8.2
8.3
8.4
Spatial Smoothing.............................................................. 142

Improvements Using Cross-Covariance Submatrices.......... 144
Applications in Subarray Processing .................................. 147
8.3.1 A Computationally Efficient Transformation .......... 149
8.3.2 Adding Forward-Backward Smoothing.................. 150
8.3.3 Simulations ........................................................... 153
Spatial Smoothing Applied to Hyperbolic Wavefronts for
Velocity Estimation ............................................................ 154
8.4.1 Semblance Review ............................................... 155
8.4.2 Semblance and The Conventional Beamformer .... 156
8.4.3
8.5
8.6
9
The Optimal Velocity Estimator with Spatial

Smoothing.............................................................158
8.4.3.1 Enhancement of the Estimates of
Covariance Matrices ..............................159
8.4.3.2 The New Velocity Estimator ...................160
8.4.4 Comparison of Coherency Measure Threshold
Discrimination ......................................................161
8.4.5 Discussion.............................................................164
Toeplitz and Positive Definite Constraints for
Covariance Enhancement...................................................164
References..........................................................................167
Waveform Reconstruction and Elimination of Multiples and Other Interferences ..........................................................................169

R. Lynn Kirlin
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
Signal-Plus-Interference Subspace ......................................170

Conventional Waveform Estimators....................................172
Subspace Estimators ...........................................................172
Interference Canceling .......................................................173
Hampsons Multiple Elimination Method ...........................177
Structural Comparison of the Subspace Methods to
Hampsons Algorithm.........................................................178
Discussion on Hampsons versus Subspace ........................181
References..........................................................................183
10 Removal of Interference Patterns in Seismic Gathers .............185

William J. Done
10.1
10.2
10.3
10.4
10.5
10.6
An Interference Cancelation Approach...............................186

The Eigendecomposition Interference Canceling Algorithm 187
Suppression of Interference, Marine Acquisition Case ........192
Suppression of Repeating Refraction, Marine
Acquisition Case ................................................................196
Suppression of Repeating Refraction, Land
Acquisition Case ................................................................206
References..........................................................................225
11 Principal Component Methods for Suppressing Noise and

Detecting Subtle Reflection Character Variations ...................227
Brian N. Fuller
11.1
11.2
Introduction .......................................................................227
A Brief Mathematical Description.......................................228
vi
11.3
Sensitivity of Principal Components to Lithologic

Variations .......................................................................... 230
11.4 Noise Reduction in the Synthetic Data............................... 234
11.5 A Real Data Example ......................................................... 235
11.6 Interpretation of the Real Data ........................................... 236
11.7 Noise Suppression in the Real Data ................................... 237
11.8 Discussion ......................................................................... 239
11.9 Conclusions....................................................................... 239
11.10 References ......................................................................... 239
12 Eigenimage Processing of Seismic Sections.............................. 241
Tadeusz J. Ulrych, Mauricio D. Sacchi, and Sergio L. M. Freire
12.1
12.2
12.3
12.4
12.5
Introduction....................................................................... 241
Theory ............................................................................... 242
12.2.1 Eigenimages and the KL Transformation................ 245
12.2.2 Eigenimages and the Fourier Transform ................ 250
12.2.3 Computing the Filtered Image ............................... 251
Applications ...................................................................... 252
12.3.1 Signal to Noise Enhancement ............................... 252
12.3.2 Wavefield Decomposition .................................... 256
12.3.2.1 Event identification ................................ 257
12.3.2.2 Vertical Seismic Profiling ....................... 263
12.3.3 Residual Static Correction ..................................... 265
Discussion ......................................................................... 268
References ......................................................................... 272
13 Single-Station Triaxial Data Analysis........................................ 275

G. M. Jackson, I. M. Mason, S. A. Greenhalgh
13.1
13.2
13.3
13.4
13.5
13.6
13.7
Introduction....................................................................... 275
Time Windows in Polarization Analysis............................. 275
The Triaxial Covariance Matrix.......................................... 276
Principal Components Transforms by SVD......................... 278
Analysis of the Results of SVD ........................................... 283
Summary ........................................................................... 287
References ......................................................................... 289
14 Correlation Using Triaxial Data from Multiple Stations

in the Presence of Coherent Noise ........................................... 291
M. J. Rutty and S. A. Greenhalgh
14.1
14.2
Introduction....................................................................... 291
Single-Station Polarization Analysis ................................... 292
vii
14.3
14.4
14.5
14.6
14.7
14.8
14.2.1 The Analysis Domain ............................................293

14.2.2 Interfering Events and Coherent Noise...................294
14.2.3 The Significance of an Eigenvalue .........................296
14.2.4 Seismic Direction Finding .....................................297
Polarization Analysis Using Two Triaxial Stations...............297
14.3.1 The Binocular 6 6 Covariance Matrix ................298
14.3.2 The Multistation Vector Space...............................303
Implementation of Multicomponent Binocular
Correlation303
Synthetic Data Results ........................................................306
A Physical Model Example .................................................314
Conclusions .......................................................................316
References..........................................................................321
15 Parameterization of Narrowband Rayleigh and Love Waves

Arriving at a Triaxial Array ........................................................323
R. Lynn Kirlin, John Nabelek, and Guibiao Lin
15.1
15.2
15.3
15.4
15.5
15.6
15.7
15.8
Introduction .......................................................................323
Background........................................................................323
Estimation of the Component Powers .................................325
Results using 0.10.2 Hz Geophysical Data at a
Triaxial Array .....................................................................329
Signal Model in the Case of One Rayleigh and One
Love Wave .........................................................................330
Application of the MUSIC Algorithm to the Array Data ......337
Conclusions .......................................................................339
References..........................................................................339
viii
Covariance Analysis for

Seismic Signal Processing
Editors and Authors:
R. Lynn Kirlin
Electrical and Engineering Department
University of Victoria
Victoria, British Columbia, Canada
William J. Done
6204 S. 69th Place
Tulsa, Oklahoma 74133
Other Contributors:
Sergio L. M. Freire
Petrobras - DEXBA/DEPEX
Salvador, Bahia - Brazil
Hui Liu
Department of Electrical Engineering
Portland State University
Portland, Oregon
Brian N. Fuller
Paulsson Geophysical Services, Inc.
7035 S. Spruce Dr. E.
Englewood, Colorado
I. M. Mason
ARCO Geophysical Imaging Laboratory
Department of Engineering Science
Oxford University
Parks Road, Oxford, U. K.
S. A. Greenhalgh
School of Earth Sciences
Flinders University of South Australia
Bedford Park, Adelaide, Australia
John Nabelek
College of Oceanic and Atmospheric
Sciences
Oregon State University
Ocean Admin. Bldg. 104
Corvallis, OR 97331
G. M. Jackson
Elf Geoscience Research Centre
114A Cromwell Road
London, U. K.
M. J. Rutty
School of Earth Sciences
Flinders University of South Australia
Bedford Park, Adelaide, Australia
Fu Li
Department of Electrical Engineering
Portland State University
Portland, Oregon
Mauricio D. Sacchi
Department of Geophysics and Astronomy
University of British Columbia
Vancouver, Canada
Guibiao Lin
Tadeusz J. Ulrych
College of Oceanic and Atmospheric Sciences
Department of Geophysics and Astronomy
Oregon State University
University of British Columbia
Ocean Admin. Bldg. 104
Vancouver, Canada
Corvallis, OR 97331
Acknowledgments
The editors are indebted to the contributing authors for their efforts and
patience during the preparation of the manuscript.
Our appreciation is owed to John Claassen, Sandia National Laboratories,
and Lonnie Ludeman, Dept. of Electrical & Computer Engineering, New
Mexico State University, for their review of the manuscript.
Kurt Marfurt, University of Houston (formerly with Amoco Tulsa Technology Center) provided valuable suggestions for improvements to and figures
for the first five chapters.
Maureen Denning, Dept. of Electrical and Computer Engineering, University of Victoria, prepared the first draft of several of Lynn Kirlin's chapters.
We also thank Julie Youngblood and Vicki Wilson, University of Houston
(formerly with Amoco Tulsa Technology Center) for their efforts in producing
the manuscript from the individual contributors' documents and its many
revisions.
ix
Chapter 1
Introduction
R. Lynn Kirlin
This reference is intended to give the geophysical signal analyst sufficient
material to understand the usefulness of data covariance matrix analysis in the
processing of geophysical signals. A background of basic linear algebra,
statistics, and fundamental random signal analysis is assumed. This reference
is unique in that the data vector covariance matrix is used throughout. Rather
than dealing with only one seismic data processing problem and presenting
several methods, we will concentrate on only one fundamental
methodologyanalysis of the sample covariance matrixand we present
many seismic data problems to which the methodology applies.
This is much like, indeed very much like, writing about seismic
applications of spectral or Fourier analysis. With Fourier analysis, the data are
represented in a domain other than the original, and each independent
estimate of frequency content contains a measure of information about the
source data. With covariance analysis, information from the data has been
compressed into the elements of the covariance matrix, and the structure of
the covariance matrix, if viewed properly, contains similar independent
measures of information about the data. The major difference is that the
Fourier transform gives one-to-one mapping of the data, and is therefore
invertible. The covariance matrix is a many-to-one mapping, and, when
appropriately applied, compresses the voluminous original data into a much
smaller amount, but still sufficient to adequately estimate the desired
unknown parameters within the data.
We will demonstrate the methodology of covariance matrix analysis and
relate the covariance matrix structure to the physical parameters of interest in
a number of seismic data analysis problems. In some cases, we will be able to
1
relate covariance methods analytically to more conventional methods, and

even make quantitative comparisons. For example, the semblance coefficient
is a well known measure of coherence. It is normally displayed in the velocity
spectrum, contours of coherence as a function of two-way traveltime and
velocity estimate. We will show in Chapters 6 and 8 that with the covariance
matrix formulation of semblance we can make improvements on the signal-tonoise ratio present in the resulting velocity spectrum.
Introductory material in Chapters 2-5 presents aspects of estimating
covariance matrices, including the statistics of sample covariance estimators,
iterative estimators, robust estimators, and the incorporation of a priori
information. The data vector covariance matrix is not an artificial structure
that is used to analyze spatio-temporal data, rather it either results from the
solution to a least-squares problem or from the solution of an optimal
estimation or detection problem where data are drawn from distributions that
are of the exponential family, most commonly the Gaussian or normal.
Most applications of the sample covariance matrix will require
decomposition of the covariance matrix into its eigenstructure (eigenvectors
and eigenvalues) or factored form (Chapter 3). The eigenstructure
components have a general statistical definition, but they also may have a
physical definition when the data are either temporal, spatial, or spatiotemporal. The physical parameter of interest often is not directly observable in
the eigenstructure, but the eigenstructure may provide the best means of
observing the parameter.
Because the sample data vector covariance matrix is a random matrix, any
estimates of its eigenstructure are also random variables. Thus methods that
assume exact knowledge of the true data covariance matrix will not give exact
results. The variability of such results, often estimates of rms velocities for
example, is of considerable concern, particularly when the data record is short
in either space or time.
When the data are composed of structured signal plus noise, the
covariance matrix reveals a signal subspace and its complementary orthogonal
(noise) subspace. Chapter 4 elaborates on this important topic and also
includes material on eigenstructure and subspace statistics.
The fundamentals of eigenstructure and the linear algebra necessary to
apply it are in Chapter 4. Understanding the material in Chapter 4 is basic to
the parameter estimation and signal-enhancement applications of most of the
following chapters.
2
Chapter 5 is an overview of many of the well-known covariance-matrix

based, high-resolution estimators of sinusoidal frequencies or, alternatively,
directions of arrival of plane waves, depending on whether the sample vector
elements span time or space. The relation of these estimators to the semblance
coefficient is clarified.
Chapters 6 through 9 expand on material from Chapter 5. These
extensions involve enhancements and applications of modern spectral analysis
methods to the estimation of seismic wavefront parameters, specifically
velocity and waveform. A comparison of the covariance method to the
conventional semblance coefficient is followed by an enhanced semblance
coefficient in Chapters 6. Li and Liu address a new optimal estimation of
velocity and two-way, zero-offset traveltime in Chapter 7; this includes a
statistical performance analysis, and is exemplary for that aspect of our general
subject.
Because estimates of covariance matrices are not exact, waveforms are
often wideband, and sources for different directions of arrival are often
correlated (multipath signals), special methods have been developed to loosen
the usual narrowband and uncorrelated signal assumptions. Approaches called
spatial smoothing and frequency focusing are among others described in
Chapter 8. Significant enhancements to the velocity spectrum are
demonstrated.
Enumerating waves and estimating their directions of arrival is a first step
in processing data with multiple signal wavefronts. A second process is the
actual estimation of the waveforms. Chapter 9 discusses several analytic
approaches. Depending on the signal models and approaches, we may
reconstruct either whole velocity bands of data (Hampson-Thorson and
Radon transform methods) or distinct signals (eigenstructure approaches).
In a sense, Chapter 9 is a preliminary to the wavefront enhancements
presented in Chapters 10, 11, and 12 by Done, Fuller, and Ulrych, et al.,
respectively. These chapters are all variations on the theme of enhancing 2-D
signals or features of interest from data that obscure these features because of
interference and noise. In Chapter 10, Done shows how construction of a
Karhunen-Loeve code, derived from the covariance analysis of interference
signal patterns in one region of 2-D data may be used in another region to
remove data correlated to that interference. Fullers work in Chapter 11 shows
how subtle variations in thin-bed geology can be detected and displayed with
eigenstructure methods. Results there are quite dramatic. In Chapter 12,
3
Ulrych, et al., provide several demonstrations of the usefulness of singularvalue decomposition for enhancing seismic components through the use of
eigenimages or orthogonal images to make up the raw 2-D seismic data
record. His work is accompanied by thorough theoretical analyses.
Three chapters demonstrate the application of covariance subspace
analysis to three-component data (triaxial geophones). In Chapter 13,
Jackson, et al., analyze three-component data at a single station, and, in
Chapter 14, Rutty and Greenhalgh extend the work to multiple stations.
From the covariance matrix eigenstructure, they produce signal-space
enhanced waveforms and test statistically for rectilinearity. Rayleigh and Love
waves in the 0.10.2 Hz range coincidently arriving at triaxial arrays are
analyzed by Kirlin et al. in Chapter 15. This work separates the two waves by
estimating the joint covariance matrix of their components. Recent work from
other authors regarding the number of waves and parameters that can be
separated and estimated using vector sensor arrays is also included in
Chapter 15.
Thus covariance analysis of seismic data is seen to be of current interest to
many researchers and a method amenable to many distinct applications. We
are not attempting to provide an encyclopedia of these applications nor of the
theory and the literature that has developed to date. Instead, we wish to
provide a diverse sampling and a discussion of that work from a common
viewpoint.
Chapter 2
Data Vectors and Covariance Matrices
R. Lynn Kirlin
Seismic signals are sensed by geophones in land acquisition or by hydrophones in marine acquisition. Typically, the signals are excited in the earth
with some sort of energy source such as an explosion, vibrator, or air or water
gun. These signals travel through the subsurface structures and are reflected
from boundaries having distinct physical properties. Eventually they produce
multiple reflections observed at each recording phone. Often the geological
structure is not simple, and although simple models have sufficed for many
regions of exploration, the recorded reflections are interpreted well only if the
geophysicist has much experience and is familiar with other sources of information.
For much of the methodology, geological structure is assumed to be reasonably simple, such as horizontally layered strata. However, such simple
structures are not always required. Often it is only necessary that there be
knowledge that some specific temporal or spatial structure (coherence) is
present in the array of received signals in order to obtain some processing
advantage.
Now, here is some of the terminology inherent in the methods we will discuss. Analysis regions, data windows, data vectors, and covariance matrices are
shown in Figure 2.1 and are described in the following sections.
Figure 2.1.
2.1
An analysis region spanning many traces and time samples; a moving analysis window within which spatio-temporal adaptive processing is done;
and sample vector windows within the moving window are indicated.
Analysis Regions
A region of 2-D data from which information is to be obtained is called an

analysis region, as shown in Figure 2.1. This region is usually rectangular, e.g.,
so many time points by so many traces. However, it may have another shape if
warranted by some structure in the data. Referring to the source data shown in
Figure 2.2a, nonrectangular analysis regions have been extracted and enlarged
in Figures 2.2b2.2e. These show, respectively, reflected energy, outgoing surface- wave noise, backscattering noise, and pump jack noise.
b)
a)
c)
d)
Figure 2.2.
2.2
e)
(a) Two NMO-corrected seismic gathers with windows that characterize

(b) reflected energy, (c) outgoing surface wave noise, (d) backscattering
noise, and (e) pump jack noise.
Data Windows
Within the analysis region are many data points, and around each point
the data may have features or parameters that are considered to have local stationarity. A window of data around this point may be analyzed separately to
provide the desired local parameter estimate or localized information. Such a
window may be said to be a running window or a sliding window. The running analysis window may be positioned at every point in the analysis region
or it may be positioned only at selected points. The alternative windows range
from maximally overlapping, where every point is a window center; to partially overlapping, whose points in time/space are spaced somewhat closer
than the window breadth or length; to nonoverlapping, where consecutive
windows in either direction touch but do not overlap. Nontouching windows
are also possible, but these omit some data from the analysis.
When choosing window size and spacing, it must be realized that smaller
windows will allow more spatial or temporal variability in the output which is
the result of processing within the window, but fewer samples within the window will be available to estimate any parameters of interest. That is, small windows allow higher spatial or temporal frequency in the resulting parameter
estimates, but provide less statistical stability, fewer degrees of freedom, to
those estimators.
2.3
Data Vectors
Within each data window, divide the data into vectors as shown in
Figures 2.1. The elements within these vectors are data points taken from
vector windows, which are subwindows of the running window; these vector
windows may have any shape within the constraints of the running window
size. Commonly, the vector is from a vector window 1 M in size, covering
either M time points from the same trace, down the trace, allowing only temporal analysis; M points taken from M traces at the same time (time slice,
snapshot, or across traces''), allowing only spatial analysis; or along and parallel to a prescribed space-time curve, allowing constrained space-time analysis.
Other vector windows are possible, such as every kth point, which would allow
an M-length vector to span Mk points (subsampling). The vector window may
also be two dimensional, such as 2 M, resulting in a length 2M vector.
In any case, the vector window is moved over all possible positions within
the data window, gathering a total of L sample vectors of data. Maximally
overlapped vector windows are usually taken. The assignment of data points
from within the vector elements is arbitrary as long as it is known, but it is
usually a logical ordering such as from lesser time to greater time and from
lesser offset to greater offset and must be consistent from vector to vector. For
example, a 2 5 vector window surrounding points of data x(i,j), i 10 to
11 and j -2 to 2 where i is time index and j is trace index, similar to that
shown in Figure 2.1, would become the 2M 10 by 1 vector:
x ( x ( 10, 2 ), x ( 10, 1 ), ..., x ( 10, 2 ) , x ( 11, 2 ) , ..., x ( 11, 2 ) ) T

( x 1 x 2 x 3 ...x 10 ) T ,
where []T indicates transpose. Subsequently, we will use []H to indicate complex conjugate transpose. In Figure 2.1 the 2 5 vector window creates the
10 1 vector x.
2.4
Sample Data Covariance Matrix
Within the window and if mean x 0, the averaged sum of vector outer
products xi xiT, i 1, 2, ..., L gives the sample covariance matrix. It is the sample covariance matrix that we will be analyzing in most of the remainder of the
book. However, the vectors will not always have come from a time-space window as discussed above. In such a case the distinction will be obvious.
In the foregoing, the vectors have been taken from 2-D time-trace data
sets. Data vectors can come from anywhere. Another common source of seismic data vectors is two- or three-component geophones (see Chapters 1315).
In this situation, a vector x might contain just three elements that are the
three-component data samples at the one point in time-space only, as if there
were a data vector window of size one by one in time/space. However, the vector may be extended to contain 3n elementsthe samples from n geophones.
In any case, a collection of L such vectors from within a data window
(there may be just one analysis window that spans the entire analysis region)
may be averaged in the outer product to give the sample covariance matrix Cx:
L
1
C x -- xi xiH.
Li1
(2.1)
Again we have assumed that the data vectors are zero mean. When the vector mean is not zero, the mean first must be subtracted from x before forming
the outer products. Because, in practice, seismic data are zero mean, we generally have no need to estimate or remove any mean. However, some recording
systems, such as those for well logging, occasionally have trouble with dc bias.
Some processing systems include a debiasing routine.
In some applications, the vector elements are discrete Fourier transform

values, one from each trace, all at the same frequency. L multiple frequencies
or L mutually exclusive spatial regions might supply the needed sample vectors.
2.5
Rationale for Sample Covariance Analysis
All the methodology in the remainder of this volume is based on the sample covariance matrix in equation (2.1). The sample covariance matrix arises
from many areas of science, engineering, and statistics: it is needed in multivariate data analysis, pattern recognition, least-squares problems, hypothesis
testing, parameter estimation, etc.
For example, if we draw one such length M vector x from an N(m, R) distribution (Gaussian multivariate vectors with mean m and covariance R), and
its a priori probability density function is (Eaton, 1983)
1
1
f ( x ) -----------M
------2 R 2 exp { ( x m ) T R 1 ( x m )/2 } ,
( 2 )
(2.2)
then the expected value of x is m, and the covariance of the vector x is R. If we

dont know the covariance matrix, it may be estimated with the sample covariance. The elements of the sample covariance are random variables distributed
with a Wishart distribution (see Section 2.6).
The true covariance matrix is the expected value E{xxT}, where we assume
the known mean is removed from the x. Similarly, the sample covariance
. That is Cx R
, where genmatrix, or any other estimate of R, is denoted R
erally we denote approximation by ( ). When the mean is not known it is
estimated by the sample mean and then removed. I explicitly formulate Cx in
the following section.
Suppose we have L independent samples of x, indexed with i, each with
density f (xi) N(m, R). The joint density of these samples is their product:
1
L 2
R
f ( x ) -----------L--M
-----2
( 2 )
L
exp -- x i m T R 1 x i m .
2i 1
10
(2.3)
where x (x1, x2, ..., xL) . It may be shown that the maximum likelihood estimate of m is
1
m
-L
xi ,
i1
1
the sample mean. Further, m
is distributed N(m, L1R) and --R is the covariL
ance of the errors in estimating m.
When m is known, the maximum likelihood estimate of R is Cx, the sample covariance matrix. This estimate of R and the above estimate of m are also
appropriate when the vectors x have complex Gaussian elements and we
define for zero mean x
R E { xx H } ,
where ()H indicates complex conjugate transpose.
2.6
Statistics of the Sample Covariance Matrix
The density of a single sample zero-mean complex Gaussian vector x is

1
f ( x ) ----M
-------- exp { x H R 1 x } .
R
(2.4)
Thus L independent samples of the complex vector x have joint density

L
1
H 1
f ( X ) ----LM
------------L exp x i R x i .
R
i1
(2.5)
If Cx is a sample complex covariance matrix, then LCx has its distinct real
and imaginary elements distributed with a complex Wishart density (Eaton,
1983). For
L
A LC x
xi xiH
i1
this density is
11
( Re { A pq } jIm { A pq } )
1
LM
1
f ( A ) --------------------- A
exp { Tr ( R A ) } ,
h ( R, L, M )
(2.6)
where
h ( R, L, M ) M ( M 1 ) /2 ( L ) ( L M 1 ) R L ,
and Tr[] indicates trace, the sum of diagonal elements.
When x is zero-mean with real elements, the distinct M(M 1)/2 elements of S LCx jointly have the Wishart probability density (Eaton, 1983;
Goodman, 1963; Anderson, 1958):
1
1
1 L 2
1
f ( S ) ----------------- R S
exp - Tr [ R S ] ,
W ( L, M )
2
(2.7)
where
LM1
W ( L, M ) 2 LM 2 M ( M 1 )/4 ( L/2 ) ---------------------- ,
2
The elements Sik and Ski are equal and therefore not distinct.
When the xi have nonzero mean , then xi in equation (2.5) is replaced
with x i
, and xi in Cx of equation (2.1) is replaced with x i
, where
1

-L
xi .
i1
For this case, where we have had to estimate an unknown mean, the Wishart
densities are rewritten similar to equations (2.6) and (2.7), except that L is
replaced by L 1, the degrees of freedom of LCx.
An iterative estimator of R, given Cx, and the constraint that R must be
Toeplitz, i.e., R RH, and all elements of any one diagonal or off-diagonal
are equal, is given by Burg et al., (1982), who show that the real-L-vector normal density of equation (2.3), the R matrix that maximizes its likelihood
(ML), also maximizes the function:
g ( C x ,R ) log R Tr ( R 1 C x ).
12
(2.8)
Defining the variation of R to be
R ( R ij ) i,j 1, 2, ..., M
shows that the variation of g(Cx, R) for Toeplitz-constrained (or any) variation
in R must satisfy
g ( C x ,R ) g ( C x ,R )Tr [ ( R 1 C x R 1 ) R ] 0 .
(2.9)
(Matrix A is Toeplitz if Ai,j Aim,jm.) Without any constraints, it is easy to

see that the maximum likelihood (ML) solution is R = Cx. Several other simple cases are given in (Burg et al., 1982).
The algorithm employing what is termed inverse iteration is simple to
describe:
Step 1) Find a variation Dk (Toeplitz) satisfying g(Cx Dk, Rk) 0
Step 2) Set Rk 1 Rk Dk
Further details on implementing the algorithm are not simple; those
details and some examples may be found in Burg et al. (1982). This is a powerful algorithm, however, since ML solutions are valuable; ML estimates of
parameters that are functions of R can be found from the ML estimate of R.
When the xi are not Gaussian, the sample covariance matrix still plays an
important role. If we choose elements in a coefficient matrix A in a linear
model xi Ay ei for the data xi in such a way as to minimize the sum
square of the errors, then the trace of the error covariance matrix
(Ay xi)(Ay xi)T

i
is to be minimized. Similarly, the modes of variance of the data xi can be

found from Cx; this is discussed in Chapter 3, where the eigenvectors of Cx are
seen to define the modes of variation in the data.
2.7
Robust Estimation of Sample Covariance Matrices
Often the data vector x contains not only signal and Gaussian noise, but
also wild points. The wild points in seismic data arise from a number of
sources including dead geophones or hydrophones, noisy phones, poor phone
placement, local noise sources that cause one trace to include significantly dif-
13
ferent data from its neighbors, faults in other acquisition hardware, and temporally transient noise sources such as lightning-induced, earth tremors, etc.
Even large amplitude noise events can be wild point in nature, especially
when the data are sorted into a different order, e.g., CMP. Also, in marine data
interference occurs when other seismic vessels are shooting nearby.
When the set of sample vectors is large, one or two wild points in time or
space may not result in a serious difference between Cx and R, but a dead or
noisy trace is certainly going to result in a significant error in one row and column of Cx. Detectors of such errors and robust estimators are useful or necessary in such situations. The effects of such errors depend greatly on the
application of the covariance analysis.
We have already mentioned maximum likelihood (ML) estimation of
structured covariance matrices. That method assumes normal data, which is
true of the algorithms to be presented in the rest of this text as well, including
those of Chapter 8, where methods of enhancing noisy covariance matrices are
presented.
The problems of dead, missing, and noisy traces have already been dealt
with by the industry, resulting in various interpolation or editing schemes.
However, it is worthwhile to note some literature that particularly addresses
the covariance estimation problem.
Robust methods of parameter estimation in general were dealt with in the
fundamental work of Huber (1964). The results of that work are summarized
along with that of several others by Andrews et al., (1971). Robust estimators
of scalar covariances are presented by Mosteller and Tukey (1977). The fundamental ideas in these references have to do with trimming extreme points
adaptively. Quite often the median is used as the location estimator, and
median absolute deviation (mad) from the median is used as a spread estimator. Knowledge of these two adaptively computed measures allows wild
points to be defined as those in excess of k mad's where k is selected ad hoc,
often 5 to 9. Other methods of nonlinearly weighting data are proposed
(Andrews et al., 1971; Mosteller and Tukey, 1977, Dewin et al., 1981). These
often lead to iterative procedures, because after a location and a spread parameter are computed and wild points trimmed, the remaining data can be reexamined for location and spread, etc.
Many such robust methods have been proposed for estimating covariance
methods. Nine of these were tested under various types of noise by Dewin,
et al. (1981). Of the nine methods, the raw covariance Cx is best only with
14
uncontaminated normal data. An adaptive multivariate trimming (MVT) and

two robust weight assignment methods (all three are iterative procedures) are
best choices for a variety of situations. It is noted that methods that tend to
fit the whole covariance matrix simultaneously are better than methods that fit
elements independently. Considering the typical a priori information that
gives structure to covariance matrices from seismic data, we expect wholematrix fitting methods to be best, because matrix elements are not independent.
More recent studies of eigenstructure variability are available, both with
and without wild point contamination. An empirical presentation on eigenstructure under contaminated Gaussian noise can be found in Moghaddamjoo
(1988).
Several measures of closeness of fit to either the covariance matrix or its
eigenstructure have been proposed. For example, the unweighted sum of
squared errors of all covariance elements is the Frobenius norm. An error
between a true and an estimated eigenvector might be measured with the
Euclidean norm or with the separating angle.
In the end, the error that counts is the error in estimating the desired
parameter, for example rms velocity or the probability of resolution. However,
because the covariance matrix estimate occurs first, its accurate estimation is of
prime importance. Chapter 8 will deal with this explicitly.
An iterative robust covariance matrix estimator ending in a positive definite matrix is presented by Campbell (1980), and because it is typical we
repeat it here, although some that are mentioned in Moghaddamjoo (1988)
also give positive definiteness.
The robust estimator of the mean is
L
x
wi xi / wi ,
i1
(2.10)
i1
and the robust estimator of R is

R
i1
w i2 ( x i
x ) ( x i
x ) T
15
/ w i2 1 ,
i 1
(2.11)
where
wi w ( di ) ( di ) di ,
(2.12)
1 ( x x ) ) 1 / 2 ,
d i ( ( x i x ) T R
i
and
d, d d o
(d)
1
d o exp -- ( d d o ) 2 /b 22 , d d o
2
do
v b1 / 2
The constant v is the degrees of freedom (dof ) of d (assumed X v2 ) and b1 and

b2 are chosen as
1) b1 , b2 irrelevant; conventional estimation
2) b1 2, b2 ; nondescending Huber form (Campbell, 1980); and
3) b1 2, b2 1.25; redescending Hampel form (Campbell, 1980).
are iterative, starting perhaps with the sample
The computations of x and R
mean or median for x .
Because w i2 1 , the degrees of freedom are
wi2 1 .
Expression (2.12) simply weights the ith vectors outer product with unity
for small deviations from the mean, but with less than unity for greater deviations. Note that if R is diagonal and has no zeros on the diagonal, d is an X2
random variable with M-1 degrees of freedom; M is the vector length and one
dof is removed for estimating the mean vector. This dof holds for general
(nondiagonal) R as well.
Campbell (1980) also proposes a robust principal component analysis.
The eigenstructure can either be calculated from the robust covariance matrix
of the above procedure, or the weights can be determined through the means
16
and variances of the principal components of the xi. This will be detailed in
Section 3.7 of the next chapter.
2.8
References
Anderson, T. W., 1958, An introduction to multivariate statistical analysis:

John Wiley & Sons, Inc.
Andrews, D. F., Bickel, P.J., Hampel, F.R., Rogers, W.H., Tukey, J.W., 1971,
Robust estimates of location: Princeton Univ. Press.
Burg, J. P., Luenberger, D. G., and Wanger, D. L., 1982, Estimation of structured covariance matrices: Proceedings of the IEEE, 70, 963-974.
Campbell, N. A., 1980, Robust procedures in multivariate analysis, I: Robust
covariance estimation: Appl. Stat., 29, 231-237.
Dewin, S. J., Gnanadesikan, R., and Kittering, J. R., 1981, Robust estimation
of dispersion matrices and principle components: J. Amer. Stat. Assoc.,
75, 354-362.
Eaton, M. L., 1983, Multivariate statistics: John Wiley & Sons, Inc.
Goodman, N. R., 1963, Statistical analysis based on a centra in multivariate
complex gaussian distribution (An introduction): Ann. Math. and Stat.
34, 152-177.
Huber, P. J., 1964, Robust estimation of a location parameter: Ann. Math.
Stat., 35, 73-101.
Moghaddamjoo, A., 1988, Eigenstructure variability of the multiple-source,
multiple-sensor covariance matrix with contaminated gaussian data:
IEEE Trans., Acoust., Speech, and Sig. Proc., 153-167.
Mosteller, F., and Tukey, J. W., 1977, Data analysis and regression: AddisonWesley Publ. Co.
17
This page has been intentionally left blank
Chapter 3
Eigenstructure, the Karhunen Loeve
Transform, and Singular-Value Decomposition
R. Lynn Kirlin
An M M covariance matrix R exhibits many special properties. For
example, it is complex Hermitian, equal to its conjugate transpose, RH R; it
is positive semidefinite, xHRx 0. Because of the latter, its eigenvalues are
greater than or equal to zero as well. In many cases, it is also Toeplitz Ri,j
Ri m,j m, that is, diagonal elements are equal. In this chapter, I will review
some of the more important properties of covariance matrices and their eigenstructure, and discuss some simple applications.
3.1
Eigenstructure and Least-Squares Fit of a Random Vector
To review eigenstructure and simultaneously demonstrate one of its uses,

consider that we have samples from a distribution of zero-mean real vectors xk,
k 0, 1, ..., L 1, and that each vector is M by 1(M 1) elements. Suppose
we wish to find one vector v of unit length such that the projection onto v of
any of the vectors x chosen at random will be closest to a scalar multiple of v
in the mean-squared-error sense. That is, we need to find v such that
2
E { x x } is minimized, where x v , and vTx, while constraining
vTv = 1. The solution of this constrained minimization shows that v is the
eigenvector associated with the largest eigenvalue of Rx E {xxH}, approxi1
mated by Cx --xx H , the sample covariance of the xk as in equation (2.1).
L
That is, for
R x v v ,
19
(3.1)
then for the largest value and associated v satisfying equation (3.1)
x ( v T x )v
(3.2)
gives minimum E { x x } . Note that the scale factor on v is vTx. When the
L xk are drawn from an infinite set, the sample covariance Cx replaces Rx.
There are M eigenvalues and M associated eigenvectors that satisfy
equation (3.1). Throughout the rest of this text we will assume that the eigenvectors are ordered such that
1 2 ... M .
(3.3)
When it is necessary, we will distinguish the eigenstructure of Cx from that

of R with the usual ( ) notation, because the eigenstructure of Cx only
approximates that of R (See Section 4.2.2 for statistics of the eigenstructure
estimates.). Estimation of R is explored more thoroughly in Chapter 8. The
rank of the sample covariance matrix Cx is the same as the number of independent vectors x from which it was created, up to a maximum of M. The rank of
a covariance matrix R is the same as the number of eigenvalues greater than
zero.
Most scientific software packages contain algorithms for finding the eigenstructure of matrices. However, not all do, and a smaller number will find the
eigenstructure of complex covariance matrices. Fewer still, if any, allow the
user to find only the largest or smallest m eigenvalues and associated eigenvectors without calculating all. This capability is a significant computational
advantage, particularly when the rank of the signal-derived part of R is small
compared to its size. Often only the single largest or smallest and its eigenvector are of interest.
3.2
The Eigenstructure Forms of the Covariance Matrix
The covariance matrix or the sample covariance matrix may be expanded

into its eigenstructure forms. These forms are very useful both to the understanding and to the implementation of a number of covariance applications.
The M eigenvalues i are first ordered, as in equation (3.3) above, and
associated with their corresponding eigenvectors vi. It may be shown then that
the covariance matrix Cx can be written
20
Cx
i vi vi
VV ,
(3.4)
i1
where V is the matrix of eigenvectors, V (v1v2...vM), and is a diagonal

matrix of eigenvalues, i.e., diag (1, 2, ..., M).
3.3
Singular-Value Decomposition and the Karhunen Loeve

Transform
We have seen that the sample covariance matrix is factorable into its eigenstructure form; that is,
H
C x VV ,
(3.5)
where the columns vi of V and the elements i of the diagonal of are,

respectively, the eigenvectors and the eigenvalues of the equation
Cx vi i vi .
(3.6)
Assuming zero mean, Cx is formed by averaging the outer products x i x i , or,

if X (x1, x2, ..., xL),
1
H
C x --XX .
L
(3.7)
Whereas eigenstructure factors only square matrices, the singular-value

decomposition (SVD) allows the factorization of any rectangular matrix into
orthogonal components. The ability to factor rectangular matrices provides
insight into the solution of equation sets and allows rectangular 2-D data to be
separated into major and minor energy portions (low rank approximation).
The following is close to that in Scharf (1991), where the relationship between
SVD and related eigenstructures are established.
Let the above X matrix be M p, so that the vectors xi have length M as
before. Consider that p M, then Cx will have rank r no greater than p, and
M r eigenvalues of Cx will be zero. Now let G XHX, a positive definite
p p matrix. Because G is square, it can be factored into its eigenstructure:
21
2
0
G ( V1 V2 ) 1
2
0 2
H
V1 ,
H
V2
(3.8)
where 2 is the diagonal matrix of the p r zero-valued eigenvalues of G,

2
2
2
that is, 2 0 and 1 diag ( 1 , 2 , ..., r ) . V1 then is p r and V2 is
H
2
p (p r). Equation (3.8) will easily verify that V 1 GV 1 1 and
2
H
V 2 GV 2 2 , or
H
2
V1 XH [ X ( V V ) ] 1 0 .
1 2
H
2
V2
0 2
(3.9)
Equation (3.9) forces X(V1V2) to be expressible as

0
X ( V1 V2 ) ( U1 U2 ) 1
,
0 2
(3.10)
where
U 1 ( U 1 U 2 ) I 0 .
0 I
U 2H
1 0
Let V ( V 1 V 2 ), U ( U 1 U 2 ) and 0 2 ; then by post multiplying equation (3.10) by VH, X is found to be
H
0 VH
X ( U 1 U 2 ) 1 1H U V H
0 2 V 2
U 1 1 V 1H U 2 2 V 2H
r
i1
(3.11a)
(3.11b)
i u i v iH
i r1
22
i u i v iH ,
(3.11c)
where the second sum is zero because i 0, i r 1, ..., M. Note that X is

M p,U is M M, V is p p. U1 and V1 both have r columns.
Equations (3.11a)(3.11c) are the SVD of X, and the vectors U are the singular vectors, i the singular values.
The vectors in U and V may be found from the eigenstructures of XXH
and XHX, respectively. That is,
r
XX H
U 1 12 U 1H
ui i2 uiH
(3.12)
i1
and
r
XHX
V 1 12 V 1H
vj j2 vjH .
(3.13)
j1
The equations U 2 12 U 2H 0 and V 2 22 V 2H 0 could be added to

equations (3.12)(3.13).
An alternate solution for the singular vectors U1 is to first solve for the r
eigenvectors V1, then from equation (3.11b),
U 1 XV 1 11 .
3.3.1
(3.14)
The Karhunen Loeve Transform
One use of SVD is that it allows any of the columns (xi) in X to be written
as a linear combination of the singular vectors uk of U. Thus,
p
xi
k uk
k1
( Uk UHk )xi
k1
UU x i ,
(3.15)
k uk xi
The transformation T UH on any xi constitutes the Karhunen Loeve
Transform (KLT), and the vector UHxi contains the principal components of xi.
For random vectors, U is found from E{xxH} UUH.
23
Similarly any linear combination (LC) of the xi is an LC of the uk. If only

the first r p singular vectors are used, this is a low-rank approximation of
xi. More on this will be disclosed in Section 3.6 and in several subsequent
chapters.
If the data matrix X has rank p and only the first r singular vectors are
used in its approximation, then
r
Xr
i ui viH
U r r V rH
(3.16)
i1
where Ur, r and Vr are composed of the appropriate parts of U, , and V.

The error matrix is r X Xr, and the Frobenius norm of the error matrix
(sum of the square of all its element) is
p
r2
T r [ rH r ]
i2 .
(3.17)
i r1
If x is a random vector and Ur is the set of r eigenvectors associated with

2
H
the largest eigenvalues i of E{xxH}, then x U r U r is the minimum
mean-squared error rank approximation, and the error is identical to
equation (3.17).
The low rank approximation Xr is a least-squares approximation to the
data matrix X. But Xr may not give the least-square-error rank r approximation to the signal part of any one noisy vector xi, as will be shown in
Section 3.6.
3.3.2
Null Space and the Minimum Norm Solution
Clearly for p M there are vectors y of length M which are not LC of the
p independent vectors xi in X. Such vectors lie in the null space of X; they
are not LC of either the p vectors xi or of the p vectors in U1, the submatrix of
U (U1U2) associated with the p nonzero singular values of U. Rather they
are LC of the M p vectors in U2.
A general vector y of length M has components both in the null space of X
and in the range of X, the range being defined as all vectors that are LC of
24
the p independent columns in X. Let the two components of y be denoted yx

and y, respectively, in the range and null space:
y yx y .
(3.18)
It can be shown easily that

T
yx [ X ( X X )
T
y [ I X ( X X )
1
1
X ]y P # y ,
X ]y ( I P # )y ,
P # U 1 U 1H X ( X X )
1
X ,
(3.19)
(3.20)
(3.21)
and
I P # U 2 U 2T ,
(3.22)
where Ui, i 1,2, are as in equation (3.11a).

The factor (XTX)-1XT X# is the pseudoinverse of X, since X#X I. Further,
T
X# ( X X )
1
1 0 T
T
U .
X V 1
0 0
(3.23)
These are convenient notations for determining the least-squares fit of columns of X to a general vector y. That is, what p coefficients (1, 2, ...,
p )T give a best least-squares fit y x X to y?
The result is
X#y ,
(3.24)
y XX # y P # y ,
(3.25)
giving
25
Note too that the error y y ( I P # )y is orthogonal to the range

space.
Obviously, a lower-rank best fit can be obtained by reducing the rank of
U1 in P# [equation (3.21)].
In effect, the result y above has solved for the unknown in the overdetermined equations:
X y ,
(3.26)
where X is M p, is p 1, y are M noisy measurements, and M p. If

p M, is known as the minimum-norm solution, since there are many
solutions , but only one that minimizes T. Thus the solution is found
with the constraints that T is minimized. Other constraints may be
applied.
3.4
A Seismic Example
Suppose a region of data is ideally flattened to a prescribed velocity corresponding to the only reflection present, i.e., the exact delays have been
removed from each trace. Then, ignoring wavelet stretch, each trace xi is now
identically a vector s except for additive noise and interference. That is, suppose
x i s n i , i 1, 2, , p,
(3.27)
where s is a constant vector, and ni is a zero-mean, spatially and temporally

white random vector, independent from element to element and from trace to
trace. We will show that the first singular vector of X approaches s as p
increases.
We form X and find the singular vectors of XXH UUH. However,
XXH is composed of signal s and noise N (n1, n2, ..., np) parts giving
XX H ( S N ) ( S N ) H
SS H 2 Re { SN H } NN H ,
26
(3.28)
where S (s, s, ..., s), M p, and N (n1, n2, ..., np), M p. Note S
s(1 1... 1) so that SSH pssH, and this is clearly a rank one matrix. Now the
statistical mean of the cross terms 2 Re{SNH} is zero, while SSH pssH and
E { NN H } p n2 I , where n2 is the variance of noise on each trace. Thus, as
p increases,
XX H p ( ss H n2 I ) pR.
(3.29)
It is easy to see that

Rs ss H s n2 s
( E s n2 )s,
(3.30)
so that s is an eigenvector of R (this confirms our earlier statement that v1 is

the least-squares fit to the set of xi) and E s n2 is the associated, and largest
eigenvalue, where Es is the energy in the signal trace. All other eigenvalues
equal n2 . This may be seen by noting that any eigenvector vi other than s
must be orthogonal to s; such vi satisfies Rv n2 v . Thus
1 E s n2 , 2 3 M n2 .
It is important to know how close the first eigenvector of XXH is to s. This

question is answered in Chapter 4.
This seismic example typifies a situation for which the major eigenvector
or singular vector is equal (or proportional) to a signal trace. In general, a single signal trace is not repeated without delay at all offsets. When there are two
reflections present, they both cannot be flattened simultaneously, because
their moveouts or rms velocities are different. When multiple reflections are
present in an analysis region, each gives rise to one or more major eigenvectors associated with the eigenvalues larger than n2 . However, the major eigenvectors are now all LCs of the distinct, independent traces.
3.5
A Second Example
In the second example, vectors are taken across traces, and we assume
there are M traces of length p so that the vectors are of length M. As before, we
assume that the traces have been flattened to some true event, so that each vec-
27
tor is composed of a scaled constant plus a spatially and temporally white

noise vector:
x i s ( t i ) 1 n i , i 1, 2, , p ,
where 1 (1 1... 1)T of length M.
With all xi as columns of X, we find the eigenstructure of XXH, an
M M matrix. Now write
T
XX H ( 1 1 1 ) Ds Ds H ( 1 1 ) NN H
cross terms
(3.31)
where Ds diag (s(t1), s(t2), ..., s(tp)), possibly complex valued, and the ith column of N is (n1(ti), n2(ti)..., nM(ti)T, for i 1, 2, ..., p. By similar arguments
as in Section 3.4, we see that with large p,p1XXH(M M) approaches
T
( E s 1 1 n2 I ) M R and the eigenvalues of R are as before, except that
there are only M of them, i.e., 1 E s n2 , i n2 , i 2 , , M .
However, the major eigenvector v1 1 (1, 1, ..., 1)T of length M, whereas
with the choice of vectors xi ith trace, as in Section 3.4, v1 s.
Now note that SVD would have found both eigenstructures. Define X as
in Section 3.4, but find the SVD
X UV H ,
(3.32)
where V are the eigenvectors of XHX(p p), and U are the eigenvectors of
XXH(M M). Then U 1 s, v 1 1 , and 11 ( E s n2 ) 1 / 2 ; these
are the singular vector u1 of X for the first example, the eigenvector v1 of XHX
for the current example, and the first singular value 1 of the SVD of X. The
eigenvalues of either XXH or XHX are i E s n2 , n2 , , n2 , to either
p or M values, respectively.
These two examples are basic to many of the algorithms presented elsewhere in this book. In general and with no noise, the singular vectors are LCs
of the signals down traces (first example), and the eigenvectors are LCs of
wavefront vectors across traces (second example). In the narrowband case,
the wavefront vectors equate to delay vectors whose elements are complex
phasor rotations. More will be said on this in Chapter 4.
28
3.6
Bias-Variance Tradeoff in a Seismic Profile
Often the noise-free portion of XHX or XXH has eigenvalues that are all
nonzero. In these situations, and when noise is present, it is still useful to use a
rank-reduced version of either X or R. So even though we are deleting some
signal energy by not including all of its components, we are excluding more
noise with each singular vector dimension or eigenvector dimension that is
not used. In effect, more bias in an estimate is being allowed in exchange for
reducing variance.
Many times this exchange can be made interactively. Sometimes the
approximate signal dimensionality is known. In many cases, it is possible to
either know or estimate what the quantitative trade is statistically. The following presentation is based on Scharf (1991, Chapter 9).
Suppose that the data matrix X, M traces of length LT is the sum of a signal matrix S plus an independent white zero-mean noise matrix N. Then the
SVD representation is

V
H,
X U
(3.33)
where we are indicating estimates of ULVH S. We have seen in the previous

sections that a flattened event, if it is the only event, will cause u1 s and
v 1H ( 1 1 1 ) . Generally there are other events plus noise. Now think of
ideally stacked but noisy traces migrated to compose a profile, so that a region
of data contains traces whose signal components are LCs of p independent signal vectors s1, s2, ..., sp:
S ( s 1 s 2 . . .s p ) S p ,
(3.34)
where is an unknown p M coefficient matrix. Thus, the data appear to be

a superposition of plane wavefronts. Figure 10.3 (Chapter 10) is a good example of this kind of data. As a practical example, one subset of noise-free traces
may equal s1, while the neighboring subset may correspond to the addition of
a single reflection denoted s2, so that the second set has noise-free traces equal
to s1 s2. Thus we let
X SN
29
(3.35)
giving
XX H SS H NN H cross terms .
(3.36)
Now the rank of SSH is p, but XXH may have singular values that do not
clearly indicate this fact because of the influence of noise terms in XXH and
similarities of signal traces. We now try to estimate S with a reduced rank X.
That is, we want to use
r
S r
u i u iH X
rU
rH X
U
i1
(3.37)
as a rank r p estimate of S. If we let r M, X is reproduced exactly, summing S with N. Otherwise, there is a bias in S r , an estimate of which is
p
b r S p S r
i r1
u i u iH X
(3.38)
where we recall p is the number of unique traces.

There are two major distinctions between this estimator and that presented by Scharf. Scharf s presentation assumes that the set of p vectors s1
through sp are known and that they are to be used in producing a least-squares
fit to another single vector xi. Here all columns of X must be fit, and we do
not know exactly the LT p matrix U p p V pH ( s 1 s 2 s p ) which composes the basis of S, nor do we even know p. Equation (3.37) indicates that all
columns of X are being fit simultaneously with the same rank r and the same
singular vectors u1, u2, ..., ur. Following Scharf, we discern that this is not
likely to yield a minimum mean-squared-error rank r fit to each trace in the
data X. Instead, for each xi, a unique optimal ordering u(1), u(2), ..., u(r)
(i)
U r exists when we know Up exactly, which should be no surprise. Suppose
that all traces in S are either s1 or s2, and that s1 and s2 somehow were orthogonal giving u1 s1, u2 s2. Then even though 12, if we were to use a
rank 1 approximation, it would be best to use either u1 or u2, whichever gives
the best fit to xi.
The preceding argument is justified in Scharf (1991). For each trace xi, the
singular vectors u ( k ) should be ordered:
30
u 1H x i 2 u (H2 ) x i 2 ... u ( p ) x i ,
(3.39)
where we assume the u k are good approximations of the uk. This assumption
and another that all noise is Gaussian leads to an optimum r for each xi
(Scharf, 1991). The optimum r* for xi is r such that the estimated mse is minimized,
mse b rH b r ( 2r p ) n2 ,
(3.40)
b r ( u p u pH u r u rH )x i .
(3.41)
and for each trace xi,
Note that b rH b r ( p r ) n2 is an unbiased estimate of bias-squared, and

r n2 is the sum of noise variance over r dimensions.
As stated previously, we often do not know p, and true p may equal M.
Thus the seismic interpreters insight is, as usual, an important factor. Further,
even though Up is only estimated, for a reasonable number of traces and a reasonable S/N we may assume that those singular vectors associated with significantly large singular values are quite accurate. The question of what is
reasonable and what is significant unfortunately remains, and tests of the ability of the above procedure for seismic sections have not been performed. Section 3.8 deals with the statistics of eigenstructure estimates, and Chapter 12
gives examples of SVD applications to seismic image data.
3.7
A Robust Eigenstructure Estimator
At the end of Chapter 2 we alluded to a robust covariance matrix estimator that was used to estimate the eigenstructure: the Campbell method
(Campbell, 1980).
Recall first that the normalized eigenvector v1 of Cx, associated with the
largest eigenvalue 1, is such that y m v 1H x m has maximum sample vari of C
ance. The eigenstructure may be taken from Cx or a robust version R
x
obtained as in Section 2.7. However the weights on data vectors xi were functions of a Mahalanobis distance di that used the iterated robust mean vector x
and robust covariance k .
31
The iterative process can be modified to give weights on xi which are functions of y m v iH x m . Because the process is iterative, in each iteration the
minimum of current and previous weight measures is retained to ensure convergence. In the following, estimates of eigenvectors vi are denoted ui, and
estimates of the matrix V are denoted U.
The proposed procedure is as follows:
1) As an initial estimate of u1, take the first eigenvector from an eigenanalysis of V.
2) Form the principal component scores y m u 1T x m .
3) Determine the M-estimators of mean and variance of ym and the associated weights wm. The median and [0.74 (interquartile range)]2 of the
ym can be used to provide initial robust estimates. Here
0.74 (2 0.675)1 and 0.675 is the 75% quartile for the N (0,1)
distribution. This initial choice ensures that the proportion of observations downweighted is kept reasonably small.
After the first iteration, take the weights wm as the minimum of
the weights for the current and previous iterations; this prevents
oscillation of the solution.
4) Calculate x and V as in steps 1 and 2 using the weights wm for step 3.
5) Determine the first eigenvalue and eigenvector u1 of V.
6) Repeat steps 2 to 5 until successive estimates of the eigenvalue are sufficiently close. To determine successive directions ui, 2 i, project the
data onto the space orthogonal to that spanned by the previous eigenvectors, u1, ..., ui1, and repeat steps 2 to 5; as the initial estimate, take
the second eigenvector from the last iteration for the previous eigenvector. The proposed procedure for successive directions can be set out
as follows.
7) Form x im ( I U i 1 U iT 1 )x m , where
U i 1 ( u i , ..., u i 1 ) .
8) Repeat steps 2 to 5 with xim replacing xm, and determine the first
eigenvector u.
9) The principal component scores are given by
u T x im u T ( I U i 1 U iT 1 x m ) and hence u i ( I U i 1 )u .
32
Repeat steps 7, 8, and 9 until all eigenvalues and eigenvectors ui, together
with the associated weights, are determined. Alternatively, the procedure may
be terminated after some specified proportion of variation is explained.
Finally, a robust estimate of the covariance or correlation matrix can be
found from UEUT to provide an alternative robust estimate. Both this
approach and that described in the previous section gives a positive definite
correlation/covariance matrix. Robust estimation of each entry separately does
not always achieve this.
3.8
References
Campbell, N. A., 1980, Robust procedures in multivariate analysis, I: Robust

covariance estimation: Appl. Stat., 29, 231-237.
Scharf, L. L., 1991, Statistical signal processing: Addison-Wesley Publ. Co.
33
Chapter 4
Vector Subspaces
R. Lynn Kirlin
Over the past decade, much research has been devoted to the understanding and application of what has come to be known as signal subspace and
noise subspace processing. This methodology is based on the linear statistical
model for vector data. All data vectors are linear combinations of their signal
and noise components. Given such vectors of length M, a vector space CM
may be spanned by any M independent, length M complex vectors. In many
situations the spanning vectors may be partitioned or chosen such that r vectors are adequate to span the set of all possible signal vectors, the signal subspace. The remaining M r vectors lie in the noise subspace. The two
subspaces are orthogonal, meaning that any signal subspace vector has zero
inner product with any noise subspace vector.
The data covariance matrix is used to estimate the two subspaces. When
the estimation is good, for example when S/N is sufficiently high and sample
size sufficiently large, then n r dimensions of noise power can be removed
effectively from the data, allowing processing to proceed with higher S/N
data. This results in better parameter estimations, decisions, or interpretations.
The ability to separate signal and noise subspaces rests not only on S/N
and sample size, but also on a priori knowledge of the linear statistical model.
In the following, I will define the linear statistical model, explain the mathematics of subspaces, and give some examples of interest.
4.1
The Linear Statistical Model
The linear statistical model assumes that the mean vector m of data x is a
linear combination of r vectors which comprise the columns of H. Thus
35
x H w ,
(4.1)
where x is the length M data vector, H is M r, is r 1 and w is M 1.

The vector contains the coefficients that combine the columns (vectors) in
H. The vector w is an additive noise vector, whose statistics may be known or
unknown. This model fits many problems for which H and may be either
fixed on time varying, known or unknown.
Scharf (1991) shows that when is the vector of unknown parameters of
the covariance matrix R of x, then HH R1x is a sufficient, complete, and minimal statistic for , meaning that it is the smallest number of parameters that
carry all the necessary information for obtaining a unique estimate of .
When the noise is zero-mean, independent, and Gaussian, the density of L
independent samples of the vector x, if x is real, is
L
f ( x ) ( 2 )
ML 2
L 2
T 1
R
exp -- ( x i m ) R ( x i m ) ; (4.2)
2i 1
and if x is complex
L
f (x)
ML
M
H 1
R
exp ( x i m ) R ( x i m ) ,
i 1
(4.3)
where m H. The above are duplicates of equations (2.3) and (2.5), and R
is the data covariance matrix.
The reader is referred to Scharf (1991) for specific techniques of either
detection of m 0, where 0 is a vector with M 1 zero elements, or estimation of m, H, or under various assumptions, knowns and unknowns.
Often the exact density of x is not known; nevertheless, the sample covariance matrix Cx of x, given the linear statistical model, carries a good deal of
information. (See Chapter 2 for the statistics of Cx when x is Gaussian.) When
the L samples of x are arranged into the columns of X, the sample covariance
matrix can be written
36
C x XX L
H
(4.4)
R E { C x } HE { }H N ,
(4.5)
=(H H H Hw w H ww ) L.
Because of the independence of w with H,

H
where E{wwT} = N.
4.1.1
Comments on Whiteness and Stationarity
In many of the situations of interest, all of the above assumptions hold;

2
additionally N n I indicates stationary white noise. When the xi are
2
time-slice vectors of samples from the sensor array, N n I indicates that
the noise is spatially white and stationary; otherwise N would be diagonal with
2
2
2
n1 , n2 , ..., nm , for spatially white but nonstationary noise, and it would
have nonzero off-diagonals for spatial nonwhiteness noise. Similar statements
could be made of temporal whiteness and stationarity if we observed the covariance matrix XHX.
We note particularly that to determine both spatial and temporal whiteness and stationarity, we must observe the covariance matrix of the concatenated vectors xi. Further, to make simultaneous use of both temporal and
spatial correlations, some amount of temporal and spatial sampling must be
incorporated into a sample vector x, as mentioned in Section 2.3.
4.2
Covariance Matrix Structure
Assume now that at each sample time a vector time slice or snapshot is
taken across M sensors. The linear statistical model becomes
x ( t ) As ( t ) n ( t ) ,
(4.6)
where x is of length M, A is M r and s(t) is the vector of signals or signal

sources which are sensed through the measurement matrix A. It is clear that
the signal component of x has r degrees of freedom if the signals in s are inde-
37
pendent. Thus the rank of APs AH is r, where Ps E{ssH}, the source covariance matrix. Further, the data covariance matrix
H
R APs A N
(4.7)
is composed of two covariance matrices, APs AH and N, one a result of signals

only and the other of noise only.
Without knowing the S/N, how might we go about estimating A and Ps?
The answers to this depend on a priori knowledge. For example, if we know
N, then APs AH R N. Further, if we know that APs AH has a given structure, we may subsequently deduce A and Ps.
From the foregoing, however, we do know from the model assumption
that the rank of APs AH is r, which is the rank of the signal space. Perhaps we
2
also know that N n I , or if it is not, we can prewhiten N if we know N.
In this case
H
R AP s A n I .
(4.8)
To estimate whatever is unknown, we explore the eigenstructure approach

next.
4.2.1
Eigenstructure and Subspaces
I subsequently will demonstrate the following properties regarding the

eigenstructure of R, where I have designated its eigenvectors vi as the columns
of V and its associated eigenvalues 1 2 M :
1) The largest r eigenvalues are associated with the first r eigenvectors
that span the same vector space as the columns of A, the signal subspace.
2
2) The smallest M r eigenvalues of R are all equal to n (This fact

leads to a determination of r).
3) The eigenvectors associated with the M r smallest eigenvalues all
exist in a space called the noise subspace.
4) Because of the orthogonality of eigenvectors, all eigenvectors in the
signal subspace are orthogonal to those in the noise subspace.
38
(Note that as used above, the term noise subspace is not strictly correct
2
because noise has equal power n in all dimensions including the signal subspace. It is more appropriately termed the orthogonal subspace, meaning
orthogonal to the signal subspace.)
Thus, R may be rewritten,
r
R
H
i vi vi
2
n
i1
vi vi
(4.9a)
i r1
H
Vs x V s V n n V n
(4.9b)
0
H
( Vs Vn ) s
( Vs Vn )
0 n
(4.9c)
VV
(4.9d)
where the eigenvalue matrix diag (1...m) has been partitioned to give s and
n, diagonal eigenvalue matrices of size r and M r respectively, and the
eigenvector matrix V has been partitioned into signal subspace eigenvectors Vs
and noise subspace eigenvectors Vn.
The above four eigenstructure properties are explained as follows. First, we
note that As has r degrees of freedom, therefore AE{ssH}AH has rank r. Further,
APs AH must have r positive eigenvalues, the last M r equaling zero. Next,
2
we observe that if an eigenvalue of APs AH is , then n is an eigenvalue
H
2
of APs A n I ; because if v is the eigenvector associated with , then
H
Rv ( APs A I )v v n v .
2
( n )v
Then by definition, 2 must be an eigenvalue of R.

Another explanation of the above is that APs AH has r orthogonal dimensions, wherein x has variance i, i 1, 2 ..., . When white noise is added to
2
the data, an additional n is added to each dimensions variance.
39
4.2.2
Statistics of Eigenstructure Estimates
Clearly, when the true covariance matrix is unknown, information must

be extracted from the sample covariance matrix and any other a priori information available. Although the use of eigenstructure is generally a diversion
on the path to some other end, it can be a useful diversion and can in itself
provide information. For example, we have already seen in property 2 that if
the signal space rank is r M, and noise is white, then M r of the eigenval2
2
ues are equal to n . Thus an estimate of n may be provided with estimates
of the M r smallest eigenvalues. Similarly, for a rank 1 signal covariance
matrix, an estimate of 1 can provide an estimate of M times the signal power
2
plus n :
2
1 M s n , r 1 ,
2
under the conditions that P s s and AH A aHa Tr[AAH] M. For

APs AH of rank r,
r
i
M
i1
2s r n2 ,
k
k1
again assuming that Ps is diagonal with elements sk , and columns ai of A

H
have a i a i M . Thus the sum of the first r eigenvalues equals M times the
total power from the sources.
Further, a minimum mean-squared-error estimate of Ps is given by
H
Ps ( A A )
1
A H Vs s Vs A ( A A )
1
(4.10)
when estimates of Vs, s, and A are found.

In narrowband direction finding with sensor arrays, or equivalently rms
velocity estimation with seismic array data, the objective is to find the columns of A, because elements in A are the delay factors exp{jwkmk} from
source k to sensor m, assuming signals are narrowband at w wk. The analogous temporal-spectral estimation problem assumes r sinusoids linearly combined and sampled at M points in time. In this case, the delays mk refer to the
phase shift of the kth sine wave at the mth time sample.
40
It is clear that to obtain good estimates of direction of arrival, velocities, or

spectral frequencies, etc., good estimates of eigenstructure will be required if
eigenstructure is the facilitating mechanism.
We draw upon Pillai (1989), for asymptotic results for the first and second
order statistics of the i and v i from the eigenstructure of Cx. Pillai shows for
large N and distinct eigenvalues that
E { i } i ,
(4.11)
E { v i } v i ,
(4.12)
cov { i, j } i j ij
(4.13)
and
M
cov { v i, v j }
1
--N

k =1
l =1
k i l j
k l kl ij
H
------------------------------------v k v l .
( i k ) ( j l )
(4.14)
(See Pillai, 1989, for the next terms of the expansions).

From the above, we are pleased to find that the eigenstructure of the sample covariance matrix is an unbiased estimate of the true covariance matrix.
Also the covariance of eigenvectors decreases asymptotically with increasing
sample size. Finally, the estimates of either distinct eigenvalues or eigenvectors
2
are uncorrelated. The variance of i is proportional to the square ( i ) of the
true spectral power i, just as with FFT-derived power spectral estimates.
Although it is important to know the above statistics, and the first performance approximations for eigenstructure subspace parameters estimation
algorithms, such as MUSIC and minimum norm, used these, more elegant
approaches to performance analysis have more recently been found. We refer
next to Clergeot et al. (1989), wherein perturbation of the subspace is the
concern.
41
4.2.3
Statistics of Subspace Component Estimates
Clergeot et al. (1989) made a valuable departure from convention when

they used a subspace approach to analyze the performance of high-resolution
algorithms that depended on eigenvalue decomposition (EVD). As they point
out, The calculation is made tractable by the remark that, for EVD methods,
we are interested in the perturbation of the signal subspace as a whole and not
in the individual perturbation on each signal eigenvector that is orthogonal to
the signal subspace.
They point out that any perturbation to signal subspace eigenvectors vi
that lie inside the signal subspace s introduces no error in the estimation of
signal subspace. Rather, only the component v i of the perturbation of vi
that is orthogonal to s needs consideration.
Following the notation of Clergeot et al. (1989), let
r
Ps
vi vHi
(4.15)
i1
be the projection operator that projects a vector x onto the signal subspace.
Similarly, let
M
PB
H
vi vi I Ps
(4.16)
i r1
be the orthogonal subspace projection operator.

From the sample covariance matrix Cx, we obtain an eigenvector
v i v i v i ,
where vi is associated with a distinct eigenvalue i of R, the true covariance
matrix. The perturbation vi of vi has two orthogonal components:
v i v i
v i
v i ,
P s v i ,
42
(4.17)
(4.18)
v i P B v i .
(4.19)
It is shown in Clergeot et al. (1989) that

Q
1
H
v i P B ------ B k ( q ) ( Y k ( q ) B k ( q ) ) vi i
KQ q 1 k 1
P B B, X vi i ,
(4.20a)
(4.20b)
where q is the usual time index on the subarray snapshots xk(q) of length m
and k is the index on subarrays used for spatial smoothing (see Chapter 8)
giving K M m 1. Bk(q) is the noise component of xk(q), and Yk(q)
Ak s(q) as in equation (4.6), except that we have indexed time samples and
subarrays. With K 1, equation (4.20a) indicates no spatial smoothing and
x(q) has length M. Ak and Bk contain appropriate transformation matrices to
yield coherence of signals in subarray k with signals at the reference subarray
(see Chapter 8). Thus, equations (4.20a) and (4.20b) give the noise subspace (B) component of vi as a function of the additive noise Bk(q), signal
components in xk(q), and the true ith eigenvector and eigenvalue.
The formula is the result of projecting onto the noise subspace the finite
average in time (q) and space (k) of all the noise vectors, each weighted by its
associated data-vectors component in the direction of vi normalized by i.
The factors vi / i on the end of (4.20a) and (4.20b) can be replaced by
||
R y v i , where
r
||
Ry
j 1 vj vHj .
j1
This particular subspace component v i is of importance in the

MUSIC, minimum norm, and related algorithms, because trial signal-space
vectors a are correlated with linear combinations of orthogonal subspace (s)
eigenvectors. When a null correlation aNPB results, the trial vector is deemed a
solution because it must lie totally in (s)to be orthogonal to (B). In practice,
only a minimum and not a null is found because true signal space solution
vectors a will yield nonzero correlations with v i .
43
It is easy to see that any true solution vector, which of course lies in the
signal subspace, is therefore a linear combination of the vi. Thus, any true
solution vector a has noise space components given by
||
a P B B B, X R y a .
The covariance of this error component of a is derived in Clergeot
et al. (1989) from this expression, and error variances on the parameters of
interest in a (such as rms velocity or bearing or frequency) follow, but are
dependent upon the specific algorithm, source correlation, S/N, and relative
source locations. Similar analyses applied to the velocity estimation are used
by Li and Liu in Chapter 7.
4.3
Examples of Signal Subspaces
Referring back to equation (4.1), x H n for the general problem.

We may collect multiple samples of x and estimate the unknowns in the
parameter vector from the resulting sample covariance matrix Cx. Unknown
parameters may be direction of arrival, slowness, or wavefront amplitude or
energy.
In the first example, let be the vector of two complex sinusoids,
exp { j w t }
s
1
1
exp { jw t }
2
s2
(4.21)
The frequencies w1 and w2 are to be estimated. We assume we have a

length nine tapped delay line and incremental delays are seconds. If x(i) is
the vector of ten samples from the nine taps plus input at times i, and the
two sinusoids plus noise are added,
44

e jw1
jw1 2
e
x(i)
jw1 9
e
1
e
e
jw 2
jw 2 2
.
.
.
jw 2 9
e jw1 i
s1
jw i
s2 e 2
n(i)
(4.22)
Hs ( i ) n ( i )
2
Given independent, white stationary noise with variance n and zero-mean

signals, the data covariance matrix is
2 0
s
HH 2 I
E { XX } R x H 1
n
2
0 s2
H
(4.23a)
HP s H n I
(4.23b)
2
This covariance matrix will have two eigenvalues greater than n and
2
eight equal to n . The two eigenvectors associated with the larger eigenvalues
span the same signal subspace as do h1 and h2, the columns of H. Both col2
umns of H are orthogonal to the eigenvector associated with 3 n . We
note that in no case will either eigenvector in s equal either h1 or h2, but will
always be some combination of both h1 and h2. However, with only one sig2
2
nal, the rank 1 case, v 1 h 1 10 and 1 10 s1 n .
In the second example, I intend to estimate the directions of two independent, narrowband, analytic sources at bearing angles and 2 and at infinite
distance. The equivalent problem is estimation of two reflections slownesses.
The plane waves arrive at M equispaced sensors. The relative delays of signal k
appear in the H-matrix as the elements e juk mk , where k sin k/c, u1
and u2 are the radian frequencies of the two sources, is the sensor spacing, c
45
is the wave velocity, and m 0, 1, 2, , M 1 indexes the sensors. Thus the

sample snapshot vector at time i can be written
1
1

ju

ju
e 22
e 11
ju 2
e 1 1
.
x(i)
.
.
.
ju1 ( M 1 )1 ju2 ( M 1 )2

e
e
s (i)
1 n(i)
s2 ( i )
(4.24)
Hs ( i ) n ( i )
Rx in this case is identical to that for the first example wherein the noise is
spatially white and stationary. The eigenstructure is identical to that of Rx in
2
the first example if both Ps and n are unchanged and if w1 u11 and w2
u22. For this reason the normalized frequency or f 0.5 is
often used for both problems ( w or u1). The parameter of interest, wi
or i, is extracted from the solution values of or f.
4.4
Seismic Wavefronts in Noise
Seismic wavefronts reflected from idealized horizontal layered media arrive

at an equispaced horizontal geophone array with the same example model
structure given in equation (4.24) above. However, a major distinction is that
the delay at the mth sensor is not generally an integer multiple of any single
intersensor delay. A technical exception to this generalization is created when
preflattening to a wavefront so that the wave strikes all sensors at the delaycorrected identical time, and 0 at all sensors. In fact we use this special
case as part of some estimation/detection schemes.
For seismic wavefronts that satisfy the hyperbolic two-way times versus the
offset model, the arrival time Tm at the mth phone is given by
2
T m T 0 ( m ) V ,
46
(4.25)
where is the sensor spacing, V is the wavefronts rms velocity, and T0 is the
zero-offset, two-way traveltime. Thus, the relative delay at sensor m in reference to sensor zero (m 0) is
m
T 0 ( m ) V T 0 .
2
(4.26)
2
The parabolic approximation uses ( ( m ) V )<< T 0 , for which

2
m ( m ) ( 2T 0 V ) .
(4.27)
The relative delays mk from the kth wavefront replace mtk in the elements of H
in equation (4.24), to the extent that the signals can be considered narrowband (Kirlin, 1991).
For more broadband signals, there will be more than one plane wave per
source. The eigenstructure properties discussed for the examples above still
hold under certain restrictions. Basically, the wavefronts to be analyzed must
be fairly flat and mostly encompassed by the window of analysis. However,
because an unflattened wavefront has nonzero relative delays and the seismic
wavefronts are broadband, each frequency component would have its unique
phase rotation at each sensor. Thus, straightforward application of the above
will yield eigenstructures with an unclear demarcation of the signal subspace,
because there will be more than one larger eigenvalue per complex wavefront present.
If the reflections are band-pass filtered to create a more narrowband signal,
the model is better matched, but signal energy has been lost. Methods of combining Fourier coefficients are used in broadband extensions. We will discuss
these methods and the above problems of model unsuitability in Chapter 6.
4.5
Nonwhite Noise
In the preceding and in most presentations of eigenstructure methods, the

white noise assumption is made. When noise is not white and its covariance is
known, the noise can be whitened with a transformation. There are several
possibilities, but they all must transform the noise vector of length M into M
orthogonal components. Clearly the eigenstructure of the noise covariance
matrix will do this. Specifically, if the (full rank) covariance matrix of noise
vectors n is
47
N VV ,
(4.28)
so that V is the matrix with orthonormal eigenvectors vi and is the diagonal

matrix of eigenvalues i, then the transformation T VH applied to the noise
vector w gives
H
V w,
(4.29)
such that cov ( ) is diagonal:

H
E { } V E { ww }V
V VV V .
H
(4.30)
However, any diagonalizing transformations will do the same. The

Cholesky decomposition is used widely and is simpler to implement (Scharf,
1991). Without deriving or explaining the Cholesky decomposition, suffice it
to say that an upper triangular matrix U is produced such that NU L is
lower triangular. It is then verified that
H
U NU D (diagonal)
(4.31)
and
N
1
UD
1
U .
(4.32)
2
Further, the columns of UH are orthogonal because UUH diag ( A ii ) .

Therefore, the columns of U comprise an orthogonal basis set, and the transformation
H
U n
orthogonalizes and whitens n. If the columns of U are normalized, then the
transformed noise is also stationary (equal power in all dimensions), and the
elements of D become identical.
48
Any such whitening transformation will be applied to the data vectors x,

thus the signal components are also transformed:
z Tx
(4.33)
T ( As w )
TAs Tw .
The covariance matrix of z is
H
R z TAPs A T n I .
(4.34)
As long as THT I (or orthonormal transformation vectors are

employed), the eigenvalues of Rz are identical to those of Rx. To demonstrate,
observe that if l and u are an eigenvalue-eigenvector pair for Rz, then
H
R z u TR x T u lu ,
H
T TR x T u lT u ,
H
Rx ( T u ) l ( T u ) .
(4.35)
This shows that l is an eigenvalue of both Rx and Rz. The eigenvector

v THu of Rx is simply a rotation of u because the magnitude is the same:
H
H 2
H
H
H H
2
T u u TT u u T Tu u u u .
The effect of T on the delay vectors a in matrix A is seen in the fact that
we must search for solutions Ta rather than a. That is, vectors Ta, and not a,
lie in the signal space of Rz.
49
4.6
References
Clergeot, H., Tressens, S., and Ouamri, A., 1989, Performance of high resolution frequencies estimation methods compared to the Cramer-Rao
bounds: IEEE Trans. Acous., Speech and Sig. Proc., 37, 1703-1720.
Kirlin, R. L., 1991, A note on the effects of narrowband and stationary signal
model assumptions on the covariance matrix of sensor array data vectors: IEEE Trans. Signal Processing, 503-506.
Pillai, S. U., 1989, Array signal processing: Springer-Verlag Inc.
Scharf, L. L., 1991, Statistical signal processing: Addison-Wesley Publ. Corp.
50
Chapter 5
Temporal and Spatial Spectral Analysis
R. Lynn Kirlin
Spectral analysis is a broad topic. Most scientists and engineers who deal
with signals are quite comfortable with the concepts of time-frequency relationships. They are familiar with the Fourier transform and the common theorems such as Parsevals, convolution, delay, etc. The fast Fourier transform, or
FFT, is widely used and understood. The FFT gives a value of the Fourier
transform at all integer multiples of the reciprocal record length T. This spacing in frequency is also the resolution. The Nyquist frequency or folding frequency is half the sampling frequency. FFT coefficients between the folding
frequency fs / 2 and the sampling frequency fs are identical to those between
fs / 2 and 0, due to the periodicity of the coefficients in the frequency
domain.
We assume the reader is also familiar with the z-transform, which is essentially the Fourier transforms equivalent with application to equispaced samples of either time or spatial signals. The classical reference by Oppenheim and
Schafer (1975) discusses the z-transform, the discrete Fourier transform
(DFT), and fast Fourier transform (FFT).
Much information can be gained from the frequency domain representation of signals. Just as the FFT can apply to temporal signal samples, it may
equally apply to spatial signal samples. Quite commonly, two-dimensional (2D) FFTs are applied to 2-D seismic data, where one dimension is temporal
and the other is spatial. Some drawbacks of the FFT spectrum are (1) it usually gives more data than is necessary, (2) it is not suitable for transient signals
of short duration (few samples), (3) it does not directly give estimates of the
few parameters which often determine precisely the statistics of the time
sequence, (4) the FFT frequencies are almost never the same as those of real
51
sinusoids that may be present in the data, (5) it does not result in a polynomial ratio form, and (6) the spectral resolution is limited to T 1.
Although I have not given the deserved equal time to the virtues of Fourier analysis, the above gives sufficient reason for seeking alternative
approaches to signal analysis. I will suggest alternatives to the FFT for either
temporal or spatial or spatio-temporal signals analysis.
In this chapter, I will relate a discrete signals power or energy spectrum, or
simply spectrum, to both its discrete autocorrelation function and the autocovariance or covariance matrix of sample vectors from the discrete sequence. I
will also explain the relationship between the eigenstructure of the sequences
covariance matrix and spectral values. However, the chief purpose of this
chapter is to demonstrate a number of high-resolution algorithms and show
their commonality.
5.1
The Discrete Power Spectrum
The relationship between a continuous autocorrelation function rx() of a

stationary process and its Fourier transform is well known:
r x ( ) E { x ( t )x ( t ) }

Sx ( f )e
j2f
df,
(5.1)

and

Sx ( f )
rx ( )e
j2f
d .
(5.2)

Using z-transform notation, the equivalent relationships for a stationary

discrete-sample sequence x(i) are
r x ( k ) E { x ( i )x ( i k ) }
1
------2j
Sx ( z ) zn 1 dz
c
52
,
(5.3)
and

Sx ( z )
where
k
k
r x ( k )z ,
(5.4)

c
is a counter clockwise closed contour integral in the region of convergence of

Sx(z) and encircling the z-plane origin.
5.1.1
Relation of Sx(z) to Eigenstructure of Rx
What is the relationship of a 2M 1 2M 1 covariance matrix Rx to

the power spectrum of the signal from which Rx was created? Assume the signal to be zero-mean, thus the covariance matrix and the autocorrelation
matrix are identical. From equation (3.1) with v and an eigenvector and
associated eigenvalue, respectively,
R x v v,
(5.5)
where v [v1, v2, ..., vm]T. Now consider the ith element of v on the righthand side of equation (5.5). It is the inner product of the ith row of Rx and v:
M
r ( i, k )vk
k1
v i ,
(5.6)
where r(i,k) is Rx(i,k), the (i, k)th element of Rx. More precisely, for stationary
processes r(i, k) rx(k i) rx(i k). We may consider vj v(j) to be a
sequence in time with j 0 corresponding to the time origin. Clearly
equation (5.6) indicates a convolution operation:
M
r ( i k )v ( k )
k1
v ( i ).
(5.7)
As the size of the matrix grows to infinity (M ), indicating knowledge of

all values of rx(k), equation (5.7) becomes
r ( i ) * v ( i ) v ( i ) ,
53
(5.8)
where * denotes convolution. Taking the z-transform of both sides of

equation (5.8) gives
S x ( z )V ( z ) V ( z ).
(5.9)
From this it is clear that in the limit as the covariance matrix incorporates
values of rx(k) for all k on the real line,
S x ( z ), z e
jw
(5.10)
where is the sampling interval. Thus, is the spectral value of Sx(z) at the
frequency w. What then can we deduce with regard to the sequence v(i)?
According to equations (5.8) and (5.10),

r ( i k )v ( k ) S x ( e
jw
(5.11)
jw
(5.12)
)v ( i ) .
k
Rewriting the convolution, we have

r ( k )v ( i k ) S x ( e
)v ( i ) .
k
This is a unique Fourier transform if v(i) e jwi; that is,

k
r ( k )v ( i k )
r ( k )e
jwk jwi
k
Sx ( e
jw
)e
jwi
(5.13)
Thus, in the limit as M , Sx(e jw), vi, the ith element of the
eigenvector v, approaches e jwi, and v approaches the complex sinusoid at
radian frequency w.
Thus, we expect that eigenstructure carries information relevant to the
spectrum of a process. We may also infer without proof that a finite number of
54
eigenvalues may be adequate to estimate an entire spectrum if that spectrum is

exactly a function of a finite number of parameters. This would clearly be the
case if the process were composed of exactly n sinusoids at different frequencies. Then Fourier analysis would yield exactly 2n real or n complex numbers
to uniquely describe the combination.
5.1.2
All-Pole Model of Sx(z)
It can be shown that any rational spectrum of a sampled signal can be adequately represented as an all-pole spectrum. A hint that this should be so is
given by noting that a finite z-plane zero factor (the polynomial 1 az1 has a
zero value at z a) can be expanded into an all pole function:
1 az
1
--------------------------------------------- .
1
2 2
1 az a z ...
1
Continue then with the assumption that the process at hand may be considered to be that produced at the output of an all-pole filter G(z), driven by
2
white noise w(k) with variance w . Thus a finite length sequence x(k) from
the output of this filter would have the z-transform
M
X(z)
1
-W ( z )
-----------------1
i 11
pi z
G ( z )W ( z ) .
(5.14)
For such a process it may easily be shown that an mth-order linear predictor will optimally predict x(k) from x(k i), i 1, 2, ..., M with minimum
mean-squared error (mmse). That is, the mmse prediction is
M
x ( k )
ai x ( k i ),
(5.15)
i1
when the ai are the solutions to the Yule-Walker equations (Haykin, 1991,
chapter 2),
R x a r,
55
(5.16)
The generating and predicting filters are shown in Figure 5.1. The error in
the prediction is ( k ) x ( k ) x ( k ) . From the diagram of Figure 5.1 or
from the above equation we can easily deduce that the variance of (k) is
2
Sx ( z ) 1 H ( z ) ,
(5.17)
where
M
H(z)
a i z 1 .
(5.18)
i1
Thus we note that

2
Sx ( z ) 1 H ( z ) .
(5.19)
It is shown widely in the literature (for example, Haykin, 1991, chapter 5)

that, for Gaussian processes, the error sequence (k) is white, that
2
2
2
2
1 H ( z ) G ( z ) , and w . Further,
M
2
w
rx ( 0 )
ai rx ( i ) ,
(5.20)
i1
where a0 1. This equation may be joined with those of equation (5.16) to

give the augmented Yule-Walker equations:
where Rx is M M, a [a1, a2, ..., aM]T and r [rx(1), rx(2), ..., rx(M)]T.
w(k)
Figure 5.1.
G(z)
x(k)
H(z)
x(k)
e(k)
White noise w(k), all-pole filter model G(z), data x(k), linear predictor
H(z), white error process e(k).
56
2
+
Rx 1 w ,
a
0
(5.21)
where R x is the M 1 M 1 covariance matrix.

The study of modern spectral analysis is based on the foregoing equations.
Because the autocorrelation generally is not known, it must be estimated if the
equations are to be solved for a, and the spectrum estimated with equation
(5.19). Indirect methods involve the lattice filter, a unique one-to-one map of
the transversal FIR (finite impulse response) filter H(z) given in equation
(5.18). In the lattice filter, M reflection coefficients rather than M tap gains ai
must be found.
Solving 1 H(z) for its roots gives the spectral poles of Sx(z). For the nsinusoid signal mentioned earlier, we would expect 2n poles (n pairs) on the
unit circle. Improving the accuracy and resolution of any such estimates taken
from noisy, short records is the goal of modern spectral analysis research. In
T +
2
this case of spatial samplings from equation (5.20), w b R x 1 1 , and the
unknown a vector in b is found from equation (5.16). Usually equation (5.21)
+
is solved with fast methods utilizing the Toeplitz structure of R x (Haykin,
1991). There is an identical problem to be solved for directions of arrival
(DOA), of plane wavefronts that arrive at a set of equispaced sensors.
5.1.3
Sx(z) as a Function of Rx
In the remainder of this chapter, I will describe a number of the high-resolution estimators, each of which has much in common with the linear prediction spectral estimates in equation (5.17). Before continuing, write Sx(w) as a
function of Rx, the covariance matrix. First, normalize sampling frequency to
unity, then let
e [1 e
jw
2jw
... e
Mj w T
] ,
(5.22)
(5.23)
b [ 1 a 1 a 2 ... a M ] ,
and
57
1 1 [ 1 0 0 ... 0 ] .
(5.24)
2
Then from equations (5.17) and (5.21), and using w ,

1
w S x ( z ) 1 H ( z )
2
e b
e
e
+ 1
( Rx )
+ 1
( Rx )
w
0
2
w
2
w
T
+ 1
( Rx )
or
2 T
+ 1
T
+ 1 1
S x ( z ) w e ( R x ) 1 1 1 1 ( R x ) e .
(5.25)
We shall see in the following that equation (5.25) is one specific formulation of one of two more general forms of high-resolution spectral estimators.
The distinctions among the specific estimators usually are due to the specific criterion, such as mmse prediction, as we have just seen, but are sometimes due to technique, either in solving for solution frequencies (or DOAs)
or in estimating the covariance matrix. Many methods specifically address the
problem of a finite number of sinusoids (or a finite number of plane-wave
arrivals). Yet all make use of the covariance matrix, and most make use of its
eigenstructure.
5.2
High-Resolution Spectral Estimators
In the foregoing we saw that the linear predictor can produce a spectral
estimator, S x ( ) by varying the frequency in e e() in equation (5.25),
where e is composed of elements of the form exp(jk), k 0, 1, ..., M, . In
this section I will use P(f ) rather than Sx(f ) for high-resolution estimates,
because most of them are not actually spectral estimates but are frequency or
DOA estimators.
58
Literally hundreds, and probably several thousands, of papers and reports

have dealt with spectral analysis. A recent text by Marple (1987) has not only
thoroughly summarized the conventional methods (Fourier) but also several
of the so-called modern methods, including maximum likelihood, maximum
entropy (ME), and eigenstructure. In fact, Marples work and Owsley (1985)
constitute a rather complete review on the topic at those dates. Marple states
that although much has been done, there still is much to do, particularly in
the eigenstructure approach. The recent current literature verifies his statement. Further presentations, more current, are given in various places of
Haykin (1991). The seismic analysis community also has begun to utilize the
eigenstructure methods (Key et al., 1987; Mars et al., 1987; Shirley and
Laster, 1987). A casual study of both Haykin (1991) and Owsley (1985)
reveals that most of the modern spectral analysis methods have one of two
common formulations. Next I describe this commonality and suggest an optimal derivation framework from which further variations within the common
formulations can be constructed.
To begin, assume that N sample vectors x each have M elements of samples taken from the complex, zero-mean stationary process x(t) to be analyzed.
These are used to estimate the true covariance matrix
H
R x E { xx } ,
(5.26)
with the average vector outer product

N
1
H
C x --- x i x i .
N
(5.27)
i1
We write the covariance matrix Rx in terms of its ordered eigenvalues

1 2 M and associated normalized eigenvectors vi, 1 i M:
M
R x VV
i vi vHi ,
(5.28)
i1
where V is a matrix whose columns are the eigenvectors vi, and is a diagonal
matrix with elements i.
59
Assume that the delay vectors

j j2
j ( M 1 ) T
... e
e
e ( ) 1e
(5.29)
represent either waveform delays through a tapped delay line or narrowband

signal wavefront delays through an equally spaced, linear spatial array of sensors. Then, in the time-sample case, is the time between samples, and in the
spatial case, sin c, where is the spacing between sensors, is the
bearing (DOA) of the wavefront arrival (uniform plane wave), and c is the
wave velocity. In either case, normalizing the in equation (5.29), we may
write elements of e(f ) as ej2 mf, where f / 2 is referred to as the normalized frequency. We will drop the argument f except where necessary for
clarification.
5.2.1
Minimum Variance Distortionless Response (MVDR)
Owsley (1985) gives the details of minimum variance distortionless

response (MVDR). The criterion for optimization is minimum total output
power from the weighted sensor data. That is, J {E|wHx|2} is to be minimized subject to the constraints that whatever power is arriving from DOA
will pass with gain unity (distortionless) (i.e., |wHe| 1). So, choosing the
weight vector w yields PMV(f ) given below. This is also known as Capons algorithm (Capon, 1969). Capon called this a maximum likelihood algorithm;
however, it is not maximum likelihood, but constrained minimum mean
square:
H
1
P MV ( f ) ( e R x e )
1
H
1
H
e i v i v i e
i1
1
1
2
1 H
i e vi
i 1
.
60
(5.30)
5.2.2
MUSIC
The Multiple Signal Classification (MUSIC) (Wax et al., 1984) algorithm

actually is not derived from an optimization, but rather is an implementation
of the observation that the direction vector e(f ) must lie in the signal space. As
such the direction vector must be orthogonal to the M D eigenvectors that
2
correspond to the M-D smaller eigenvalues of Rx D 1 ... M n ,
2
assuming white noise with variance n is present at the sensors and exactly D
independent wavefronts are arriving at the sensors. Thus, ideally,
H
e ( f )v i 0, i D 1, ..., M ,
when f is chosen correctly. Thus,
M
H
P MU ( f ) e v i
i D 1
1
(5.31)
We note immediately that this is similar to MVDR in equation (5.30),

but only eigenvectors vi, i D 1 are used, and since all eigenvalues i, i
1
D 1, are equal, the scale factors n have been discarded. For exact solutions and true Rx, PMU(f ) . This contrasts to PMV(f ), where for the same
conditions and proper scaling, PMV(f ) equals the true power in the arriving
signal. PMU is not to be taken as a spectral estimator; as a plot versus f it is only
mistakenly referred to as a spectrum. Rather, it is a frequency or DOA indicator, as are many of these algorithms.
An alternate form of MUSIC has been developed to alleviate the computationally burdensome search over f for maxima in PMU(f ) (Rao and Hari,
1989). (see also Chapter 7, Section 7.4.3.2.) Root-MUSIC uses z-k in place of
e-jkw in the elements of e. This creates a polynomial in z:
1
P MU ( z ) e ( z )V N V N e ( z )
M1
k
( 1 ri z
i1
1
) ( 1 r i z ),
(5.32)
where k is a scale factor, VN is the M (M D) matrix whose columns are

noise-space eigenvectors and ri and r *i are the roots of PMU(z). Thus, VN is
61
analogous to U2 in Section 3.3, except that here we assume the null space
eigenvectors are taken from the exact covariance matrix, rather than XXH.
5.2.3
Eigenvalue
The eigenvalue method (Johnson, 1987) is somewhat of a merging of

PMV(f ) and PMU(f ). This scheme recognizes that all noise eigenvalues may
not be equal and gives the proper weight to each noise-space eigenvector in
the formula for PMU(f ). However, because eigenstructure must always be estimated, it is perhaps better to use PMU(f ), which assumes white noise and
exactly D signals, or PMV, which makes no assumptions at all. This estimator is
M
2
1 H
P EV ( f ) i e v i
i D 1
5.2.4
1
(5.33)
Enhanced Minimum Variance
Owsley (1985) introduced an enhanced minimum variance by allowing

the real, positive parameter to indicate knowledge of the S/N. As ,
this estimating approaches PMU(f ), thus becomes more of a detector than a
power spectrum. For 1, the estimator is equal to PMV(f ). Subsequently,
we shall see that another algorithm, new maximum entropy, can in essence
be used to set . The formula is
M
2
H
P EMV ( f ) q i e v i
i 1
( e VQV e )
1
1
(5.34a)
(5.34b)
where
Q diag ( q i ) ,
1
--------------------2--, 1 i D
( i )
qi
1
, D 1 i M;
----2
62
(5.35)
is a real, positive parameter to be selected, and 2 is the assumed independent noise variance at each sensor.
The division of the summation over eigenstructure into 1 i D and
D 1 i M is due to assumed knowledge that either there are D narrowband signals present in the data x or that the data may be represented by its
reduced-rank covariance. The other dimensions are due mostly, if not completely, to noise.
5.2.5
Maximum Entropy
Maximum entropy (ME) estimates (Owsley, 1985; Burg, 1975) can be

written similarly to the above, but the magnitude-square is taken outside, not
inside the summation:
2 1
P ME ( f ) 1 H
i 1 vi vHi er
1
i1
(5.36)
where 1 1 ( 10...00 ) , and er is e with reversed elements.

This formulation of the ME spectrum is identical to the mmse linear predictor spectral solution given in equation (5.25), except that we have dropped
a scale factor.
The ME spectrum was derived by Burg, who asked what spectrum P(f )
has maximum entropy (see Section 5.2.7)
12
Hx
ln P ( f )df
(5.37)
1 2
subject to the constraints that the density of P(f ) must match the known autocorrelation lags rx(m) (= Rx(m) for zero mean x) for 0 m M 1. The
solution leads to the augmented Yule-Walker equation (5.21), except in equation (5.21) we had size M 1 instead of M for the size of the covariance, having derived an order-M predictor.
In most of the above estimators, various scale factors in the numerator
have been omitted. In equation (5.36), for example, the usual whitened signal
H 1
1
power ( 1 1 R x 1 1 ) is not shown in the numerator. Any constraint for this
63
scalar might be used, such as the whitened MMSE as just discussed; alternately a total power equal to unity can be used. When simply detecting sinusoids or estimating their frequencies or target (emitter) bearing angles, a scale
factor is unimportant and only the location or relative size of spectral peaks is
of interest. When comparing estimators, their spectral maximum is often set
to unity.
Reconsidering the maximum entropy spectrum, we note that it also might
H 1
be written using i 1 1 i v i :
P ME ( f )
2 1
i1
H
i vi er
(5.38)
Expanding equation (5.38) and using the orthonormality of vi and vj gives

M
H
H
P ME ( f ) q ik e v i v k e
i 1 k 1
1
1
V 1 1 1 1 V
( e VQV e )
1
(5.39a)
(5.39b)
where
Q
q ik
1
( i k ) v *1i v 1k ,
1
(5.40)
and v1i is the first element of vi. For unspecified qik, the form of equation
(5.39b) is more general than that of equation (5.34b), and incorporates all the
other spectra. If all qik 0 when i k, equation (5.39b) degenerates to the
general form of PEMV in equations (5.34a)-(5.34b) with unspecified diagonal
qi.
5.2.6
Minimum Norm
Yet another powerful spectral estimator was introduced by Kumaresan and

Tufts (1983). By minimizing the norm of a linear combination of noise-space
eigenvectors and constraining the first element of that linear combination to
64
be unity (a polynomial form constraint) the minimum norm (MN) estimator

is constructed. A reformulation of that estimator is
M
P MN ( f )
qik eH vi vHk e
i D 1 k D 1
( e VN VN 11 11 VN VN e )
1
1
(5.41a)
(c c) ,
(5.41b)
where VN is the noise space eigenvector matrix as in equation (5.32), and

c [v1, D 1,v1, D 2, , v1M]T is the vector of eigenvectors first elements.
H 2
H
H
Here the qik are the corresponding i,k elements in V N 1 1 1 1 V N ( c c ) .
The differences between this and PME are that only noise eigenvectors are used
and that i are equated to unity under the assumption that they are equal.
Thus, it has been shown that all of the spectral estimators above, including
ME and minimum norm, are of one general form [equations (5.39a) and
H
H
(5.39b)], having a weight qik associated with each e v i v k e term in the series
expansion. A large subset of the estimators use qik 0 for i k, and this subset has the form of equations (5.34a) and (5.34b).
5.2.7
Maximum Entropy Spectrum with Eigenstructure

Projection Constraints
The original ME spectral estimate was introduced in 1975 by Burg. An

excellent review is given by Papoulis (1981) who relates the ME method
(MEM) to autoregressive Gaussian processes with both deterministic and
non-deterministic autocorrelation functions.
The usual constraints are that the density of P(f ) must match the known
autocorrelation lags rx(m) for 0 m M 1. This gives the expression to
maximize:
12
M1
12
max lnP ( f )df q m P ( f )e

( )
m0
1 2
1 2
jmw
df,
(5.42)
where w 2f, and qm are Lagrangian multipliers.

In Kirlin (1992), I suggest that it is reasonable and maybe even preferable
to force P(f ) to have certain projections onto the eigenvectors. It may be pref65
erable due to the fact that, given an estimate of Rx from a short data record, it
is well known that estimates of rx(m) for larger values of m are quite poor.
Also, values of rx(m) may be near or at zero-crossings of rx(), in which case
their normalized estimates diminish uniformly with their index, have error
variances inversely related to the number of vector samples, and are proportional to the square of their true value (see Chapter 3). However, as the index
increases, eventually either computation or sample-size errors will cause
smaller eigenvalues to have erroneous inverses and will give considerable MSE
1
in spectral estimates based on R x . Lastly, positive projections insure a positive definite estimate.
Thus consider maximizing either
12
J
12
lnP ( f )df
1 2
qi
P ( f ) e v i df
i1
1 2
12
(5.43a)
or
12
J
lnP ( f )df
qi
i1
1 2
12
( f )e v i df.
(5.43b)
1 2
Equation (5.43a) contains the constraints via real Lagrange multipliers qi

that the data at all frequencies f must have fixed expected-square projections
onto the eigenvectors vi. Equation (5.43b) contains the constraints via complex qi that data at all frequencies f must have fixed root-expected-square projections onto the vi. Thus, in equation (5.43a) we constrain P(f ) such that
12
P ( f ) e v i df C i ,
1 i M,
(5.44)
1 2
where, for example, Ci may be the eigenvalue i. Taking the variation of J in
equation (5.43a) with respect to P(f ) easily gives
M
2
H
P ( f ) qi e vi
i 1
66
1
(5.45a)
( e VQV e )
1
, Q diag ( q i ).
(5.45b)
That is a satisfying result in that it predicts the form of the large subset of
the estimators given by equations (5.43a) and (5.43b). Simply by choosing the
Ci according to some criteria, we can produce more of such estimates.
In order to satisfy the constraints of equation (5.44), substitute P(f ) from
equations (5.45a) and (5.45b) to yield
12

1 2
e v
---M------------i-------- df C i , 1 i M .
(5.46)
qj eH vj 2 .
j1
The solution of these constraint equations for the unknown Lagrange

multipliers is not as straightforward as desired.
An effective alternative is as follows: Multiplying both sides of equation
(5.46) by qi, and summing over i gives
M
12

1 2
qi eH vi 2
i1
---M---------------------df
qj eH vj 2
qi Ci
q C 1,
(5.47)
i1
j1
where q [q1, q2, ..., qM]T and C [C1, C2, ..., CM]T.
Because only the sampled covariance is available, we are usually uncertain
of the eigenstructure. Thus, we let the constraint values Ci be random variables with E{Ci } C i and covariance matrix Kc. Because the Ci are random
variables, we choose the qi to minimize the variance of qTC while enforcing
the expectation of the constraint in equation (5.47); that is,
T
E{q C} q C 1 .
(5.48)
This results in the general spectrum of equation (5.45b) becoming the

new maximum entropy (NME) spectral estimator:
67
1
P NME ( f ) C K c C ( e VD q V e )
1
(5.49)
where
D q diag ( q 1, q 2, ..., q M)
(5.50)
and
T
1
1
q K c C ( C K c C ).
(5.51)
Because equation (5.49) has been derived in terms of C and its variance, it
applies to general uncertain constraints. Thus, for example, according to convention, we might split the signal and noise space into eigenvector sets, esti2
mating i l i
, 1 i D , for the signal eigenvalues, and
2
i
, D 1 i M for the noise eigenvalues. Alternately, we might
use Ci li,1 i D, and Ci 0, D 1 i M. Each of these uncertain
Ci has a respective variance that can be inserted into equation (5.49).
5.2.7.1 Example 1
Consider then the case where a pure spectral analysis is being attempted,
having no knowledge of sensor noise, etc. Letting C i i , the eigenvalues
estimates give the realization that the constraints are a bit uncertain. In fact, it
is known (see Chapter 4) that for Gaussian data and distinct i the asymptotic
mean and variance of i are
E { i } ,
(5.52a)
i j
E { ( i i ) ( j j ) } ------- ,
N ij
(5.52b)
where N is sample size in the covariance estimate. Then in equation (5.51)

using C i i i i , where the i, are zero-mean, uncorrelated errors
2
2
with E { i } i , gives for each qk,
1
q k ( k
2
k )
i1
2
( i
68
2
i )
(5.53)
These produce the spectral estimate

M
P(f)
[ 2i var { i } ]
i1
1
( k var { k } ) e
vk
k1
(5.54)
where e e(f ). This spectral estimate is approximately the same as PMV(f )

except for a scale factor and a weighting of each estimated eigenvalue inversely
with its estimated variance.
5.2.7.2 Example 2
In (5.54), we use our measurement k to estimate the variance via equation (5.52b); thus,
2
var { i } i N ,
(5.55)
and P(f ) becomes, using equation (5.55) in equation (5.49),

M
P(f) M
k
1
( e ( f ) ) vk
(5.56a)
k1
MP MV ( f ) .
(5.56b)
This method is not effective for estimating variance for small eigenvalues.
Thus the NME estimate with uncertain constraints C i i , when the
2
variance of i is estimated with var { i } i N , is PMV, the minimum
variance spectral estimate, scaled by M.
5.2.7.3 Example 3
Suppose next that we wish to use the constraints
2
( i ) l i , 1 i D
Ci
2

, D1iM,
69
(5.57)
where l i is an ith mode signal power estimate, is Owsleys (1985) signalspace enhancement factor, and
M
1

-------------- i ,
M Di D 1
2
(5.58a)
is a noise power estimate. When the true noise eigenvalues are not distinct, we
2
need an estimate of variance
. When M D 3 and N is large,
2

----------------------s MD
is approximately t-distributed with M D 1 degrees of freedom. In this
2
case,
has sample variance
MD1
2
2
var {
} ----------------------------------------------s
(M D 3)(M D)
(5.58b)
where
M
s (M D 1)
1
2 2
( i
)
i D1
However, assume that the smaller eigenvalues are, in fact, distinct, then
the uncertainties of Ci are expressed by
M
2
var { i } ( M D ) var { j } , 1 i D
j D1
(5.59)
var { C i }
M
( M D )2 var { j } , D 1 i M ,
j D1
Again replacing the value of i with i in var { i } in equation (5.52b),
and using D for M D,
70
1 2
2
2
i N ( D N ) j , 1 i D
j D1
var { C i }
M
( D2 N )1 2j , D 1 i M .
j D1
(5.60)
With these variances and Ci as in equation (5.57), the estimator of equation (5.54) becomes
M
2
l 2i
(
)
1
---------------------------- ---------------------------------------
M
M
i 1
i D1
2
D2 2j
i D 2 2j
j D1
j D1
P NME ( f, ) --------------------------------------2--------------------------------------------------------------- (5.61)
D
M
2
2 H
l i e H v i

e vi
1
-- ------------------M--------------------- --------------------------M

i1 2
i

D

1
2 2
D2 2j
i D j
j D1
j D1
This expression for the signal-space-enhanced spectral estimate is to

within a scale factor very similar to Owsleys (1985) [equations (5.34a) and
(5.34b)], except that rather than a deterministic Ci 2, it is a random variable with a variance to be estimated. Naturally, this will yield a spectral estimate with less resolution and assumed accuracy than one for which i and D
are known exactly. The resulting estimate, however, will not contain misleadingly sharp spectral peaks implying precise data when sinusoids are present,
but will spread the peaks to reflect the uncertainty, via the signal-space summation in the denominator. Also, by utilizing all eigenvectors ( ), uncertainty with respect to the signal space dimension is incorporated.
5.2.8
Complex New Maximum Entropy Estimator
If, instead, we use the projections given in equation (5.43b), the corresponding spectral estimator is the complex new maximum entropy estimator
(CNME) (Kirlin, 1992):
71
P CNME ( f ) ( C
2
1
Kc C )

i 1 j 1
1
* H
H
q i q j e v i v j e
( C K c C ) ( e VQV e )
1
1
(5.62a)
(5.62b)
where
1
1
Qc Kc C C Kc .
(5.63)
The first significant difference between this result and that for the power
constraints is that the magnitude-squared operation is outside the sum in
equation (5.62a) instead of inside as in equation (5.49). This form corresponds to that of PME and PMN as is seen in the matrix Qc.
In Kirlin (1992) some qualitative comparisons are shown between
PCNME(f ) and PME and PMN under various assumptions regarding C and LC.
Although this general form is of interest, our personal preference is to avoid
the form of PME, PMN, and PCNME due to tendency to give false peaks (instability).
5.2.9
Example Spectral Estimates
To demonstrate relative results, take an estimated signal spectrum generated by two sinusoids at normalized frequencies 0.225 and 0.250 in white
Gaussian noise at S/N 3 dB. The data record is 64 points long and the
covariance matrix is 20 20. The following spectral estimators have been
used: conventional ME, Figure 5.2; minimum variance (PEMV with 1),
Figure 5.3; enhanced minimum variance PEMV with enhancement factor
100, Figure 5.4 (essentially PMU); new maximum entropy method PNME of
equation (5.61) with 1, Figure 5.5; and a modified forward-backward
linear prediction (FBLP) method (Marple, 1987), Figure 5.6.
A number of expected effects can be seen among these results. Although
PME resolves the two frequencies, it is quite noisy. The basic PMV is much
more stable, its peaks are nearly unresolved. At the other extreme is PFBLP
which has the sharp, well-resolved peaks but a large number of sidelobes up to
-8dB from maximum.
72
Spectrum in dB
Figure 5.2.
Conventional maximum entropy method estimate. S/N 3 dB, 64

samples, 20 20 covariance matrix. ( 1992 IEEE. Used with permission. R. L. Kirlin, New Maximum Entropy Spectrum Using Uncertain
Eigenstructure Constraints, IEEE Trans. on Aerospace and Electronic
Systems, vol. 28, no. 1, January 1992.)
Of particular interest is the comparison between PEMV with 1 (PMV )

and PNME in Figures 5.3 and 5.5, respectively. PNME with enhancement factor
1 shows improvement over PMV. However, the methods are identical for
, where they provide the best resolution and sharpest peaks, equivalent to
PMU.
5.2.9.1 Comparison with Minimum Norm
At various S/N and enhancement values for new ME (NME), PNME and
PMN were compared by simulations as above, but using 128 points of data and
multiple runs with independent noise records on each run. We have concentrated on PMN because of accepted superiority over other methods with regard
to threshold of resolution, as defined and documented in Kaveh and Borabell,
1986. Ten example runs at S/N 3 dB are shown for PMN in Figure 5.7a and
for PNME with 1 in Figure 5.8. The resolution advantage of PMN is evident, but so also are its sidelobes; 1 is used here for PNME.
73
Spectrum in dB
Figure 5.3.
Minimum Variance estimate. S/N 3 dB, 64 samples, 20 20 covariance matrix. ( 1992 IEEE, Used with permission. R. L. Kirlin, New
Maximum Entropy Spectrum Using Uncertain Eigenstructure Constraints, IEEE Trans. on Aerospace and Electronic Systems, vol. 28, no.
1, January 1992.)
Two additional comparisons to MN have been made. In the first, the algorithms were each told that D 1 when, in fact, D 2. At S/N 10 dB and
1 for NME, 10 runs of each algorithm gave a single peak each time.
However, when the two algorithms were each told that D 3 (3 signals
present) when in fact D 2, 10 runs of each algorithm gave the plots in
Figures 5.9 and 5.9. The greater tendency for instability is clearly shown in the
PMN plots. This kind of error (false alarm) is obviously minimized by incorporating the model uncertainty offered by PMNE. If nothing else, this example
shows that more study should be done before designing specifications for miss
and false alarm probabilities.
5.3
Conclusions
It has been shown that all modern spectral estimators are of the form given
in equations (5.62a) and (5.62b), a subset of which is given by equations
(5.34a) and (5.34b). These estimators have been derived under various criteria
74
Spectrum in dB
Figure 5.4.
Enhanced minimum variance estimate, 100. S/N 3 dB, 64 samples, 20 20 covariance matrix; essentially equivalent to PNME with
100 and PMU. ( 1992 IEEE. Used with permission. R. L. Kirlin, New
1, January 1992.)
such as mmse linear prediction, minimum variance distortionless response,

constrained maximum entropy, maximum likelihood, idealized signal-subspacenoise-subspace partitioning, and minimum norm.
The general forms of equations (5.34a) and (5.34b) and equations (5.62a)
and (5.62b) have been derived via ME, but with constraints requiring the
spectrum to have given expected-square or root-expected square projections
onto the eigenvectors or data modes. It is conjectured that these constraints
may be more appropriate in general than the autocorrelation lag constraints of
conventional ME. I suggest this may be true since the estimated eigenvalues
have standard deviations which decrease with their rank and sample size and
are proportional to the true eigenvalue, whereas estimated autocorrelation values have errors which increase with lag number and may be of large proportion when the true correlation is near zero. Further, by taking into account the
75
Spectrum in dB
Figure 5.5.
New maximum entropy method, PNME from equation (5.61) with 1.

S/N 3 dB, 64 samples, 20 20 covariance matrix.( 1992 IEEE, Used
with permission. R. L. Kirlin, New Maximum Entropy Spectrum Using
Uncertain Eigenstructure Constraints, IEEE Trans. on Aerospace and
Electronic Systems, vol. 28, no. 1, January 1992.)
uncertainty of sample eigenstructure, the resulting spectral estimates are not

overly confident of the noise-subspacesignal-subspace partition.
Questions of proper tradeoff between false alarm and miss probabilities
remain unanswered for estimators of this general type. However, the reader is
referred to Li et al. (1993) for an approach to performance analysis in the case
of small sample size.
It has been shown repeatedly in the literature (for example, Li et al., 1993)
that PMU is minimum variance for estimating frequencies or DOAs (equivalent to velocities of plane seismic wavefronts). However, PMN yields better resolution and has a lower threshold. The drawback of MN is its instability or
false alarm rate. For general spectral estimation, PMV is a reasonable choice
because it gives a true power estimate and makes no assumptions about sinusoids or finite number of planewave sources.
76
Spectrum in dB
Figure 5.6.
Modified FBLP method. S/N 3 dB, 64 samples, 20 20 covariance

matrix. ( 1992 IEEE. Used with permission. R. L. Kirlin, New Maximum Entropy Spectrum Using Uncertain Eigenstructure Constraints,
IEEE Trans. on Aerospace and Electronic Systems, vol. 28, no. 1, January
1992.)
One other popular spectral estimator is known as ESPRIT. This will be

discussed and referenced in Chapter 7. That chapter derives applications of
MUSIC, MN, and ESPRIT to single reflection rms seismic velocity and twoway zero-offset traveltime; performance analyses are included.
5.4
References
Burg, J. P., 1975, Maximum entropy spectral analysis: Ph.D. dissertation,

Stanford University..
Capon, J., 1969, High resolution frequency wavenumbers spectrum analysis:
Proc. IEEE, 57, 1408-1418.
Haykin, S., 1991, Adaptive filter theory: Prentice-Hall, Inc..
Johnson, D. H., 1987, The application of spectral estimation to bearing estimation problems: Proc. IEEE, 70, 1018-1028.
77
Spectrum in dB
Figure 5.7.
Ten Runs using minimum-norm S/N 3 dB, M 20, n 64, f1

.225, f2 .25. ( 1992 IEEE. Used with permission. R. L. Kirlin, New
1, January 1992.)
Kaveh, M., and Barabell, A. J., 1986, The statistical performance of the
MUSIC and minimum-norm algorithms in resolving plane waves in
noise: IEEE Trans. Acous., Speech and Sig. Proc., 34, 331-341.
Key, S. C., Kirlin, R. L., and Smithson, S. B., 1987, Seismic velocity analysis
using maximum likelihood weighted eigenvalue ratios: 57th Ann.
Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 461-464.
Kirlin, R. L., 1992, New maximum entropy spectrum using uncertain eigenstructure constraints: IEEE Trans. Aerosp. Elect. Systems, 28, 2-14.
Kumaresan, R., and Tufts, D. W., 1983, Estimating the angles of arrival of
multiple plane waves: IEEE Trans. Aerosp. Elect. Systems, 19, 134139.
Li, F., Liu, H., and Vaccaro, R. J. , 1993, Performance analysis for DOA estimation algorithms, further unification, simplification and observations: IEEE Trans. Aerosp. and Elect. Systems, 29.
78
Figure 5.8.
Ten Runs using new ME S/N 3 dB, M 20, n 64, f1 .225, f2

.25, 1. ( 1992 IEEE. Used with permission. R. L. Kirlin,
New Maximum Entropy Spectrum Using Uncertain Eigenstructure
Constraints, IEEE Trans. on Aerospace and Electronic Systems,
vol. 28, no. 1, January 1992.)
Marple, S. L., Jr., 1987, Digital spectral analysis with application: PrenticeHall, Inc.
Mars, J., Glangeaud, F. Lacoume, J. L., Fourmann, J. M. and Spitz, S., 1987,
Separation of seismic waves: 57th Ann. Internat. Mtg. , Soc. Expl.
Geophys., Expanded Abstracts, 489-492.
Oppenheim, A. V., and Schafer, R. W. , 1975, Digital signal processing: Prentice-Hall, Inc.
Owsley, 1985, Sonar array processing, in Haykin, S., Ed., 1985, Array signal
processing: Prentice Hall, Inc.
Papoulis, A., 1981, Maximum entropy and spectral estimation: A review:
IEEE Trans. Acoust., Speech, and Sig. Proc., 29, 1176-1186.
79
Figure 5.9.
Ten Runs for minimum-norm, S/N 10 dB, M 20, n 64, f1

.225, f2 .25. ( 1992 IEEE. Used with permission. R. L. Kirlin, New
Maximum Entropy Spectrum Using Uncertain Eigenstructure Constraints, IEEE Trans. on Aerospace and Electronic Systems, vol. 28,
no. 1, January 1992.)
Rao, B. D., and Hari, K. V. S., 1989, Performance analysis of root MUSIC:
IEEE Trans. Acoust. Speech and Sig. Proc., 37, 1789-1794.
Shirley, T. E., Laster, S. J., and Meek, R. A., 1987, Assessment of modern
spectral analysis methods to improve wavenumber resolution of f-k
spectra: 57th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded
Abstracts, 607-609.
Wax, M., Shan, T. , and Kailath, T., 1984, Spatiotemporal spectral analysis by
eigenstructure methods: IEEE Trans. Acous., Speech, and Sig. Proc.,
32, 817-827.
80
Figure 5.10.
Ten Runs for new ME, S/N 10 dB, M 20, n 64, f1 .225, f2
.25 1. ( 1992 IEEE. Used with permission. R. L. Kirlin, New Maximum Entropy Spectrum Using Uncertain Eigenstructure Constraints,
IEEE Trans. on Aerospace and Electronic Systems, vol. 28, no. 1, January
1992.)
81
Chapter 6
Root-Mean-Square Velocity Estimation
R. Lynn Kirlin
6.1
Introduction
The covariance analysis approach to directions of plane-wave arrival or

multiple sinusoid frequency estimation have already been detailed in
Chapter 5. In Chapter 7 Fu Li and Hui Liu derive optimal subspace estimators of two-way vertical traveltime and rms velocity for single wavefronts.
Because hyperbolic wavefronts by definition are not planar and wavelets are
not temporally narrowband, direct applications of high-resolution subspace
algorithms often do not work well. In this chapter I will give some of the
details and trade-offs in the estimation of rms velocity when multiple broadband wavefronts are present in the analysis window. I will also present two
enhancements of conventional semblance. The first is a low-rank or signal
subspace version of semblance, and the second, the multiple sidelobe canceller, is an adaptive interference cancelling method that presents to the semblance algorithm a best estimate of interference-free wavefronts. These
enhancements can substantially improve resolution and coherence estimation.
6.2
Multiple Wavefront Model
In this section, I will present aspects of curved wavefronts as they relate to

subspace analysis of covariance matrices and provide the background necessary
for understanding the methods that can deal with multiple, curved, broadband wavefronts.
In Section 4.3, I presented a model of multiple narrowband plane-wave
signals arriving at a linear, equally spaced geophone array. The time-slice data
vector
83
x ( i ) Hs ( i ) n ( i ) ,
(6.1)
where H has K delay-vector columns

j 2j k k
j ( M 1 ) k k T
...e
) , k 1, 2, ..., K, indexing the dish k ( 1e k k e
tinct wavefronts, and n(i) is the independent additive noise vector, white in
both space and time. Section 4.4 shows that the delays k associated with any
hyperbolic wavefront must also be indexed with m, the offset (phone) index,
because the delay of the kth wavefront is not constant between equispaced sensors. Instead, the approximation to delay versus offset is
2
mk ( T 0 ( m ) /V k ) T ok ,
(6.2)
wherein Tok is the vertical two-way traveltime, is phone spacing, and Vk is

the rms velocity for the kth wavefront.
In order to utilize the narrowband, time-stationary model of equation
(6.1), we might use the parabolic approximation to mk so that time slices of a
single wavefront can still be modeled as hksk(i), implying that hk gives the
same relative delays at any time index i. Thus in the analysis window we may
often approximate
2
mk ( m ) / ( 2T 0V 0) p ( m ) ,
(6.3)
where ok is taken to be the temporal coordinate at the center of the analysis

window. We assume that these relative delays hold throughout the time gate of
the analysis region. The variable p describes a parabolic delay curve.
Unfortunately, the delay between sensors is still not spatially stationary,
except in the case where we have exactly preflattened the wavefront so that
mk 0 for all m.
In addition, the narrowband assumption does not do well either unless the
phase in the exponential delay factor exp (jkmk) is approximately constant
for all frequencies in the band of the kth wavefront. Noting that for w k
k, and exp (jkmk) cos kmk j sin kmk, we find that the rate of
change of this factor with respect to k is d/d wk exp (jmk) mk (sin
mk j cos mk) jmk for small mk. Thus, if we have small enough
time delay mk, there is little error in the narrowband assumptions.
84
The concept of narrowbandedness in array processing is generally defined

in terms of delay. If the delay of the wavefront from one end of the array to the
other is small with respect to the temporal correlation time, then the wavefront is deemed narrowband. Clearly in Figure 6.1 the wavefront cannot be
considered narrowband in this context. Therefore, only flat or nearly flattened
wavefronts can utilize the narrowband model.
Figure 6.1.
Two wavefronts at 1900 ms, 10 dB S/N, v1 9000 ft/s (2700 m/s), v2

12 000 ft/s (3600 m/s). Sensors spaced at 200 ft (60m).
We have shown (Kirlin, 1991) that wavefronts, not meeting the narrowband and stationarity assumptions, lose energy into dimensions of the space
other than the ideal rank-one dimension of the plane wave. This energy
appears as colored noise, adding magnitude to the near-diagonal elements of
the covariance matrix.
The next assumption that must be overcome is that of independent signals
sk(i). If, for example, there are two independent signals, then the covariance
matrix
85
2
0
P s E { ss } 1
2
0 2
H
(6.4)
has rank two, and so does

H
E { xx } N HP s H ,
(6.5)
where N n I is the noise covariance matrix. However, if s1 and s2 are fully

2 2
2
*
correlated, then Ps(1, 2) 12 1 P s ( 2, 1 ), P s ( 2, 2 ) 1 , and
Ps has rank 1, not 2. Thus, HPsH will be rank 1. In general, if any two wavefronts have perfect correlation, HPsHH will have rank less than the number of
wavefronts. With finite time gates on the analysis window, it is almost impossible for two wavefronts, even carrying identical wavelets, to be perfectly correlated (Kirlin, 1991). However, even high correlation, combined with other
imperfections is still detrimental.
Assuming f 30 Hz, the direct application of multiple signal calcuations
(MUSIC) to the data in Figure 6.1 results in the MUSIC spectrum shown in
Figure 6.2. The multiple peaks are due to the broadbandedness of the signals,
as if there were many narrowband wavefronts. Another interpretation is that a
rank of HPsH is greater than two. In fact there are five eigenvalues of Cx
E{xxH} that are within 10% of the largest, and others drop off gradually in
magnitude.
Improvement is effected if we preflatten the data to an approximate velocity Va 10 000 ft/s (3000 m/s). Then wavefronts with velocities V will exhibit
delays equal to
x ( V )
T 0 x /V T 0 x /V a .
The direct application of MUSIC using f 30 Hz to the preflattened

data gives a much improved (but biased) result in Figure 6.3. For two wavefronts with velocities 9500 ft/s(2900 m/s) and 10 500 ft/s (3200 m/s), the
spectral result after flattening is shown in Figure 6.4. I point out that there are
two large eigenvalues of the covariance matrix of these flattened data, 7.8 and
1.5; others are smaller than 0.6. Preflattening has clearly modified the covariance matrix of this broadband data to better fit the model of equation (6.1).
86
Figure 6.2.
Direct MUSIC analysis of data in Figure 6.1, using Ricker spectral peak as
frequency. Peaks should be at 9000 ft/s (2700 m/s) and 12 000 ft/s
(3600 m/s).
We have found that linearly constrained minimum variance (LCMV or

MVDR, see Chapter 5) performs similarly, but takes a little longer than
MUSIC, 48 s versus 30 s.
Clearly, direct application of narrowband high-resolution algorithms is
not adequate without preflattening. Otherwise we are led to broadband methods or alternatives.
6.3
Frequency Focusing and Spatial Smoothing
A number of methods have been developed to deal with broadband data

and multiple wavefronts with strong temporal correlation; first, the broadband problem.
87
Figure 6.3.
MUSIC analysis of data in Figure 6.1 after preflattening using v

10 000 ft/s (3000 m/s). Peaks should be at 9000 ft/s (2700 m/s) and
12 000 ft/s (3600 m/s).
Clearly, each Fourier frequency coefficient of a signal represents a narrowband component. This leads to consideration of frequency shifting each component to a reference or central frequency, coherently combining those and
then applying the narrowband algorithm. This procedure, called frequency
focusing, yields some success when bandwidths are not too great and when the
rms velocities are approximately the same for all wavefronts in the window.
To begin this method, the nth Fourier coefficients Fni are found for each
(ith) trace; i.e.,
88
Figure 6.4.
MUSIC applied to wavefronts as in Figure 6.1, but with velocities 9500

ft/s (2900 m/s) and 10 500 ft/s (3200 m/s).
xi ( t )
ni
F ni e
jnt
(6.6)
Across L traces, a vector of the nth coefficients is written

T
y n ( F n1, F n2, ..., F nL ) .
(6.7)
A covariance estimate of the nth coefficients in traces i and j is given by
89
Cn K
1
ynk yHnk ,
(6.8)
k1
assuming K samples of ynk, k 1,2, ..., K are available. This procedure is

to be used when data are temporally stationary. However, in seismic processing, the window length is temporally short and only one sample, K 1, of yn
can be obtained if L M (i.e., if the length of yn is the same as the number of
traces).
In order to get some statistical stability in Cn, L might be chosen smaller
than M, and, assuming spatial stationarity a window sliding across traces will
give K M L 1 independent samples of yn. However the i, jth covariance
element for yn at shift position w w1 (the length L window shifted to the
th
w 1 trace) is not the same as the i, jth at shift position w w2. That is, interphone delays are not spatially stationary. Let yn(w) denote the nth coefficient
vector with reference phase at trace w. In order for the vector outer product
H
yn(w1) y n (w1) to have the same elements as vector outer product
yn(w2)yn(w2), we will need a transformation Uw1 w2, such that
H
E { U w1 w2 y n ( w 2 )y n ( w 2 )U w1 w2 } E { y n ( w 1 )y n ( w 1 ) } .
Basically, this transformation will remove the delay factors on yn(w2) and
replace them with the approximate delay factors of yn(w1). To estimate the
delays, we might use the parabolic approximation
2
w ( w ) p ,
(6.9)
where
2
p 1/ ( 2T 0 V ) .
(6.10)
V is the trial velocity and T0 is the time center of the analysis window. We
might also use the hyperbolic expression
2 1/2
w ( T 0 ( w/V ) )
90
(6.11)
Use of p rather than equation (6.11) allows precalculation of w for all T0;
if a wavefront has p p T0 in a window positioned at T0 s then the corresponding V is extracted by rearranging equation (6.10).
In either case we need to choose a reference sensor for the analysis region.
Suppose we let the first trace in the most offset window (of length L) be the
time delay reference. Then the index of this trace is M L 1. The window whose first element is located on trace w would then yield transformation
element
U M L 1, w ( i ) exp[-j2f n ( w i 1 M L i )]
(6.12)
for the ith diagonal element Uw(i,i) UM L 1,w . The elements of Uw are
spatially dependent. If we have not already preflattened the data, we can
choose Va or pa to be a central approximation of all possible parameters. Then
the phase shifts in equation (6.12) are approximately correct for fn, and the
sample vectors, transformed to the spatial reference, will give a good coherent estimate of the vector yn (M L 1) at the reference location.
On the other hand, if we have already preflattened to Va in the time
domain, we have already effected this approximate phase shift at all frequencies and do not need to use Uw at all. Note, however, that in consideration of
this fact and by observation of the two wavefronts in Figure 6.1, it should be
clear that neither flattening nor spatial smoothing with a single approximate V
will yield the desired approximation yn (w) yn (M L 1) when there are
two distinct wavefronts present. Spatial smoothing of the sum of any two or
more wavefronts with distinct velocities is not effective because of nonspatial
stationarity. (Spatial smoothing is still appropriate for any wavefront which
has been exactly flattened, however, and subsequently I will do this.) We are
left with only frequency focusing (Wang and Kaveh, 1985) to achieve multiple samples of yn.
th
We would like to use y n2 as an estimate of y n0 , the vector of Fourier n 0

coefficients corresponding to frequency f f0 n/T, where T is the temporal
length of the analysis window. The transformation that will effect this is
w/T ) ] ,
T n diag [ exp ( 2j ( n n 0 )T
(6.13)
w is the estimated relative delay at the wth sensor. The smoothed covawhere T
riance matrix is
91
n0
C
n Tn yn yHn THn ,
(6.14)
and the summation indexes over all frequencies of sufficiently high S/N.
Weights of each Tnyn according to S/N are also proposed. The true w can only
be approximated using an approximate velocity representative of the range to
be searched.
Obviously, this method will not work well over both a broad range of frequencies and a broad range of velocities, because the addition of the covariance at each frequency is intended to be coherent.
We have experimented with both spatial and frequency smoothing. Spatial
smoothing is clearly not appropriate for single covariance matrix analysis of
multiple curved wavefronts. Frequency smoothing has been moderately effective, but the short durations of seismic signals make time-domain methods
more attractive.
Nevertheless, I want to draw attention to a recent publication which
addresses the broadband problem very well, even though the method is
directed to plane-wave analysis. Allam and Moghaddamjoo (1994) have introduced a frequency-domain remapping method, projecting (proportioning)
spatial frequencies used at each temporal frequency out to those spatial frequencies at the reference temporal frequency. The projection is based on the
linear relationship between spatial and temporal frequency content of a planewave. Results are impressive for plane waves and the methodology might possibly be extended to hyperbolic wavefronts.
Each temporal frequency provides a vector of spatial frequency coefficients
which can then contribute coherently to a sample covariance matrix.
6.4
Discussion
One potential advantage of eigenstructure analysis is the ability to analyze

just one covariance matrix to resolve multiple velocities. Many seismic analysis
situations fit this method. The velocities need to be narrow in range so that
flattening, time gating, and then either time or frequency domain analysis can
be applied.
When velocities in an analysis region are not similar, other methods
should first be used to remove those wavefronts for which high-resolution
analysis is not needed. With semblance and the algorithms I will describe
92
next, preflattening is done for each trial velocity. If a wavefront has been flattened exactly, a number of beamforming and interference-canceling planewave detection and parameter estimation algorithms apply to that wavefront.
Preflattening also allows spatial smoothing for improved covariance matrix
estimation. Other multiple wavefronts in the analysis window will not be planar, but appear as high- (or multi-) dimensional coherent interferences. Some
degradation from the ideal is seen.
6.5
Comparison of MUSIC with Semblance
Now we compare the structure of the conventional semblance coefficient

to that of the high-resolution estimators. Both approaches utilize the data
covariance matrix. Examples will demonstrate applications in high and low
S/N.
Figure 6.5 shows that the semblance coefficient may be written
k N/2
Sc
x ( j, i )
j k N/2 i 1
--------k--
----N/2
-------------------------------.
M
j k N/2 i 1
(6.15)
x ( j, i ) 2
However, in terms of the covariance matrix Ck of the time slice vectors

around the kth time sample,
T
S c 1 C k 1/MTr [ C k ]
(6.16)
where 1 (1 1 ... 1)T, and Tr[ ] indicates matrix trace.

If we factor Ck into its eigenstructure VVH, where the columns of V are
eigenvectors and (sn ) is the diagonal matrix of ordered eigenvalues,
then partitioning V, M M, into signal subspace eigenvectors Vs, M p, and
orthogonal subspace eigenvectors Vn, M (M p), results in
T
1 ( V V ) 1 1 ( Vn n Vn ) 1
S c = -----------s-----s-----s---------M--------------------------------,
M
m1
93
m
(6.17)
Figure 6.5.
Assumed data sampling scheme, centered on trial wavefront curvature

(trial rms velocity) intersecting time index j k, and offset index i 0.
where s is the diagonal matrix of p signal subspace eigenvalues,

2
n n I M p , and we have used Tr[ ] Tr[Ck] m.
Because semblance ranges between 0 and 1, it is not directly comparable
to MUSIC for example, but the second term in the numerator is exactly the
denominator function of MUSIC (Section 5.3) when the MUSIC trial delay
vector e 1. If there is only one wavefront in the window, Vs has only one
column. When the correct velocity is chosen, Ck gives Vs 1/ M , and Sc
(M 1 0)/(M m). This shows that Sc is ideally less than unity by the factor
M
1/ 1 m / 1 .
m2
94
It can also be shown that, in white noise, 1 M s n if the signal

2
2
wavefront is temporally stationary with variance s , so with m n ,
m 2, Sc is less than unity by the factor
M 1 1
--------------------S c 1 M S/N 1- .
For low S/N, this factor approaches 1/M.

To compare semblance with MUSIC, which approaches infinity at high
S/N when the correct velocity is selected, we can examine
S c ( 1 S c )
1
(6.18)
which approaches infinity as Sc approaches one. We have used S ' c for the data
of Figure 6.6, where S/N 5 dB and two wavefronts have velocities 9500 ft/s
(2900 m/s) and 10 500 ft/s (3200 m/s). The semblance analysis of Figure 6.7
should be compared to the MUSIC analysis of Figure 6.8 for data flattened to
the nominal velocity of 10 000 ft/s (3000m/s). Both spectra have been normalized by their peak values. MUSIC uses f 30 Hz.
We can see that both methods have resolved the two wavefronts, MUSIC
somewhat better than semblance. Computation time using MATLAB on a
Sparcstation SLC is 200 s for semblance and 30 s for MUSIC, including the
single flattening process. However, semblance is more accurate. Both algorithms used increments of 50 ft/s(15 m/s) in searching velocity, or 201 trials.
It is clear that even though MUSIC requires eigenstructure analysis, doing
this once is much more efficient than forming 201 covariance matrices, which
in effect is required by semblance.
The potential utility of broadband MUSIC is shown in Figure 6.9, by a
second application of MUSIC to the data if Figure 6.6, but center frequency is
input to be 60 Hz. Note that the bias has essentially disappeared but resolution has decreased. The potential for application of wideband MUSIC with
frequency focussing is evident, however we have not found it to be useful for
our test cases.
6.6
Keys Algorithm
If all the information with regard to the wavefronts and noise is in the
covariance matrix, perhaps the eigenvalues themselves give adequate informa95
Figure 6.6.
Two wavefronts at 9500 (2900) and 10 500 ft/s(3200), respectively.

S/N 5 dB.
tion for some purposes. Key (Key and Smithson, 1990) developed an algorithm which windows the data per trial velocities and steps along in time like
semblance. The covariance matrix for each trial velocity is analyzed for eigenvalues.
Assuming only one wavefront exists in the analysis window, then, ideally,
2
2
2
1 M s n m n for M 1. Thus, a coherence measure is
J k ( 1 )/
2
M s / n
(6.19)
where
( Tr ( C ) 1 )/ ( M 1 ) ,
96
(6.20)
Figure 6.7.
Semblance analysis of the data in Figure 6.6. MATLAB time 200 s;

(1 semblance coefficient)-1 at T0 1.850 s.
is a proportionality factor, and is the average of the eigenvalues other than

the largest. Thus, dk is proportional to S/N. (Key sets equal to the likelihood
2
ratio testing 2 M n 1 ). For comparison, the semblance
coefficient upon exact flattening gives the result:
1
-----------------------------------.
S c M
1 ( M 1 )
(6.21)
Because both 1 and are estimates of the true values, their difference in
the numerator of JK gives rise to more variability than that for alone. Further, when both 1 and are inappropriately small, the ratio can be inappro2
priately large. Thus, the bias of
n in the numerator of
2
2
S c ( 1 M
s
n ) is traded for the variance in the numerator of
2
J K ( 1 s ) . The denominators of the two measures are similar except
97
Figure 6.8.
MUSIC analysis of f 30 Hz, of the data in Figure 6.6 preflattened at

v0 10 000 ft/s (3000 m/s). MATLAB time 30 s. Note the bias resulting
in a shift of the estimate away from the time value; see Figure 6.9.
that Sc contains signal power as well as noise power to normalize the coherence
measure.
Results of Keys algorithm for the data in Figure 6.6 is shown in
Figure 6.10. These computations required 724 s.
6.7
A Subspace Semblance Coefficient
Because semblance is the conventional coherency measure, we have constructed an equivalent based on the estimate of the signal subspace (Kirlin,
1992). This is somewhat of a compromise between semblance, which can
never go to zero when noise is present (numerator 1), and Keys algorithm,
which uses subspace ideas but is quite unstable and is not normalized.
We derive that the coherence measure
98
Figure 6.9.
MUSIC analysis of f 60 Hz, of data in Figure 6.6. Note the accuracy has
improved over use of f 30 Hz, but resolution has degraded, and baseline
has raised.
T
SK 1 1 Vn Vn 1
T
( 1 v 1 1 )/ ( M 1 )
(6.22)
or its MUSIC-like version

S K 1/ ( 1 S K )
(6.23)
is an estimate of how well the signal subspace has been flattened. If there is
only one signal present v1 1/ M , and with no noise SK 1. However,
with no signal at all, v1 is a randomly oriented vector; the average power of the
99
Figure 6.10.
Keys algorithm for the data of Figure 6.6. T0 1.850 s.

2
vector 1 in the direction of v1 equals 1/Mth of 1 , or 1. Thus the average nosignal value of SK is zero, and Sk may go negative.
A plot of the spectrum of S K for the data of Figure 6.6 at t 1850 ms is
shown in Figure 6.11. Comparing Keys algorithm with semblance Sc in
Figure 6.7, note the accuracy of semblance is preserved while the background
level is lowered and resolution is apparently enhanced.
This algorithm seems to work quite well. Run time is comparable to semblance and it is time efficient if a special routine is used that finds only the first
eigenvalue. However, solving for all 32 eigenvectors in this problem used
703 s. (Compare to 210 s for semblance and 30 s for preflattened MUSIC.)
The satisfactory outcome of subspace semblance has led to the use of a
similar algorithm which I describe in the following section. Subsurface semblance calculates coherence assuming only one wavefront is present. The next
algorithm optimally and adaptively cancels any wavefronts which may be
present before computing a coherence measure.
100
Figure 6.11.
6.8
Subspace semblance coherency measure of equation (6.22) for data of

Figure 6.6.
Multiple Sidelobe Canceler
The multiple sidelobe canceler (MSC) is extracted from the beam forming
literature. Conventional semblance is like conventional beam forming; one
must apply a delay at each phone so as to align the wavefront to make it
appear if it were a broadside plane-wave. This allows coherent stacking or
addition of the M time-shifted traces, thereby reducing random noise power
by M. However, when another wavefront is present, its signal would interfere
with the ideal stacking process.
For example, in the previous section two wavefronts were present. This
causes the first eigenvector to be different from 1/ M , no matter what the
trial velocity, since the first eigenvector is that linear combination of the two
direction (delay) vectors which gives greatest temporal variance of the sum.
With window processing, we are relying somewhat on the time gate to reduce
the energy of the nonflattened wave and its correlation with the flattened one.
101
The multiple sidelobe canceler seeks to subtract any wavefronts from the
data which do not have the trial velocity used for flattening. If this is possible,
a more accurate stack is effected. The first step is to apply the flattening delays
per the trial velocity. If x is an unflattened data vector from the analysis window, then we let Dx represent the flattened data. The usual stack is ym
1TDx. The second step is to remove the data at the flattening velocity. Our
approximation of the residual, the auxiliary or interference reference for noise
canceling, is
x a ( I D )x .
(6.24)
However, over any finite time, for any sources not in nulls of the (I D)
beam, (that is, with any finite length array) ym and xa are correlated. Therefore,
we use minimum-mean-squared-error criteria to subtract the optimal linear
combination of the xa elements from ym. That is, we find wa such that
2
E { ym wa xa }
is minimized. The solution for wa is
w a R a# r ma ,
(6.25)
where r ma E { y m x a } and R z is the pseudo inverse of R a E { x a x a } .

We must use pseudo-inverse because Ra has rank at most M 1, since we
have removed data from the dimension in the direction of 1. If Ra UUH
represents the eigenstructure form, then the diagonal matrix L has at least one
zero. Associating U1 and 1 with the p nonzero eigenvalues gives
#
1
Ra U1 1 U1 ,
where 1 is p p and U1 is M p.
The interference-canceled main beam or waveform estimator is
102
(6.26)
y m y ( R # r ) T x
m
a ma
a
T
[ 1 ( R a r ma ) ( I D ) ]x
(6.27)
The expectations for Ra and rma must be estimated with the data in the
analysis window. Let the window be N M. Writing xn for the nth time slice
of x, the nth ym is
T
y n 1 x n /M ,
(6.28)
the sample rma is

N
r ma N 1
N1 ( I 1 1T /M ) xn yn
(6.29)
n1
and the sample Ra is

N
a N
R
1
( I 1 1 /M )
xn xHn ( I 1 1T /M ) .
(6.30)
n1
Expression (6.27) is an estimate of the interference-free flattened waveform. A comparison to semblance is possible using the expression
N
S MSC
y n
n 1
---N-----------------------------------------------------
x n diagrma R a x an
(6.31)
n1
where y n is y m at time n, and xan is xa at time n. A comparison to MUSIC is

effected by plotting (1 SMSC)1.
An example of the MSC is given in Figure 6.12, an application to the data
of Figure 6.6. Good resolution has been achieved, although for this coherence
103
measure, equation (6.30), I utilized spatial smoothing, averaging 13 diagonal

submatrices of size 20 from the otherwise 32 32 covariance matrix. The
time of computation was 2090 s. The MATLAB pseudo-inverse utilizes singu#
lar-value decomposition. A low-rank pseudo-inverse might be used in R a of
equation (6.26). This can reduce the variability in ym but may add bias.
Figure 6.12.
(1 SMSC)1 for the data of Figure 6.6. Spatial smoothing was used to enhance a 20 20 covariance matrix. Time gate is 100 ms.
For this example, as for the other algorithms, a single window length of
approximately 100 ms around 1850 ms has been applied. Shorter windows
can also be used; these may result in higher S/N estimates of coherence at the
time when wavefront peaks are present, but more variable estimates are
expected at other times. A 20-ms window has been used for the result in
Figure 6.13.
104
velocity, ft/s
Figure 6.13.
6.9
Same data and coherence algorithm as in Figure 6.12, but a 20 ms time

gate is used. Note the greater variability. 1/(1 SMSC) for a 20 ms processing window.
Summary of Coherence Detection and Velocity Estimation
The planar narrowband wave model, so prevalent in the sonar and radar
literature, has been shown to have application, but also limitations, in the estimation of seismic wavefronts rms velocities. In particular, the MUSIC algorithm has been shown to be quite fast and to yield high-resolution for two
wavefronts with close velocities; however, it is necessary that the data be preflattened to the approximate rms velocity, and estimates are likely to be biased.
Iterated flattening, as is practiced with conventional semblance, is recommended in all coherence detection and estimation methods. Preflattening can
be said variously to make the wavefronts planar and narrowband for array
processing purposes, to reduce rank in the covariance matrix, or to increase
stability of estimators.
105
MUSIC with preflattening, MVDR with preflattening, low-rank semblance or the multiple sidelobe canceler all achieve good results, improving on
conventional semblance in one aspect or another. Certainly there are other
variations that we have not discussed, although we have suggested, for example, that a low-rank pseudo-inverse might be used in the MSC.
Our MATLAB computation times for the algorithms of this chapter are
shown in Table 6.1. Note that substantial time costs could be greatly reduced
in several cases with more efficient programming, sometimes as simple as only
solving for the first eigenvector and eigenvalue rather than all (32 in these
examples).
Table 6.1
Comparative Computation Time
Seconds
MUSIC
30
MDVR
48
Semblance
210
Keys
724
Subspace Semblance
703
MSC
2090*
Keys could be made much more efficient by

solving for only the first eigenvalue and eigenvector
*Subspace semblance utilizes inefficient pseudoinverse.
Another variation of interest was suggested by Haimovich and Bar-Ness

(1991). In their interference nulling scheme, the eigenstructure of the covariance matrix Ca for xa (I 1 1T/M)x is partitioned into signal V 1 and
orthogonal V subspaces, the signal actually being the interference. Interference correlated components of x are totally removed giving
H
x s ( I V 1 V 1 )x
106
as an approximation to the interference-free signal wavefront estimator. Of

course, xs can then be stacked. Haimovich and Bar-Ness (1991) offer an efficient transformation from x to xs. Their work is recommended for further
insights. The algorithm is optimum in the sense of interference nulling, but it
is not mmse in estimation of the signal, nor is it maximum S/N (they discuss
H
these as well), because V 1 V 1 x also includes true signal components.
Again, comparison of algorithms must be done carefully. Many of the subspace based algorithms commonly display reciprocals of functions; these reciprocals can approach infinity at the solution velocity. Others give estimates of
the signal power (MSC, MVDR, or LCMV). Yet others like semblance are
true coherence measures that can attain unity at maximum. Our displays are
of functions that have had the potential to reach infinity with an accurate estimate and noise-free data. In order to compare these, we have converted
semblance coefficients to allow them to potentially approach infinity via
(1 Sc)1 and have additionally normalized each spectrum with its peak value.
6.10 References
Allam, M., and Moghaddamjoo, A., 1994, Two-dimensional DFT projection
for wideband direction-of-arrival estimation: IEEE Signal Processing
Letters, 1, 35-37.
Haimovich, A. M., and Bar-Ness, Y., 1991, An eigenanalysis interference canceller: IEEE Trans. Signal Processing, 39, 76-84.
Key, S. C., and Smithson, S. D., 1990, New approach to seismic-reflection
event detection and velocity determination: Geophysics, 55, 10571069.
Kirlin, R. L., 1991, A note on the effects of narrowband and stationary signal model assumptions on the covariance matrix of sensor array data
vectors: IEEE Trans. Signal Processing, 39, 503-505.
1992, The relationship between semblance and eigenstructure velocity estimators: Geophysics, 57, 1027-1033.
Wang, H., and Kaveh, M., 1985, Coherent subspace processing for the detection and estimation of angles of arrival of multiple wideband sources:
IEEE Trans. Acous., Speech and Sig. Proc., 33, 823-831.
107
Chapter 7
Subspace-Based Seismic Velocity Analysis
Fu Li and Hui Liu
In this chapter, we present a new approach to simultaneously estimate the
stacking velocity and zero-offset time of seismic wave propagation. This
approach includes the following steps: preprocessing to extract structure,
application of several subspace methods (ESPRIT, MUSIC, and Minimum
Norm) to estimate the time delay at each sensor, and postprocessing to estimate the stacking velocity and zero-offset time. The advantages of this proposed approach are high resolution and less computation.
Wave propagation velocity often reects the properties of the media
through which a wave propagates. Therefore, estimating the stacking velocity
of a seismic wavefront is an important signal processing task in exploratory
seismology. However, because of the special hyperbolic trajectory that often
occurs with seismic wave propagation, the stacking velocity must be estimated
together with two-way normal incidence time (zero-offset time).
Conventionally, people estimate the stacking velocity and zero-offset time
by varying the seismic data window to seek the maxima of some coherency
measure function, for instance, semblance coefcient (Neidell and Taner,
1971), or Keys method (Key and Smithson, 1990). However, this approach
must either estimate velocity and zero-offset time iteratively or plot a twodimensional semblance spectrum over a range of the velocity variable and the
zero-offset time variable. The computational expense associated with varying
the data window is high.
Goldstein and Archuleta (1987) rst applied the MUltiple SIgnal Classication (MUSIC) (Schmidt, 1979) algorithm and the spatial-smoothing technique to estimate the directions of seismic arrivals and later applied MUSIC,
spatial-smoothing, and seismogram alignment to estimate several seismic
109
08Chapter07.indd 1
12/3/09 1:50 PM
parameters (Goldstein and Archuleta, 1991). Their work is based on the

assumption that the signals are narrow band such that the time delay at each
sensor can be approximated by a phase delay. In fact seismic signals are generally wide band (Key and Smithson, 1990).
Kirlin (1992) introduced a formulation of the semblance coefcient in
terms of the data covariance eigenstructure, and made comparisons to several
eigenstructure formulations, such as MUSIC and Minimum Norm (MN)
(Kumaresan and Tufts, 1983) algorithms. The signal subspace is dened by a
direction vector in which each entry has a different delay corresponding to the
wave-propagation delay at each sensor (trace). He also used wide-band signals
in his simulations. His method has a signicant advantage due to the computational efciency of eigenstructure algorithms when the data window is xed
while minimizing a spectral function of the varying explicit velocity parameters. However, if data vectors are taken down trace, the data covariance matrix
generally does not have the low rank structure, and the eigenvector associated
with the largest eigenvalue is not identical to the direction vector, even assuming only one wavefront exists.
We propose a new approach to simultaneously estimate stacking velocity
and zero-offset time of seismic wave propagation based on a different subspace
structure. This approach includes four major steps:
1. preprocess the seismic data including transforming the data into the frequency domain and unweighting the signal response;
2. forming subspaces [on a different dimension from (Kirlin, 1992)];
3. estimate delays using subspace based algorithms such as ESPRIT (Paulraj
et al., 1985), MUSIC, and MN; and
4. estimate stacking velocity and zero-offset time simultaneously from delay
estimates.
Computer simulations indicate a promising performance. In addition to
the advantage of high resolution of subspace processing, this new approach
also requires far less computation, since no eigenvalue decomposition or singular-value decomposition is used.
7.1
Problem Formulation
We assume a uniform line array of M 1 sensors (geophones for land

acquisition or hydrophones for marine acquisition) placed at offset xi
(i 0, 1, ..., M) as in Figure 7.1. The seismic wavefront propagation in a at-
110
08Chapter07.indd 2
12/3/09 1:50 PM
layer cake medium loosely approximates a hyperbolic trajectory (Neidell and

Taner, 1971):
Figure 7.1.
Hyperbolic window to increase S/N.
x2
t 2 T 02 -v--2
(7.1)
where t is the seismic arrival time at offset x, v is the stacking velocity of

seismic propagation, and T0 is the two vertical incidence time at zero-offset.
The signals received at time t are
y ( t ) ( y o ( t )y M ( t ) ).
(7.2)
The signal received at ith sensor is a delayed version of the signal at (i-1)th
sensor yi(t) yi 1(t ). If the delays at all the sensors are chosen with
respect to a common reference, then equation (7.2) can be rewritten as
y ( t ) ( y ( t o )y ( t M ) ).
(7.3)
Further, if the signals in the analysis window (see Figure 7.2) are sampled
at tj for j 0, 1, ..., K, we can form a data matrix (or analysis window as
known in other literature) of dimension (K1) (M1)
111
08Chapter07.indd 3
12/3/09 1:50 PM
y ( to o )
...
..
y ( tK o )
...
y ( to M )
...
Y ( )=
Simplified model for seismic propagation
...
Figure 7.2.
y ( tK M )
(7.4)
where Y(t) is the time-domain data matrix. When y(t) is narrow band (at center frequency c ) signal, a time delay can be generally approximated by a
phase delay as implied in Goldstein and Archuleta (1991) and other DOA
estimation literature
y ( t ) y ( t )e jc .
(7.5)
Then a direction vector can be formed as

a ( e jc o e jc M ) T ,
(7.6)
112
08Chapter07.indd 4
12/3/09 1:50 PM
7.2
Subspace Approach
...
..
y ( K )e jK 1
y ( o )e jo M
...
Y ( )=
y ( o )e jo 1
...
Taking the discrete Fourier transform (DFT) of all the columns in Y(t) of
equation (7.4), we can obtain
...
y ( K )e jK M
,
(7.7)
2
where k k ----------- (for k 0, 1, ..., K), y ( k ) y ( k ) e j k . Y() is
K1
the frequency-domain data matrix, and w is the vector of discrete frequencies.
( jw k ) 1
An element y ( w k )e
is the DFT at wk for the mth trace.
7.2.1
Structure Extraction
Y ( )=
y ( o )
y ( o )e jo
...
y ( o )e jo M
..
..
...
We choose the rst sensor which is at the source location (zero-offset) as

the reference sensor. Then the time delay of the seismic wavefront at this rst
sensor with respect to itself as the reference is zero, i.e., 0 0. The time delay
of the seismic wavefront at this sensor with respect to the source is therefore
T0, which will be estimated later. All the is are now with respect to the phase
of the rst sensor for i 1, ..., M. With this convention, equation (7.7) can
be rewritten as
...
which is used to dene the span of the signal subspace. Also notice that the
narrow band data matrix has low rank as in most of DOA estimation literature. But when y(t) is a wide-band signal, y(t ) cannot be expressed as
y(t)e-j, so that applying a subspace approach directly to time-domain data
does not utilize the subspace structure appropriately. Therefore, we perform
subspace processing in the frequency-domain because the time delays in the
time-domain correspond to phase delay in the frequency-domain.
y ( K )
y ( K )e jK
...
y ( K )e jK M
(7.8)
113
08Chapter07.indd 5
12/3/09 1:50 PM
e j o
------------y ( o )
...
..
...
T ( )=
...
e j K
------------y ( K )
...
(7.9)
and multiply T() by Y() to obtain
...
T ( )Y ( )=
...
e jo 1
..
.
..
e jK 1
...
e jo M
.
e jK M
(7.10)
The rst column in T()Y() does not contain any useful information.
So we dene a matrix A which has all the columns of T()Y() except the
rst column
...
..
A
e
j K 1
e jo M
...
e jo 1
...
Now dene T() as
...
j K M
(7.11)
Generally, most of the signal energy in the frequency domain is distributed within a certain bandwidth and possesses conjugate symmetry with
respect to zero frequency. Assume the range of this energy band is (P,Q) with
0
P < Q
K, and denote
2
------ .
d -K----
1
Then we can write A, a reduced version of A , as
114
08Chapter07.indd 6
12/3/09 1:50 PM
e jPd M
e j ( P 1 )d 1
...
..
.
e j ( P 1 )d M
...
...
e j ( Q 1 )d M
e j ( Q 1 )d 1
e jQd 1
...
...
...
A=
e jPd 1
e jQd M
(7.12)
It is obvious now that A has Vandermonde structure which provides great

advantages in applying structural techniques. With this Vandermonde matrix
A, we now can estimate the desired parameters.
7.2.2
Estimation of Time Delays of a Seismic Wavefront
In almost all subspace-based processing algorithms for narrow band signals, the signal and orthogonal subspaces are determined by either a singularvalue decomposition of the data matrix or an eigenvalue decomposition of the
data covariance matrix. This is because the data matrix, and thus the covariance matrix, are rank decient in the noise-free case. However, in the seismic
data representation, the data matrix, and thus the covariance matrix, if used,
are of full rank due to wide bandwidth. That is, the (Q P 1) M)
(assuming Q P 1 M) matrix A() has rank M. The advantage of this
fact is that we do not have to perform those computationally expensive
decompositions. Instead, we determine the subspaces by
s A ( A H A ) 1 A H
and
o I s
(7.13)
where s and o, both with the dimension of (Q P 1) (Q P 1) ,

are projection matrices onto the signal and orthogonal subspaces, respectively,
and s has a rank of M and o has Q P M 1. The important properties associated with signal and orthogonal subspaces are
s A A
and
o A 0.
(7.14)
With s and o obtained in equation (7.14), we can perform all the high-resolution subspace algorithms:
115
08Chapter07.indd 7
12/3/09 1:50 PM
7.2.2.1 MUSIC
We form a delay spectrum function
P ( ) a ( ) H o a ( )
(7.15)
where a()H = ( e jPd e jQd ) is a steering vector. Then we search to nd

P ( i ) 0
for
i 2, , M ,
(7.16)
for the noise-free case. In the presence of noise, we will get Q P 1 minima, instead of M zeros. We choose the s corresponding to M smallest minima. The Root-MUSIC can also be implemented by forming a delay spectrum
polynomial
P ( z ) a ( z )H o a ( z ) ,
(7.17)
where z e jd . Factoring the polynomial P(z) into

QP
P(z) C
( 1 ri z 1 ) ( 1 ri* z )
i1
(7.18)
we can get 2(Q P) roots. We choose the M double roots on the unit circle
(for the noise-free case) or the M roots closest to but inside the unit circle (for
the noisy case), as the signal roots. We then calculate the s from these signal
roots.
7.2.2.2 Minimum-Norm
The linear prediction-error vector is dened as
o e1
------------2
d
o e1
(7.19)
where e 1T ( 1 0 0 ) . Then we form polynomial and factor it into

QP
D( z ) a( z )Hd
( 1 r i z 1 ) .
i1
(7.20)
116
08Chapter07.indd 8
12/3/09 1:50 PM
7.2.2.3 ESPRIT
The shift-invariance exists in the rows of A, as shown in following expression:
... e j ( P 1 )d M
.. .
...
...
e j ( P 1 )d 1
...
e j ( Q 1 )d 1 ...e j ( Q 1 )d M
0
...
...
.. .
e jd 1
... jPd M
e
.. .
...
e jPd 1
...
... e jQd M
e jQd 1
...
j d M
(7.21)
In equation (7.21) the matrix on the left side is the matrix A (or sA)
excluding the rst row, so we denote it as A, and the rst matrix on the right
side is the matrix A (or sA) excluding the last row, so we denote it as A. If
we further denote
...
..
0
...
e jd 1
...
The M roots on (for the noise-free case) or closest to (for noisy case) the unit
circle are chosen as signal roots. We can also apply our searching algorithm (Li
and Vaccaro, 1989), that is to search over for the M zeros or smallest minima
of D() = a()Hd.
...
e jd M
(7.22)
then we have
A = A ,
(7.23)
so that
A
A ,
(7.24)
117
08Chapter07.indd 9
12/3/09 1:50 PM
The estimation of time delay i can also be implemented on each column

vector of A, i.e., each trace in the frequency domain. Let ai be the ith column
of A, then i can be obtained from
i ai a i
(7.25)
where i e jd i , for i 1, ..., M.

Our simulations show that this vector approach provides the same estimates as in the matrix approach.
7.2.3
Estimation of Velocity and Zero-Offset Time
The hyperbolic trajectory in seismic wave propagation is shown in

equation (7.1). At the sensor located at offset xi, the arrival time of seismic signal is ti. Equating ti To i, equation (7.1) can be rewritten as
1
x i2 ---2 2 i T o i2 .
v
(7.26)
For all the offsets xis and delay estimates is, we have
2
xM
2 M
1
---2
v
To
12
...
2 1
...
x 12
...
where superscript stands for the left pseudo-inverse of a matrix

(A (AHA)-1AH). Generally, the matrix calculated from
equation (7.25) may not be diagonal. We need to calculate the eigenvalues of
to get is. Note that since A is available from data (unlike in DOA estimation where array manifold A is only a functional of undetermined angles), estimation over data A or signal-subspace projection s makes no difference.
2
M
(7.27)
such that we can obtain
118
08Chapter07.indd 10
12/3/09 1:50 PM
...
2 1
...
To
x 12
...
1
--v2
2 M
2
M
2
xM
12
.
(7.28)
Thus we obtain the estimates of stacking velocity v and zero-offset time T0,
simultaneously. Notice that equation (7.28) is the solution to an overdetermined estimation problem, it is equivalent to identifying a hyperbola with the
parameters v and T0 that best ts all the estimated arrival delays i.
7.3
Improved Subspace Approach
Since the seismic signals are transient signals, they only occur in a small
part of the total data. The rest of the data have only noise. To improve the performance of subspace processing of velocity estimation, we should use only
the part of the data that has the signals. This is the reason for using a data
analysis window. However, because the seismic reection has a hyperbolic trajectory, applying a rectangular data analysis window directly to seismic data
will still have the data that contains only the noise. In order to achieve the
highest possible S/N, we apply a hyperbolic window to select the signals. This
hyperbolic window concept is similar to seismogram alignment.
A scheme that improves the proposed subspace processing approach is the
following:
1. Estimate seismic arrival delays i using various subspace processing algorithms described in Section 7.2. Obtain the velocity and zero-offset estimates v and T0 using equation (7.28).
2. Construct a set of delays i from equation (7.1) using estimated parameters v and To. (The hyperbolic trajectory passes through all the i , while
the trajectory is only a best t for all the i .)
3. Apply a data analysis window (ones inside the window and zeros outside)
with time range ( i , i ) to the seismic data (see Figure 7.1).
4. Repeat step 1 for more accurate estimates from windowed data.
119
08Chapter07.indd 11
12/3/09 1:50 PM
7.4
Performance Analysis
In practice, ideal conditions never prevail. Among the many sources of

error are conditions, such as observation noise, sensor error effects, and
unknown correlation of the previous two types of noise or error. These are the
major factors which perturb the subspaces, thus degrading the performance of
subspace processing algorithms.
Observation noise is always present resulting in the perturbation of estimated subspaces when only a nite number of measurements are available or
when the noise eld is spatially coherent with an unknown structure. We
assume that the observation noises at all sensors are uncorrelated with equal
variances (white noise) and uncorrelated with the signals. If the observation
noise is not white, but has a known correlation structure, then we assume that
the data have been prewhitened prior to processing. If the noise has unknown
correlation structure, then the DOA estimates will be perturbed in a way that
can be calculated using the techniques given in this chapter, as has been shown
(Li and Vaccaro (1991). Here, we assume that the (possibly prewhitened) data
matrix can be written as n I .
In the following subsections, we analyze all the subspace-based parameter
estimation algorithms in a unied fashion. The basis for the performance analysis of subspace methods is discussed in Li (1990), Li et al. (1990), Li and
Vaccaro (1991), and Li et al. (1993).
7.4.1
Perturbation of the Signal and Orthogonal Subspaces
In the noisy case, the A matrix will be perturbed so that the parameters
estimated will also be perturbed. The noise perturbed A matrix can be
expressed as
A N,
A
(7.29)
where N is the noise matrix.

We now derive the rst order approximation of the signal and orthogonal
subspace dened in Section 7.2.2:
s A
(A
HA
) 1 A
H.

Substitute equation (7.29) for and we get
120
08Chapter07.indd 12
12/3/09 1:50 PM
s ( A N ) [ ( A N ) H ( A N ) ] 1 ( A N ) H

( A N ) [ A H A A H N N H A N H N ] 1 ( A N ) H .
Neglecting the second order term and using the Matrix Inverse Lemma, we
get
s ( A N ) [ A H A ( I ( A H A ) 1 ( A H N N H A ) ) ] 1 ( A N ) H

( A N ) ( I ( A H A ) 1 ( A H N N H A ) ) ( A H A ) 1 ( A N ) H
A ( A H A ) 1 A H A ( A H A ) 1 N H N ( A H A ) 1 A H
A ( A H A ) 1 ( A H N N H A ) ( A H A ) 1 A H .
(7.30)
Using the notation for subspace projection, we have

def
s

s
s
s o NP H PN H o
(7.31)
here we dened P A(AHA)-1 and note that AHP I.

Equivalently, the perturbation in the orthogonal subspace is
o I
s def

o
s o o .
(7.32)
o o NP H PN H o
(7.33)
So
7.4.2
Statistical Property of the Noise Matrix
The noise matrix N used here is a transformation of the noise matrix from
. For the
the spatial domain. Denote the spatial domain noise matrix as N
121
08Chapter07.indd 13
12/3/09 1:50 PM
N
) n2 I
E(N
H
here E[ ] is the expectation operator and n2 is the variance of the noise.

Because of the data manipulation in equations (7.9) and (7.13), the frequency domain noise matrix N that perturbates A is related to the original
noise matrix by
,
N DFN
where F is the DFT matrix and D is dened similarly to T() as
D
...
..
...
0
.
...
e j P
------------y ( P )
...
convenience of the analysis, we assume it to be white Gaussian so that its covariance matrix is
e j Q
-------------y ( Q )
(7.34)
The interval (P Q) is the range of energy band. The covariance matrix of N
becomes
N
H )F H D H n2 DFF H D H .
E ( NN H ) DFE ( N
Here we use the property that the DFT matrix F is unitary.
In the case of using the improved subspace approach, the noise matrix in
the time domain is weighted by a window matrix Wi at each trace so that
N DF ( W 1 n 1 ,,W M n M ) ,
where n i is the noise vector at trace i.
7.4.3
Perturbation of the Time Delay
Basically there are two ways to implement MUSIC and MN for parameter
estimation: Extrema-Searching and Polynomial-Rooting. It has been proved
that these two methods have the same performance in the sense of mean122
08Chapter07.indd 14
12/3/09 1:50 PM
squared error of parameters estimated (Li, 1990; Li et al., 1990; Li and Vaccaro, 1991c; Li et al., 1993). We rst analyze the perturbation of time delay
(Li et al., 1993) estimation using MUSIC and MN algorithms. Readers are
referred to (Li et al., 1993) for details of the development.
7.4.3.1 Extrema Searching: MUSIC and MN
The null spectrum function associated with the MUSIC and MN searching algorithm can be written as
P ( t ) a ( t ) H o W o a ( t )
where the weighting matrix W equals I for MUSIC and
e 1 e 1H
----------------o e1 4
for MN. Dene e1 as the vector of all zeros except a1 in ith position. The time
delays can be estimated by searching for the minima of the null spectrum.
The perturbation of the time delay estimates can be obtained via a rst
)/t
order expansion of P (t i ,
o
.
o)
P ( i ,
--------------------
i ---2-------------------- ,
P( , )
-------------i-2-------o-
for
k 1, , M 1.
(7.35)
Taking the rst and second partial derivative with respect to i, we can easily
obtain
2 P ( i , o )
--------------2--------- 2 a ( 1 ) ( i ) H o W o a ( 1 ) ( i )

and
o)
P ( ,
-----------i---------- a ( 1 ) ( ) H ( ) ( W W ) ( ) a ( )
i
o
o
o
o
i

a ( i ) H ( o o ) ( W W ) ( o o ) a ( 1 ) ( i ) .
123
08Chapter07.indd 15
12/3/09 1:50 PM
Taking the rst order approximation, we have

o)
P ( i ,
P( , )
( , )
--------------------- -----------i--------o-- ---P
--------i--------o-- ,

P ( i , o )
------------- 0 and (using R[] for real part)
where --------
P ( , )
-----------i--------o-- 2 [ a ( i ) H o W o a ( 1 ) ( i ) ] .

(7.36)
(7.37)
Substituting the result in equation (7.33) into equation (7.37) we have

P ( , )
-----------i--------o-- 2 [ e iH A H ( o NP H PN H o ) W o a ( 1 ) ( i ) ]

2 [ e iH N H o W o a ( 1 ) ( i ) ].
(7.38)
Here we use the property

AHP I
A H o 0.
So, we can obtain the perturbation of the time delay estimates as

[ e iH N H i ]
i ----------------------- ,
i
(7.39)
with i and i dened as

i j o W o a ( 1 ) ( i )
and
i a ( 1 ) ( i ) H o W o a ( 1 ) ( i ) .
7.4.3.2 Polynomial-Rooting: MUSIC and MN
A common spectral polynomial for MN and Root-MUSIC can be written
as
124
08Chapter07.indd 16
12/3/09 1:50 PM
L1
P ( z ) a ( z 1 ) T o W oH a ( z ) A
( 1 ri z 1 ) ( 1 ri* z ) ,
i1
(7.40)
where the weighting matrix W is the same as in the extrema-search algorithm

and A is a scaling factor. A equals 1 for MN and
(L K)
-----------2----h
for MUSIC and h is the coefcient vector (with a rst element of unity) for
the polynomial H(z). H(z) is the causal spectral factor of P(z), namely
P ( z ) H ( z )H ( z 1 ). Note that polynomial rooting algorithms can only
be applied to uniform sampled frequency-domain data. The relationship
between the roots of the spectral polynomial and the time delay is
r i e jd i .
The common model in equation (7.40) is a spectral polynomial whose
signal roots occur on the unit circle with multiplicity two. Because the signal
roots occur with multiplicity two, a direct perturbation expansion of
equation (7.40) would yield second-order perturbations of the signal roots;
the rst-order perturbations would be multiplied by zero. In order to reduce
the multiplicability of the signal roots, we work with the derivative of the
spectral polynomial. The derivative has signal roots on the unit circle with
multiplicity one. Thus, to nd the rst-order perturbations of time delay estimation, we must calculate the perturbations of the roots of the derivative of
P(z). We are only interested in the roots z r i e ji corresponding to
noise-free time-delay estimations.
1. Take the derivative of P ( z, r ) with repsect to z, substitute r r r
into
P ( z, r )
---------------z
and evaluate it at z ri. The rst-order terms yield
125
08Chapter07.indd 17
12/3/09 1:50 PM
P ( z, r )
2 jAr i* Im ( r i r i* ) G ( r i ) ,
----------------z

r
z
i
where
L1
G ( ri )
( 1 rj ri1 ) ( 1 rj* ri ).
jk
j 1
(7.41)
o ) with respect to z, substitute

2. Take the derivative of P ( z, U
o and W
W W into

o
)
P ( z, U
-----------------o-z
and evaluate it at z r i . The rst-order terms yield
)
P ( z, U
-----------------o-z
z ri
2 jr i* Im [ r i* a ( r i ) H o W oH a ( 1 ) ( r i ) ] .
(7.42)
3. Equating (7.41) and (7.42), we obtain

r
2jAr i* Im -----i G ( r i ) 2jr i* Im [ r i* a ( r i ) H o W oH a ( 1 ) ( r i ) ].
ri
(7.43)
Using r i* r i1 and the angle-root relation given in Tufts et al. (1989),
Im [ r 1 a ( r ) H W H a ( 1 ) ( r ) ]
r
i C i Im -----i C i ---------------i------------i-------------o-----------o--------------i----.
ri
AG ( r i )
(7.44)
where Ci 1/vd. Now, substitute equation (7.33) into equation (7.44) to

obtain, similarly, as with equation (7.38)
Im [ C r 1 e H N H W H a ( 1 ) ( r ) ]
i ------------i---i--------i--------------o----------o---------------i--- .
AG ( r i )
(7.45)
126
08Chapter07.indd 18
12/3/09 1:50 PM
i o W oH a ( 1 ) ( r i )C i r i1 ,
then (7.45) can be simplied as
Im [ e H N H ]
i -----------i-------------i--.
AG ( r i )
(7.46)
This result can be proved to be the solution to equation (7.40) (see Li, 1990).
7.4.3.3 ESPRIT Algorithm
In the noisy case, the data structure is perturbed so we can solve for a rst
order perturbation. We now have
(A N ) ( ) (A N ).
Cancel A and A and neglect the second-order term to obtain
A
(N N ).
Since is diagonal matrix, its eigenvectors are ei. The rst-order perturbation
due to is
of eigenvalues of
i e iH (A
I A
i e iH e i e iH A
(N N
)e i
(7.47)
)Ne i .
Here we use the properties

N e i I
N ( i Ie i ) i I
Ne i
and
N i ( e iH 1 )A
I N i e iH A
e iH A
If we dene
IN.
We further relate the perturbation of i to i (Li et al., 1990)

127
08Chapter07.indd 19
12/3/09 1:50 PM
(7.48)
where Ci 1/d,

iH C i e iH (A I A I ) ,
and
i 1 .
7.4.3.4 Vector-Wise ESPRIT Algorithm
In the noisy case,
H
n iH ) ( a ( i ) n i )
i (---a------(-----i-)---
----------------------------------
i
i
Km 1
(7.49)
using the rst order approximation

a H ( i ) n i n iH a ( i )
H( )n n Ha ( )
a
i
i
i
----------------------- ------------------------------------------i-- .
i -----------------K----
i
1
K1
(7.50)
We further relate the perturbation of i to i (Li et al., 1990):

i C i -------i
i
[ e iH N H i ]

i C i -------i ----------------------- ,
i
i
a H ( i ) n i n i H a ( i )
C i -------------------------------------------
K1
a H( )n a H( )n
C i ------------i------i-------------------i------ ,
K1
(7.51)
where Ci 1/. By using
128
08Chapter07.indd 20
12/3/09 1:50 PM
H
a H ( i ) H a-------(-----i-)
------------- e i A K 1
K1
and
Ne i n i ,
we can rewrite equation (7.51) as
[ e iH N H i ]
i ----------------------- ,
i
(7.52)
where

iH C i e iH ( A I A I )
e iH A
and
i 1 .
This is the same as that of the matrix ESPRIT algorithm.
7.4.3.5 Mean-Squared Error of Time-Delay Estimation
The estimation of elements of covariance matrix Ct is given by
E[ [ e HNH ][ e HNH ] ]
C r ( i, j ) E ( i j ) --------------i-------------i------------j-------------j--- .
i j
(7.53)
Under the assumption that the noise elements are uncorrelated circular
random variable with equal variances n2 /2 (for real and imaginary parts
respectively), then (see Li and Vaccaro, 1991)
( H DD H ) 2
C ( i, j ) ------i-----------------j-------n ( i , j ) .
2 i j
(7.54)
When improved subspace algorithms are used
129
08Chapter07.indd 21
12/3/09 1:50 PM
(7.55)
we can see that the covariance matrix Ct is a diagonal matrix.
7.4.4
Relating Time-Delay Estimation to Parameter Estimation
We can relate the perturbation of [To v] to the perturbation of

(1, ..., M)T by using equation (7.29):
2 ( 2 2 )
x 22
2 ( M M )
2
xM
( 1 1 ) 2
...
(v
v ) 2
x 12
...
T o T o
2 ( 1 1 )
...
( HD W W HDH ) 2
C ( i, j ) ------i----------------i------j-------------------j-------n ( i , j ) ,
2 i j
( M M ) 2
(7.56)
Lets dene
2 )T
x ( x 12 x 32 x M
and take the rst order approximation

H
To To
2 ( H ) ( 2 ( ) x )

2
(v v)
xH
1
2 ( H H ) .
( 2 . )
xH
4 H 2 H x 8 H 2 H x
2 x H x H x 2 x H
0
1
2 H ( . ) 4 H ( . )
2 H
H ( . )
2 x H ( . )
x
130
08Chapter07.indd 22
12/3/09 1:50 PM
where . stands for element-wise multiplication. Now, referring to the rst to

fourth terms on the right as B1, B2, b1 and b2, respectively, and neglecting the
higher order terms, we have
T o T o
1
1
( v v ) 2 [ B 1 ( I B 1 B 2 ) ] ( b 1 b 2 )
(7.57)
( I B 11 B 2 )B 11 ( b 1 b 2 )
B 11 b 1 B 11 B 2 B 11 b 1 B 11 b 2
for the rst order approximation
T o
T o T o
To
2v .
( v v ) 2 v 2
----------- v3
(7.58)
Since
To
1
v 2 B 1 b 1 ,
this becomes
T o
v
1
0
( B 11 B 2 B 11 b 1 B 11 b 2 ).
0.5v 3
(7.59)
Through a lengthy but straightforward derivation (see Appendix A), we can

prove that
d1
T o
v D d ,
2
(7.60)
where row vectors d1, and d2 are given in the appendix. Now we have the
mean-squared error of the parameter estimated as
E ( T o ) 2 d 1 C d 1H
(7.61)
and
131
08Chapter07.indd 23
12/3/09 1:50 PM
E ( v ) 2 d 2 C d 2H .
7.5
(7.62)
Simulations
The general conguration of the simulation experiment is a uniform line

array of six sensors (M 5) with a spacing of 200 ft (60m) between sensors. A
Ricker wavelet with 40 Hz dominant frequency is used to simulate a wideband seismic wavefront. The Ricker wavelet has a frequency distribution of
w(f) f
2e
( 2f ) 2
----------------2
.
(7.63)
It arrives at the rst sensor at To 0.1 s with a velocity of 2000 ft/s (600m/s);
501 data points are sampled uniformly from each trace and the signal-to-noise
ratio is 20 dB.
Twenty-one frequency-domain data points (Q P 20) are used from
each trace within the main lobe of the wavelet (30 Hz ~ 50 Hz) with
f 1 Hz. Figures 7.37.5 show the velocity and zero-offset time estimates
from twenty trials using MUSIC, MN, and ESPRIT algorithms, respectively.
Figures 7.7 and 7.8 show the velocity and zero-offset time estimates from
twenty trials using the improved MUSIC, MN, and ESPRIT algorithms,
respectively. The window width is .3 s.
Table 7.1 shows root-mean-squared error (RMSE) of velocity and zerooffset time estimates averaged from 100 trials using MUSIC, MN, and
ESPRIT algorithms. Table 7.2 shows the RMSE using improved MUSIC,
MN, and ESPRIT algorithms. Table 7.3 gives RMSE for the sample using
Semblance and Key's algorithms. The same window used for the improved
subspace approach is used for both the Semblance and Keys algorithms. The
search steps in both algorithms are 0.8 ft/s (20 cm/s) for velocity variable and
0.0004 s for time variable.
Table 7.1. RMSE for MUSIC, Minimum Norm, and ESPRIT
RMSE
MUSIC
MN
ESPRIT
Velocity (ft/s)
10.8298
25.3919
19.2899
Zero-offset time (s)
0.00299
0.00713
0.00856
132
08Chapter07.indd 24
12/3/09 1:50 PM
Figure 7.3.
MUSIC estimates of velocity and zero-offset time (20 trials).
Table 7.2. RMSE for Improved MUSIC, Minimum Norm, and ESPRIT
RMSE
MUSIC
MN
ESPRIT
Velocity (ft/s)
10.7464
14.2129
9.6694
0.00278
0.00411
0.00267
Table 7.3. RMSE for Semblance and Keys Methods

RMSE
Semblance
Keys Method
Velocity (ft/s)
11.426
15.899
0.00856
0.0122
133
08Chapter07.indd 25
12/3/09 1:51 PM
Figure 7.4.
Minimum-norm estimates of velocity and zero-offset time (20 trials).
From these results we can see that the improved subspace processing algorithms outperform the semblance and Keys algorithms. For the original subspace algorithm, MUSIC has smallest RMSE, followed by ESPRIT; but for
the improved subspace algorithms, ESPRIT has smallest RMSE, followed by
MUSIC.
Figures 7.9 and 7.10 give the mean-squared error for velocity and zero-offset time estimates, in which the lines are theoretical predictions, and discrete
symbols are simulation measurements. They show that experimentally simulated results and those predicted theoretically agree very well.
134
08Chapter07.indd 26
12/3/09 1:51 PM
Figure 7.5.
7.6
ESPRIT estimates of velocity and zero-offset time (20 trials).
Conclusion
The approach presented estimates the stacking velocity and zero-offset

time simultaneously in contrast with the semblance approach which requires
estimation through iterative search. The presented approach extracts structural
information from general wide-band signals in contrast with previous efforts
which assume a narrow bandwidth or special data window (paralleled to trajectory). The advantages of the presented approach are higher resolution and
less computation.
This approach allows us to estimate the stacking velocity and zero-offset
for only one seismic wavefront. In exploration seismology and earthquake seismology, we are often interested in one primary wavefront, and consider others
135
08Chapter07.indd 27
12/3/09 1:51 PM
Figure 7.6.
Improved MUSIC estimates of velocity and zero-offset time (20 trials).
as noise. However, an estimation scheme for the parameters of multiple wavefronts will be useful and worth pursuing in the future.
7.7
References
Goldstein, P., and Archuleta, R. J., 1987, Array analysis of seismic signals:
Geophys. Res. Lett., 14, 13-16.
1991, Deterministic frequency-wavenumber methods and direct
measurements of rupture propagation during earthquakes using a
dense array: Theory and methods: J. Geophys. Res., 96, 6173-6185.
Key, S. C., and Smithson, S. D., 1990, New approach to seismic-reection
event detection and velocity determination: Geophysics, 55, 10571069,
Kirlin, R. L., 1992, The relationship between semblance and eigenstructure
136
08Chapter07.indd 28
12/3/09 1:51 PM
Figure 7.7.
Improved Minimum-norm estimates of velocity and zero-offset time (20

trials).
velocity estimators: Geophysics, 57, 1027-1033.

Kumaresan, R., and Tufts, D. W., , 1983, Estimating the angles of arrival of
multiple plane waves, IEEE Trans. Aerospace and Electronic Systems,
19, 134-139.
Li, F., 1990, A unied performance analysis of subspace-based DOA estimation algorithms: Ph. D. thesis, Univ. of Rhode Island.
Li, F., Liu, H., and Vaccaro, R. J., 1993, Performance analysis for DOA estimation algorithms: Further unication, simplication, and observations: IEEE Trans. Aerospace and Electronic Systems, 29, 11701184.
Li, F., and Vaccaro, R. J., 1989, Min-Norm linear prediction of arbitrary sensor array: Proc. IEEE Internat. Conf. on Accoust., Speech and Sig.
137
08Chapter07.indd 29
12/3/09 1:51 PM
Figure 7.8.
Improved ESPRIT estimates of velocity and zero-offset time (20 trials).
Figure 7.9.
Mean-squared-error versus S/N for velocity estimation.
138
08Chapter07.indd 30
12/3/09 1:51 PM
Figure 7.10.
Mean-squared-error versus S/N for zero-offset time estimation.
Proc., 26132616.
1991a, Performance degradation of DOA estimation due to
unknown noise elds: Proc. IEEE Internat. Conf. On Accoust., Speech
and Sig. Proc., 14131416.
1991b,On frequency-wavenumber estimation by state-space realization: IEEE Trans. On Circuits and Systems, 38, 800804.
1991c, Unied analysis for DOA estimation algorithms in array signal processing: Signal Processing, 22, 147169.
Li, F., Vaccaro, R. J., and Tufts, D. W., 1990, Unied performance analysis of
subspace-based estimation algorithms: Proc. IEEE Internat. Conf. on
Accoust., Speech and Sig. Proc., 25752578.
Neidell, N. S., and Taner, M. T., 1971, Semblance and other coherent measures for multichannel data: Geophysics, 36, 482497.
Paulraj, A., Roy, R., and Kailath, T., 1985, Estimation of signal parameters
via rotational invariance techniques - ESPRIT: in Proc. 19th Asilomar
Conf. on Signals, Systems and Computers, 8389.
139
08Chapter07.indd 31
12/3/09 1:51 PM
Schmidt, R. O., 1979, Multiple emitter location and signal parameter estimation: Proc. RADC Spectral Estimation Workshop, 243258.
7.8
Appendix A, Verication of Equations (7.60)
We rst simplify B1 and b2 in equations (7.59) and (7.60):

4 H
B2 2
H x
H x
and
6 ( . ) H
.
b2
2( x . )H
Let
b1
b2
b3
b4
=2
0.5v 3
B 11
,
a 1
1
a 2 B 1 b 1 ,
and
c 1 =
c 2
0.5v 3
6 ( . ) H
B 11
.
2( x . )H
(7.64)
So we have
T o
v
b1
b2
4 H
b3
b4
H x
H x
0
a 1 c 1
a 2 c 2
4a b H a b x H a b x H c
2 1
1 2
1
1 1
H a b xH a b xH c
b

4a
1 3
2 3
1 1
2
d
1
d 2
def
(7.65)
140
08Chapter07.indd 32
12/3/09 1:51 PM
Chapter 8
Enhanced Covariance Estimation with
Application to the Velocity Spectrum
R. Lynn Kirlin
Seismic reflections from an interface are relatively short in time; they are
transient. Because of this fact, estimation of the covariance matrix is accomplished by the use of either time-slice vectors (across traces) or Fourier transform value vectors. Although we might have a dozen or so time-slice vectors,
this is generally not adequate, (a few hundred would be nice, and appropriate
for temporally stationary signals). In the frequency domain, the short reflection only affords one Fourier transform. Thus, the multiplicity of vector samples must be found another way if possible.
As always with a finite data set, we may exchange spatial resolution for statistical stability. The exchange is effected two ways: break the array into spatially offset subarrays or, in the frequency domain, utilize the correlated
variations of transform values among distinct (usually neighboring) frequencies. Both of these schemes are usually implemented with a sliding window,
either in space or frequency, as appropriate. In this chapter, I examine these
possibilities for enhancing covariance estimation and determine their applicability to seismic array processing.
In practice, the data covariance matrix,
H
R x E { xx } E { x }E { x } ,
(8.1)
is not known, and usually it must be estimated with the finite data available.
Generally, it is known that E{x} 0; thus, the estimator of Rx is as in
Equation (2.1):
141
x ------1------ x i x H
.
Cx R
N 1i1 i
(8.2)
For real Gaussian data, the elements of Cx are distributed Wishart, and for
complex data they are complex Wishart, as discussed in Chapter 2. Also in
Chapter 2, I described estimation in the maximum likelihood sense and
robust estimation for contaminated Gaussian data.
Different forms of a priori knowledge help improve covariance estimation.
The primary case is when the data are known to be the sum of signals that
each having plane wavefronts and the array is uniformly spaced and linear.
However, I will describe a method for dealing with hyperbolic curvature wavefronts in Section 8.4. Equivalently we could consider vectors from a time
sequence which is composed of sinusoids. In either case all noise samples are
considered independent, white, stationary, and zero-mean Gaussian random
2
variables with variance n . Note that not all of the following analyses will
apply directly to hyperbolically curved wavefronts.
8.1
Spatial Smoothing
Under the conditions assumed above it is easy to imagine that the covariance matrices of data from different subarrays of m adjacent sensors should be
identical. See Figure 8.1 for the general concept of subarray covariance matrices related to the whole array covariance matrix. However, in general this is
not true because temporal coherence among the signal sources causes spatial
nonstationarity. Yet, by properly combining data from different subarrays
coherently, a more stable estimate of the m m covariance matrix results. For
any single plane-wave narrowband signal arrival, the delay between any two
sensors at positions i1 i0 and i2 i0 q is indicated by the same factor
exp{j2fq }(where is the delay between adjacent sensors) regardless of i0.
Thus we may coherently combine narrow band, analytic signals from sensors i1 and i2 if we shift the phase of the signal at i2 by multiplying the analytic
signals by the factor exp{j2fq }. This is easily effected for each sensor in one
subarray spaced q positions from each sensor in another subarray.
Thus let x1(i) be a time-slice vector taken from a reference subarray at time
i and let xq(i) be the corresponding time-slice vector taken from an array q
sensor spacings removed. Then if the covariance matrix associated with x1(i) is
142
R1 and that with xq(i) is Rq, then, for a multiple source signal vector s and
2
noise covariance matrix I ,
n
R 1 AE { ss }A n I E { x 1 ( i )x 1 ( i ) }
(8.3)
and
H
R q AD q E { ss }D q A n I E { x q ( i )x q ( i ) } ,
(8.4)
where A is the matrix of direction vectors [see H in equation (4.24)], and Dq

is a diagonal matrix with kth element exp{j2fq k} corresponding to the
inter-sensor delay k of the kth signal source.
1 and R
q is not appropriate
Because of Dq, a straightforward average of R
in the multiple source case. Further, an averaging of noise power on the diagonal elements would give biased results.
1 and R
q , A would have
In order to affect a true coherent averaging of R
to be known. However, if we are going to use an algorithm like MUSIC which
searches over arrival angles , we may align to x1(i) that part of xq(i) which is
coherent with the trial direction vector. That is, we may try
x 1 diag ( a *q ( ) )x q ( i )
(8.5)
Tq xq ( i ) ,
where Tq is a diagonal matrix with elements exp { j2fq } and is the intersensor delay for the trial direction in the search vector of the direction-finding
algorithm; i.e.,
sin /v ,
(8.6)
where is the intersensor spacing, is the trial direction of arrival, and v is

the wave velocity. These elements exactly remove the delay of sk at the subarray
when is chosen correctly. Using the transformed xq(i) gives
N
H
-1--T q x q ( i )x H
R
q ( i )T q
N i1
H
Tq Cq Tq
143
(8.7)
Suppose that we have an M-sensor array; then Equation (8.7) implies that
each subarray of m elements can be used to estimate the covariance matrix of
R1. There are M m 1 of these, therefore the spatially smoothed covariance estimate is
Mm
o = R
1 -----------1------------ T C T H .
C
Mm1q0 q q q
(8.8)
When employing Equation (8.8) we must ask:

1) What is the best number m to use?
2) How well does the transformation Tq of Equation (8.5) work?
3) Are there better transformations?
The first question has been answered fairly well by Tufts (1991). He recommends M/3 m 2M/3for short arrays of M sensors. Although Tufts
analysis presumed time series data with a sum of sinusoids for signals, there is
no mathematical distinction with our spatial smoothing problem.
We now look at the question of a better transformation.
8.2
Improvements Using Cross-Covariance Submatrices
Kirlin and Du (1991) introduced an optimum method of determining the

transformation on one subarrays time slice vector x2 to estimate with minimum mean squared-error the vector x1. The least-squares linear prediction
(LSLP) of one random vector from another random vector is developed as follows (Eaton, 1983).
Let x be the concatenation of the random vectors x1 CM and x2 CN .
T T T
Then x ( x 1 x 2 ) has the covariance matrix
R C(M N) (M N)
R
R 11 R 12
R 21 R 22
(8.9)
where R ij E[(xi i)(xi j)H], i, j 1, 2 and i E[xi] is the mean

vector of the process xi, i = 1,2. Now we consider the LSLP of one random
144
vector by another random vector; that is, we want to find a linear transformation f, given by
x 1 f ( x 2 ) Ax 2 x 0 ,
(8.10)
such that the expectation of the 2-norm of the error x 1 x 1 x 1 or

2
J E { x 1 2 }
2
E { x 1 x 1 2 }
(8.11)
2
E { x 1 ( Ax 2 x 0 ) 2 }
is minimized, where x0 CM and A CM N is an M N transformation
matrix. The optimal linear transformation to minimize J is easily shown to
give
1
A R 12 R 22
(8.12)
and
1
x 0 1 R 12 R 22 2 .
(8.13)
Let Cov ( x 1 ) be the covariance matrix of the prediction error vector x 1

with the least 2-norm, then
1
Cov ( x 1 ) R 11 Cov ( x 1 ) R 11 R 12 R 22 R 21 ,
(8.14)
where
1
Cov ( x 1 ) R 12 R 22 R 21
(8.15)
is the covariance matrix of the optimal prediction of x1 by x2. The proof is

given in Eaton (1983).
No effort is made to use the prediction directly since the prediction vector
x 1 is not needed; instead, the estimation of the autocovariance matrix of x1 is
145
the concern. Note that the above proposition not only provides the optimal
prediction of the random vector x1 from the random vector x2, but also it
implies that the autocovariance matrix R11 of x1 can also be predicted from x2
since
1
p11 Cov ( x 1 ) R 12 R
R
22 R 21 ,
(8.16)
11 is in fact the covariance matrix of the optimal prediction of x1 by

where R
11 , it is
x2. Although this estimate may not be a direct optimization for R
Equation (8.16) that shows how the crosscorrelations between x1 and x2 can
be exploited to predict the autocovariance matrices. Clearly, if an exact predic p11 R 11 and Cov ( x ) 0 . When the two
tion has been affected, then R
random vector processes are highly correlated and their observations have similar signal-to-noise ratios (S/N) (and we expect this to be true in our array), it
is possible to improve the estimate of the autocovariance matrix of x1 by incor 11 . There are different ways to do this: one ad hoc
porating its estimate R
approach is to average the original estimate with its prediction directly:
11 1-- ( R
p11 ) .
11 R
R
2
(8.17)
The argument behind the averaging operation in the example of array processing, where x1 and x2 are from different subarrays and received wavefronts
11 and R
p11 are noisy estimates of R11.
are spatially stationary, is that both R
These can be written
11 R 11 N 1 ,
R
(8.18)
p11 R 11 N 2 ,
R
(8.19)
and
where N1 and N2 are the perturbations due to finite averaging. Equation (8.17)
can be rewritten as
11 R 1-- ( N N ) .
R
11
2
2 1
146
(8.20)
Generally, the averaging will suppress perturbations, which usually are

p11 are averaged into R
11 ,
uncorrelated and zero mean. When N estimates of R
11 obviously converges correctly as expected to the signal-produced strucR
ture in R 11 . In this case, high correlations may not be necessary. In some
other applications, however, the improvement can be achieved only if high
coherence exists between the two random processes and x2 does not have an
S/N much lower than that of x1; otherwise, the expected improvement cannot
be guaranteed. In fact, the rationale of the coherent signal-subspace method
(CSS or CSM) (Wang and Koveh, 1985) is the same as the one used here: if
the S/N values of the two random processes were known, we could use properly weighted averaging to prevent the predicted autocovariance matrix from
deteriorating the estimate of R11,
w11 1-- ( R
p11 ) ,
11 w 12 R
R
2
(8.21)
where w12 is the weighting factor.

For the purpose of choosing the weighting factor w12, a reasonable choice
is some power q of the ratios of estimated signal power to signal-plus-noise
power,
1
w 12
Tr { R R 22 R 21 }
------------12
--------------------- ,
Tr { R 11 }
(8.22)
where q is any positive number. When the two traces are equal, w12 has a value
of one; and when the two random vectors are uncorrelated with
R12 R21 0, w12 has a value of zero.
8.3
Applications in Subarray Processing
Consider two general subarrays, each composed of M sensors, whose posi1 1

1
tion vectors are specified, respectively, by the sets { z 1, z 2, . . . , z M } and
2 2
2
{ z 1, z 2, . . . , z M } of real 3 1 vectors. Assume the two subarrays have the
same geometrical structures, then there always exists an orthogonal matrix
T R3 3 and a real vector z0 R3 such that (Cadzow, 1988)
2
z i Tz i z 0
i 1, 2, . . . , M .
147
(8.23)
Let there be d incident plane waves arriving with 3 1 direction-cosine

vectors 1, 2, ..., d from associated sources of narrow band signals
s1(t), s2(t), ..., sd(t), respectively. Then the kth subarray signal vector is specified
by the linear combination of steering vectors
d
xk ( t )
si ( t )ej ak ( i ) n ( t )
k 1, 2 ,
(8.24)
i1
where i is the initial phase of the ith signal, n(t) CM is the additive noise
vector and ak(i ) is the steering vector of the kth subarray for the ith signal,
which is given by
ak ( i ) [ e
T k
jw 0 i z 1 /c
,e
T k
jw 0 i z 2 /c
,...,e
T k
jw 0 i z M /c T
] ,
(8.25)
with c denoting the propagation velocity of the signals, w 0 the signal band
center frequency and T the transpose operation. Using matrix-vector notations, the kth subarray output vector can be presented as follows:
x k ( t ) A k s ( t ) n ( t ), for k 1, 2 ,
(8.26)
where Ak and s(t) are given by

j
j
s ( t ) [ s 1 ( t )e 1, s 2 ( t )e 2, . . . , s d ( t )e
j d T
] ,
and
A k [ a k ( 1 ), a k ( 2 ) , . . . , a k ( d ) ] .
The additive noise is assumed to be a stationary zero-mean random process that is temporally and spatially white and uncorrelated with the signals.
With this assumption we get the cross-covariance matrix of the subarray outputs
H
R ij E [ x i x j ] A i SA j I ij
for i, j 1, 2 ,
(8.27)
where S E[s(t)s(t)H] and 2I E[n(t)n(t)H]. Using Equation (8.26), we

rewrite Equation (8.15) as
148
1
H
H
p11 A 1 SA H
R
2 R 22 A 2 SA 1 A 1 S A 1 ,
(8.28)
H 1
where S SA 2 R 22 A 2 S is the estimate of the signal covariance matrix.
8.3.1
A Computationally Efficient Transformation
In the application to DOA estimation, the structure of the signal covariance matrix is not of concern except it must be of full rank for the eigen-type
algorithms to apply. The spatial smoothing techniques will generally change
the structure of the signal covariance matrix and increase its rank. This is in
fact necessary when sources are fully coherent, because then rank is less than
the source count. From matrix theory, we know that the rank of the product
of two matrices is always less than or equal to the lower rank of the two matrices or
rank ( AB ) minimum [ rank ( A ), rank ( B ) ] .
(8.29)
1
Obviously, retaining R 22 in Equation (8.28) generally will not help to

increase the rank of the signal covariance matrix for coherent signals. Exclud1
ing R 22 simply effects a different S matrix. So we have justification for setting
1
p11 . Then the predicted autocovariance
R 22 I in the expression for R
matrix is redefined for the subarray processing as follows:
p11 w 12 R 12 R 21 ,
R
(8.30)
where w12 is the weighting factor. The function of w12 is twofold: it normal1
p11 to cancel the scaling affects arising from neglecting R
izes R
22 and reduces
noise propagation from the second subarray when its outputs have very low
S/N. Similar to the definition of w12 in Equation (8.22), we can define w12 by
q
w 12
Tr { R 12 R 21 }
----------------------- .
Tr { R 11 R 22 }
(8.31)
As another example, consider the special case of x1 x2. According to

Equation (8.30), this special case, which is approximately true when x1 and x2
are highly correlated, gives
149
p11 R 11 R 11 .
R
The above result implies that R12R21 can be approximately regarded as the
prediction of R11R11, the square of covariance matrix R11, which contains the
1
p11 not
same DOA information as R11. Omitting R 22 in the expression of R
only reduces the computational requirements but also avoids the numerical
errors introduced by the matrix inversion operation. Note again in the above
discussions that we assume that true crosscorrelations are known for the convenience of formulation. If only finite observations are available, the improved
estimate of R11 is given by
11 1-- ( R
p11 ) ,
11 R
R
2
(8.32)
p11 w 12 R
12 R
21 and R
ij is the estimated cross-covariance matrix.
where R
Similarly, the improved estimate of R22 is given by
22 1-- ( R
p22 ) .
22 R
R
2
(8.33)
Although we have only considered the case of two subarrays, the above
results apply to more than two subarrays. For example, if there are L subarrays,
the predicted covariance matrix of the ith subarray from the crosscorrelations
will be
L
pii
R
1
ij R
ji ,
----------- w ij R
L 1 j 1(j i)
(8.34)
ii will be
and R
ii 1-- ( R
pii ) .
ii R
R
2
8.3.2
(8.35)
Adding Forward-Backward Smoothing
Consider spatial smoothing techniques using a linear array with equally

spaced elements. In Du and Kirlin (1991), additional spatial smoothing meth150
ods have been proposed based on the approximations presented above.

Assume that the total array is sectioned into L (probably overlapping) subarrays. Then the estimates of the signal space eigenstructure of the squared covariance matrices of the L subarrays can first be improved in equations (8.34)
and (8.35) by using the improved estimate of the squared covariance matrix
for the ith subarray given by
L
ii -1 ij R
ji ,
R
w ij R
L j
1
(8.36)
where ii 1. Improvement using Equation (8.36) is then followed by averaging the L improved estimates of the squared subarray covariance matrices.
Thus the recommended forward and the forward-backward spatially
smoothed covariance matrices are given by
f --1-R
2
L
wij R ij R ji ,
(8.37)
i1 j1
and
fb ---1--R
2
2L
wij ( R ij R ji Rij Rji ) ,
(8.38)
i1 j1
ij is the sample forward cross-covariance matrix between

respectively, where R
the ith and the jth subarray, i, j 1, 2, , L and R ij is the associated backward
cross-covariance matrix.
Forward-backward smoothing is so-named due to the forward and backward linear prediction process for time series. Because of the assumed temporal stationarity, vectors can be taken with elements ordered either forward or
backward in time. Utilizing both improves the estimate variance. We can
apply the same concept to equally spaced arrays. The forward-predictor cova ii for the ith subarray. The backriance matrix is, as we have been discussing, R
ward-predictor covariance matrix R ii arises from the subarray displaced one
ii . Due to spatial-stationarity, the
sensor farther offset from the subarray of R
151
ii and R ii have identical expectation if there

exact location is not a concern. R
is an interchange of elements in R ii mirrored across both diagonals. That is,
ii } E { TR ii T } ,
E{R
where T
0 0 0 ...
1
0 0 0 ... 1 0 .
.
..
1 0
0
ii with ( R
ii T R
ii T)/2 R
ii R ii /2.
Thus it is reasonable to estimate R
Generally speaking, the outputs of all subarrays have similar S/N values.
Therefore, I am not concerned about the noise propagation problem introduced from using the crosscorrelations with some random signals of very low
S/N values. For this reason, I usually set wij 1.
There are some redundancies in equations (8.37) and (8.38) since the
crosscorrelations have been exploited more than once. An essential consideration in proposing the above equations is to make the new algorithms capable
of increasing the rank of the signal covariance matrix after spatial smoothing.
It has been proved in Du and Kirlin (1991) that when wij 1, the numbers
of subarrays required by algorithms (8.37) and (8.38) are the same as those
required by the conventional algorithms in Shan et al. (1985), Williams et al.
(1988), and Pillai and Kwon (1989).
To illustrate how the conventional methods ignore the crosscorrelation
matrices, consider a method for finding the forward, spatially smoothed covariance matrix using the covariance matrix of the overall array R CMM with
M denoting the number of sensors in the overall array. First, we form a band
matrix in R using m 1 diagonals or elements rij with i 1, ..., M and
|i j|, m, where m denotes the number of sensors in each subarray. Then
along the main diagonal of R, an m m square window slides down to sample this band matrix, yielding L m m submatrices, which correspond to L
autocovariance matrices of the L M m 1 subarrays. The average of
these L autocovariance matrices is the forward, spatially smoothed covariance
matrix using the conventional method. This shows that the conventional spa-
152
tial smoothing methods ignore the information in R, which lies outside the
band matrix (see Figure 8.1).
MM
Figure 8.1.
8.3.3
Conventional spatial smoothing method using autocorrelations of subarrays in the band matrix only. Correlations out of the band matrix which
correspond to the cross-subarray correlations are used in the suggested
smoothing.
Simulations
In the simulation, we use a uniform linear array of 16 sensors; the wavenumber is defined as sin . Three coherent signals with equal powers are
arriving from bearing angles 10, 14, and 80 degrees, respectively. The S/N,
which is defined as the signal-to-noise power ratio is 0 dB; the number of
snapshots is 64; the size of the subarray is 8; and the number of subarrays is 9.
For both the conventional methods and the new methods, we use the forwardbackward spatial smoothing scheme to obtain the smoothed covariance matrix
and apply the MUSIC algorithm to find the power spectra. Five independent
runs using the conventional forward-backward spatial smoothing method are
plotted in Figure 8.2. This figure shows that the two signals of the bearing
angles 10 and 14 degrees are not resolved. Using the same sets of data, the
results for the improved forward-backward spatial smoothing method are
153
shown in Figure 8.3, where the resolution of the two signals at close bearing
angles are achieved. This example illustrates that, by using the proposed
method, a more stable estimate of the covariance matrix can be obtained, and,
based on this estimation, the recently developed high-resolution algorithms
will achieve a better performance.
8.4
Spatial Smoothing Applied to Hyperbolic Wavefronts for

Velocity Estimation
To some extent it is safe to say that the velocity analysis is equivalent to the
DOA estimation and spectral estimation. However, high-resolution spectral
estimators have had limited success in this area since seismic reflections do not
satisfy some important assumptions on which these estimators have been
based. Rather, seismic reflections have nonplanar (hyperbolic) wavefronts and
are temporally transient. Therefore, they are neither temporally nor spatially
stationary.
The above features of seismic signals deserve special considerations when
applying modern array processing techniques.
Figure 8.2.
Five runs using conventional forward-backward spatial smoothing. Three

coherent signals from 10, 14, and 80; 16 sensors; subarray size 8;
S/N 0 dB; 64 snapshots.
154
Figure 8.3.
Five runs using the new forward-backward spatial smoothing method.

Three coherent signals from 10, 14, and 80; 16 sensors; subarray
size 8; S/N 0 dB; 64 snapshots.
The hyperbolic model of reflection time versus offset provides a means for
establishing the necessary velocity relationships. Based on the hyperbolic
model, several methods have been developed for this purpose (Robinson and
Treitel, 1980; Neidell and Taner, 1971). Semblance is perhaps the most
widely used velocity estimation method, as is outlined in the following subsection.
8.4.1
Semblance Review
Further to the discussion of semblance in Section 6.5, I give the following

review.
Semblance consists of performing a stack across the common-midpoint
(CMP) gather along various hyperbolic trajectories and calculating the reflection coherency. Assume that the velocity analysis is to be carried out with
respect to two-way traveltime t0 and the range of velocity to be covered by the
analysis is vmin to vmax. Then velocity analysis by the semblance method is carried out as follows for each t0 in the range considered:
155
1) Assume an initial stacking velocity v1 vmin and find a lag corresponding

to t0 and v1; then take a time gate of N 1 samples from each trace, symmetrically disposed about the aforementioned hyperbolic trajectory
(Figure 8.4).
2) Determine the in-gate sample covariance matrix R using the N 1 sample vectors obtained across the traces.
3) Measure the degree of match (coherency) between the traces at this alignment by the semblance coefficient
T
u Ru
-------------- ,
S c -Mtr
(R)
(8.39)
where M is the number of sensors and u is a column vector consisting of

all ones.
4) Determine the trial velocity increment by an appropriate step and calculate the semblance coefficient corresponding to the new hyperbolic trajectory and time window.
5) Repeat step (4) until vmax is reached for the current t0.
6) Determine the zero offset time t0 increment and repeat steps 15.
7) Repeat the above process until the appropriate range of time down the
data record has been covered.
The derived result is usually a contour plot of Sc on t versus v (traveltime
versus velocity) axes. Examples of such plots are given in Figures 8.68.8.
8.4.2
Semblance and The Conventional Beamformer
In Kirlin (1992), and in Chapter 6 of this work, I relate semblance to

high-resolution algorithms such as MUSIC. I will now show that the semblance algorithm is essentially a conventional beamformer with steering
implemented in the time domain but with a short time window applied
around the trial hyperbolic trajectory (Figure 8.4). First, a brief review on the
conventional beamformer.
The conventional beamforming method employs a procedure of delayand-sum processing to steer a beam in a particular direction. For the given
field of view, the beamformer scans all possible angles where sources may be
present, and calculates the array output power. Thus, for narrow-band signals,
156
Figure 8.4.
Forming a time gate for semblance analysis.

H
a ( )Ra ( )
P BF ( ) -----H-------------------- ,
a ( )a ( )
( j sin ) c
(8.40)
j ( M 1 ) sin /c
0
where a ( ) 1, e 0
for a uniform linear array
,e
where , , 0 , and c denote, respectively, the sensor spacing, steering direction, center frequency of the signal, and the propagation speed of the wave in
the medium. The locations of the peaks of the spectrum are interpreted as the
estimates of DOA. We can rewrite the above equation as
u R ( )u
-,
P BF ( ) -----------------M
(8.41)
and R ( ) is the steered covariance matrix defined as in Krolik and Swingler

(1989):
H
R ( ) T ( )RT ( ) ,
( jw sin ) c
( j ( M 1 )w sin )/c
0
where T diag 1, e 0
.
,e
The covariance matrix steering operation can also be implemented in the
time domain by adding appropriate delays to different traces. The time-
157
domain method is particularly useful if the signal is temporally wideband. As a

matter of fact, the in-gate covariance matrix used by the semblance method is
essentially the covariance matrix steered according to a hyperbolic trajectory
and estimated using the data in the time gate. Then we find that, at each time
step, the semblance velocity algorithm is actually a normalized (by the trace of
R) conventional beamformer, so that maximum output is unity. However, it
differs from the standard beamformer in several ways. First, the wavefront of
seismic events has a hyperbolic curvature. Only those wavefronts that match
the trial velocity possess the spatial shift invariance we need for spatial
smoothing. This is actually beneficial. Secondly, each reflection lasts only for a
very short time, and thus seismic events are transient and wide band (see Kirlin, 1991, for implications of violating these usual beamforming assumptions). The time gate formed in the semblance algorithm accommodates the
nonstationarity of seismic events such that unrelated data samples are
excluded.
8.4.3
The Optimal Velocity Estimator with Spatial Smoothing
Recently, several statistically optimal beamformers have been developed

(VanVeen and Buckley, 1988). Applying these optimal beamformers to the
velocity estimation problem will yield velocity estimation algorithms that are
superior to semblance in the sense of high discrimination power. However, we
note some special considerations.
The duration of the steering time gate is usually very short. A typical time
gate used in velocity analysis is about 48 ms. Thus only about 25 snapshots are
available to estimate the steered or in-gate sample covariance matrix if the
sampling period is 2 ms. The number of sensors, however, is usually much
larger than this number. In order to have a relatively stable estimate of a covariance matrix of size M, the number of snapshots should be at least five to ten
times larger than the matrix size. We cannot obtain more time samples, but we
can utilize spatial smoothing techniques to solve this problem. Although spatial smoothing originally was used to overcome the multipath problem in
DOA estimation, it can also be used to improve the stability of the estimate of
the covariance matrix by trading off (decreasing) the spatial aperture.
Two conditions accompany spatial smoothing techniques; first, the signals
must have a plane wavefront, and second, the subarrays must have the same
geometry. The above two conditions will make sure that the signal subspace
158
for each subarray is identical. If the seismic reflection is perfectly matched by a

hyperbolic trajectory, it does have a plane wavefront in this short time gate.
Furthermore, if we partition a linear array using the method in Shan et al.
(1985), the ideal signal subspace associated with these subarrays is identical for
a reflection which perfectly matches the trial hyperbolic curvature. On the
other hand, if the seismic reflection is not matched, both the signal and noise
components of the subarray covariance matrices are combined incoherently;
the resulting covariance matrix is more or less like a noise-only covariance
matrix, that results in low array output power.
The above analysis demonstrates the feasibility of the application of the
spatial smoothing technique to the velocity estimation problem. Next we will
see the positive effects of smoothing.
8.4.3.1 Enhancement of the Estimates of Covariance Matrices
Assume the time gate is formed at zero offset time t0 and steering velocity
vs. Let Im,M(i) be defined by
I m, M ( i ) [ 0 M i , I m , 0 M ( M m i ) ] .
The spatially smoothed m m covariance matrix can be written
Mm
1
T
R ( t 0, v s ) ----------------------- I m, M ( i )R M ( t 0, v s )I m, M ( i ) ,
Mm1 i0
(8.42)
where RM(t0,vs ) is the M M in-gate covariance matrix, with M denoting the

number of sensors, and m is the size of the subarray and R(t0,vs ).
The enhanced covariance matrix is given by rewriting Equation (8.36):
Mm
2
T
2 ( t , v ) -----------1-----------R
I m, M ( i )R M ( t 0, v s )I m, M ( i ) .
0 s
M m 1 i
0
(8.43)
The enhanced covariance matrix exploits crosscorrelations between subarrays.

The spatially smoothed covariance matrix or the enhanced spatially smoothed
covariance matrix can be directly utilized in the optimal beamformers.
159
8.4.3.2 The New Velocity Estimator

The Linearly Constrained Minimum Variance (LCMV) beamformer is a
very general approach applicable in many situations. We will apply the LCMV
method to the velocity estimation problem using the R(t0,vs ) in
( t , v ) in Equation (8.43). For the semblance method,
Equation (8.42) or R
0 s
the original covariance matrix RM(t0,vs ) is used.
The idea behind the velocity estimation procedure using the LCMV
method is essentially a hypothesis test process; i.e., we assume that the desired
signal has the same velocity as each trial velocity vs at the given two-way travel
time t0. In other words the desired signal is assumed to be accurately flattened
in the time gate. The constraint that allows the desired signal to pass without
attenuation can be translated into
H
u w 1,
(8.44)
where w is the weight vector of the beamformer. If this is the only constraint
used, we obtain the optimal weights
1
R ( t 0, v s )u
w ---H----------1----------------- ,
u R ( t 0, v s )u
(8.45)
and the array output power is then

1
P ( t 0, v s ) ---H----------1----------------- .
u R ( t 0, v s )u
(8.46)
2 ( t , v ) . We use
R(t0,vs) in the above two equations can be replaced with R
0 s
the array output power to indicate the degree of match between the data and
the pre-assumed signal or the confidence level of the hypothesis test. If information regarding interferences is available (e.g., through interactive seismic
processing software, strong interferences are visible and can be approximately
measured), more constraints can readily be added into the constraint matrix
such that the new velocity estimator has zero response to these known interferences. This feature makes the new estimator especially suitable for interactive
seismic processing software packages.
160
8.4.4
Comparison of Coherency Measure Threshold Discrimination
As Neidell and Taner (1971) argued, noise present on the data channels
affects coherency measures primarily through the apparent amplitude and
shape diversity it creates. The precise character of the effects depends on the
noise statistics and the signal-noise interactions. Therefore, it is reasonable to
perform an experiment on noise-free data to establish in some sense the discrimination or resolution of semblance and the optimal velocity estimator.
Figure 8.5 depicts a synthetic CMP gather. These data are specially designed
to test the resolving power and discrimination threshold of candidate coherency measures. Each region indicated in the data contains a doublet; the
pulses are separated by 20 ms in two-way time and the rms velocities differ by
200 ft/s (60 m/s). All events are Ricker wavelets with dominant frequency at
40 Hz. Since the objective of this computation is a resolution test rather than
the modeling of physical reality, the time separation and velocity increments
for the doublets have been chosen so that the trajectories tend to cross. The
sensor array used is a uniform linear array of 32 sensors with sensor spacing
200 ft (60m). The data are recorded from 0.8 s to 2.8 s, and the sampling
period is 2 ms. All calculations use a 10 ms time step and a 48 ms time gate,
but different coherency measures. Contoured results will be presented for final
comparisons. Table 8.1 summarizes the data characteristics. Figures 8.6 and
8.7 show the computed velocity spectra by the semblance method and the
new method with the conventional spatial smoothing. The final size of the
covariance matrix for the new method is 20. Since this is a noise-free case, the
improvement of the spatial smoothing is not necessary. In each contour plot
the coherency measure has been normalized to have unity peaks. Ideally the
contour centers should be located at the correct events parameterized by the
two-way traveltime t0 and the velocity v. The conventional semblance method
does not have clear contours at the correct locations, but the new method
does. Thus we conclude that the new method has better properties of resolution and parameter identification than semblance.
Table 8.1 Parameters for data in Figure 8.5.
Event no.
t0 (102 ms)
9.2
15
15.2
22
22.2
vs (103 ft/s)
8.2
9.2
10
10.2
161
Figure 8.5.
Figure 8.6.
Noise-free synthetic CMP gather for discrimination threshold test.
Contour plot of the velocity spectrum using conventional semblance

(clean data).
162
Figure 8.7.
Contour plot of the velocity spectrum using the new velocity estimator
with conventional spatial smoothing (clean data).
It has been noted that semblance is a normalized coherency measure while

the new estimator can produce a coherence measure of positive infinity. In
order to clarify the possible doubt that the performance improvement of the
new method is merely obtained by changing the coherence measure value
range, we modify the semblance coefficient by
1
m
S c -1--------- Sc ,
(8.47)
such that the modified semblance S c has the same range as the new estimator.
The contour plot for the modified semblance is given in Figure 8.8. This figure shows that either the resolution of the three doublets is not achieved or the
estimates are very much biased. This example shows that the performance of
the new method is credible.
Since the new velocity estimator involves the matrix inversion operation,
it requires more computation than semblance. When the covariance matrix is
163
Figure 8.8.
Contour plot of the velocity spectrum using the modified semblance

(clean data)[equation(8.47)].
singular or nearly singular, the pseudo-inverse is used in place of the inverse

operation.
8.4.5
Discussion
Having applied spatially smoothing to the rms velocity estimation. Conventional semblance was found to be a scaled conventional beamformer. We
proposed an optimal velocity estimator based on LCMV beamformers, using
spatial smoothing and enhanced spatial smoothing to improve the estimate of
the steered covariance matrix. Comparing conventional semblance and the
optimal velocity estimator shows that the latter performs much better than the
conventional semblance in discriminating between close events.
8.5
Toeplitz and Positive Definite Constraints for

Covariance Enhancement
Cadzow (1988) showed that certain mappings of properties of covariance

matrices are closed and can be used to enhance features of interest. Of particu-
164
lar interest is an alternating approximation to a rank p and positive definite

matrix beginning with the sample covariance matrix Cx of interest.
First, to satisfy the rank p requirement, use the mapping
p
(p)
Cx
k uk vHk
k1
(k)
F Cx ,
(8.48)
where k are the p largest (assumed distinct) singular values of Cx, and uk and
*
u k are the corresponding left and right singular vectors (see Chapter 3). This
mapping yields the p-rank matrix closest to Cx in the Frobenius norm sense
(p)
[sum of squares of differences of Cx (i, k) and C x ( i, k ) for all i, k].
+
Similarly, the n n Hermitian matrix C x which is positive definite and is

closest to the n n Hermitian matrix Cx, is given by
p
+
Cx
k uk uHk
F Cx ,
(8.49)
k1
where p is the number of positive eigenvalues of Cx (assumed that

1 2 p 0).
Next, suppose we wish to find the Hermitian-Toeplitz matrix closest to
Cx. Then we find the matrix A which would transform the necessary unique
number of elements into a vectorized (complex) Toeplitz form. For example
(T)
Cx

1 2 3
(T)
*2 1 2 x
* *
3 2 1
165
1
2

3
1
*
2R
2
1 A 2I
2
3R
*
3I
3
2
1
(8.50)
where
A
1
0
0
0
1
0
0
0
1
0
1
0
1
0
1
0
1
0
0
j
0
j
0
j
0
j
0
0
0
1
0
0
0
1
0
0
0
0
j
0
0
0
j
0
0
(8.51)
Thus, the vectorized Cx is

x
(T)
A(A A)
1
A x F
(T)
Cx .
(8.52)
Lastly, we often wish to find the matrix Cx(p) which is closest to Cx, has rank p,
and has its n p smallest eigenvalues equal. This has been shown in section
4.2.1 to be given by
p
Cx (p)
H
k uk uk
k1
2
n
uk uk F(p) Cx ,
(8.53)
k p1
where
2
n
1
---------np
k
(8.54)
k p1
and k and uk denote the eigenstructure of Cx.

The concluding requirement in the application of any of the above transformations is that they be used in sequence and the result iterated. Thus the
kth iterated approximation to a rank-p Toeplitz-Hermitian positive definite
matrix is given by
166
x ( k ) F(T) F C
C
(p) x( k 1 ) ,
(8.55)
where we assume in F(p) that only p positive eigenvalues are used, and in F(T)
that Hermitian-Toeplitz form is produced.
Some spectral estimation or bearing estimation results of the iterative
method are given in Cadzow (1988). It is not claimed that the iteration converges to a maximum-likelihood solution. In fact, it generally would not.
Another difficulty is in the selection of rank p. If p is chosen too large, the
method will faithfully give a rank p solution, when perhaps only p 1 solutions are true. As in most seismic processing applications, good-sense interpretation is a requirement. However, we note that when p is known or correctly
chosen, just a few iterations give enhanced results.
8.6
References
Cadzow, J. A., 1988a, Signal enhancementA composite property mapping

algorithm: IEEE Trans. Acous., Speech and Sig. Proc., 38, 4962.
1988b, A high resolution direction-of-arrival algorithm for narrowband coherent and incoherent sources: IEEE Trans. Acous., Speech and
Sig. Proc., 36, 965979.
Du, W., 1992, High resolution algorithms for spectral analysis and array processing: Ph. D. Dissertation, University of Victoria.
Du, W., and Kirlin, R. L., 1991, Improved spatial smoothing techniques for
DOA estimation of coherent signals: IEEE Trans. Acous., Speech and
Sig. Proc. 39, 12081210.
Eaton, M. L., 1983, Multivariate statistics: John Wiley & Sons, Inc.
Kirlin, R. L., 1991, A note on the effects of narrowband and stationary signal
model assumptions on the covariance matrix of sensor array data vectors: IEEE Trans. Signal Processing, 39, 503506.
1992, The relationship between semblance and eigenstructure velocity
estimators, Geophysics, 57, 10271033.
Kirlin, R. L., and Du, W., 1991, Improvements on the estimate of covariance
matrices by incorporating cross-correlations, Proc. IEE-F, Radar and
Signal Proc., 138, 479482.
167
Krolik, J., and Swingler, D., 1989, Multiple broad-band source location using
steered covariance matrices: IEEE Trans. Acous., Speech and Sig. Proc.,
37, 14811494.
Neidell, N. S., and Taner, M. T., 1971, Semblance and other coherence measures for multichannel data: Geophysics, 36, 482497.
Pillai, S. U., and Kwon, B. H., 1989, Forward-backward spatial smoothing
techniques for the coherent signal identification: IEEE, Trans. Acous.,
Speech and Sig. Proc., 37, 815.
Robinson, E. A., and Treitel, S., 1980, Geophysical signal analysis: PrenticeHall, Inc.
Shan, T. J., Wax, M., and Kailath, T., 1985, On spatial smoothing for direction-of-arrival estimation of coherent signals: IEEE Trans. Acous.,
Speech and Sig. Proc., 33, 806811.
Tufts, D.W., Parthasarathy, S., and Kumaresan, R., 1991, Effect of predictor
order on the accuracy of frequency estimates, in Haykin, S., Ed.,
Advances in spectrum analysis and array processing, 1: Prentice-Hall,
Inc., 114140.
VanVeen, B. D., and Buckley, K. M., 1988, Beamforming: A versatile
approach to spatial smoothing: IEEE Acous., Speech and Sig. Proc.
Magazine, 424.
Wang, H., and Kaveh, M., 1985, Coherent signal subspaces processing for the
detection estimation of angles of arrival of multiple wide-band sources:
IEEE Trans. Acous., Speech and Sig. Proc., 33, 823831.
Williams, R. T., Prasad, S., Mahalanabis, A. K., and Sibul, L. H., 1988, An
improved spatial smoothing technique for bearing estimation in a multipath environment: IEEE, Trans. Acous., Speech and Sig. Proc., 36,
425432.
168
Chapter 9
Waveform Reconstruction and Elimination of
Multiples and Other Interferences
R. Lynn Kirlin
Most chapters in this book deal with methods that enhance either the estimation of wavefront phase velocities (directions of arrival) or the covariance
matrices from which those and other parameters are inferred. Occasionally the
exact waveform from a single source or reector is the desired result. This
brings a few problems into play:
1) Multiple removal, where a reection from a single boundary appears to
have more than one arrival because it has been caught in a waveguide layer
and transmits part of its energy to the surface with each cycle of reection
within that layer;
2) Secondary interference removal, where uncontrolled sources outside of the
central seismic experiment cause wavefronts to be superimposed on the
desired reections;
3) Thin bed resolution, where features of distinct reections from the top
and bottom of the bed are to be individually analyzed but the waveforms
are considerably overlapped in the prole; and
4) Separation of up- and down-going waves in vertical seismic proling.
Many techniques have been developed to deal with each of these problems. Generally the idea is to lter out the unwanted waveforms. Often this is
done following transformation of the data within the processing window into
another domain where wavefronts at different velocities may be more easily
distinguished. Those deemed to be interference are nulled out with the appropriate two-dimensional lter and the remaining data are inverse transformed.
In almost all such schemes, whole bands of f-k or -p data are passed or
stopped.
169
10Chapter9.indd 1
12/3/09 2:08 PM
The high-resolution methods described in this book are based on the

assumption that wavefronts in the processing window are nite in number.
Each is considered to have its own instantaneous power and direction of
arrival or velocity. Typically, only one of the wavefronts is considered to be the
signal, and all others are deemed interference. However, it is possible to have
more than one assumed signal, e.g., with reections from thin beds, when separating up- and down-going waves from all others, or when leaving all others
while removing a multiple. The signal subspace techniques presented in earlier
chapters of this book allow each signal wavefront to be selected and recreated
individually, whether or not its velocities fall in a continuous passband.
Next, I will show how such signals can be reconstructed and then qualitatively compare the computations and results with those of currently accepted
methods, including f-k ltering, Hampsons method, and the radon transform.
9.1
Signal-Plus-Interference Subspace
We begin with the data model of equation (4.24), using time index i,
x ( i ) As ( i ) n ( i ) ,
(9.1)
where A is an M r matrix, the columns of which are the delay vectors, one
for each of the total of r source wavefronts. Vector s(i) contains the r source
signals (waveforms), some of which we would like to recover. Lastly, n(i) is a
2
vector of additive, independent, white Gaussian noise with variance n at
each sensor. The r sources may include multipaths, in fact, all seismic reections from the same source are quite coherent, thus the elements of the sourceplus-interference (SPI) covariance matrix PSI indicate such coherence. We
assume zero-mean signals and interferences so
H
P SI cov ( s ) E { s s } ,
(9.2)
and the data covariance matrix is

H
R x AP SI A n I
(9.3a)
170
10Chapter9.indd 2
12/3/09 2:09 PM
V SI SI V
(9.3b)
SI
Following the concepts of Section 4.2, we may assume that the eigenstructure
of Rx allows separation into signal and noise subspaces, giving for signal-plusinterferences eigenvectors the r columns of VSI and for associated eigenvalues
the r largest diagonal elements of SI. When the sources are highly coherent,
not only in frequency but also in space and time, we presume that either spatial smoothing, frequency-focusing or some other process has been applied to
cause the transformed SPI subspace covariance matrix to have full rank (r). See
for example Shan and Kailath (1985). (Note that for curved wavefronts and
partially nonoverlapping reections, total coherence is not a concern; this is
the case in common midpoint gathers, other prestack data and actually most
typical situations.) For further details on dealing with coherent sources see
Chapter 6.
Suppose then that we have identied the SPI subspace and that, by some
high-resolution means, such as those in Chapter 5, we have found the r solu
tion estimates for velocities of the r wavefronts; that is, we have an estimate A
of A in equation (9.1). From equation (4.10) we have that the estimate of the
SPI covariance matrix is
HA
) 1 V
SI
SI V
H
H
SI ( A
P
SI ( A A )
1
(9.4)
However, we are actually interested in reproducing (within the processing

window) the waveforms of the signals, the number of which we will say is rS <
r. We need only to designate which rS of the r SPI solution velocities are the
signals. We assume that the r solutions are now ordered so that the rst rS in
the order are signal delay vectors. Thus we partition the matrix of delay vectors:
A ( A S :A I ) ,
(9.5)
Where AS is M rS and AI is M (r rS).

It is evident that the vectors which are the columns of AS span the signal
subspace, and columns of AI span the interference subspace. These two
subspaces almost certainly overlap (are not orthogonal). The remaining
171
10Chapter9.indd 3
12/3/09 2:09 PM
M R dimensions dene the noise subspace, wherein all vectors are orthogonal to the SPI subspace.
9.2
Conventional Waveform Estimators
There are a number of algorithms that can reproduce estimates of any of

the waveforms, both those assumed to be signal and those assumed to be
interference (Scharf, 1991, Ch. 9; Haimovich and Bar-Ness, 1991). Often
both signal and interference are lumped together and solved for simultaneously. For example, an unconstrained least-squares solution of
equation (9.1) gives
H
s ULS ( i ) ( A A )
1
A x(i) ,
(9.6)
where the r elements of s ULS ( i ) are the estimates of the r waveforms at time
sample i.
The next most common solution may be the Wiener solution, where we
utilize knowledge of exact or approximated data and signal covariance matrices. This estimate is the minimum-mean-squared-error solution
H
1
s W ( i ) P SI A R x x ( i )
(9.7)
The effect of utilizing information regarding the independent noise and

the covariance of signals (all assumed stationary) is evident with the appear1
ance of PSI and R x in the solution as opposed to equation (9.5). For a scalar
2
2
2
measurement and a single signal this gives the familiar s ( s n ) factor.
9.3
Subspace Estimators
Even the Wiener solution does not utilize all possible a priori information;
further, there are other optimality criteria of interest. Because we know that a
wavefront is planar (or hyperbolic), estimation of the appropriate parameters
allows us to make a better estimate of the signal covariance matrix needed in
equation (9.7). In fact in blind processing we have no a priori estimate of PSI.
Rx can of course be estimated with the data in the processing window. Further,
without an examination of eigenvalues or singular values of the data, we have
no good guess as to the number of wavefronts present. However, it is some-
172
10Chapter9.indd 4
12/3/09 2:09 PM
times assumed, even realistically, that there is only one strong reection in the
window.
It has been shown, for example (Ottersten et al., 1989), that when sample
SI from equation (9.4) in
size is not large, use of the low-rank (r) P
2
x P
SI
R
nI
(9.8)
x than does the raw unconstrained

gives much lower mean-squared error in R
x.
or unstructured covariance estimate of R
x and the subspace estiThe mean-squared error (mse) for both the raw R
mate approach the same asymptotic limit,
2
1 2
mse SI n T r { P SI
H
1
1
1 2
A R x A P SI
},
(9.9)
where A ( A A ) A is the pseudo-inverse. Further, the mse without

any estimate of the signal covariance [equation (9.6)] is
H
mse ULS n Tr { ( P SI A A )
1
(9.10)
SI is shown to be
The asymptotic improvement using the low-rank P
about an order of magnitude for a two-source uncorrelated problem when
S/N 0 dB. When S/N is higher (10 dB or above), asymptotic mse's are
identical, but either the subspace or ULS estimators converge much faster
than using equation (9.7) without a structured Rx.
9.4
Interference Canceling
In the preceding, we assumed that all sources, both signals and interferences, were to be estimated jointly. However, the purpose of this chapter is to
optimally enhance just one of a number of signals in the presence of others. In
seismic processing only one wavefront for any one two-way traveltime can be
considered signal, although multipath energies ideally might be combined for
maximum S/N recovery. In the following, I will consider that only one of the
r sources is desired and others are interferences; thus rs 1, rI r 1.
In many cases, we not only select which wavefront is considered signal but
also resample the data, interpolating after timeshifting it so that the desired (or
173
10Chapter9.indd 5
12/3/09 2:09 PM
trial) wavefront appears to be arriving simultaneously at all sensors. I have

related several advantages to time-shifting in Chapter 6, and in Section 6.9 I
mentioned the eigenstructure-based interference canceler of Haimovich and
Bar-Ness (1991). Now, here are a few more details from that presentation.
After signal-blocking the preattened data x with the (M 1) M
matrix
1 1 0 0
0 1 1 0
C .
.
0
1 1
(9.11)
the auxiliary data vector of length M 1

x a Cx CAs Cn
(9.12)
contains no signal. It does, however, contain the r 1 interfering wavefronts

and transformed white (now nonwhite) Gaussian noise Cn assuming the original noise vector is n.
The (M 1) (M 1) correlation matrix of the auxiliary data is
H
R a CA I P I A I C n CC ,
(9.13)
where PI is the covariance matrix of the interferences, which we assume are

less than perfectly coherent.
We wish to design a weight vector w that will multiply xa so that the
removal of wHxa(i) from the stack, 1Tx(i), leaves an optimal estimate of the
scalar signal s(i). That is, suppressing time index i,
s 1 T x w H x a s ref w H x a ,
(9.14)
where sref 1Tx, the raw stack.

174
10Chapter9.indd 6
12/3/09 2:09 PM
Depending on which optimization is chosen, the weight vector will

change. Three optimizations are dened.
1) Minimum mean-squared error
H
E { s w xa } .
(9.15)
2) Maximum output S/N:

2
H
T
s w w
w IE { ss* }I w
S N ------------H------------------ ----H---------- .
w Ra w
w Ra w
(9.16)
3) Minimum output interference-plus-noise (IPN), constrained for unity

array gain and assuming a attened signal
H
min w R a w subject to w 1 1
w
.
(9.17)
In all three cases it is required that

H
w CC E I 0 ,
(9.18)
which says that the weight vector is orthogonal to the transformed (by C)
interference subspace eigenvectors, columns of EI, where EI is dened by the
generalized eigenstructure of the matrix Ra,
H
R a e CC e .
(9.19)
In equation (9.19), is a generalized eigenvalue, and e is the associated generalized eigenvector (g-eigenvalues and g-eigenvectors). Haimovich and BarNess (1991) state that Ra (which is (M 1) M) will have rI g-eigenvalues
2
2
larger than n and M rI 1 g-eigenvalues equal to n . The M rI 1
associated noise-subspace eigenvalues, columns of En, are orthogonal to the
interference subspace, which is spanned by the transformed g-eigenvectors
CCHEI. Thus a generalized eigendecomposition is required for the three optimizations equations (9.15) and (9.17).
175
10Chapter9.indd 7
12/3/09 2:09 PM
Next we perform an SVD on CCHEI and denote as QI the matrix whose

columns are the rI singular vectors associated with the rI nonzero singular values, and Qn the matrix whose columns are the M rI 1 singular vectors
whose columns are associated with the M rI 1 singular values which
equal zero. That is,
H
CC E I QV
O H.
( QI Qn ) I
V
O O
(9.20)
Having obtained the analyses of equations (9.19) and (9.20), the solutions for
the three optimized weight vectors are
1) Minimum-mean-square error
2
Q Q 1
w mmse ---------s------n---2---n--------- ;
2
H
2
s Qn 1 n
(9.21)
2) Maximum S/N
H
w SNR
Qn Qn 1
--------H------- ;
Qn 1
(9.22)
3) Constrained minimum variance

H
w MV
Qn Qn 1
--------------2- .
H
Qn 1
(9.23)
Note that all three solutions differ only by a scale factor.

Discussion in the referenced literature mentions that although ideal (total
or maximum) cancelation of interferences (wMV) may be desirable in some
cases, output S/N is actually decreased because sidelobe peaks are higher. Our
preference is to use either the maximum S/N or mmse solutions.
176
10Chapter9.indd 8
12/3/09 2:09 PM
As with the subspace methods of Section 9.3 the exploited a priori knowledge of the structure of the signal and interference wavefronts gives a much
faster approach to the asymptotic optimum results than for the conventional
methods that use only raw estimates of covariance matrices and no subspace
analyses. For the short time records of a seismic reection this is a signicant
advantage.
9.5
Hampsons Multiple Elimination Method
In Hampson (1986), the author builds on earlier papers by Thorson

(1984) and Thorson and Claerbout (1985), wherein the seismic prole data
are modeled as a linear combination of simple hyperbolic events of constant
amplitude. Hampson approximates the events, following attening with
NMO, as parabolas, thereby allowing a Fourier transform that reduces computations to feasibility. The equation given by Thorson,
d ( h, t )
dp drU ( p, ) (
2 2
t p h ) n ( h, t ) ,
(9.24)
becomes after the parabolic approximation,

d ( h, t )
dp drU ( p, ) ( ( t ph2 ) ) n ( h, t ) ,
(9.25)
wherein
d(h, t) measured seismogram at offset h and two-way time t,
U(p, t) hyperbolic transform coefcient at slowness p and zero-offset time t,
and
n(h, t) measurement noise at offset h and two-way time t.
The -function is a hyperbolic or parabolic edge in the t, p plane. After
performing a discrete Fourier transform (DFT) on U(p, t) from the -domain
to , and discretizing in p to Np values, equation (9.25) transforms to
d ( h, w )
p U ( p, )ej ph
n ( h, ) ,
(9.26)
where
177
10Chapter9.indd 9
12/3/09 2:09 PM
U ( p, )
j
U ( p, )e .
(9.27)
The inverse DFT for discretized is

d ( h, t )
j t
d ( h, )e .
(9.28)
By also discretizing offset h to Nh values, equation (9.26) can be vectorized as

d LU n ,
(9.29)
where L is Nh Np with elements

L ( i, k ) e
j p k h i
, l k N p , and 1 i N h ,
(9.30)
d is an Nh-element column vector with elements d(hi, w), and U is an Np-element column vector with elements Uk U(pk, w). The unconstrained leastsquares solution for the unknown U is
T
U (L L)
1
L d.
(9.31)
By nding U from equation (9.31), the estimate d ( h, ) can be determined by equation (9.26) for each , and then d ( h, t ) found using
equation (9.28).
The 2-D plot of U(p, t) as determined by equation (9.28) can be used to
lter slowness by selecting acceptable values. Hampson uses high-passed p-values (recall that data were originally attened) to estimate multiples, and then
subtracts their reconstruction of d from the original to get estimates of primaries that when attened had nearly zero move-out.
9.6
Structural Comparison of the Subspace Methods to

Hampsons Algorithm
Similarities of Hampsons (1986) method to the eigenstructure methods

can be seen. In the eigenstructure methods, a structurally identical L matrix is
178
10Chapter9.indd 10
12/3/09 2:09 PM
employed, but a nite number of wavefronts are assumed to compose d(hi, t).
In the narrowband model (appropriate particularly after attening) a wavefront sk(t) (the kth element of U) with slowness pk arrives at the ith sensor as
s k ( t )e
j p k h i
In the wideband model the DFT of sk(t) gives at the ith sensor the Fourier
transform value
s k ( 0 )e
2
j 0 p k h i
at each 0 .
The most signicant difference between Hampsons method and the subspace methods is that with eigenstructure the model assumes an unknown but
xed number K of wavefronts to be found, and therefore the L matrix has
exactly K columns (wherein p pk, k 1, 2, ..., K) rather than one for each
trial pk, and U has exactly K elements, one for each wavefront. Otherwise
equation (9.29) is identical to MUSICs wideband model at a single given frequency.
Other major differences between Hampsons and the subspace methods
are the subspace assumptions of (1) stationary processes and (2) the treatment
of the elements of U as random variables rather than deterministic constants.
Thus, for the eigenstructure methods we have the K-column matrix L with kth
column Lk indicating the delays at the offsets hi as given in equation (9.30),
and the U vector containing exactly K complex source signals sk(t). Thus, we
have in the narrowband model
s (t)
1
s2 ( t )
U .
.
.
sk ( t )
(9.32a)
179
10Chapter9.indd 11
12/3/09 2:09 PM
or for the wideband model at each selected frequency w0, the elements of U
are K source Fourier transform values:
S ( )
1 0
S2 ( 0 )
.
U
SK ( 0 )
(9.32b)
A means of coherently combining information from multiple frequencies has

been discussed in Section 6.3. In the wideband model, all signicant frequencies are focused coherently on the single frequency w0, and the U vector is as
in equation (9.32b). It is also possible to nd the K solutions for pk independently for each and noncoherently produce an overall estimate, but this
approach is nonoptimal. For nearly preattened data, methods of the preceding sections that assume narrowband wavefronts may be appropriate.
At this point we proceed with U as in equation (9.32b) to make the comparisons we seek. Rather than solving equation (9.29) for the least-squares
solution equation (9.31), the subspace methods rst nd the covariance of elements of d:
H
R d E { dd* } LE { UU }L E { nn }
H
(9.33)
LR u L I ,
wherein the meaning of Ru and 2 is implicit. Finding the eigenvalues i and
corresponding eigenvectors vi of Rd and using the properties given in
Chapter 4, we have, ideally,
K
Rd
( k 2 ) vk v*k 2 I ,
(9.34)
k1
where
2
1 2 K K 1 Nh .
(9.35)
180
10Chapter9.indd 12
12/3/09 2:09 PM
Thus LRuLH, the signal covariance matrix, is given by the eigenstructure

related to the rst K eigenvalues. Further, the solution vectors Lk, the columns of L, and their number K are found, for example, with MUSIC by
th
searching over possible p values, or with root MUSIC by solving an N h -order
polynomial; K of the p values (ideally) give Lk orthogonal to the noise space
eigenvectors vi, i K 1,., Nh:
Nh
2
H
L k v i 0 ,
(9.36)
i K1
where approximation is used due to the fact that the v i are estimates of vi
from a noisy covariance matrix estimate of Rd. We note that, as in Chapter 4
root MUSIC is faster. We note again that rather than solving U for all values
of p, the subspace methods reduce these equations to only K.
Having found the K solution vectors Lk, one or more of these is considered signal and the others are considered interference. Reconstruction of signal alone is accomplished via one of the subspace methods given in
Section 9.3 or Section 9.4.
The accuracy of using only one value of p to describe one reection is a
function of how well the parabolic model ts. Further, if wavefronts are not
well attened, more than one p per waveform is more appropriate. This is the case
for multiples, and some experimentation may be required.
9.7
Discussion on Hampsons versus Subspace
Even though both Hampsons and subspace methods are based on the
same equations, considerable differences in application result. Both require a
DFT of each trace unless a narrowband subspace method is used, both give
estimates of slowness or velocity, both allow separation of signal and interference, and both are optimum in different senses for dealing with noise.
Computationally, Hampson requires the solution of Np equations for each
; MUSIC requires nding the eigenstructure at one or more frequencies of
an Nh by Nh covariance matrix and then solving for K solution values of p,
where K is usually much less than Np. Some subspace methods do not require
eigenstructure; for example QR decomposition to nd the subspaces is much
faster. In many methods solving for the K values of p is much faster than solv-
181
10Chapter9.indd 13
12/3/09 2:09 PM
ing Hampsons equation (9.31). The covariance matrix must also be estimated, but this is not computationally demanding.
Three major differences appear:
1) Hampsons method produces an estimate of U(p, t) after the inverse
DFT, and this is convenient for reproducing parts or all of the data. In
contrast MUSIC will easily produce |U(p, )|2, and the zero-offset signal amplitudes must be determined by a separate reconstruction process such as presented in Sections 9.4 and 9.5, but the size of vectors
and matrices involved is smaller than equation (9.31).
2) Subspace methods undoubtedly will give improved resolution for
velocity picking thereby allowing separation of more similar reections. However, the accuracy is a function of the parabolic t.
3) Broadband subspace methods require covariance estimation with samples of d(h, ). The eigenstructure literature suggests each trace be segmented in time; each segment yields a sample of d(h, ) to be used in
the covariance estimate. However, for nonstationary processes such as
primary seismic reections, this is not appropriate except for multiples
which reoccur down the trace. Typically a processing window is
stepped down through time and solutions found at each step. Solutions for U(pk, tk) are assumed to be for the center of the time span of
the window, although Li (Chapter 7) gives a method of explicitly solving for both Pk and tk for single wavefronts.
In summary, Hampsons method generally is applied to DFTs of whole
gathers, having rst been processed to NMO. It solves for all values of zerooffset reections versus time and can be used to band-pass or bandstop on the
parameter p before reconstruction into the time domain.
Subspace methods utilize a model having the same equations, but assume
that only K wavefronts are present in the windowed data. By one of several
means (Chapter 4, Chapter 6) some of which are quite fast, the number K and
the corresponding pk are found. These algorithms operate on the data covariance matrix, easily obtained from either time-domain (narrowband or preattened) or frequency domain (broadband). Typically, subspace processing data
windows span only a short time interval and the window must be moved to
process for all two-way times of interest. For parameterization of multiples,
the whole record can be processed in one window, but subsegments must give
sample DFTs for the covariance matrix estimation. An alternative to segment-
182
10Chapter9.indd 14
12/3/09 2:09 PM
ing for samples is to use multiple frequencies for samples. We have not experimented with estimation or suppression of multiples.
The use of a priori knowledge, giving the covariance matrix structure, is
an advantage that was emphasized in Section 9.3, where faster convergence to
the mmse was noted. This is important because of the nite duration of a primary reection.
More complex methods of signal reconstruction (weighted stacking), such
as presented in Section 9.4, require more computation yet allow optimal estimation of each waveform, considering all others as interference. This can be of
signicance when ne features of the waveform are to be interpreted.
The above comparisons also apply to the Radon Transform (Beylkin,
1987) which converts two-dimensional data much the same as Hampson's
procedure. Typically applied to a whole gather (perhaps NMO corrected), the
Radon transform also allows band-pass and band-stop ltering in p before
inversion back into the space-time domain. FK ltering uses 2-D FFTs and
wedge-shaped stop or passband ltering, and is most appropriate for plane
wavefronts. Again the advantages of subspace processing over FK lie in the
structured estimate of the signal covariance matrix and exclusion of noisespace energy from the estimate through limiting signal-plus-interference
dimensionality to K Np.
9.8
References
Beylkin, G., 1987, Discrete radon transform: IEEE Trans. Acous., Speech and
Sig. Proc., 35, 162-172.
Haimovich, A. M., and Bar-Ness, Y., 1991, An eigenanalysis interference canceller: IEEE Trans. Acous., Speech and Sig. Proc., 39, No. 1, 76-84.
Hampson, D., 1986, Inverse velocity stacking for multiple elimination: J.
Canadian S.E.G., 22, 44-55.
Ottersten, B., Roy, R., and Kailath, T., 1989, Signal waveform estimation sensor array processing: Proc. Asilomar Conference, WA5-6, 1-5.
Scharf, L. L., 1991, Statistical signal processing: Addison-Wesley, Publ. Co.
Shan, T.-J., and Kailath, T., 1985, Adaptive beam-forming for coherent signals and interference: IEEE Trans. Acous., Speech and Sig. Proc., 33,
537-536.
Thorson, J. R., 1984, Velocity-stack and slant-stack inversion methods, Ph.D.
Thesis, Stanford Univ.
183
10Chapter9.indd 15
12/3/09 2:09 PM
Thorson, J. R., and Claerbout, J. F., 1985, Velocity-stack and slant-stack stochastic inversion: Geophysics, 50, 2727-2741.
184
10Chapter9.indd 16
12/3/09 2:09 PM
Chapter 10
Removal of Interference Patterns in Seismic
Gathers
William J. Done
The Karhunen-Loeve Transform (KLT) technique is frequently applied to
a variety of seismic data processing problems. Initial seismic applications of
the KLT (Hemon and Mace, 1978; Jones, 1985) were based on principal components analysis, usually on stacked data. A subset of the principal components obtained from the KLT of a seismic data set is used to reconstruct the
data. Using the dominant principal components in the reconstruction emphasizes the lateral coherence which characterizes poststack seismic data. Using
subdominant principal components during reconstruction can emphasize
detail in the result by eliminating the strong lateral coherency carried by the
high-order principal components. The usual approach to principal components analysis is also used to suppress random noise in the final reconstruction
by always eliminating the low order principal components from any reconstructions. These low-order principal components contribute to the randomness in the data.
The three applications described in this chapter, however, use the eigendecomposition methods of the KLT in a manner more closely associated with
interference canceling (Widrow and Stearns, 1985). The reader is also referred
to Haimovich and Bar-Ness (1991). In all three examples, the goal is the suppression of coherent noise in the seismic data. The coherent noise forms an
interference pattern in the seismic data that is correlated in the spatial and
temporal domains. Suppression of this interference is accomplished by first
estimating the coherent noise and then subtracting that estimate from the
original data, the difference being an estimate of the desired seismic signal
components.
185
10.1 An Interference Cancelation Approach

The procedure used for the KLT based interference suppression scheme
demonstrated here makes use of a model similar to the model used in the
adaptive interference canceling model in its use of subtraction to suppress
noise. Figure 10.1 depicts the model upon which adaptive interference canceling is based. The left-hand portion of the figure shows the data production
model assumed for interference canceling applications. The desired signal is sk,
which has been corrupted by additive noise nk, producing the corrupted signal
xk. While sk is not observable, it assumed that the signal yk is also available for
recording. Commonly called the reference signal, yk is correlated with nk. In
this model the unknown systems gk and hk account for the differences
between the noise sequences nk and yk.
Data Production Model
Desired Signal
Data Analysis Model
Primary Input
sk
(z)
xk
+
nk
e k s k
-
Reference Input
Noise
Source
Figure 10.1.
G(z)
yk
Adaptive
Filter
n k
wk
Adaptive interference canceling model.
On the right side of Figure 10.1 is the data analysis portion of the adaptive
interference canceling model. The two recorded signals xk and yk are the inputs
to the interference canceler. Reference signal yk is filtered by the adaptive filter
wk. The output nk of wk is subtracted from the data signal xk, the difference
signal ek being an estimate of the desired signal sk. Adaptivity comes about
because, initially, the characteristics of wk and thus ek are not known. By
assuming that the noise process nk is not correlated with the desired signal sk,
various schemes based on the minimization of the energy in the error signal ek
186
can be devised to gradually adjust the coefficients of wk over time, represented

by k in the preceding.
The important points from the adaptive interference canceling model as
applied to the eigenstructure seismic interference canceling technique are: (1)
the assumption that an estimate of the noise to be suppressed is available or
can be derived; (2) recovery of the desired seismic signal is accomplished by
subtracting the noise estimate from the seismic data.
10.2 The Eigendecomposition Interference Canceling Algorithm

Following the approach in Kramer and Matthews (1956), assume that
there exists a set of N real vectors yk, k 0, ..., N 1, where each vector is
M 1. In the seismic applications described in Hammond and Mace (1978)
and Jones (1985), the vector yk contains the seismic samples from M traces at
time k. M can be quite large for typical applications. For example, if the
method described in Jones (1985) is used on stacked data, M may be on the
order of several hundred. The size of M significantly impacts the computational requirements. The method as applied for the coherent interference suppression algorithm described here results in small values of M, typically
between 4 and 25.
We further assume that M 1 transform vectors ai exist such that
T
p ki [ y k y ] a i ,
(10.1)
where T denotes the matrix transpose operation and y is the vector mean of
the N vectors yk. It is assumed that we can generate the vector yk from the pki
using a set of M 1 vectors bj according to
M
yk y
pkj bj
(10.2)
j1
Equations (10.1) and (10.2) are the forward and inverse transform relations of
the KLT. An approximation to yk is made by using only the first m M terms
in the sum in equation (10.2):
187
y k
pkj bj .
(10.3)
j1
The error introduced by this approximation is

e k y k y k
(10.4)
j m1
p kj b j .
(10.5)
With ek being a function of bj and ai by Equations (10.1) and (10.5), the

proper selection of the ai and bj vectors can minimize the error ek in
equation (10.4). If this error is minimized in the quadratic sense by selecting
the ai and bj vectors to minimize
N1
1
T
J ( m ) ------------ e k e k ,
N 1k 0
(10.6)
a i b i u i , i 1, , M .
(10.7)
the result is
The M 1vectors ui are determined from the covariance matrix of the yk.
This matrix is estimated by the sample covariance
N1
1
T
R ------------ [ y k y ] [ y k y ]
N 1k 0
UU
and is real and symmetric.
188
(10.8)
(10.9)
The vectors ui are the normalized eigenvectors of R, with ui being the ith
column of the M M matrix U. is an M M diagonal matrix:
diag [ 1 2 M ] .
(10.10)
Proper construction of U assures that the eigenvalues of R obey the relationship

1 2 M .
(10.11)
Replacing ai with ui in equation (10.1) and bj with uj in equation (10.2) gives

the transform relationship for the KLT discussed in Chapter 3. The basis
functions of the transform, the ai and bj vectors, are dependent on the data yk.
In equation (10.1) pki is the projection of data vector yk onto eigenvector
ui. Collecting all of the pki, k 0, ..., N 1 into vector pi, we have the ith
principal component, the projection of all of the data vectors on the ith eigenvector. The first principal component pi is the projection onto u1, the dominant eigenvector since 1i, i 2, , M. It represents that portion of the
yk having the greatest energy.
Equation (10.2) can be used to reconstruct the data yk from the principal
components and eigenvectors. The reduced order reconstruction in
equation (10.3) reconstructs the yk with error ek by projecting the principal
components on the m dominant eigenvectors. Now consider that the interference in seismic data indicated as nk in Figure 10.2 is correlated with an
observable reference input, the yk data vectors. Also assume the desired seismic
signal sk is sufficiently different from nk and yk that its dominant eigenvectors
are largely orthogonal to the m eigenvectors used for the reduced order reconstruction of the yk. This is equivalent to saying that the vectors ui,
i 1, , m define the noise subspace N while ui, i m 1, , M define
the signal subspace S.
The yk data vectors are used to estimate the m dominant eigenvectors of
the noise subspace N. The primary input data vectors xk, given by
xk sk nk ,
(10.12)
are then projected onto the noise subspace. This reduced order representation
(see Scharf and Tufts, 1987) is
189
Data Production Model
Data Analysis Model
Desired Signal
Primary Input
xk
sk
+
nk
s
k
+
-
(z)
n k
Reconstruction
of Target Data
on Noise Space
Reference Input
Noise
Source
G(z)
Figure 10.2.
yk
Eigenanalysis
of Training
Data
Eigendecomposition interference canceling model.
x k s k n k .
(10.13)
If the signal and noise subspaces are orthogonal, then

s k 0 ,
(10.14)
and the projection of the xk on the noise subspace becomes an estimate of the
interference nk obscuring the desired seismic signal sk. The estimate of sk is
obtained from
s k x k n k .
(10.15)
As depicted in Figure 10.2, the eigendecomposition interference canceling

(EIC) model requires two inputs: the contaminated seismic data xk and a
noise vector yk representative of the interference nk. In practice, a record of
seismic data, e.g., shot records in the following examples, can provide both
inputs. Interferences in seismic records can frequently be isolated. A training
region is selected from the seismic record, chosen and possibly manipulated to
emphasize the interfering noise. Examples of manipulation are the application
190
of gain control or flattening of noise events. Even though this procedure looks
at 2-D data blocks, it is likely that data with a strong horizontal correlation
will exhibit a greater spread in eigenvalues than data in which that same correlation occurs diagonally through the data block. This emphasis in the spread
of the eigenvalues is desirable because the goal is to train on the noise to be
removed, associating that noise with the dominant structure. This will be
illustrated in the examples to follow.
The target region is that portion of the seismic record in which it is
desired to suppress the interference. Typically this is the entire seismic record,
including the training region.
Once the training and target regions have been selected, each is divided
into data vectors, the yk and xk. In Jones (1985), the data vectors were large
and one-dimensional in nature. Typically, a vector comprised all of the samples at a constant time in a stack. Within a vector, there was variation only in
one independent axis, usually the spatial axis. The vectors are large because
seismic stacks usually consist of several hundred traces. This makes M the
same as the number of traces. In the EIC application, the data vectors are
formed by taking a rectangle of samples from the datavariation in both the
spatial and temporal axes. In Chapter 2, Figure 2.1, and Section 2.3, the samples are arranged in an M 1 vector, where M is no larger than 25 in the following examples. The order the samples are loaded into the vectors is not
important, but must be identical for all vectors.
In the procedure originally specified in Done et al. (1991), the residual
data exhibits an artifact caused by dividing the target region into small, adjacent blocks. This artifact causes a strong visual correlation between output
traces common to a column of data blocks. An abrupt change between the
boundaries of adjacent blocks, especially in the trace direction, causes a degradation in the appearance of the output data. This phenomenon is similar to
the edge which occurs when using the KLT to encode portions of images. By
overlapping adjacent data blocks, both in the trace and time directions, this
artifact is reduced. A data value in the reconstructed target region is the average of all elements that correspond to that data value taken from the reconstructed target region vectors in which that element is found. The averaging
process smooths out the boundaries between data blocks. The examples illustrate this effect.
191
To summarize, the eigendecomposition interference canceling procedure

for suppressing coherent interference in seismic data consists of the following
steps:
1) Manipulate the data to emphasize the horizontal coherence of the
interference.
2) Specify a training region which is predominantly interference.
3) Break the training region into data blocks covering K traces by L time
samples, each block forming a (KL) 1 data vector. Typical values for
the product of K and L are 6 to 25. Overlapping the data blocks is recommended.
4) Using the data vectors defined in step 3, compute the covariance
matrix R.
5) Perform an eigenvalue/eigenvector decomposition of R and select the
subset of eigenvectors corresponding to the most dominant eigenvalues, typically the first or first and second eigenvectors.
6) Reconstruct the training region using the selected eigenvectors and
check the result for proper representation of the coherent interference.
This is a reduced rank reconstruction of the training data.
7) Designate the desired target region for suppressing coherent interference. Usually this is an entire seismic record, for example, a shot
record. Divide the target region into K L data blocks, forming a
data vector from each.
8) Use the dominant training region eigenvectors selected in steps 5 and
6 to reconstruct the data blocks. This step projects the seismic data
onto the subspace determined by the reduced rank model for the
coherent interference.
9) Obtain the residual signal by subtracting the reconstructed target
region obtained in step 8 from the original target region data.
The residual obtained from step 9 is the desired output. This residual is the
part of the original target data lying out of the subspace of the reduced rank
interference model. The purpose of the training region can now be seen to
force the coherent interference to be strongly associated with the dominant
axes of the coordinate space determined by the eigenvectors of R.
10.3 Suppression of Interference, Marine Acquisition Case

Figure 10.3 shows one shot record of marine seismic data. These data have
had an AGC process applied prior to analysis by the eigendecomposition
192
interference canceling procedure. Beginning at about 4.0 s, the seismic data

become obscured by some noise that might be backscattered source energy.
Arriving at the receivers in the recording cable from two directions, this coherent interference forms a chevron-like pattern on the data. The data selected for
the training region are shown in Figure 10.4a. The target region in this example consists of all data below 3 s.
The training region is divided into data blocks covering 3 traces and 2
time samples. In test 1, the data blocks do not overlap, resulting in 700 data
blocks being defined in the training region. Test 2 allows the data blocks to
overlap adjacent blocks by one element in the trace and time directions. With
overlapping, 2000 data blocks are extracted from the training region. The
eigenvalues obtained from the covariance matrix R for these two tests are
listed in Table 10.1. In either case, the first three eigenvalues are an order of
magnitude or more larger than the remaining eigenvalues. Because the coherent noise arrives from two directions, the reconstruction of the training data
should be attempted using the two or three most dominant eigenvectors.
Table 10.1 Eigenvalue comparison for
tests using overlapping and
nonoverlapping data blocks.
Order
(no. of eigenvalues)
No Overlap
Overlap
419123.
403433.
298751.
307242.
155691.
157946.
10186.
10405.
9116.
8953.
7864.
8023.
Figure 10.4a shows an isolated view of the training region from the input
data in Figure 10.3. The training region, as reconstructed using the two most
dominant eigenvectors from tests 1 and 2, is shown in parts b and c of
Figure 10.4, respectively. Either reconstruction of the training interference is
193
Figure 10.3.
One shot record of marine seismic data, DAVC applied, exhibiting coherent noise. ( 1991 IEEE, Used with permission, W. J. Done, R. L.
Kirlin, A. Moghaddamjoo, Two-Dimensional Coherent Noise Suppression in Seismic Data Using Eigencomposition, IEEE Trans. on
Geoscience and Remote Sensing, vol. 29, no. 3, May 1991.)
194
a)
b)
c)
Figure 10.4.
Training zone and reconstructions of training zone; (a) training zone,

(b) reconstruction using nonoverlapping data blocks, (c) reconstruction
using overlapping data blocks. ( 1991 IEEE, Used with permission,
W. J. Done, R. L. Kirlin, A. Moghaddamjoo, Two-Dimensional Coherent Noise Suppression in Seismic Data Using Eigencomposition, IEEE
Trans. on Geoscience and Remote Sensing, vol. 29, no. 3, May 1991.)
195
visually accurate, with the overlapped data blocks in test 2 producing a slightly
smoother appearing reconstruction.
The reconstructed target and resulting residual for test 1 are shown in
Figure 10.5, parts a and b, respectively. Figure 10.6 contains the results for the
overlapping data blocks used in test 2. Comparing Figures 10.5b and 10.5b to
the input data in Figure 10.3, it can be seen that the coherent interference
below 3 s (the target zone) has been suppressed. The test 2 result with overlapping data blocks tends to have smoother transitions between the seismic
traces. With the suppression of the coherent interference, the hyperbolic seismic arrivals are more visible below 3 s.
Interpretation of these data should be done with caution, as with any technique which causes lateral mixing of seismic traces. The danger is that the lateral mixing can create false events or smooth over fine features. But with this
method, the smearing is limited to a known number of traces determined by
the data block size and amount of overlap. Notice that three dead traces
present in this record of data have been reconstructed, though the signal levels
in these three traces are somewhat lower than in the adjacent traces.
10.4 Suppression of Repeating Refraction, Marine Acquisition

Case
Figure 10.7 shows one shot record from another marine survey. The data
has had AGC applied prior to plotting. The processing goal is to use eigendecomposition interference canceling to remove the repeated or reverberating
refraction present in the data. Intuitively, it seems advisable to flatten the
interference prior to interference canceling. Thus, the primary correlation
within the training region will be in the horizontal direction, corresponding to
the refraction energy. This has the added advantage of making the technique
more applicable in split-spread shooting configurations. For split spreads,
where the source is in the center of the array, the refraction energy would slope
away from the shot in two directions and use of this method directly on the
data would probably require two passes, one for each side of the spread. Modifying the data so that the refraction event is approximately horizontal would
allow the split spread to be processed in one pass.
Analysis of the refraction event indicates that a velocity of 8720 ft/s (2660
m/s) will approximately flatten the refraction. The resulting flattened data are
shown in Figure 10.8a and the selected training zone on that data is shown in
Figure 10.8b. Interference canceling is applied to these data with the following
196
Figure 10.5a.
Test 1 with nonoverlapping data blocks. Reconstructed target region

(3.0 to 6.0 s). ( 1991 IEEE, Used with permission, W. J. Done, R. L.
Kirlin, A. Moghaddamjoo, Two-Dimensional Coherent Noise Suppression in Seismic Data Using Eigencomposition, IEEE Trans. on
Geoscience and Remote Sensing, vol. 29, no. 3, May 1991.)
197
Figure 10.5b.
Test 1 with nonoverlapping data blocks. Residual target region (3.0 to

6.0 s). ( 1991 IEEE, Used with permission, W. J. Done, R. L. Kirlin,
A. Moghaddamjoo, Two-Dimensional Coherent Noise Suppression in
Seismic Data Using Eigencomposition, IEEE Trans. on Geoscience and
Remote Sensing, vol. 29, no. 3, May 1991.)
198
Figure 10.6a.
Test 2 with overlapping data blocks. Reconstructed target region (3.0 to

6.0 s). ( 1991 IEEE, Used with permission, W. J. Done, R. L. Kirlin,
A. Moghaddamjoo, Two-Dimensional Coherent Noise Suppression in
Seismic Data Using Eigencomposition, IEEE Trans. on Geoscience and
Remote Sensing, vol. 29, no. 3, May 1991.)
199
Figure 10.6b.
Test 2 with overlapping data blocks. Residual target region (3.0 to 6.0
s). ( 1991 IEEE, Used with permission, W. J. Done, R. L. Kirlin,
A. Moghaddamjoo, Two-Dimensional Coherent Noise Suppression
in Seismic Data Using Eigencomposition, IEEE Trans. on Geoscience
and Remote Sensing, vol. 29, no. 3, May 1991.)
200
Figure 10.7.
One shot record of South Florida Basin data, AGC applied, exhibiting repeating reverberations.
201
data block parameters. Each data block covers a 12-trace by 2-time sample
pattern, and adjacent data blocks overlap by three samples in the trace direction and one sample in the time direction. This results in 1840 data blocks in
the training region. Computing the sample covariance matrix and performing
an eigendecomposition produces the eigenvalues listed in Table 10.2 under
the columns labeled Flattened Data. Only the largest 10 eigenvalues are
listed. This is for a training region defined by four pairs (trace, sample number) (1, 50), (1, 450), (76, 50), and (76, 150) on the flattened data. Also listed
in the table is the normalized running sum of those eigenvalues.
Table 10.2 Eigenvalues comparison showing
effects of preflattening data.
Flattened Data
Unflattened Data
Eigenvalue
Cumulative Sum
(Normalized)
Eigenvalue
Cumulative Sum
(Normalized)
3 779 219.
0.7377
1 829 858.
0.3264
782 533.
0.8905
1 670 745.
0.6245
186 719.
0.9269
633 756.
0.7375
110 805.
0.9485
595 499.
0.8437
68 174.
0.9619
280 942.
0.8939
43 636.
0.9704
293 065.
0.9301
34 036.
0.9770
96 328.
0.9473
21 295.
0.9812
93 658.
0.9640
17 580.
0.9846
61 414.
0.9749
10
14 005.
0.9873
46 034.
0.9831
Order
The flattened data training region vertices can be moved to their corresponding positions in the original, unflattened data domain. Doing this, the
four point zone defined previously becomes the region bounded by vertices (1,
402), (1, 802), (76, 191), and (76, 291). These vertices define a training
region on the unflattened data that covers the same data area as the region
defined above for the flattened data. Figure 10.9 shows the unflattened test
202
Figure 10.8a.
The data after being flattened using a velocity of 8720 ft/s (2660m/s).
The entire data set.
training zone. Using the same data block configuration, there are 1649 data
blocks in the unflattened training region. The eigenvalues found from the
sample covariance matrix and the normalized running sum of those eigenvalues are listed in Table 10.2 under the columns labeled Unflattened Data.
The differences in the number of data blocks found in the flattened and
unflattened cases arise from the definition of the data block and the manner in
which data blocks are extracted from a training zone. With the current version
of the algorithm, no data block can extend beyond the boundaries determined
203
Figure 10.8b.
The data used as the training zone for the flattened data tests.
by the four vertices specifying the training zone. As a result, a quantization

effect occurs in the assignment of data blocks.
Comparing the eigenvalues in these two cases, it is apparent that the flattening of the data prior to analysis has concentrated more of the interference
coherency into the first eigenvalue/eigenvector pair. Eigenvalue 1 in the flattened data case accounts for almost 74% of the sum of all 24 eigenvalues. For
the unflattened data case, the largest eigenvalue accounts for only 33% of the
total sum, and the first three eigenvalues are required to match the 74% total
204
Figure 10.9.
Training zone for the unflattened data tests.
of the first eigenvalue in the flattened case. This finding correlates with statements on flattening for rank reduction in Sections 6.2 and 6.9.
The next step is to use selected eigenvectors from the training region to
reconstruct the data in the target region and compute the residual between the
original data and reconstruction. The eigenvalues in Table 10.2 suggest several
tests. Using the flattened data, (a) reconstruct in one case with eigenvector 1
and (b) in the second case, reconstruct with eigenvectors 1 and 2. Three
reconstructions are done with the unflattened data: reconstruct the target
205
using (a) eigenvector 1, (b) eigenvectors 1 and 2, and (c) eigenvectors 1

through 3. In each of these tests the target region is specified as the entire shot
record. Figures 10.10ae contain the reconstructed targets for all five tests.
The five residuals from these tests are shown in Figures 10.11ae. All results
are shown in the unflattened domain, that is, after the flattening step has been
reversed.
Several observations can be made from these results. Using the flattened
data produces better results than those obtained using the unflattened data. In
the flattened data case, reconstruction of the interference using the two dominant eigenvectors is somewhat better than that using only the dominant
eigenvector. Primary seismic reflection energy down to 1.0 s is revealed in
both flattened data tests. Interference canceling has succeeded in suppressing
the coherent noise. The results for tests on the unflattened data are not as positive. Not only is there significantly more coherent interference left after the
subtraction step, but the reconstructed targets in Figures 10.10ae indicate
that the desired reflections are being reconstructed in the estimation of the
coherent interference, especially for the two and three eigenvector cases. This
is undesirable because the presence of reflections in the noise estimate will
cause those reflections to be suppressed in the subtraction step which produces
the estimate of the desired seismic reflection energy.
The most important observation from this set of tests is that in situations
where the coherent interference can be flattened, the eigendecomposition
interference canceling procedure is likely to produce better results. The flattening increases the 2-D coherency of the interference. In the case illustrated,
it also tends to decrease the 2-D coherency of the desired signal. These two
tendencies cause the dominant eigenvalues/eigenvectors to be more strongly
representative of the coherent interference to be suppressed. The reconstruction of the data from the dominant eigenvectors is thus more like the coherent
interference than other signals present. Subtracting the reconstruction from
the original data then results in better coherent interference suppression.
10.5 Suppression of Repeating Refraction, Land Acquisition

Case
Figure 10.12 shows one shot record of data acquired in Kansas, with AGC
applied prior to plotting. It suffers from a repeated refraction arrival. The flatten, interference canceling, unflatten processing sequence used on the data in
206
Figure 10.10a. Flattened data test, reconstructed target zone. Reconstruction with
eigenvector 1.
207
Figure 10.10b. Flattened data test, reconstructed target zone. Reconstruction with eigenvectors 1 and 2.
208
Figure 10.10c. Unflattened data test, reconstructed target zone. Reconstructed with
eigenvector 1.
209
Figure 10.10d. Unflattened data test, reconstructed target zone. Reconstruction with
eigenvectors 1 and 2.
210
Figure 10.10e. Unflattened data test, reconstructed target zone. Reconstruction with
eigenvectors 1, 2, and 3.
211
Figure 10.11a. Residual from flattened data test. Reconstruction with eigenvector 1.
212
Figure 10.11b. Residual from flattened data test. Reconstruction with eigenvectors
1 and 2.
213
Figure 10.11c. Residual from unflattened data test. Reconstruction with eigenvector 1.
214
Figure 10.11d. Residual from unflattened data test. Reconstruction with eigenvectors
1 and 2.
215
Figure 10.11e. Residual from unflattened data test. Reconstruction with eigenvectors
1, 2, and 3.
216
the preceding section is used to suppress the reverberating refraction. A velocity of 4434 ft/s (1351m/s) is used to flatten the interference event.
Figure 10.13 shows the training region used, defined by trace, time sample
vertices (100, 150), (121, 150), (100, 300), and (121, 300) in the flattened
data domain. All plots are shown after restoration to the unflattened domain.
The data block configuration used for this analysis is an 8-trace by 3-time
sample pattern, with adjacent blocks overlapping by one element in both trace
and time directions. There are 225 data blocks in the training zone. The first
three eigenvalues determined from the training zone covariance structure
account for approximately 50%, 21%, and 13%, respectively, of the total
eigenvalue sum.
The reconstructions of the entire record using the dominant first and second eigenvectors are shown in Figures 10.14a and b. Figure 10.14a shows the
result obtained when only eigenvector 1 is used in the reconstruction. If eigenvectors 1 and 2 are used, the reconstruction is as shown in Figure 10.14b. The
residuals obtained by subtracting these reconstructions from the original data
are plotted in Figure 10.15. Results for the two cases are similar, but the one
eigenvector reconstruction may be somewhat better, because the desired
reflection data appears stronger.
An alternative to this interference canceling procedure is velocity filtering.
Velocity filtering can produce a smoothed over, wormy appearance that is
indicative of strong trace-to-trace mixing. Figure 10.16 shows the results of
applying velocity filtering to the original shot record. The velocity filter was
designed to reject velocities from 0 to 6125 ft/s (1867m/s) and 0 to 6125
ft/s (1867m/s). In some situations, such as range dependent attribute analysis, the trace mixing characteristic of velocity filtering may be undesirable.
Trace mixing effects in the eigendecomposition interference canceling method
are limited to the trace width of the data blocks. Velocity filtering can also be a
problem when bad traces are present, as is the case for the near range traces in
this data set. High amplitude values in these traces reproduce the impulse
response of the 2-D velocity filter, introducing artifacts into the output. This
can also happen at the edges of mute zones or around dead traces. On the
other hand, a properly designed velocity filter can remove interference over a
range of velocities, while the eigendecomposition approach may have to be
applied repeatedly to suppress interference arriving with different velocities.
Comparing Figure 10.15a or b to 10.16 shows that the eigendecomposition
217
Figure 10.12.
One shot of Kansas data, DAVC applied, exhibiting repeating reverberations.
218
Figure 10.13.
Training zone.
219
Figure 10.14a. Reconstructed target zones. Reconstructed with eigenvector 1.
220
Figure 10.14b. Reconstructed target zone. Reconstructed with eigenvectors 1 and 2.
221
Figure 10.15a. Residual. Reconstructed with eigenvector 1.
222
Figure 10.15b. Residual. Reconstructed with eigenvectors 1 and 2.
223
Figure 10.16.
Results from the application of a velocity filter to the input data.
224
results still contain some interference events which have a different velocity
than the events that were dominant in the selected training zone.
10.6 References
Done, W. J., Kirlin, R. L., and Moghaddamjoo, A., 1991, Two-dimensional
coherent noise suppression in seismic data using eigendecomposition:
IEEE Trans. on Geosci. and Remote Sensing, 29, 379384.
Haimovich, A.M., and Bar-Ness, V., 1991, An eigenanalysis interference canceller: IEEE Trans. Acous., Speech and Sig. Proc., 39, 7684
Hemon, C., and Mace, D., 1978, The use of the karhunen-loeve transformation in seismic data processing: Geophys. Prosp. 26, 600626.
Jones, I. F., 1985, Applications of the Karhunen-Loeve transform in reflection
seismology: Ph.D. Dissertation, Univ. of British Columbia.
Kramer, H. P., and Matthews, M. V., 1956, A linear coding for transmitting a
set of correlated signals: IRE Trans. on Information Theory, 2, 4146.
Scharf, L. L., and Tufts, D. W., March 1987, Rank reduction for modeling
stationary signals: IEEE Trans. on Acoust., Speech, and Signal Proc.,
35, 350355.
Widrow, B., and Stearns, S. D., 1985, Adaptive signal processing, PrenticeHall, Inc.
225
Chapter 11
Principal Component Methods for
Suppressing Noise and Detecting Subtle
Reflection Character Variations
Brian N. Fuller
11.1 Introduction
Seismic interpreters are sometimes presented with the challenge of identifying small lateral reflection character changes that can indicate significant
lithologic variations. Examples of this occur in stratigraphic trap exploration
and in estimating the lateral extent of a reservoir. Standard seismic trace plots,
even in color, sometimes do not have sufficient dynamic range to show a small
waveform variation against a background of traces that are very similar to one
another. Additionally, noise can obscure subtle waveform variations.
It is the purpose of this study to present methods and examples which
show how eigenvector methods can be used to aid an interpreter in detecting
small trace-to-trace variations in seismic waveforms when the lateral reflection
variation is very small and/or when the variations are obscured by noise. This
work extends that of D.C. Hagen (1982), who presented methods for using
the correlation coefficients of data traces with principal components to identify porous sand zones on a CMP stacked section. This work also draws from
other papers that have discussed the uses of principal component reconstruction as a noise suppression technique (Huang and Narendra, 1975; Andrews
and Patterson, 1976). Some of those papers have dealt specifically with seismic
reflection data (Jones and Levy, 1987; Done et al., 1991).
The approach here is to first calculate the normal-incidence seismic
response of a simulated reservoir sand that varies in thickness between 0.3 and
4.5 m. Eigenvector methods are then applied to the model data with and
227
without added noise to see if it is possible to detect the small waveform

changes that occur with the variations in the sand thickness. With an affirmative result from the model data, I apply the same techniques to real seismic
data recorded over Hartzog Draw oil field in the Powder River Basin, Wyoming, to see if the seismic data can be used to detect lithologic changes within
the reservoir and to detect the lateral edges of the reservoir. The edges have
been well defined by drilling.
The results of the experiment on the synthetic data are that seismic waveform changes can be correlated with stratigraphic changes on the scale of less
than a meter, even in the presence of significant random noise. The signal-tonoise ratio in the synthetic data is increased from 4.0 to 136.0 by processing
just the first few principal components, thus making waveform variations easier to see. The result of the experiment with the real data is that important
stratigraphic variations correlate well with the variations in the principal components.
11.2 A Brief Mathematical Description

Here is a brief mathematical description of the eigenstructure methods I
used along with a discussion of their use in the problem. Derivations of these
and similar equations can be found in many places (Hotelling, 1933; Devijver
and Kittler, 1982) and so will not be presented here.
The use of the techniques presented begin with finding the eigenstructure
of a covariance matrix from stacked seismic data within a window that is M
data traces by N data samples in dimension. Each data trace in the window is
represented as a column vector xk with the ith sample of the kth data trace
labeled as xi k, as in
T
x k [ x 1k, x 2k, , x Nk ] ,
(11.1)
where the superscript T indicates the transpose of the matrix. An N N sample covariance matrix C is formed from the data in the window and each element Cij of the matrix is calculated by equation (11.2):
M
C ij
xik xjk .
k1
228
(11.2)
An orthonormal vector basis for the dataset is then given by the normalized eigenvectors of the matrix C. The variance in the data accounted for in
the dimension of each eigenvector is given by their associated eigenvalues. The
data component in the dimension of the eigenvector associated with the largest eigenvalue is known as the first principal component. The second principal
component is the data component in the dimension of the eigenvector associated with the second largest eigenvalue, and the third, fourth, etc., principal
components are named in a like manner. The jth normalized eigenvector is
annotated as vj. The eigenvalues are ordered largest first, smallest last.
An estimate xi of a data vector (a portion of a seismic data trace) may be
constructed by using (see section 3.6)
N
xi
ij j ,
N N .
j1
(11.3)
In equation (11.3), ij is a coefficient relating the vector xi to the eigenvector vj. The value of ij , which shall for the remainder of this paper be referred
to as a projection coefficient, is found by the dot product operation,
T
ij x i v j .
(11.4)
As noted above, the first, second, etc., eigenvectors of a data sets covariance matrix are ranked in accordance with the proportion of the variance of
the data along that eigenvectors dimension. If all of the data traces in a data
window were identical, only one principal component would be necessary to
account for all of the variance in the data window. The first eigenvector v1
would be identical, except for a scaling factor, to each of the data traces.
The second and subsequent eigenvalues would equal zero because there
would be no other energy in the data that is not accounted for by the first
eigenvector. In the seismic data case that I am describing here, however, the
assumption is that the data traces in the window are merely similar to one
another. Variability from one trace to the next is due to lithologic variations
and/or noise. In this case, the data traces are still highly correlated with (meaning they are very similar to) the first eigenvector. The second and higher eigenvectors dimensions account for the trace-to-trace variability in the data. The
values of ij would vary with the variability in the data. If the variations are
229
due solely to random noise, then the values of ij will vary randomly. However, if the data vary systematically, then the values of ij vary systematically,
and we expect this in the first few eigenvectors.
11.3 Sensitivity of Principal Components to Lithologic Variations

As stated in the Introduction, one of the primary purposes of this study is
to gauge the sensitivity of eigenstructure methods to variations in lithology.
Put another way, how small a variation in lithology can be detected by these
methods? If the method cannot detect reflection variations any better than a
human looking at a seismic plot, then there is not a strong reason to continue
the research. Answering this question begins with building the geological
model shown in Figure 11.1a. The figure shows a sand body that varies in
thickness from 0.3 m on the left to 4.5 m on the right. The sand body is
encased in layered marine shales and has a velocity that is 5% higher than the
surrounding shales. The distribution of layer thickness and velocity changes in
the layered shale is based on a group of well logs recorded in the thick upper
Cretaceous shale section in the Powder River Basin of Wyoming near Hartzog
Draw oil field. A typical shale velocity at this location is 3600 m/s.
A normal-incidence seismogram was created from the geological model in
Figure 11.1a. The sampling interval of the synthetic data is 2.0 ms and the
wavelet used is a sin(x)/x function band-pass filtered between 8 and 50 hz. A
98-ms time window of the normal-incidence seismic section is shown in
Figure 11.1b. The time window is centered about the average arrival time of
the reflections from the sand body. The horizontal (spatial) axis of the geologic
cross-section is the same as the horizontal axis of the seismic data so that a seismic trace is the normal-incidence seismic response of the geologic model
directly above the trace. Careful inspection of the seismic profile in
Figure 11.1b shows that there is a lateral variation in amplitude and arrival
time in the reflections, but the changes are subtle. As we shall see later in this
paper, those variations are quite difficult to see in the presence of noise.
The covariance matrix was formed from the data in the 98-ms window
and its eigenvectors and eigenvalues were found. Approximately 98% of the
variance of the data was accounted for by the first eigenvalue and 1.9% by the
second. The remaining 0.1% was distributed among the remaining 48 components and was negligible in any one of them.
230
Figure 11.1.
(a) A geologic cross-section of a sand body in shale. (b) A normal-incidence seismic section calculated using the cross-section in part a above.
The eigenvectors from this dataset were calculated. (c) Projection coefficients for each of the data traces in part b onto the first eigenvector. The
values of projection coefficients are not shown as they are somewhat arbitrary, depending on the scaling of the eigenvectors. (d) Projection coefficients for each of the data traces in part b onto the second eigenvector.
231
Figures 11.1c and 11.1d show plots of projection coefficients [see equation (11.4)] of each data trace on the first and second principal components,
respectively. As above, there is a one-to-one correspondence between the horizontal axes of Figures 11.1c and 11.1d with the horizontal axes of
Figures 11.1a and 11.1b. Note that the projection coefficients change with the
changes in the sand layer thickness. The slope of the projection coefficient
curves is steepest in places where the sand thickness changes most quickly.
This experiment establishes that, at least in noise-free data, the principal component method can detect subtle reflection character changes for changes in
sand thickness of much less than a meter.
Repeating the experiment above on more realistic, but still controlled conditions. Figure 11.2a is a plot of the data in Figure 11.1b with noise added.
The noise is in the same frequency band as that of the reflections (8 Hz to
50 Hz) and the signal power to noise power of each trace is 4.0. The covariance matrix was formed from the noisy data and the principal components
were calculated. In this case, 98.5% of the variance in the data was accounted
for by the first 10 eigenvalues with 81.0% accounted for by the first. The
information is more dispersed among the eigenvalues than in the noise-free
case because the added energy is random and uncorrelated with either the signal or the noise on the other traces.
Figure 11.2b shows projection coefficients for the noisy data onto the second eigenvector (jagged curve) along with the projection coefficient curve for
the noise-free case that was shown in Figure 11.1. The projection coefficient
curve for the noisy data is similar in shape to that of the noise-free curve, but
the added noise causes random variations about the noise-free curve. A running average smoothing operator was applied to the noisy projection coefficient curve to reduce the random variations and to obtain a better estimate of
the noise-free projection coefficients. The smoothing operator calculates the
mean of 2K 1 projection coefficients and outputs a value at the ith trace that
is the mean of the ith projection coefficient and the K projection coefficients
on either side of the ith trace. The smoothed curve is shown as the data marks
with no lines connecting them. This curve was calculated using a value of 9
for K, or averaging 19 projection coefficients.
The primary result of the smoothing exercise is that the smoothed projection coefficient curve exhibits an easily observed and systematic variation that
is correlated with changes in the sand layer thickness. The sensitivity of the
projection coefficient plot to changes in sand thickness is not as great in the
232
Figure 11.2.
(a) Noise added to the data in Figure 11.1b. (b) The projection coefficients of each trace onto the second eigenvector are plotted here as the
jagged, dashed curve. The jagged curve was smoothed with a 19-point
smoothing operator plotted as distinct data points. The projection coefficients curve for the noise-free second eigenvector (Figure 11.1d) is plotted
as a solid curve. The values of projection coefficients are not shown as they
are somewhat arbitrary, depending on the scaling of the eigenvectors.
(c) The traces in part b were reconstructed from the first three principal
components. The projection coefficients for the first principal component
were smoothed. The reconstructed data have a signal-to-noise ratio of
32.5.
233
noisy data as it is in the noise-free data, but there is a clear difference in the
smoothed projection coefficients between where the sand thickness is 2 m and
where it is 4.5 m. The observable difference in sand thickness is about 2% of
the wavelength of the data. Other smoothing methods may give a better estimate of the noise-free projection coefficients and provide better resolution of
variations in sand thickness.
This part of the experiment shows that under controlled conditions, principal components methods can detect small lithologic variations and do so
better than visual interpretation.
11.4 Noise Reduction in the Synthetic Data

Principal components methods have been shown to possess some very
powerful noise reduction capabilities under the right conditions (Huang and
Narendra, 1975; Andrews and Patterson, 1976; Jones and Levy, 1987; Done
et al., 1991). This noise reduction capability stems from the property that
eigenstructure can be used to separate data into orthogonal subspaces. In data
that are highly redundant, such as in the examples shown in Figures 11.1 and
Figures 11.2, a large part of the signal maps into a subspace separate from that
of the noise. The data can be reconstructed using only information in the subspace that contains the desired part of the signal. Eqaution (11.3)shows how a
data trace can be reconstructed from the principal components.
In the following experiment, the noisy data traces shown in Figure 11.2a
are reconstructed to obtain a numerical assessment of signal-to-noise ratio
improvement when data are highly redundant from one trace to the next.
Recall that the purpose of this study is to show methods that aid the interpreter in seeing small character variations; reducing the noise in a dataset can
improve the clarity of the seismic section and the chances of being able to see
those variations.
As a measure of the success of the noise suppression method, we calculate
the ratio of signal power and noise power in the reconstructed data. The noise
power of the reconstructed data is measured by summing the squared difference between the noise-free samples (Figure 11.1a) and the reconstructed
samples. The signal power is simply the power of the noise-free data in the
window.
The noisy data traces in Figure 11.2a were reconstructed using the first
three principal components. Rather than immediately applying equation 11.3
to reconstruct the data, however, a smoothing function is applied to the pro234
jection coefficients of the first principal component before reconstruction. Justification for reconstructing the traces using smoothed projection coefficients
comes from the experiment above from which we learned that a better estimate of noise-free projection coefficients can be obtained by smoothing the
projection coefficients. Use of a less noisy projection coefficient should result
in a better estimate of the noise free trace after reconstruction. The projection
coefficients for the first eigenvector were smoothed with a 19-point equal
weight averaging operator while the projection coefficients for the second and
third dimensions were left unsmoothed. The reconstructed data are shown in
Figure 11.2c, which has a signal-to-noise ratio of 32.5. This is substantially
better than the signal-to-noise ratio of 4.0 in the original data.
Other reconstruction experiments resulted in output signal-to-noise ratios
of 15.8 using the first two principal components with no smoothing and
138.6 using the first two principal components and a 19-point smoothing
operator on both components. In the last case, the reconstructed data are visually indistinguishable from the original noise-free data.
The idea of treating data in separate subspaces before reconstruction can
be applied as a general concept. The advantage to operating on data in separate signal subspaces before reconstruction is that the appropriate noise suppression measures (band-pass filter, smoothing, etc.) for each kind of noise in
the separate subspaces can be treated without affecting the data in the other
subspaces. In the data shown in Figure 11.2c for example, some noise was suppressed by smoothing the projection coefficients for the first principal component but the lateral resolution provided by the second and third principal
components was preserved. Additionally, the projection coefficients can be
weighted more or less heavily before reconstruction, thus providing seismic
data in which character variations are easier to see than if the projection coefficients provided by the original eigendecomposition had been used.
11.5 A Real Data Example

With the results of the above synthetic data example, we can move with
some confidence to the use of the principal components method in real seismic data. The purpose of the following exercise is to see if significant stratigraphic changes in a known reservoir can be detected by using the principal
components methods. The seismic data studied was recorded over Hartzog
Draw oil field in the Powder River Basin (Wyoming, USA). The fields reservoir rock is at a depth of about 2700 m and is composed of a sand sequence
235
encased in marine shale. The reservoir sand ranges in thickness from 1 m at

the edge of the field to nearly 10 m at the fields center in the area studied. The
seismic data presented here have a maximum frequency of 60 Hz, which is not
sufficient to resolve the reservoir sand. Instead, any effect of the reservoir sand
on the reflected wavefield will be small (see Widess, 1973, for a discussion of
how thin beds can produce a reflected wave).
Figure 11.3a shows a geologic cross-section through the eastern portion of
Hartzog Draw reservoir sands at the location of the seismic line. The cross-section is based on correlations between well-log responses and lithology presented by Tillman and Martinsen (1987). The three general rock types in the
cross-section are marine shale, porous sand, and tight sand. The wells were
projected an average of 200 m with a maximum of 500 m to the cross-section,
so there is some uncertainty as to the exact location of facies terminations.
There are two important lateral lithologic boundaries represented in this
cross-section. The first is the eastern termination of the porous sand unit
between Well 2 and Well 3. This is important from an economic standpoint
because most of the oil produced from this field comes from the porous sand
facies. The second important lithologic boundary is the eastern termination of
the tight sand facies. This boundary is the eastern edge of the reservoir. Wells
drilled to the east of Well 3 would probably be dry, based on maps in Tillman
and Martinsen (1987) and on a well-drilled subsequent to Tillman and Martinsens paper.
Figure 11.3b shows seismic data recorded over the part of the field represented in the cross-section and in the time period that is affected by the presence of the reservoir sand. The data traces were processed to final stack (Fuller,
1988), band-pass filtered to between 8 Hz and 59 Hz, and time shifted so that
the uppermost positive lobe of the reflection signal was aligned at a constant
time. Alignment of the reflections at a constant time simplifies the computer
algorithms that are used to calculate covariance matrices. As in the synthetic
example above, each data trace in the seismic section can be tied to the location in the cross-section just above the data trace.
11.6 Interpretation of the Real Data

Principal components were calculated for the data in Figure 11.3b. The
first principal component accounted for 85% of the variance; the second,
6.3%; and the third, 2.3%. Most of the information from the fourth and
higher principal components was determined to be noise by reconstructing
236
the data traces, excluding the first through the third principal components,
and visually verifying that there was little coherency among the reconstructed
traces.
The projection coefficients for the second and third eigenvectors are plotted in Figure 11.3c. These projection coefficients have been smoothed with a
19-point smoothing operator. Once again smoothing sacrifices some lateral
resolution in exchange for a better view of the trends indicated by the projection coefficients. The trend in the projection coefficients for principal component 2 changes abruptly near CMP 1375. This change correlates with the
eastern termination of the porous sand. Another change in the trend for projection coefficient 2 occurs at about CMP 1400, which correlates well with
the eastern edge of the reservoir sand. The third principal component
decreases at a nearly constant rate from the left side of the seismic section to
about CMP 1400, the eastern termination of the sand body, where it reverses
slope and increases to the right side of the seismic section. Unlike the second
principal components, there is no significant variation in the third at the eastern edge of the porous sand facies. This observation has not been explained.
The most notable changes in the projection coefficients occur at locations
where the change in sand thickness is about 5 m. This gives us a gauge of the
sensitivity of the principal component method in terms of the seismic wavelength. The sensitivity of the method in terms of wavelength is estimated in
the following way. The frequency band of the data is between 8 Hz and
59 Hz. If we assume that the center frequency of the data is 33.5 Hz and a
seismic velocity (from well logs) at the depth of the reservoir to be 4200 m/s,
then 5 m is about 4% of the dominant wavelength of the data. Perhaps
smaller lithologic changes could be detected, but this would require better
well control than was available for this study.
11.7 Noise Suppression in the Real Data

For the sake of completeness, the real data are reconstructed using the
first three principal components and unsmoothed projection coefficients in
Figure 11.3d. Since the true noise-free signal is unknown, the improvement in
S/N cannot be estimated in the same way that it was measured in the synthetic
data example above. The data are clearly much less noisy than the original data
in Figure 11.3b, however, and the variations in reflection character are easier
to see.
237
Figure 11.3.
(a) A geological cross-section across Hartzog Draw Oil Field showing the
reservoir sand facies. (b) A seismic section recorded across the cross-section shown in part a showing the time interval effected by the presence
of the reservoir sand. The data traces are spatially coincident with the part
of the cross section directly above the trace. Eigenvectors were calculated
for the data in the window. (c) Projection coefficients of each trace onto
the second and third. Changes in the projection coefficient trends are coincident with changes in the reservoir facies. The values of projection coefficients are not shown as they are somewhat arbitrary, depending on the
scaling of the eigenvectors. (d) Data traces in part b were reconstructed
using the first three principal components. The noise was reduced by this
operation.
238
11.8 Discussion
The methods presented in this chapter have the weakness that there is no
established connection between reflection coefficients, geology, and the information conveyed by the projection coefficients. Until we can say directly what
the value of a projection coefficient means relative to reflection coefficients,
these methods will remain an indirect, albeit powerful, indicator of stratigraphic variations.
11.9 Conclusions
In tests on synthetic data, principal component methods indicate where
changes in lithology thickness occur that are on the order of 2% of the dominant wavelength of the data. The signal-to-noise ratio in synthetic data was
increased from 4.0 to as high as 136.0 by reconstructing the data using a subset of the principal components. Improvements in S/N can also be made in
the data by operating on separate parts of the signal subspace before reconstructing data traces using selected principal components. In the real data case
presented, principal component methods provide an indicator of vertical
lithologic changes that are about 4% of the dominant wavelength of the data.
11.10 References
Andrews, H. C., and Patterson, C. L., 1976, Singular value decompositions
and digital image processing: IEEE Trans. Acous., Speech, and Sig.
Proc., 24, 26-53.
Devijver and Kittler, 1982, Statistical pattern recognition: Prentice/Hall International.
Done, W. J., Kirlin, R. L., and Moghaddamjoo, A., 1991, Two-dimensional
coherent noise suppression in seismic data using eigendecomposition:
IEEE Trans. on Geosci. and Remote Sensing, 29, 379-384.
Fuller, B. N., 1988, Seismic detection of upper cretaceous stratigraphic oil
traps in the Powder River Basin, Wyoming: Ph.D. thesis, Univ. of
Wyoming.
Hagen, D. C., 1982, The application of principal components to seismic data
sets: Geoexploration, 20, 93-111.
Huang, T. S., and Narendra, P. M., 1975, Image restoration by singular value
decomposition: Applied Optics, 14, 2213-2216.
239
Jones, I. F., and Levy, S., 1987, Signal-to-noise ratio enhancement in multichannel seismic data via the Karhunen-Loeve transform: Geophys.
Prosp., 35, 12-32.
Ranganathan, V., and Tye, R. S., 1986, Petrography, diagenesis and facies controls on porosity in Shannon sandstone, Hartzog Draw Field, Wyoming: AAPG Bull., 70, 56-69.
Tillman, R. W., and Martinsen, R. S., 1987, Sedimentologic model and production characteristics of Hartzog Draw field, Wyoming, A Shannon
shelf-ridge sandstone, in Tillman, R. W., and Weber, K. J., Eds., Reservoir characterization: SEPM special publications no. 40. 15-112.
Widess, M. B., 1973, How thin is a thin bed?: Geophysics, 38, 1176-1180.
240
Chapter 12
Eigenimage Processing of Seismic Sections
Tadeusz J. Ulrych, Mauricio D. Sacchi, and
Sergio L. M. Freire
12.1 Introduction
This chapter briefly reviews the important theoretical aspects of eigenimage processing and demonstrates the unique properties of this approach using
various examples such as the separation of up and downgoing waves, multiple
attenuation, and residual static correction. In particular, we will compare the
eigenimage technique to the well-known frequency-wave number f-k,
method, (Treitel et al., 1967), and discuss important differences which arise
especially with respect to spacial aliasing and the separation of signal and
noise.
In order to fully understand the similarities and differences of this
approach versus approaches described in the previous chapters, it is important
to begin with a little history. The first publication which introduced and
applied aspects of eigenimage processing to seismic data was the paper by
Hemon and Mace (1978). These authors investigated the application of a particular linear transformation known as the Karhunen-Love (KL) transformation. The KL transformation is also known as the principal component
transformation, the eigenvector transformation, or the Hotelling transformation (Anderson, 1967; Love, 1955). It has been used by various authors for
one- and two-dimensional data compression and to select features for pattern
recognition. Of particular relevance to the ensuing discussion is the excellent
paper by Ready and Vintz (1973) which deals with information extraction
and S/N improvement in multispectral imagery. In 1983, the work of Hemon
and Mace was extended by a group of researchers at the University of British
241
Columbia (Levy et al., 1983; Ulrych et al., 1983) which culminated in the
work of Jones (1985) and Jones and Levy (1987).
In 1988, Freire and Ulrych applied the KL transformation in a somewhat
different manner to the processing of vertical seismic profiling data. The
actual approach which was adopted in this work was by means of singularvalue decomposition (SVD), which is another way of viewing the KL transformation (the relationship between the KL and SVD transformations is discussed in this chapter; see also Chapter 3). In later works, Ulrych et al., (1988)
and Freire and Ulrych (1990) applied the SVD approach to various other
problems, including the attenuation of multiple reflections, and adopted the
nomenclature of eigenimage decomposition to this method of data processing.
Eigenimages were first introduced into the literature by Andrews and Hunt
(1977) in the context of image processing and, in our opinion, this description is one which is the most succinct for the purpose at hand.
A seismic section that consists of M traces with N points per trace may be
viewed as a data matrix X where each element xij represents the jth point of the
ith trace. A singular-value decomposition (Lanczos, 1961) transforms X into a
weighted sum of orthogonal rank one matrices which have been designated by
Andrews and Hunt (1977) as eigenimages of X. A particularly useful aspect of
the eigenimage decomposition is its application in the complex form. In this
instance, if each trace is transformed into the analytic form, then the eigenimage processing of the complex data matrix allows both time and phase shifts to
be considered. This is particularly important in the case of the correction of
residual statics.
12.2 Theory
We consider the data matrix X to be composed of M traces with N data
points per trace, the M traces forming the rows of X. The SVD of X is given
by, (Lanczos, 1961),
r
X
i ui vTi ,
(12.1)
i1
where ()T indicates transpose, r is the rank of X, ui is the ith eigenvector of

XXT, vi is the ith eigenvector of XT X, and i is the ith singular value of X. The
singular values i can be shown to be the positive square roots of the eigenval242
X
i ui vTi
i1
Figure 12.1.
Eigenimage decomposition of the data matrix X into the sum of

weighted eigenimages.
ues of the matrices XXT and XTX. These eigenvalues are always positive
because of the positive definite nature of the matrices XXT and XT X. In
matrix form, equation (12.1) is written as
T
X UV .
(12.2)
where the definitions of the matrices are clear from equation (12.1).
Andrews and Hunt (1977) designate the outer dot product uiviT as the ith
eigenimage of the matrix X. Owing to the orthonormality of the eigenvectors,
the eigenimages form an orthonormal basis which may be used to reconstruct
X according to equation (12.1). This concept is illustrated diagrammatically
in Figure 12.1. It is clear from equation (12.1) that the contribution of a particular eigenimage in the reconstruction of X is proportional to the magnitude
of the associated singular value. Since in the SVD the singular values are
always ordered in decreasing magnitude, it is possible, depending of course on
the data, to reconstruct the matrix X using only the first few eigenimages.
Suppose, for example, that X represents a seismic section and that all M
traces are linearly independent. In this case X is of full rank M, all the i are
different from zero, and a perfect reconstruction of X requires all eigenimages.
On the other hand, in the case where all M traces are equal to within a scale
factor, all traces are linearly dependent, X is of rank one and may be perfectly
243
reconstructed by the first eigenimage 1u1 v 1 . In the general case, depending

on the linear dependence which exists among the traces, X may be reconstructed from only the first few eigenimages. In this case, consider the data to
be composed of traces that show a high degree of trace-to-trace correlation.
Indeed, XXT is a weighted estimate of the zero lag covariance matrix of the
data X, and the structure of this covariance matrix, particularly the distribution of the magnitudes of the corresponding eigenvalues, indicates the parsimony or otherwise of the eigenimage decomposition. If only p, p r
eigenimages are used to approximate X, a reconstruction error is given by
r

k .
k p1
Freire and Ulrych (1988) defined band-pass XBP, low-pass XLP and highpass XHP eigenimages in terms of the ranges of singular values used. The bandpass image is reconstructed by rejecting highly correlated as well as highly
uncorrelated traces and is given by
q
X BP
i ui vTi ,
1 < p q < r.
(12.3)
ip
The summation for XLP is from i 1 to p 1 and for XHP from i q 1

to r. It may be simply shown that the percentage of the energy which is contained in a reconstructed image XBP is given by E, where
q
2i
ip
E ----r-------- .
(12.4)
2i
i1
The choice of p and q depends on the relative magnitudes of the singular values, which are a function of the input data. These parameters may, in general,
2
be estimated from a plot of the eigenvalues i i as a function of the index
i. This is reasonable given the form of equation (12.4). In certain cases, an
abrupt change in the eigenvalues is easily recognized. In other cases, the
change in eigenvalue magnitude is more gradual and care must be exercised in
244
the choice of the appropriate index values. Figure 12.2 illustrates the above
discussion in a simple fashion. Figure 12.2a represents a synthetic seismic section showing three reflectors, one of which is faulted. The section has been
corrupted with additive pseudo-white noise with a standard deviation of 20%
of the maximum amplitude. Figure 12.2b shows the variation of the relative
magnitudes of the eigenvalues. In this particular case Figure 12.2b shows that
the signal portion of X is contained in only the first two eigenimages. Indeed,
the first two eigenimages and the sum of these eigenimages, which are shown
in Figure 12.2c, d, and e, respectively, illustrate this point very clearly. We
note in particular that the second eigenimage bears the signature of the faulted
reflector and the highly correlated horizontal information appears in the first
eigenimage.
12.2.1
Eigenimages and the KL Transformation
As we have seen, decomposition of an image X into eigenimages is performed by means of the SVD of X. Many authors also refer to this decomposition as the Karhunen-Love or KL transformation. We believe however, that
the SVD and KL approaches are not equivalent theoretically for image processing and, in order to avoid confusion, we suggest the adoption of the term
eigenimage processing. Some clarification is in order.
A wide sense stationary process (t) allows the expansion

( t )
cn n ( t )
0 t T,
(12.5)
n1
where n(t) is a set of orthonormal functions in the interval (0,T) and the
coefficients cn are random variables. The Fourier series is a special case of the
expansion given by equation (12.5) and it can be shown that, in this case,
( t ) ( t ) for every t and the coefficients cn are uncorrelated only when
(t) is mean-squared periodic. Otherwise, ( t ) ( t ) only for 0 t T,
and the coefficients cn are no longer uncorrelated. In order to guarantee that
the cn are uncorrelated and that (t) (t) for every t without the requirement of mean-squared periodicity, it turns out that the n(t) must be determined from the solution of the integral equation
0 R ( t1 , t2 ) ( t2 ) dt2
T
( t 1 ) 0 < t1 < T,
245
(12.6)
Figure 12.2.
An example showing an abrupt change in eigenvalue magnitude :(a) synthetic seismic section with noise, (b) magnitude of resulting eigenvalues,
(c) first eigenimage, (d) second eigenimage, and (e) sum of first two
eigenimages.
246
where R(t1,t2) is the autocovariance of the process (t). Substituting the eigenvectors that are the solutions of equation (12.6) into equation (12.5) gives the
KL expansion of (t). An infinite number of basis functions is required to
form a complete set. However, for an N 1 random vector x the dimensionality N is finite, and we may write equation (12.5) in terms of a linear combination of N orthonormal basis vectors wi (wi1, wi2,, wiN)T as
N
xk
yi wik
k 1, 2, , N,
(12.7)
i1
which is equivalent to
x Wy ,
(12.8)
where W (w1, w2, , wN). Now only N basis vectors are required for completeness. The KL transformation or, as it is also often called, the KL transformation to principal components, is obtained as
T
y W x,
(12.9)
where W is determined from the covariance matrix Cx of the process

T
C x WW ,
(12.10)
and is the diagonal matrix of ordered eigenvalues.

A particularly interesting property of the transformation described by
equation (12.9) is its relationship to the entropy of a stochastic process. First
of all, define the entropy of a process x, assuming jointly Gaussian variables
with zero mean. Defining f(x) to be the Gaussian density function, it follows
that the entropy H(x) is given by
H ( x ) f ( x )ln f ( x ) d x
N
1 T 1
1
f ( x ) ---ln ( 2 ) - ln ( C x ) - x C x x dx ,
2
2
2
247
where |Cx| is the determinant of Cx. Since the integral is identically the expectation of the bracketed terms,
T
1
1
E [ x C x x ] E [ trC x xx ] trI N ,
where E[] is the expectation operator, and l ln(e), we obtain the final
expression
1
N
H ( x ) --ln ( C x ) ---ln ( 2e ) .
2
2
Since the determinant of a matrix is equivalent to the product of its eigenvalues, the above equation may be written as
N
1
N
H ( x ) -- ln ( i ) ---ln ( 2e ) ,
2i 1
2
which defines the entropy of the Gaussian process x.
Using the above definition of the entropy, Young and Calvert (1974) show
that, given a positive definite matrix W, maximizing the entropy of y in
T
equation (12.9), subject to w i w i 1 , i 1, 2,, N, results in W U,
T
2 T
where U is obtained from the eigendecomposition XX U U . In
other words, the principal component transformation constrains y to have
maximum entropy.
Let us now turn our attention to the problem of the KL transformation
for multivariate statistical analysis. In this case, we consider M row vectors xi,
i 1, M arranged as rows in an M N data matrix X. The M rows of the
data matrix are viewed as M realizations of the stochastic process x, and, consequently, the assumption is that all rows have the same row covariance matrix
Cr .The KL transform applied to X now gives
T
( y, y 2 y N ) Y W X .
(12.11)
An unbiased estimate of the row covariance matrix is given by

M
1
2
r ------1------- x i x Ti ------1------- XX T -----------C
-U U ,
M1
M 1i 1
M1
248
(12.12)
assuming a zero mean process for convenience. Since the factor M 1 does
not influence the orthonormal eigenvectors, we can see from equation (12.12)
and the definition of U that W U. Consequently, we can rewrite
equation (12.11) as
T
Y U X.
(12.13)
Substituting equation (12.2) into equation (12.13), we obtain

T
Y U UV
V T
(12.14)
The principal components contained in the matrix Y may be viewed as the

inner product of the eigenvectors of XXT with the data, or as the weighted
eigenvectors of XTX.
Since X may be reconstructed from the principal component matrix Y by
the inverse KL transformation
X UY ,
(12.15)
we may combine equations (12.14) and (12.15) to obtain

T
X UV .
(12.16)
Equation (12.16) is identical to equation (12.2), showing that, providing we

are considering a multivariate stochastic process, the SVD and the KL transformation are computationally equivalent.
We turn now to the problem at hand, which is image processing. In this
instance the situation encountered is essentially different. From a stochastic
point of view, we now have one realization, X, of a two-dimensional random
process. The KL transformation and the SVD of X will be the same only if we
assume the separability of the covariance matrix of the process into a product
of the covariance matrices of the rows and columns of the image, and if these
covariance estimates are computed from the one realization that is our image
(Gerbrands, 1981). It is clear that, in this case, the use of stochastic terminology, like the KL transformation, is not appropriate. On the other hand, eigen249
image decomposition, which depends on a deterministic decomposition using

the SVD, is perfectly descriptive. It should also be pointed out that in the case
when the SVD and KL decompositions are equivalent, as in the situation
described above, the KL transformation is generally referred to as the zero-lag
KL transformation since it is computed from only the zero-lag covariance
matrix. Applications of the full KL transformation to geophysical image
enhancement has been discussed by Marchisio et al. (1988).
12.2.2
Eigenimages and the Fourier Transform
An interesting aspect concerning eigenimages, which is also relevant in the

comparison of eigenimage and velocity filtering in both the t-x and the f-k
domains, is the behavior of eigenimages under Fourier transformation. To this
end, we take the Fourier transform, FT, of equation (12.1) to obtain
r
i F2 ( ui vTi ) ,
F2 [ X ]
(12.17)
i1
where F2 [] represents the 2-D FT. It is clear from equation (12.17) that the
f-k representation of X may also be viewed in terms of eigenimages in that
T
domain. Further, since the M rows of the ith eigenimage ( u ki v i , k 1, m )
are equal to within a scale factor, we may write
T
F 2 [ u i v i ] F 1 [ u i ]F 1 [ v i ] ,
where F1 [] represents the 1-D FT. Consequently we may write
equation (12.17) as
r
F2 [ X ]
i F1 [ ui ]F1 [ vi ]T .
(12.18)
i1
This expression demonstrates that the 2-D FT of any matrix X may be

obtained as the weighted sum of 1-D FTs of the eigenvectors associated with
XXT and XTX. In the case when X may be sufficiently well reconstructed from
only the first few eigenimages, its 2-D FT may be efficiently computed using
only a small number of 1-D FTs.
250
12.2.3
Computing the Filtered Image
Finally, we turn to the problem of the actual computation of the eigenimage filtered section. Even with the availability of efficient algorithms the computation of the full decomposition may be a time consuming task. This is
particularly true in the seismic case where the dimension N may be large. Fortunately, in our case, the dimension M is often considerably less than N and
we are also concerned with the reconstruction of X from only a few eigenimages. Consequently, we can reconstruct the filtered section, XLP say, rapidly by
computing only those eigenvectors of the (M M) matrix XXT which enter
into the summation in equation (12.1). In order to make the derivation quite
general, we will concern ourselves with the construction of a general XBP , a
band-pass SVD data matrix, using the singular values pp 1q
where p 1, q r, and r is the rank of the matrix. We wish to compute XBP
without the necessity of computing the complete SVD of the data matrix X.
Using equation (12.2), the band-pass matrix XBP is given by
T
X BP U BP BP V BP ,
(12.19)
where UBP , VBP , and BP are equal to U, V, and , respectively, with the
exception that the first p 1 and the last r q columns of each matrix are
zeroed. Without loss of generality we consider the case where M, the number
of traces, is less than N, the number of time samples per trace, and we compute the covariance matrix XXT of smaller dimension which allows the
decomposition
T
XX U U .
T
With the availability of U we compute U BP X , which, following

equation (12.2) may be written as
T
U BP X U BP UV .
(12.20)
Owing to the orthonormality of the eigenvectors it follows that

T
U BP U BP ,
and substituting equation (12.21) into equation (12.20), we obtain
251
(12.21)
U BP X BP V .
(12.22)
Because of the zero values along the diagonal of BP , it is clear that

T
BP V BP V BP .
Using the above expression in equation (12.22),
T
U BP X BP V BP .
(12.23)
Finally, substituting equation (12.23) into equation (12.2) we obtain

T
X BP U BP U BP X .
(12.24)
It is interesting to note from equation (12.24) that in the case when p 1 and
q r M,
T
X LP U LP U LP X X .
T
It should be realized however, that the product ULP U LP is not, in general,

equal to the identity matrix. The extension of the above discussion to the
complex case is straight forward requiring only that the transpose of a vector
or matrix be replaced by the complex conjugate transpose.
12.3 Applications
In this section we will illustrate with synthetic and real data examples
some of the applications of eigenimage processing to seismic sections and,
where applicable, we will compare the results with those obtained using f-k
filtering. In particular, we are interested in addressing such issues as signal-tonoise enhancement, wavefield decomposition, and residual static correction.
12.3.1
Signal to Noise Enhancement
We consider a very simple example which serves to illustrate very clearly

the fundamental difference in the eigenimage and f-k approaches. The example considered, which is shown in Figure 12.3a, represents a synthetic stacked
section of 30 traces constructed from a 1-D model containing three horizontal
252
layers. Only primary reflections are considered, and the signal has been corrupted with additive white noise with a standard deviation equal to 20% of
the maximum signal amplitude. Figure 12.3b shows the 2-D amplitude spectrum of the noiseless input and is indicative of the horizontal character of
the input. To separate the signal from the background noise, we applied an f-k
filter with a cutoff of 1ms/trace to 1ms /trace. The 2-D amplitude spectrum of the output is shown in Figure 12.3c. Figure 12.3d illustrates the 2-D
amplitude spectrum of the output of the eigenimage filter with q 1, i.e., we
have used only the first eigenimage. Simple visual inspection shows immediately the close similarity between the actual noiseless signal spectrum and that
obtained using eigenimage decomposition. The eigenimage filter has recovered almost the exact f-k signature of the input, including the symmetry with
respect to the f-axis. The f-k filter has certainly increased the S/N, but at the
expense of signal distortion. Another quite striking view of the two different
filtering schemes is illustrated in Figures 12.3eg, which show the 2-D amplitude spectra of the actual noise and the noise rejected by the f-k and the eigenimage filters, respectively. As can be seen, the spectrum of the noise output
from the eigenimage filter is almost exactly that of the input, whereas the spectrum of the noise rejected by the f-k filter shows clearly that this filter cannot
separate noise which occupies the same f-k band as the signal. The difference is
that whereas the f-k filter rejects f-k components, the eigenimage filter rejects
uncorrelated components.
To gain deeper insight into the eigenimage filter, it is interesting to consider this example in a little more detail. Let S and N represent the noiseless
data matrix, which is composed of M identical traces, and the noise matrix
which is uncorrelated with the data, respectively. The input matrix is
X S N as illustrated in Figure 12.3a. Since S is of rank one, it may be
reconstructed from the first eigenimage. Using the reconstruction
T
X LP U LP U LP X ,
we may write
T
S u1 u1 S ,
where ul is the eigenvector associated with the largest eigenvalue and, in this
T
particular case, the matrix u 1 u 1 is composed of elements each equal to 1/M.
253
Figure 12.3.
Amplitude spectra of a synthetic stacked seismic section: (a) synthetic

seismic section, (b) amplitude spectrum of noisless section, (c) amplitude
spectrum of (a) after f-k filtering, and (d) amplitude spectrum of eigenimage filter using the first eigenimage.
254
Figure 12.3.
Amplitude spectra of a synthetic stacked seismic section (continued): (e)

amplitude spectrum of the noise comprising the seismic section, (f) amplitude spectrum of the noise rejected by the f-k filter used for (c), and (g)
amplitude spectrum of the noise rejected by the eigneimage filter for (d).
Since N is composed of traces of uncorrelated random noise, the data

covariance matrix Cx is a sum of the signal and noise covariance matrices, i.e.,
Cx Cs Cn
Cs N I ,
2
where I is the M M identity matrix and N is the noise variance. Consequently
255
C x u u ,
( C s C n )u u ,
(12.25)
and
2
C s u ( N )u .
(12.26)
From equations (12.25) and (12.26) it is evident that the eigenvectors of XXT
2
are insensitive to white noise, while the eigenvalues are increased by N . ConT
sequently, the first eigenimage of X is u 1 u 1 and
T
u1 u1 X u1 u1 S u1 u1 N
T
S u1 u1 N .
(12.27)
Equation (12.27) shows the behavior of the SVD filter very clearly. The
signal is recovered fully and the noise is suppressed in a manner equivalent to
an average stack. In the more general case, when the signal may vary from
trace to trace, the rank of S is greater than one and the noise will be suppressed
by an optimally weighted average. This average is optimum in the sense in
that the weight vectors ui are obtained as a result of a maximum variance criterion. We have computed the S/N for this example using the following expression
T
S N T r { SS } T r { ( Y S ) ( Y S ) } ,
where Y is the filtered matrix and Tr {} represents the trace of the matrix. The
actual S/N of the input was 0.6. The values computed for the two filtering
schemes were 3.20 for the f-k filter and 4.24 for the eigenimage filter, a 5-dB
improvement.
12.3.2
Wavefield Decomposition
Application of eigenimages to the problem of wavefield decomposition

entails the formulation of a model of the input data. The model we consider is
256
equivalent to the model used by many authors in the application of spectral

matrix techniques to the same problem.
12.3.2.1 Event identification
Specifically, we consider a seismic section which is composed of L events,
L M, corrupted with additive random noise. Each of the L events has constant dip across the section and sik(t), the ith event on the kth trace, may be represented as
s ik ( t ) a ik w i ( t i ( k 1 )X V i ) ,
(12.28)
where aik is the amplitude of the ith event on kth trace, wi is the wavelet associated with the ith event, i is the time of occurrence of the ith event on the first
trace. The spacial interval is X and Vi is the apparent velocity of the ith event.
Each trace in the section is now represented by
L
xk ( t )
sik ( t ) nk ( t ) ,
(12.29)
i1
where nk(t) represents the random noise on the kth trace.

Although, as we will see later, this linear moveout model is not essential
for eigenimage processing, it is a very convenient one for the discussion which
follows.
As discussed in the theoretical section and demonstrated in Figure 12.2,
the parsimonious distribution of the singular values associated with X depends
on the vertical coherency associated with this matrix. In consequence, the
application of eigenimage decomposition for the purpose of event separation
always requires a preprocessing of data in the form of a time shifting operation
such that the required signal is placed in an approximately vertical position
(viewing the rows of X as constituting the M traces of the section). Assuming
the model given by equations (12.28) and (12.29), for each event, we must
obtain a priori information about i and Vi and then perform the required
time shift. The a priori information may be obtained either visually or by
means of some type of coherency measure. Since eigenimage processing
involves the computation of the eigenvalue structure of the row covariance
257
matrix of the data, we will consider a coherency measure based on these eigenvalues.
It is evident from the discussion concerning equations (12.25), (12.26),
and (12.27) that, for an input section consisting of one event only [L 1 in
equation (12.29)], which has been aligned to simulate zero moveout, the ideal
eigenvalue distribution of the row covariance matrix consists of the major
2
2
2
eigenvalue 1 S N , where S is the variance of the signal, together
2
with M 1 eigenvalues i N for 2 i M. Various measures which
indicate the presence of signal immediately suggest themselves. For example,
we can define measures K1, K2, and K3 given by
1
1

- , K 3 ----1 .
K 1 ----M---------- , K 2 -------------M
2
i 1 i
i 2 i
These measures were applied by Ulrych et al.,(1983) and Jones (1985) in
velocity analysis of CMP sections. As pointed out by Key and Smithson
(1990), however, although important, these measures fail to take into account
the presence of the noise variance in the energy estimate. In an attempt to
overcome this shortcoming, Key and Smithson (1990) present a measure
which appears to have high sensitivity and resolution. We briefly outline this
approach here and compare the various measures from the point of view of
event identification and separation.
First, since Cr XTX is only an estimate of the true covariance matrix of
the data, an estimate of the noise variance is determined from the eigenvalues
of Cr as
M
2

N
1
------------- i .
M1i2
Using this estimate, an estimate of the signal variance is computed as

2

S 1
N,
giving an estimated S/N
2

( M 1 ) 1 i 2 i
S
-.
S N ----2- ------------------------------------------M
i 2 i

N
258
(12.30)
Due to the sensitivity of the eigenvalues to the presence of signal, Key and
Smithson (1990) formulate a weighting function WML which is a log-generalized likelihood ratio which tests the hypothesis that no signal is present.
M
W ML
( i 1 i M )
MN ln -------------------------- .
1M
M
( i 1 i )
(12.31)
The point behind WML is that, given precise knowledge of Cr , in the pres2
ence of noise only i n for i 1 to M and hence WML 0. In the pres2
ence of signal only, 1 S , i 0 for j 2, and WML . WML thus
provides a strong discriminant in the presence of signal. Key and Smithson
(1990) combine equations (12.30) and (12.31) to obtain a new coherency
measure KML given by
K ML W ML S N .
(12.32)
If the events are indeed linear, M and N in equations (12.30) and (12.31) may
be taken to represent the full seismic section. Since, in practice, events may
often show nonlinear moveout, KML is computed using suitable windows
which then define the indices M and N.
An example of the use of the measures we have discussed for event identification is illustrated in Figure 12.4. Figure 12.4a shows an input section composed of two equal amplitude dipping events of opposite polarity with
additive random noise. This example is close to the one used by Rutty and
Jackson (1992) in wavefield decomposition using spectral matrix techniques.
The dips are twice the sample interval and the S/N is 20. Figure 12.4b shows
the eigenvalue variation as a function of dip and illustrates very clearly the
philosophy behind the various measures which we compare. Specifically,
Figures 12.4ce show the measures K1, K2, and K3, respectively. The various
components of the Key and Smithson (1990) measure, KML, are shown in
Figures 12.4fh.
Although KML certainly exhibits a far superior resolution to the other three
measures, the problem is that this measure assumes that only one event exists
in the window of interest. The weighting function given by equation (12.31)
completely dominates the measure, and the information in the S/N, computed using equation (12.30), is swamped. In this particular instance, the best
259
Figure 12.4.
An example of seismic event detection: (a) two dipping, crossing seismic

events with noise added, (b) eigenvalue variation versus dip.
260
Figure 12.4.
An example of seismic event detection (continued): (i) dipping event 1 recovered by low-pass eigenimage filtering, (j) dipping event 2 recovered by
low-pass eigenimage filtering, (k) sum of dipping events i and j.
262
measure is K3, but much work in this regard needs to be done. One path we
are at present pursuing is the definition of norms based on the concept of
entropy.
The actual separation of the dipping events using low-pass eigenimage filtering is shown in Figures 12.4i and j. We note the low amplitudes in the middle of the reconstructed section which correspond to destructive interference
of the events as seen in the input data. The improvement in the S/N is shown
in Figure 12.4k which is the sum of the two low-passed reconstructed events.
12.3.2.2 Vertical Seismic Profiling
A classic example of the use of eigenimage decomposition is in application
to vertical seismic profiling (VSP) for the purpose of separation of up and
downgoing waves (Freire and Ulrych, 1988). The processing sequence is illustrated in Figure 12.5 in schematic fashion. Figure 12.5a is the input section
showing the up and downgoing events and Figure 12.5b is the same section
following time shifting of each trace by appropriate amounts. The time shifting may be performed either in the time domain, which will in general require
some type of interpolation, or, as is our preference, in the frequency domain
by operating on the phase of each trace. This latter approach requires two
FFTs per trace, but obviates interpolation. Figure 12.5c and d show the
shifted up- and downgoing waves, respectively, which are the outputs of bandpass and low-pass eigenimage filtering. We determine p and q which are
required for this purpose [equation (12.3)] are determined from an examination of the behavior of the relative magnitudes of the eigenvalues of the matrix
XXT as previously discussed. The final steps in the processing are illustrated in
Figures 12.5eg, and consist of time shifting the recovered up and downgoing
waves to their original positions, transferring the upgoing components into
two-way traveltime using computed first breaks, and stacking to produce a
final trace.
An example from the state of Bahia in Brazil is shown in Figure 12.5.
Figure 12.5a shows the input VSP section and Figure 12.5b illustrates the
time-shifted section reconstructed from the first four eigenimages. This section is in fact the XLP reconstruction with q 4 and represents the downgoing waves. Figure 12.5c is XBP with p 5, q 41, shifted to the two-way
traveltime, and low-pass reconstructed with p 6. This figure represents the
separated and signal to noise enhanced upgoing wave section and compares
with the f-k fan-filtered upgoing wave section shown in Figure 12.5d. Striking
263
Figure 12.5.
A schematic illustration of the process used to separate up- and down-going VSP events: (a) input VSP section, (b) time shifted version of a, flattening the down-going events, (c) time shifted up-going events after
band-pass eigenimage filtering, (d) time shifted down-going events after
low-pass eigenimage filtering, (e) down-going events after reversing the
time shift, (f) up-going events after reversing the time shift.
264
and important differences may be seen, particularly in the first 2.2 s of the sections. Of special interest is the reflection at 1.9 s which has been very well
recovered in the eigenimage section whereas f-k fan-filtering was unable to do
so.
In comparing eigenimage and f-k filtering, it is important to point out
that whereas, due to the periodic nature of the FT, spacial aliasing occurs in fk filtering whenever the maximum spacial frequency exceeds 1/2 x, where x
is the spacial sampling, eigenimage reconstruction does not entail a periodic
assumption and, consequently, spacial aliasing does not arise.
12.3.3
Residual Static Correction
A very interesting application of eigenimage decomposition occurs when

the data matrix is mapped into the complex domain by forming the analytic
data matrix. In this application, we are concerned not only with time shifts in
the data but also with phase shifts that may occur in practice for various reasons as discussed by Levy and Oldenburg (1987). The philosophy behind this
approach is the assumption that, under certain conditions, frequency independent phase shifts in the seismic pulse may be as being viewed approximated by equivalent time shifts.
Let X(f ) F1[x(t)] represent the FT of x(t). Then,
F1[x(t to)] e
i2ft o
X(f) .
We now assume that in the narrow seismic bandwidth

2ft o 2f o t o .
This approximation has been used by Levy and Oldenburg (1987) very
successfully in modeling residual wavelet phase variations. We can now write
F1 [ x ( t to ) ] e
e
i
+i
X (f ), f 0
X (f ), f 0
and hence
F 1 [ x ( t t o ) ] ( cos i sgn f sin ) X ( f ) , all f ,
transforming back to time
265
Figure 12.6.
An example of separating up- and down-going VSP events: (a) input VSP
section, (b) time shifted section reconstructed from first four eigenimages, the estimate of the down-going events, (c) estimate of up-going waves
from band-pass eigenimage filtering, d) up-going waves estimated by f-k
filtering.
266
x ( t t 0 ) [ x ( t )e
i
(12.33)
where x ( t ) x(t) iH[x(t)] is the analytic signal and H [] represents the

Hilbert transform.
Equation (12.33) shows that the frequency independent phase shift may,
under the assumption of a narrow bandwidth, be approximated by the time
shift to. From the point of view of residual phase shifts in the recorded seismic
wavelet w(t), equation (12.33) may be written as
o ( t )e i ] ,
w(t) [w
where w
o ( t ) is the analytic true wavelet.
We now consider an example of the correction of residual statics when the
seismic wavelet contains a degree of frequency independent phase variation.
The method we have developed is a two-stage approach. In the first stage, we
remove the time shift associated with each individual trace that is determined
by crosscorrelation with the first complex principal component of the section.
This component is determined from the complex form of equation (12.9) as
H
y u 1 X , where u1 is the major eigenvector of XXH, and the H designates
the complex conjugate. In the second stage, any frequency independent phase
shifts are removed by rotating each trace with the phase determined from u1,
which contains the information about , and the final section is produced as a
low-pass eigenimage reconstruction.
The results of the residual static correction method are shown in
Figure 12.7. Figure 12.7a shows a synthetic section which is composed of one
real seismic trace repeated 24 times. Figure 12.7b shows the synthetic input
section which consists of the traces of Figure 12.7a subjected to random time
and phase shifts and corrupted with different realizations of white noise with
S/N of 20. Figure 12.7c illustrates the residually corrected section using the
standard approach of cross-correlating against a stacked trace. Figure 12.7d
shows the residually corrected section of Figure 12.7c phase corrected using
the phase determined from the eigenvector associated with the major eigenvalue. Figure 12.7e and f show the results of the eigenimage approach.
Figure 12.7e illustrates the low-pass reconstructed section assuming only time
shifts and Figure 12.7f shows the final time and phase corrected and low-pass
267
reconstructed section. This example clearly demonstrates the importance of

the phase correction if, indeed, such phase shifts occur in the data.
An application of this technique to real data is shown in Figure 12.8.
Figure 12.8a shows the input data section which was obtained in Bahia, Brazil. The result of conventional residual static correction is illustrated in
Figure 12.8b. Eigenimage residual corrections are illustrated in the next two
figures. Figure 12.8c shows the time correction only, using crosscorrelation
with the first principal component, and Figure 12.8d shows the result of both
time and phase correction. Comparison of these figures clearly shows that the
time plus phase corrected eigenimage section shows better horizontal continuity and better definition of the fault located in the middle of the section.
12.4 Discussion
Eigenimage reconstruction is a nonlinear and data dependent filtering
method which may be applied either in the real or complex domain to achieve
a number of important processing objectives. Since seismic sections are, in
general, highly correlated trace-to-trace, eigenimage reconstruction is a parsimonious representation in the sense that the data may be reconstructed from
only a few images. A natural consequence of such a reconstruction is, as we
have shown, an improvement of the S/N. Eigenimage reconstruction has a
capacity similar to f-k filtering to remove events that show an apparently different phase velocity on the section. The actual process by which this is
accomplished is quite different to that entailed in f-k filtering. In the former
approach, events are removed which do not correlate well from trace to trace.
In the latter, events are rejected which possess different f-k signatures. One of
the consequences of this difference is that eigenimage filtering is not subject to
spacial aliasing in the sense of f-k filtering. However, eigenimage reconstruction encounters similar difficulties as does the f-k approach in separating
events with similar dips.
As we have mentioned, the linear moveout model is not essential in eigenimage processing. In fact, Ulrych et al. (1983) and Key and Smithson (1990)
employed a hyperbolic moveout model to estimate stacking velocities in shot
gathers. A very fine example of S/N enhancement using eigenimages on
stacked seismic sections which contain highly curved events has been recently
presented by Al-Yahya (1991) who applied the filter for selected dips and then
combined the results for all dips to produce the composite enhanced section.
268
Figure 12.7.
A synthetic example of residual statics correction using eigenimage decomposition: (a) synthetic section formed by repeating one trace 24
times, (b) input section formed by subjecting the section in a to random
time and phase shifts and adding white noise, (c) processed section with
residuals estimated using standard approach of correlating against a
stacked trace, (d) section in c phase corrected using phase -estimate from
eigenvector associated with major eigenvalue, (e) processed section using
low-pass eigenimage reconstruction of section b, assuming only time
shifts, (f) phase correction determined from major eigenvector, applied to
section (b).
269
Figure 12.8.
An example of residual statics correction using seismic data: (a) input data
section, (b) application of convention residual static correction.
270
Figure 12.8.
An example of residual statics correction using seismic data (continued):

(c) eigenimage time correction for residual statics, d) eigenimage time and
phase correction for residual statics.
271
Much of the subject matter in this book involves the use of the spectral
matrix. It is appropriate, therefore, that we comment on the correspondence
between eigenimage and spectral matrix techniques. The spectral matrix is
formed in the frequency domain as the complex covariance matrix at each frequency. For linear events in time, each frequency contains a sum of L harmonics associated with the L events on the section. Separation of the events is then
achieved by means of averaging in frequency and/or space using window functions which depend on the input data. This is possible because of the redundancy of information which exists in the frequency-space domain. Unlike the
eigenimage technique, a priori information concerning the dips of the events
is not required. However, as shown by the recent work of Rutty and Jackson
(1992), this entails certain penalties which manifest themselves as edge effects
and the necessity that events to be separated have different energies. These
points are well illustrated by a comparison of Figures 12.4i and 12.4j in this
chapter with Figures 15.3a and 15.3b in Rutty and Jackson (1992). These latter figures show the edge effects in the separated events, even when the events
have different energies, at the same time illustrating that, unlike the eigenimage approach, amplitudes of the events are recovered. At the time of this writing, we are engaged in research that involves combining the spectral matrix
and eigenvector methods to develop an approach that reflects the best of both
techniques.
12.5 References
Al-Yahya, K.M., 1991, Application of the partial Karhunen-Love transform
to suppress random noise in seismic sections: Geophys. Prosp., 39, 7793.
Anderson, T. W., 1971, An introduction to multivariate statistical analysis:
John Wiley & Sons, Inc.
Andrews, H. C., and Hunt, B.R., 1977, Digital image restoration: PrenticeHall, Inc., Signal Processing Series.
Freire, S. L. M., and Ulrych, T. J., 1988, Application of singular value decomposition to vertical seismic profiling: Geophysics, 53, 778-785.
Freire, S. L., and Ulrych, T. J., 1990, An eigenimage approach to the attenuation of multiple reflections: Buturi-Tansa, 43, 1-13.
Gerbrands, J.J., 1981, On the relationship between SVD, KLT, and PCA, pattern recognition: 14, 375-381.
272
Hemon, C.H., and Mace, D., 1978, Essai d'une application de la transformation de Karhunen-Love au traitement sismique: Geophys. Prosp., 26,
600-626.
Jones, I. F., 1985, Applications of the Karhunen-Love transform in reflection
seismic processing: Ph.D. thesis, Univ. of British Columbia.
Jones, I. F., and Levy, S., 1987, Signal-to-noise ratio enhancement in multichannel seismic data via the Karhunen-Love transform: Geophys.
Prosp., 35, 12-32.
Key, S. C., and Smithson, S. B., 1990, New approach to seismic-reflection
event detection and velocity determination: Geophysics, 55, 10571069.
Lanczos, C., 1961, Linear differential operators: D. Van Nostrand Co.
Levy, S., Jones, I. F., Ulrych, T. J., and Oldenburg, D. W., 1983, Applications
of common signal analysis in exploration seismology: 53rd Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 325-328.
Levy, S., and Oldenburg, D.W., 1987, The deconvolution of phase shifted
wavelets: Geophysics 47, 1285-1294.
Love, M., 1951, Probability theory: D. Van Nostrand Co.
Marchisio, G., Pendrel, J. V., and Mattocks, B. W., 1988, Applications of full
and partial Karhunen-Love transformation in geophysical image
enhancement, 58th Ann. Internat. Mtg., Soc. Expl. Geophys.,
Expanded Abstracts,1266-1269.
Ready, R. J., and Wintz, P. A., 1973, Information extraction, S/N improvement and data compression in multispectral imagery: IEEE Trans.
Communications, 21, 1123-1130.
Rutty, M. J., and Jackson, G. M., 1992, Wavefield decomposition using spectral matrix techniques: Exploration Geophysics, 23, 293-298.
Treitel, S., Shanks, J. L., Frazier, C. M., 1967, Some aspects of fan filtering:
Geophysics, 32, 789-800.
Ulrych, T. J., Freire, S. L., and Siston, P., 1988, Eigenimage processing of seismic sections: 58th Ann. Internat. Mtg., Soc. Expl. Geophys.,
Expanded Abstracts, 1261-1265.
Ulrych, T. J., Levy, S., Oldenburg, D. W., and Jones, I. F., 1983, Applications
of the Karhunen-Love transformation in reflection seismology: 53rd
Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, (page
numbers).
273
Young, T. Y., and Calvert, T. W., 1974, Classification, estimation and pattern
recognition: American Elsevier Publishing Co., Inc.
274
Chapter 13
Single-Station Triaxial Data Analysis
G. M. Jackson, I. M. Mason, S. A. Greenhalgh
13.1 Introduction
In a high signal-to-noise environment, a single triaxial geophone can provide estimates of the polarization state of a seismic arrival. Knowledge of the
mode of the event (e.g., P-wave) and elastic properties of the host rock can be
used to infer the direction of propagation. Alternatively, knowledge of the
direction of propagation can be combined with the polarization state to identify the various wave modes and characterize the elastic properties of the host
rock. Polarization analysis might, for example, allow body waves (rectilinear
polarization) to be distinguished from surface waves (elliptical polarization).
Polarization measurements are therefore important in many areas of seismology.
Polarization analysis can be achieved efficiently by treating a time window
of a single station triaxial recording as a matrix and doing a singular-value
decomposition (SVD) of this seismic data matrix. SVD is a standard matrix
algebra technique which produces both an eigenanalysis of the data covariance
(cross-energy) matrix and a rotation of the data onto the directions given by
the eigenanalysis (Karhunen-Love transform). Before proceeding with the
SVD approach, however, it is necessary to discuss the selection of the data
entering the polarization analysis.
13.2 Time Windows in Polarization Analysis

An instantaneous triaxial signal does not provide polarization information.
It is necessary to observe the signal for some period of time. Consequently, all
polarization analysis is performed over a selected time window. A time win-
275
dow is implicit even in the extraction of instantaneous polarization'' using

the analytic signal, since the imaginary part of the analytic signal is the Hilbert
transform of a window of the real trace. The selection of the window around
the arrival of interest is critical to the success of polarization analysis.
Three requirements constrain the choice of analysis window: first, the
window should contain only one arrival; second, the window should be such
that the ratio of signal energy to noise energy is maximized; and third, the
window should be as long as possible so as to allow the discrimination of noise
from signal.
The linear superposition of many different seismic arrivals at a single triaxial geophone cannot, in general, be reversed, so unambiguous identification of
the interfering arrivals can only be achieved if there is a predictable relationship between the interfering waves (see, for example, Dankbaar, 1985). In
general, interpretation of the results of a polarization analysis requires some
confidence that only a single arrival was responsible for the data in the analysis
window. The observation of hodograms and correlation with any neighboring
stations are commonly used to detect interference.
Polarization estimates also lose their reliability with increasing noise levels.
Estimates of the minor axis for an elliptically polarized arrival are particularly
vulnerable to noise. As the ratio of minor-to-major axes decreases, this problem worsens. Since the instantaneous signal-to-noise power ratio varies with
time, the relative magnitude of signal energy and noise energy depends on the
window selected for analysis.
One usually has no prior knowledge of what is noise and what is signal.
Under these conditions it is necessary to define noise as the component of
variation that is uncorrelated between triaxial channels. Observations through
a certain time are necessary to identify mutually orthogonal variations as such.
In practice, the windows are chosen to be the smaller of the arrival duration
and arrival separation. In the examples presented below, the window is deliberately extended beyond the duration of the synthetic arrivals for the purposes
of illustrating SVD.
13.3 The Triaxial Covariance Matrix

A triaxial station yields three traces, xx, xy and xz, that form the columns of
the data matrix X (See Table 13.1 for all matrix definitions). Polarization
analysis is usually done by eigenanalysis of the cross-energy matrix M, which is
the inner product of the data matrix X with itself. See Flinn (1965), Montal276
betti and Kanasewich (1970), Esmersoy (1984), and Jurkevics (1988) for
more detail. We have
n
M (X X)
i1
xx ( i )
x x ( i )x y ( i )
xy ( i )
x y ( i )x x ( i )
x z ( i )x y ( i )
x z ( i )x x ( i )
x x ( i )x z ( i )
x y ( i )x z ( i )
xz ( i )
. (13.1)
Table 13.1 Definitions of matrices occurring in the SVD analysis

of a triaxial recording.
Definition
Matrix
: The data matrix. The columns x are the traces of the triaxial recording. The matrix is of dimension n 3, where n is the number of
samples in the window.
: The cross-energy matrix (3 3) for the triaxial traces in the window.

M is the matrix XTX. Eigenanalysis of this matrix produces eigenvalues 1, 2 and 3 corresponding to eigenvectors v1, v2, v3.
: The Karhunen-Love transform of the data matrix X. K (n 3) is X

rotated into the Cartesian frame v1, v2, v3.
: An n by 3 matrix produced by SVD of matrix X. The columns of U

are the eigenvectors of XXT. Although XXT is an n by n symmetric
matrix, it is only of rank 3 and consequently has only 3 eigenvectors.
: A diagonal matrix (3 3) produced by SVD of X. The diagonal elements 1, 2, 3 are the singular values. Each singular value is the
square root of the corresponding eigenvalue of XTX or XXT (they
share eigenvalues). Since XTX M, 1, 2, 3 are the square roots
of the eigenvalues 1, 2, 3.
: A 3 3 matrix produced by SVD of X. The columns of V are the

eigenvectors of XTX. Since XTX M, the columns of V are the
eigenvectors v1, v2, v3 of the cross-energy matrix M.
The superscript T denotes matrix transposition.

Eigenanalysis of the cross-energy matrix M provides a principal-components analysis of the energy in the time window. The energy expectation in a
277
triaxial trace can always be decomposed into three components corresponding

to orthogonal (in time) variations along three mutually perpendicular axes.
The magnitude of these energy components is given by the eigenvalues (1,
2, 3) of the cross-energy matrix M and the direction of the components are
the corresponding eigenvectors (v1, v2, v3). The eigenvalues provide estimates
of signal and noise energy and the degree of rectilinearity of polarization
(Esmersoy, 1984).
Polarization analysis is usually completed by rotating the data X into the
eigenvector frame (v1, v2, v3). The rotated data K are obtained by the matrix
multiplication:
K XV .
(13.2)
Matrix K is the Karhunen-Love (KL transform) or principal-component

transform of the data X. The rotated signal components k1, k2, k3 are mutually orthogonal over the window, i.e.,
k1 k2 k1 k3 k2 k3 0
(13.3)
Here singular-value decomposition of the triaxial data matrix X is used to

achieve this principal component transformation. Previous applications of
SVD and KL transforms to matrices composed of seismic data include the first
trace of the KL transform as an alternative to the stack trace (Hemon and
Mace, 1978), wavelet estimation and velocity analysis (Ulrych et al., 1983;
Levy et al., 1983), noise reduction on stack sections and multiple attenuation
(Jones and Levy, 1987), and upgoing/downgoing wavefield separation in VSPs
(Freire and Ulrych, 1988). Here the SVD of a triaxial recording is illustrated
using two synthetic examples: a linearly polarized arrival with noise in
Figure 13.1 and an elliptically polarized arrival (noise-free and with noise) in
Figures 13.2 and 13.3.
13.4 Principal Components Transforms by SVD

Singular-value decomposition allows the data matrix X to be expressed as
the product of 3 matrices:
T
X UWV .
278
(13.4)
Figure 13.1.
SVD analysis of a synthetic example of a linearly polarized arrival with

added noise. The polarization simulates a P arrival with an elevation of
24 degrees and an azimuth of 116.5 degrees.
279
Figure 13.2.
SVD analysis of a synthetic example of a noise-free elliptically polarized arrival.
280
Figure 13.3.
SVD analysis of a synthetic example of an elliptically polarized arrival with

noise added.
281
The columns of matrix V are v1, v2, v3: the eigenvectors of the crossenergy matrix M. W is a 3 3 diagonal matrix with the singular values 1,
2, 3 as the diagonal elements. Each singular value 1 is the positive square
root of the corresponding eigenvalue i of the cross-energy matrix M. Each
column of matrix U is the same column of the rotated data matrix K divided
by the corresponding singular value. This follows from
K XV
T
UWV V
(SVD of X)
UW
(V is orthonormal).
(13.5)
U is the signal matrix X after both projection into the Cartesian frame of the
eigenvectors of the cross-energy matrix M and normalization by division with
the singular values. Since the eigenvalues of M are the energies of the principal components in the window, the singular values are proportional to the
amplitudes of the data principal components. Division of the columns of K
by the singular values produces the n-dimensional unit vectors u1, u2, and u3
describing the time behavior of the data principal components.
The SVD of data matrix X can also be written as the sum of three eigenimages or principal components:
X UWV
i ui vTi
(13.6)
i1
E1 E2 E3
Each eigenimage is the outer product of a column of U with a column of
V weighted by the corresponding singular value. Each eigenimage is a principal component of the data (trace of the KL transform) expressed in the Cartesian frame of the recording. The eigenimages are mutually orthogonal in that
no one eigenimage can be reconstructed from a combination of the other two.
Superposition or stacking of the eigenimages reconstructs the data X.
282
13.5 Analysis of the Results of SVD

In generating Figure 13.1, spikes of appropriate amplitude were passed
through a zero phase Gaussian band-pass filter to simulate a single arrival.
The portable random number GASDEV'' of Press et al. (1986, p. 203) was
used to generate noise uncorrelated between channels. The normally distributed random numbers were then passed through the same band-pass filter and
scaled to generate noise with an energy in the window equal to a quarter that
of the signal.
The two forms of SVD (X UWVT and X E1 E2 E3) are presented graphically in all synthetic examples. The signal in Figure 13.1 is concentrated on the first trace of the KL transform and the corresponding first
eigenimage. The least-squares best estimate of the polarization direction is
recovered from the first column of the matrix V (top row of VT). Note that
the direction estimate in the presence of noise (-0.414, 0.834, 0.364) is not
exactly that used in the construction of the signal (-0.408, 0.817, 0.408). The
first trace of U multiplied by the first singular value (diagonal element of W) is
the first trace of K, which is the projection of the data along v1. The time variation u11 along the direction v1 is the first principal component of the data.
The first eigenimage E1 is made up of the projections of that first principal
component along the Cartesian axes of the recording frame.
If a window contains one and only one seismic arrival, the most complex
signal that can arise is an elliptically polarized signal; i.e., the signal is contained in a plane defined by the axes of the polarization ellipse. The vectors v1
and v2 provide least-squares best estimates of the axes of the signal polarization
ellipse. The projection of the data onto v3 is regarded as pure noise, while the
projections of the data onto v1 and v2, consist of signal principal components
added to noise of the same energy (over the analysis window) as the data projection along v3. The energy in the window of the data principal components
2
2
2
2
along v1, v2 and v3 are 1 , 2 , and 3 . Hence the total noise energy is 3 3 ,
2
2
the energy of the signal along v1 is 1 3 , the energy of the signal along v2
2
2
2
2
2
is 2 3 , the total signal energy is 1 2 2 3 and the signal-to-noise
ratio (energy) is
2
1 2 2 3
Signal/Noise ----------------------------2
3 3
283
(13.7)
If one has the a priori knowledge that the arrival is rectilinearly polarized,
2
the estimates of the noise energy along v2 becomes 2 , and the estimate of
noise energy along v1 can be taken as ( 22 23 )/2 . The signal-to-noise ratio
(energy) is then
2
1 ( 2 3 )/2
-.
Signal/Noise ----2-----------2-------------------------------2
2
3 2 ( 2 3 )/2
(13.8)
Since the singular values give estimates of the noise energy and of the
energy of the signal principal components, the singular values can be used to
calculate a probability as to (a) whether there is signal present (not just random noise) and (b) whether the signal is rectilinearly polarized (a body wave),
as opposed to elliptically polarized (a surface wave). F-tests are used to calculate the significance of the differences between energies.
First, we evaluate the null hypothesis that the data energy in the window is
2
2
2
the same as the noise energy. The ratio of the data energy ( 1 2 3 ) to
2
the noise energy for the more general elliptically polarized signal model ( 3 3 )
gives an F-ratio that is evaluated in terms of the number of independent samples N that contributed to the energies. Adapting the F-test of Press et al.
(1986, p. 468), we have
2
1 2 3
F -----------------2----------3 3
(13.9)
N
Q ( F/N ) I ---------------- ( N/2, N /2 ) ,
N NF
(13.10)
and
where Q (F/N) is the significance level of the null hypothesis that the energies
are equal, given the ratio of the energies F and the number of independent
samples N contributing to the energies. I is the incomplete beta function (see
Press et al., 1986, p. 166). The number of independent samples is best
obtained by doing a discrete Fourier transform of the real data window and
counting the number of frequency samples with significant amplitude. Only if
the data are sampled at the Nyquist rate will the number of independent sam284
ples (N) be equal to the total number of samples in the analysis window (n).
Typically N is less than n.
If the null hypothesis has low significance in this first F-test, one can
assume that signal is present. It then becomes meaningful to ask whether the
residual energies (noise energy) of the rectilinear and elliptically polarized signal models are different. If the null hypothesis (that the energies are the same)
is significant, the data are rectilinearly polarized. The residual energy is
2
3 2 ( 2 3 )/2
2
for the rectilinearly polarized signal model and 3 3 for the elliptical signal
model, giving
2
3 2
-.
F --------------2
2 3
Again, the F-ratio is evaluated in terms of the number of independent samples N that contributed to the energies.
Referring to Figure 13.1, the singular values in the presence of added noise
were 3.600, 1.145, and 0.868. This gives the total energy in the data window
as 15.024 (sum of the squares of the singular values) with a noise energy of
3.097 for the linearly polarized model and 2.260 for the elliptically polarized
signal model. The F-test gives a significance of only 0.0025% to the hypothesis that there is no signal present (an F-ratio of 0.15 on 24 independent samples). Given the near certainty of the presence of signal in the window, one
can then test the null hypothesis that the noise variance of the linearly polarized signal model is no greater than that of the elliptically polarized signal
model. The F-ratio of 1.370 on 24 independent samples gives a 46% significance level to the null hypothesis that the rectilinear and elliptically polarized
noise energies are the same. One can therefore say that the signal is rectilinear
with a confidence level of only 46%. Noise has degraded the certainty with
which one can say the signal is rectilinearly polarized.
The signal is unknown so noise energy has to be defined as that which is
uncorrelated between channels. Consequently, the energy identified as noise
on specific examples will show a stochastic variation representing the interaction of the true signal (unknown) with the true added noise (also unknown).
Significance levels must therefore be interpreted in the knowledge that first,
285
Figure 13.4.
Construction of the elliptically polarized arrival from two orthogonal

signals.
we cannot expect to be 100% certain from observation of a finite window and

second, that the values for specific examples will show stochastic variation.
The noise-free, elliptically polarized case was constructed by the superposition of two differentially phase-rotated, rectilinearly polarized arrivals
(Figure 13.4). The linear superposition that creates the recordings of multiple
arrivals is not reversible. It is not possible to say unambiguously which particular interfering events created a given recording. The pair that is produced as
the first two eigenimages by SVD analysis is the pair that are orthogonal (in
space and time). SVD of a noise-free, elliptically polarized arrival is shown in
Figure 13.2. Two nonzero principal components are produced. The orthogonal rectilinearly polarized arrivals used to construct the input are recovered as
the eigenimages only because they were orthogonal and of different energy.
Note that the polarization directions obtained from the columns of V are not
exactly those used to construct the input, even in the noise-free case. This is
produced by a lack of sensitivity of the projected power to direction in the
principal components analysis. In the extreme case of circular polarization,
the projected power is totally independent of the direction (within the plane
perpendicular to v3).
286
Figure 13.3 shows the addition of noise to the elliptically polarized example. The noise is the same as that used in the example of Figure 13.1. The first
principal component of the signal is still recognizable but the second component of the elliptically polarized signal is buried in the noise.
A comparison of the direction v1 in Figures 13.2 and 13.3 show a 180degree flip, which illustrates a fundamental ambiguity in single station triaxial
recordings. It is not possible to distinguish the true arrival from an arrival of
opposite polarity arriving from the opposite direction. This ambiguity can
often be eliminated by other constraints: for example; an explosion must produce a compressional (not rarefractional) first motion, and the direction of a
wave recorded on the surface cannot be downgoing. In the absence of a priori
information, components of opposite polarity along antiparallel directions are
equivalent. The 180-degree flip must be removed before looking at the deviation due to the added noise. Note that the polarity of the first trace of U is
reversed accordingly.
The singular values for the noisy elliptically polarized example
(Figure 13.3) were 3.470, 0.990, and 0.803. Performing the F-test for the
presence of signal, we get a significance of only 0.0015% for the null hypothesis, strongly suggesting the presence of signal. The second F-test to distinguish rectilinear and elliptical polarization is, however, inconclusive. We get a
significance level for the null hypothesis of 58% from an F-ratio of 1.260 and
24 independent samples.
A tighter window around the arrival would have reduced the noise energy
in the window, while leaving the signal energy unchanged. The consequent
increase in signal-to-noise ratio would have reduced 3 with respect to 2 giving rise to a higher F-ratio and a decreased significance for the null hypothesis
(rectilinear polarization). One could then state with a higher level of confidence that the arrival was elliptically polarized. Note, however, that since
noise is defined as that which is uncorrelated between channels and since correlation over too short a window finds spurious correlation between independent time series, too short a window will give an estimate of noise that might
be less than the true'' noise. Noise in the short window is indistinguishable
from signal, so the signal estimate is corrupted.
13.6 Summary
Polarization analysis can therefore be efficiently achieved by treating a
time window of a single station triaxial recording as a matrix and doing a sin287
gular-value decomposition (SVD) of this seismic data matrix. SVD of the triaxial data matrix produces an eigenanalysis of the data covariance (crossenergy) matrix and a rotation of the data onto the directions given by the
eigenanalysis (Karhunen-Love transform), all in one step.
Singular-value decomposition offers a computationally efficient method
of analyzing seismic arrivals at a triaxial station. An eigenanalysis of the crossenergy matrix is produced along with a rotation of the data onto the principal
component directions given by the eigenanalysis:
1) The signal is contained in the plane perpendicular to the column v3 of V
corresponding to the smallest singular value (3).
2) The first and second columns v1, v2 of V provide least-squares best estimates of the axes of the signal polarization ellipse. These directions are
mutually perpendicular.
3) The squares of the singular values (found along the diagonal of matrix W)
give the energies of the three data principal components. The noise and
signal principal component energies can be inferred, allowing an F-test for
the significance of the hypothesis of rectilinear polarization, as well as the
calculation of signal-to-noise ratios.
4) The projection of the data along the principal axes of polarization is
obtained as the columns of matrix U multiplied by the corresponding singular values. This is the Karhunen-Love transform K of the data X.
5) The eigenimages produced by SVD are the projection of the data principal components along the Cartesian axes of the recording frame.
Thus SVD provides a complete principal-components analysis of the data
in the analysis time window. Selection of this time window is crucial to the
success of the analysis and is governed by three considerations: the window
should contain only one arrival, the window should be such that the signal-tonoise ratio is maximized, and the window should be long enough to allow the
discrimination of random noise from signal.
The SVD analysis provides estimates of signal, signal polarization directions, and noise. F-tests based on the singular values can be used to give confidence levels for hypotheses like the absence of signal, or rectilinear (versus
elliptical) polarization of the signal.
288
13.7 References
Dankbaar, J. W. M., 1985, Separation of P and S waves: Geophys. Prosp., 33,
970-986.
Esmersoy, C., 1984, Polarization analysis, orientation and velocity estimation
in three component VSP, in Toksz, M.N., and Stewart, R.R., Eds.,
Vertical seismic profiling Part B: Advanced Concepts: Geophysical
Press.
Flinn, E. A., 1965, Signal analysis using rectilinearity and direction of particle
motion: Proc. IEEE, 53, 1874-1876.
Freire, S. L. M., and Ulrych, T. J., 1988, Application of SVD to vertical seismic profiling: Geophysics, 53, 778-785.
Hemon, C., and Mace, D., 1978, The use of the Karhunen-Love transformation in seismic data processing: Geophys. Prosp., 26, 600-626.
Jones, I. F., and Levy, S., 1987, Signal-to-noise ratio enhancement in multichannel seismic data via the Karhunen-Love transform: Geophys.
Prosp., 35, 12-32.
Jurkevics, A., 1988, Polarization analysis of three-component array data: Bull.,
Seis. Soc. Am., 78, 1725-1743.
Levy, S., Ulrych, T. J., Jones, I. F., and Oldenburg, D. W., 1983, Applications
of complex common signal analysis in exploration seismology: 53rd
Ann. Internat. Mtg. Soc. Expl. Geophys., Expanded Abstracts, 325328.
Montalbetti, J. F., and Kanasewich, E. R., 1970, Enhancement of teleseismic
body phases with a polarization filter: Geophys. J. Roy. Astr. Soc., 21,
119-129.
Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T., 1986,
Numerical recipes: The art of scientific computing (Fortran and Pascal): Cambridge Univ. Press.
Ulrych, T. J., Levy, S., Oldenburg, D. W., and Jones, I. F., 1983, Applications
of the Karhunen-Love Transformation in reflection seismology: 53rd
Ann. Internat. Mtg. Soc. Expl. Geophys., Expanded Abstracts, 323325.
289
Chapter 14
Correlation Using Triaxial Data from Multiple
Stations in the Presence of Coherent Noise
M. J. Rutty and S. A. Greenhalgh
14.1 Introduction
Polarization analysis of single station multicomponent seismic data has
been performed with success by several researchers (Montalbetti and
Kanasewich, 1970; Vidale, 1986; Flinn, 1965; Esmersoy, 1984; Magotra et
al., 1987). The technique has two major objectives: (1) to devise filters to distinguish between events with different modes of vibration (e.g., P- and Swaves versus Rayleigh waves) and (2) to provide a means of estimating the
direction of particle motion for use in seismic direction finding.
The most common applications of triaxial seismic recording to date are in
vertical seismic profiling (VSP) and earthquake seismology. Various filtering
techniques have been applied to individual three-component records, but very
little work has been carried out using the polarization information from more
than a single station at a time. One exception is Jurkevics (1988) who advocates averaging the covariance matrices formed at different stations. This simple procedure reduces the estimation variance by a factor 1/M, where M is the
number of stations used, but it cannot cope with coherent noise between the
stations. Bataille and Chiu (1991) use a similar averaging procedure to reduce
the effects of incoherent arrivals. They also consider the error introduced
when an interfering event is present within the time window of the single station polarization analysis.
The major problem with the conventional single station approach to
polarization analysis is that it places some severe restrictions on the types of
data set that can be processedgenerally, data must have high signal-to-noise
291
ratio events and no coherent noise. This means that all events under examination must be well separated in time. These restrictions have led to the current
rather limited use of the technique.
Since multicomponent information is often available from more than one
position in space, techniques which use this information should be developed.
These would then supplement the single-station processing techniques when
the data fail to meet their criteria. If a coherent event could be singled out
from any coherent noise prior to applying polarization processing procedures,
a significant improvement in the results of polarization analysis would follow.
The technique described below is an attempt in this direction. Correlating
events are picked from a multicomponent seismic section in the presence of
both random and coherent noise. This represents the first stage of a possible
processing procedure to deal with multicomponent data acquired over a sparse
spatial array currently under development.
The theory behind monochromatic electromagnetic polarization (with its
vibration restricted to a plane) is well documented (e.g., Born and Wolf,
1965), but seismic polarization analysis for polychromatic transients, in which
the motion is three-dimensional, is poorly developed. After a brief discussion
of single station polarization theory and its limitations, an analysis of a
two-station approach is presented. An interpretation of the physical significance of the multidimensional vector space is proposed and a correlation procedure developed. This is then applied to synthetic and physical scale model
data with varying levels of both coherent and random noise.
14.2 Single-Station Polarization Analysis

The covariance matrix for a particular window of a data set is formed by
taking the sum of the outer products of the data vector u = xi + yj + zk with
itself within the window:
W
t 0 --2
C [ t0 ]
u ( i )u ( i ) .
(14.1)
W
i t 0 --2
Here W is the window length, T represents the transpose operator and t0 is the
center of the window. Any rectilinear motion within the time window will
constructively add in the covariance matrix, improving the ratio of signal to
292
random noise. If the ratio of signal and random noise energy is sufficient for
the window chosen, the direction of the signal will dominate within the covariance matrix. A coordinate rotation made to the principal axes of the covariance matrix using similarity transforms (rotations) will reveal this direction
and the proportion of energy along it. This transformation is achieved by an
eigendecomposition of the covariance matrix, with its eigenvectors being the
principal axes and its eigenvalues proportional to the energy in each of these
directions. Expressed in matrix notation:
T
R CR
(14.2)
where C is the original covariance matrix, is the diagonalized matrix, and R

is the matrix required to rotate the coordinate axes. The eigenvalues of C are
given by the elements i ii and the eigenvectors are the columns of R.
Any window of the original three-component seismogram can therefore
be rotated into its principal axes of motion given by the eigenvectors above. It
can also be decomposed into its orthogonal eigenimages (Freire and Ulrych,
1988; Jackson et al., 1991) with the number of significant eigenvalues representing the dimension (or rank) of the particle motion. To obtain information
about the whole trace, the window can be placed at different positions along
the data.
14.2.1
The Analysis Domain
The expected particle polarization for perfect single mode elastic arrivals
will be either one-dimensional (e.g., rectilinear P- or S-waves) or two-dimensional (e.g., plane, polarized Rayleigh waves). A rectilinear arrival is easily dealt
with in the time domain using the covariance analysis since each sample in the
time window constructively sums along its arrival direction. This produces a
unique dominant direction in the eigenanalysis of its covariance matrix. However, if the arrival being examined is a nonrectilinearly polarized event, such as
a Rayleigh wave, the signal energy is observed in a plane. The covariance
matrix analysis cannot then indicate this as a coherent event. This type of
event must be examined in a different data domain. Vidale (1986) uses the
spectral coherency matrix that is a complex analogue to the covariance matrix.
Analytic signals are generated by taking the Hilbert transform of the data as its
imaginary part. These are then used to calculate the coherency matrix C as
293
T
t 0 --2
C [ t0 ]
T
i t 0 --2
u ( i )u ( i )
(14.3)
where T is the time window, u is the analytic signal and H represents the complex conjugate transpose operator. A complex eigendecomposition performed
on this matrix reveals any clean coherent signals (including nonrectilinearly
polarized ones) in a unique complex direction.
Another equally valid approach to deal with nonrectilinearly polarized
events is to work in the frequency domain. This uses the (complex) cross-spectral matrix averaged over a frequency window. Plane-polarized events which
are well separated (in frequency) from one another again sum constructively in
a single complex principal direction, which may be obtained by eigendecomposition.
When using a covariance matrix style of processing, it is important to be
sure of the type of signal that is under examination. If it is a plane polarized
event, then a complex representation must be used (i.e., the analytic signal in
the time domain or the Fourier transform in the frequency domain). However, if the signal is rectilinearly polarized, computing time may be saved by a
factor of about four by using the real signal.
14.2.2
Interfering Events and Coherent Noise
If two overlapping events interfere within the analysis window used, a

composite planar particle motion is produced. In general, none of the principal directions obtained from an eigenanalysis will then point along the direction of the individual event directions. All that is then known is that the
original direction vectors of the two signals both lie in the plane spanned by
the first two principal directions of the covariance matrix. Coherent noise is
therefore a major problem in a standard, single-station covariance analysis.
In this discussion, the primary concern is with the correlation of rectilinear body-wave events, when they are both well separated in time and interfering. A time-domain analysis using the real signal is therefore appropriate.
However, when describing signals mathematically, the more general (complex)
coherency matrix approach will be used. The extraction of the covariance analysis from the coherency analysis is straightforward.
294
The recorded data at a single station are regarded as a three-component

analytic vector. Consider the case when a transient rectilinear arrival is being
examined. The data recorded can then be represented in analytic form by:
u s 1 ( t )d n ( t ) ,
(14.4)
where d {dx dy dz}T is the vector of direction cosines of the rectilinear

event, s1(t) defines its analytic wavelet, and n is the analytic noise vector. Taking a window of length T centered on t0, the coherency matrix formed from
these data is:
1
C [ t 0 ] --T
T
t 0 --2
T
t 0 --2
H
1
H
s 1 ( t )s 1 ( t )d d dt --T
T
t 0 --2
1
--T
T
t 0 --2
T
t 0 --2
T
t 0 --2
H
1
H
s 1 ( t )n ( t )d dt --T
T
t 0 --2
H
s 1 ( t )d n ( t ) dt
n ( t )n ( t ) dt
(14.5)
T
t 0 --2
where H represents the conjugate transpose operator.

If the signal-to-noise ratio is large, the first term on the right-hand side of
equation (14.5) dominates. The eigendecomposition of the coherency matrix
then reveals the signal direction in the eigenvector corresponding to the dominant eigenvalue. This eigenvalue includes phase information related to the
frequency differences between the different components and the time over
which the signal is averaged. The amplitudes of the coherency matrix elements
are therefore dependent on the positioning and length of the time window, as
one would expect for a transient event. Polarization analysis is therefore most
successful with high signal-to-noise data and transient signals.
As the signal-to-noise ratio decreases towards unity, the other terms in
equation (14.5) become significant. However, if the noise is purely random,
then the second and third terms tend to zero. This indicates one of the major
strengths of single station polarization analysis: Purely random noise cancels
itself over the time window used for the covariance matrix, leaving the coher-
295
ent signal in the dominant eigenimage. Most single station polarization processing therefore assumes that any noise within the time window of analysis is
random. If this criterion is satisfied, the technique is highly successful. However, it is important to realize that this places major restrictions on the types of
data set that may be legitimately processed.
If the noise is spatially coherent or polarized, or the signal-to-noise ratio
falls below unity, the noise terms in equation (14.5) dominate the covariance
matrix and a successful polarization analysis is not possible. Single station
polarization data must therefore have a good signal-to-noise ratio and must
not contain more than one event in each time window used. In practice, the
window for the analysis optimally should be positioned to contain all significant samples of the signal (maximizing signal-to-noise energy) without interference from any previous or subsequent events. It should also be as long as
possible within these constraints so that signal may be enhanced above the
random noise (Jackson et al. 1991).
14.2.3
The Significance of an Eigenvalue
When considering realistic data, it is very important to appreciate the significance of the eigenvalues of the covariance matrix. If there were no noise,
then any nonzero eigenvalue would be significant. When noise is present, the
significance of certain eigenvalues depends on the types of signal and noise
present. Let us denote the eigenvalues by i and arrange them in descending
order. If there is a rectilinear event in purely random noise, (which implies
only one event in our time window), the noise energy within the window may
be estimated as 33 (Kanasewich, 1981). In this case 1 is significant if
1 2 3. However, if several phase-shifted overlapping events with different arrival directions were present within our time window, then, even with
no random noise, 3 may be comparable with 1 and so provides no indication to the significance of 1.
Significance of eigenvalues may usually be tested by comparison with the
smallest eigenvalue of the covariance matrix. However, this is only true if the
signal and any coherent noise within the window are known to span a vector
space of smaller dimension than our covariance matrix. The smallest eigenvalue is then purely due to random noise. Given a specific window over the
data, the dimension of the particle motion within it can be estimated. A filter
function can then be calculated to enhance rectilinear signals using the relative
size of the eigenvalues. This function is then applied to the midpoint of the
296
sliding window (Means, 1972; Benhama et al., 1988; Jurkevics, 1988;

Kanasewich, 1981) and is the basis of polarization filtering.
If the single station does not satisfy the criteria above (i.e., its smallest
eigenvalue is not purely due to random noise) and multiple station data are
available, it is worthwhile expanding the dimension of the covariance matrix
to use more stations. This provides an additional increase in signal-to-noise
ratio if the signal is coherent across the spatial array (as does conventional
stacking of single component data). Chapter 15 elaborates on this concept.
14.2.4
Seismic Direction Finding
If single station data processing identifies a rectilinear event, then the

eigenvector corresponding to the dominant eigenvalue gives the principal
direction of particle motion. Triaxial data acquisition therefore lends itself naturally to a three-dimensional geometry (as opposed to conventional 2-D seismic surveys which are subject to sideswipe, i.e., reflections from structures to
the sides of the survey). Triaxial processing may be used to estimate the azimuth and elevation of the wave arrival if the type of rectilinear event is
known.
A certain amount of care is needed if the receiver station is located near
the Earth's surface. Here, the incident wavefield is reflected and mode converted giving rise to three overlapping waves. This makes the apparent angle of
emergence differ from the true arrival direction. The reflected wave also may
sometimes be phase-shifted, destroying the rectilinearity of the recorded
motion. This so-called free-surface effect is well documented (Aki and Richards, 1980), and ways to account for it are discussed by Kennett, (1991).
14.3 Polarization Analysis Using Two Triaxial Stations

A six-dimensional covariance matrix may be formed by two stations of
three-component data and then decomposed as in the single-station case.
Again this gives the principal directions of motion and the proportions of
energy in these directions. However, the directions now obtained are
six-dimensional eigenvectors and this six-dimensional vector space is yet to be
interpreted. The eigenvalues are again proportional to the amount of signal
energy in each of the mutually orthogonal (in 6-D) directions. Thus, any linear dependency between the six components of signal will be indicated by an
insignificant eigenvalue. The number of significant eigenvalues therefore rep-
297
Figure 14.1.
A VSP source receiver geometry. Triaxial receivers down a borehole receive signals generated by a source near the surface. Coherent noise may
be caused by local scatterers and reflected events.
resents the dimensionality of the six-component signal (or the rank of its signal space).
14.3.1
The Binocular 6 6 Covariance Matrix
Consider two, three-component stations with a correlating signal due to

some source s (as in a VSP experiment, Figure 14.1). For the moment, we
assume noise-free rectilinear signals with receivers well below the free surface.
The analysis presented below considers a correlating event on the two stations
in the presence of some spatially coherent noise produced by a local scatterer
near station 2.
Let the signals received at each station be represented by:
298
u 1 s 1 ( t )d 1
(14.6)
and
u 2 s 1 ( t 0 )d 2 s 3 ( t )d 3 ;
(14.7)
s1 represents the correlating wavelet delayed by 0 on station 1 and s3 the wavelet produced by the local scatterer. The direction cosines of the arrivals at each
station are given by di {dxi dyi dzi}T.
Setting uT = {u1T u2T}, the binocular 6 x 6 coherency matrix formed from
u is:
H
C [ ] A d1 d1
H
H
B d 2 d 1
H
B d 1 d 2
H
E d 2 d 2
H
D d 1 d 3
0
D
H
d 3 d 1
H
F [ d 2 d 3
H
d 3 d 2 ]
H
G d 3 d 3
where
T
--2
1
A ( t 0 ) --T
s 1 ( t )s 1 ( t )dt ,
T
--2
T
t 0 --2
1
B ( ) --T
s 1 ( t )s 1 ( t )dt
T
--2
299
(14.8)
T
--2
1
D ( ) --T
s 1 ( t )s 3 ( t )dt ,,
T
--2
T
--2
1
E ( ) --T
S 1 ( t t 0 )S 1 ( t )dt
T
--2
T
--2
1
F ( ) --T
s 3 ( t )s 1 ( t )dt ,
T
--2
and
T
--2
1
G ( ) --T
s 3 ( t )s 3 ( t )dt .
T
--2
T is the length of the time window on the data and H represents the conjugate
transpose operator.
The coherency matrix has been split into the sum of two separate matrices
in equation (14.8). The first of these represents the coherence between the
correlating arrivals of s1. Its characteristic equation is:
6
( A E ) ( AE BB ) 0
(14.9)
implying at least four of its six eigenvalues must be zero. This is due to its representing only one rectilinear event on each station. Each event contributes to
the signal space rank, which therefore has a maximum value of 2 in this example.
300
As the time difference t0 tends to zero, i.e., when the event on both stations is perfectly correlated, then A, B, and E all tend to the same constant.
This implies that AE BBH 0, which in turn implies that only one eigenvalue will be nonzero (1 A E). Thus, if the signal correlating on both
stations is not correctly aligned, its phase shift introduces some linear independence. When it is time-aligned, the data on the two stations are linear
combinations of each other. Since the eigenanalysis separates the traces into
independent orthogonal signals, the dimension of the eigenanalysis must then
decrease. This result holds irrespective of the directions of arrival (in real
space) at the two stations.
If there were no scattered event s3, the right-hand matrix in
equation (14.8) is zero and so the rank of the coherency matrix will fall from
two to one when the event on each station correlates perfectly. When the overlapping event s3 is present, we might expect a drop in rank from three to two
when the correlating event s1 is correctly aligned. This procedure may therefore be able to identify correlating events in coherent noise.
A simple synthetic experiment was performed to test this possibility. Signal s1 on station 1 is correlated with a signal at station 2 created as the superposition of signal s1 and another signal s3 (Figure 14.2). Signal s1 has arrival
direction ratios of 2:1:1 at station 1 and 1:1:2 in the merged signal at station
2. The superposed three-component signal on station 2 thus contains signal s1
but in differing proportions between separate traces on the station.
The time window used in the test was the complete trace. A cyclic time
shift was applied to the traces of the second station and the covariance matrix
and its eigenvalues calculated at each shift. The eigendimension of the motion
is indicated in the lower part of Figure 14.2 by the plot of the ratio of the
3rd/1st eigenvalues against the time shift in the covariance matrix. The rank
of the covariance matrix clearly drops at = t0, the propagation delay of the
correlating signal s1. When the merged event s1 on station 2 is not aligned
with event s1 on station 1 then there are three nonzero eigenvalues (one from
station 1 and two from station 2). However, when event s1 is aligned, there is a
linear dependence inherent in the covariance matrix and so there are only two
nonzero eigenvalues (one from signal s1 on both stations and one from signal
s2 on station 2). This dependence occurs despite the somewhat arbitrary
choice of arrival directions (particle motion directions) of the correlating signal.
301
Figure 14.2.
Synthetic traces for two triaxial geophones. A single rectilinear event arrives on stations 1 and 2 with direction ratios 1:2:3 and 2:1:1, respectively. Another rectilinear signal is superposed on station 2 constituting
coherent noise. The ratio of 3rd/1st eigenvalues of the binocular covariance
matrix is plotted for different time shifts of one station to the other. Perfect correlation is observed when the 3rd eigenvalue becomes zero at the
correct time shift, =t0.
302
14.3.2
The Multistation Vector Space
Since the binocular station approach used above uses a six-dimensional

vector space, it is logical to enquire as to the physical significance of this space.
Previously, when using the single station, the three dimensions represented the
real space of particle motion recorded at the receiver. In the two-station case,
the six-dimensional vector space appears to represent two simultaneous 3-D
spaces for each station. If there were N stations, each measuring three-component data, then our vector space dimension would be 3N. This would then
correspond to N simultaneous 3-D spaces (one for each of the N stations). In
an ideal, noise-free case with no dispersion, we should therefore hope for a
minimum dimension of 1 (out of a possible 3N) to represent a single correlating rectilinear wave. Obviously, with noise on the recording and slight phase
changes in the signal as it propagates, we are unlikely to observe this. In fact,
for superposed elastic arrivals and additive noise, the particle motion will no
longer be planar (linear or elliptic), but will span a volume which varies with
time. We could therefore form some function of the eigenvalues of the
3N-dimensional matrix for several different windows to be minimized, and
hence pick a correlating signal over the array. Practical difficulties would
include estimating how many arrivals are within the time window selected.
More importantly, the computations required for large numbers of
eigen-decompositions of 3N traces over multiple time shifts would be
immense. We therefore restrict ourselves to the binocular problem at present.
14.4 Implementation of Multicomponent Binocular

Correlation
The choice of the best correlation time for the simple example shown in
Figure 14.2 was not difficult. There was no random noise on the section, a
single event on station 1 to compare with station 2, and the time window
always contained the same data. It spanned the whole trace with the second
station adjusted using a cyclic time shift. The single station rank was therefore
constant and a reduction of rank in the 6 6 covariance matrix genuinely
implied some linear dependence between the stations due to a correlating
event. When real data are considered, a more sophisticated test is required, as
we will describe below.
A certain rectilinear event is selected as a reference wavelet and isolated
from the rest of the trace by means of a time window. The reference wavelet
should have as little random and coherent noise as possible within the window
303
and should be of sufficient signal-to-noise ratio such that a single-station covariance analysis is capable of detecting it. Once the reference wavelet has been
selected and windowed, a similar window is placed over the beginning of the
traces of the station we wish to examine (the correlation station). The eigenvalues of both the correlation station and the combined station covariance
matrices are calculated and stored before the window on the correlation station is time shifted and the calculation repeated.
As the window is shifted across the data frame at the correlation station,
several events may be encountered. These events may or may not be rectilinear
with varying amounts of random and spatially coherent noise. The eigenstructure of both the binocular and correlation station covariance matrices will be
duly modified. If a correlating event is correctly time aligned, the rank of the
two-station covariance matrix should decrease compared to the sum of the
ranks of the two single stations. The signal space ranks for the combined and
single station data are related by the inequality
Rank ( C ( ) ) Rank ( C 1 ( ) ) Rank ( C 2 ( ) ) ,
(14.10)
where C is the combined binocular covariance matrix, C1 and C2 are the

matrices corresponding to reference and correlation stations, respectively, and
is the time shift of the correlation window. Strict equality occurs only when
the signal on stations 1 and 2 are independent.
Because the eigenvalues of the covariance matrices are proportional to the
energy of the signal within the time window, the sum of the eigenvalues of the
combined matrix must equal the sum of the eigenvalues within both single
station matrices. Hence
6
comb
i1
ref corr ,
i1
i1
(14.11)
where combi , refi , and corri represent the eigenvalues of the combined,
reference and correlation station covariance matrices, respectively. The distribution of these eigenvalues changes as different events appear in the correlation window in accordance with the rank inequality [equation (14.10)].
Correlation functions R1 and R2 have therefore been defined below. The functions maximize when a rectilinear reference event is compared with a similar
304
rectilinear event in both purely random noise (R1) and in random and coherent noise (R2):
comb1 ( ) sin g2 ( )
---------------------- -------------------R1 ( )
comb 2 ( ) sin g 1 ( )
(14.12)
comb1 ( ) comb2 ( )
sin g3 ( )
- ------------------------------------------------R 2 ( ) ----------------------------------------------------- comb3 ( )
sin g1 ( ) sin g2 ( ) .
(14.13)
Here combi is the ith eigenvalue of the combined covariance matrix and
sin gi is the ith ordered eigenvalue in the set of all eigenvalues from both single
station covariance matrices. These eigenvalues vary with according to
equation (14.8).
The function R1 has a maximum when the second eigenvalue of the binocular covariance matrix is a minimum and the principal eigenvalues of the
individual covariance matrices are maximum. This represents a collapse of
rank of the combined covariance matrix from dimension 2 to 1, implying a
correlating rectilinear signal without significant coherent noise. Similarly, R2
seeks a collapse of the 6 6 covariance matrix from rank 3 to rank 2. This
maximizes when a correlating rectilinear signal and another rectilinear event
are superposed on station 2. Clearly, other functions Rj (j 2) could be
formed to pick for correlating plane polarized events.
Each function is formed using a ratio of the eigenvalues of the combined
covariance matrix and a weight comprising the inverse ratio of the corresponding single station eigenvalues. If no event is present in a range of the correlation station's traces, R1 and R2 will be flat functions over that range with
amplitudes around unity. Spikes are formed when there is correlation at the
corresponding time delay. Their amplitudes are not dependent on the amplitude of either reference or correlation events but on the linear dependence
between their wavelets.
The statistical distribution describing the accuracy of the correlation procedure is unknown. Standard statistical theory uses normalized nondimensional standard distributions to test for the significance of specific events. R1
and R2 are nondimensional functions whose amplitudes indicate some measure of the confidence of the corresponding type of event. They may therefore
305
be used as a simple statistical measure of the likelihood of the type of event

tested. The larger the value (above 1), the greater the confidence that an event
has been picked. The time at which the spike occurs yields the time estimate.
Its variance is related to the confidence index R.
The functions defined in equations (14.12) and (14.13) may, in a loose
sense, be compared to a statistical F-test. This compares the ratios of the variances of two distributions to see if they differ significantly. An F-value of 2.0
for 20 or more degrees of freedom is significant at the 95% confidence interval. R1 and R2 are not strictly ratios of variances of two distributions. However,
the relative sizes of the eigenvalues used in the functions do represent the
spread of energy from the desired rectilinear signals. As a rule of thumb, we
suggest a picked value above 2.0 in R1 or R2 is significant using a window size
of more than 20 samples. This is a subjective choice, and may well be inappropriate in certain situations, but, for the purposes of a practical automated
application, some significance value must be chosen.
Once the correlation functions R1 and R2 for the first window have been
calculated, the window is progressively moved down the data of the correlation station and the calculation repeated. The correlation functions may then
be examined to pick correlating events. A flow chart of the correlation procedure is given in Figure 14.3.
14.5 Synthetic Data Results

Numerical experiments have been performed to examine the performance
of the above procedure in picking separated and overlapping events in varying
levels of random noise. Figures 14.414.6 show a single rectilinear reference
event picked in the time-window marked on the reference station compared
with both a separated correlating event and two overlapping correlating events
on station 2. The example serves purely to illustrate the possibilities and limitations of the technique and does not relate to any particular geological scenario. R1 and R2 are plotted below the traces. Figure 14.7 shows several filter
functions calculated using the single-station analysis operating on the cleanest
of the data sets.
The wavelet used for the examples is a zero-phase 0-60 Hz band-pass filtered spike. It has direction ratios 1:2:3 on the reference station and is centered at 254 ms. The same wavelet appears three times on the correlation
station with equal amplitudes. A single event is centered at 126 ms with direction ratios 2:1:2 and two overlapping events centered at 362 ms and 382 ms
306
Figure 14.3.
Flow chart of binocular multicomponent correlation procedure.
307
Figure 14.4.
Synthetic seismogram with a rectilinear zero-phase band-pass filtered

(0-60 Hz) spike reference wavelet on station 1 correlated against both
separated and interfering versions of itself on station 2 in the presence of
10 percent noise. The reference window length is 72 ms and is shown by
the rectangle on the reference station traces.
308
Figure 14.5.
Synthetic seismogram with a rectilinear zero-phase band-pass filtered (060 Hz) spike reference wavelet on station 1 correlated against both separated and interfering versions of itself on station 2 in the presence of 50
percent noise. The reference window length is 72 ms and is shown by the
rectangle on the reference station traces.
309
Figure 14.6.
Synthetic seismogram with a rectilinear zero-phase band-pass filtered (060 Hz) spike reference wavelet on station 1 correlated against both separated and interfering versions of itself on station 2 in the presence of 100
percent noise. The reference window length is 72 ms and is shown by the
rectangle on the reference station traces.
310
Figure 14.7.
Four rectilinearity measures commonly used in single-station covariance

analysis operating on the same input data as the second station in
Figure 14.4 (10% noise). The separated rectilinear event is correctly
picked, although the resolution is poor. The two interfering events produce a composite planar motion and thus cannot be picked correctly. The
window length used is 72 ms.
311
have been superposed with direction ratios 3:2:1 and 2:1:2, respectively.
Uniformly distributed random noise was applied to the data such that its
amplitude limits were the stated percentage of the maximum peak to trough
value of the reference signal.
The general appearance of the functions R1 and R2 is that of several spikes
within a flat unit background. The amplitude of the spikes decreases with the
increase of noise (Figure 14.6) indicating the lowering confidence level of
picking the respective events. Figure 14.4 shows the reference station and correlation stations with 10% noise. The window length of the analysis is 72 ms
and the reference window is marked by the rectangular box. All three correlating events are correctly picked by spikes on R1 and R2, which clearly dominate
the picking functions. Note how the event free of coherent noise is picked by
R1 while the separate picks for both of the interfering events are successfully
identified on R2.
As the noise level applied to the synthetic data is increased, the confidence
of our picking functions is eroded. Figure 14.5 shows the same data set with
50% noise on both reference and correlation stations. Again, the expected
events are indicated by spikes on the relative picking functions. However,
there are some unexpected maxima present, especially on R1. Sidelobes around
correct picks originate due to the wavelet shape and our choice of window
length. As with one-component correlation, a wavelet with an oscillatory
nature will exhibit ringing in its autocorrelation. However, when this effect
is observed on R1, the sidelobes must all lie within a time interval equal to the
sum of the window length chosen for the correlation, and the time the transient event is significant. Thus, if we choose the window length to be the
observable period of the signal, we expect any sidelobes to be within two window lengths on R1. More significantly, if separate events are present but separated by a time less than the chosen window length, then they should show as
interfering events on R2 and not on R1. The reason for this is that the covariance matrices cannot contain information from only one of the signals. Any
secondary maxima occurring within a window length of an event picked on R1
must therefore be due to ringing.
There are also maxima on R1 when an overlapping event is detected as
shown around 380 ms (Figure 14.5). These are due to the way the signal vector space is affected by the overlapping events. However, if R2 picks any significant overlapping events, then there should not be any valid picks on R1
312
present within the time window used of each such pick. Thus, any maxima
showing on R1 within such a time window are not significant.
Figure 14.6 shows the same data with 100% noise superposed. The corresponding events can still be picked correctly, though our confidence levels are
much lower with values of R1 and R2 lying between 2.0 and 3.0. Spurious
maxima can be detected as outlined above. This result is very encouraging and
provides some measure of the significance of picking correct times in high
noise.
For comparison, a single-station analysis has been performed on the correlation station data of 10% noise using the same window length. Four rectilinearity (or filter gain) functions commonly used in single-station polarization
analysis have been plotted in Figure 14.7. All are formed from the normalized
eigenvalues of covariance matrix. The function F1 is the rectilinearity filter
function defined by Esmersoy [1984, Equation (13)]:
3 1
1
-1 .
F 1 --- --------------------------------2 1 2 3
(14.14)
The function F2 is the Butterworth gain function given by:
( 2 3 ) 2N
F 2 1 ---------------------------
1
1
--2-
(14.15)
We have used parameter values of = 0.1 and N = 4. Kanasewich (1981)

introduced the function F3 in the form
N
F 3 1 -----2
1
(14.16)
and suggested values of N = 0.3 to 1.0. (We have used N = 1.0). The polarization measure F4 is referred to by Esmersoy:
1
F 4 ------------------2 3 .
313
(14.17)
All these functions show local maxima for the single event at 126 ms.
However, the resolution of the picks is poor. The maxima have a width nearly
twice the window length used in all cases except for F4. The interfering events
at 362 ms and 382 ms have not been detected and would be suppressed using
such rectilinearity measures. Of even greater concern, two erroneous maxima
occur at times corresponding to where the processing window contains the
beginning of the first and the end of the last interfering wavelets. This is due
to each signal having no appreciable interference from the other wavelet at
these positions. The failure of single-station processing on such a high quality
data set indicates the problems of coherent noise. When processing such a
data set, a multiple station approach has a clear advantage.
14.6 A Physical Model Example

A series of tests has been run on physical scale model seismic data generated in the laboratory. The model used was a uniform sheet of aluminum as
shown in Figure 14.8. A single biaxial receiver was placed on the surface and a
moving source profile (MSP) was generated as the source descended the borehole (Pant, 1989). The source position started at the top of the borehole (trace
1) and was moved in model scale intervals of 3 cm to the bottom (trace 41).
This was similar to performing an offset VSP experiment with a single surface
source and an array of biaxial receivers (although the free-surface will affect
results because the receivers are positioned on the surface). The correlation
process could therefore be applied to physical experimental data with the
advantage of knowing the model geology.
Various scaling factors can be used in relating the laboratory measurements to equivalent field situations. The dimensions of the aluminum plate
were scaled by a factor of 1000 to represent a medium of uniform velocity
with thickness 1.2 km. The distance between consecutive source positions in
the borehole, then translated from 3 cm (laboratory scale) to 30 m (field
scale), while the time sampling interval of 1 s (laboratory scale) became 1 ms
(field scale). This scaling procedure facilitates direct comparison with exploration data, and the times displayed in Figures 14.914.12 represent those of
the field scale.
The expected dominant arrivals in this simple experiment are direct Pand S-waves, reflected P (PP), reflected S (SS), and mode converted reflections
(PS and SP) (Pant and Greenhalgh, 1990). These events overlap on certain
stations and so will act as coherent noise when examining one particular event.
314
315
Figure 14.8.
Moving surface profile geometry for recording reflected signals from the
bottom of aluminum plate in the laboratory. Wave paths and types are
shown schematically. A biaxial receiver records the wavefield produced
by a source which is repeated at several locations down the borehole. The
source is moved in intervals of 3 cm from the top (trace 1) to the bottom
(trace 41) of the plate.
Figure 14.9 identifies all the major events using the correlation procedure. The
two-component model data have been rotated into horizontal and vertical
directions and zero traces used in a dummy transverse direction). The reference event used was formed by placing a 16-ms window around the direct P
arrival on station 22 (shown by the small rectangle). The correlation functions
R1 and R2 are plotted for station 31, highlighted by the larger rectangular box
in the figure. All expected events are labelled. These correspond to direct P (P,
210 ms), reflected P (PP, 310 ms), direct S (S, 350 ms), P to S conversion (PS,
360 ms), S to P conversion (SP; 480 ms) and reflected S (SS, 540 ms). Note
how the overlapping events (S and PS) appear on R2 while those free of interference occur on R1.
The correlation of a small subset of the traces is examined in detail in
Figures 14.1014.12. These illustrations are concerned with the interference
of the reflected P event (PP) and the direct S event (S) at stations 27 to 29 and
indicate the practicality of the technique. All figures have the same reference
event as Figure 14.9 (window length 16 ms) and use a significance cut-off
confidence value of Ri = 2.0. Figure 14.10 successfully picks the overlapping
PP (324 ms) and direct S (320 ms) events on station 27. The pick on R1 corresponds to that of the direct S but it is overridden by the same pick on R2 since
it is within a window length of a valid pick of an interfering arrival. However,
it serves to reinforce the interpretation of the presence of a coherent event. It is
particularly encouraging to see that the function R2 can distinguish the two
events when they are so closely overlapping and when one has much more
energy content than the other.
Figures 14.11 and 14.12 show the results of the procedure applied to stations 28 and 29 to examine the same two events. In Figure 14.11, the interfering PP and S events are clearly picked on station 28 at 320 ms and 328 ms,
respectively, using function R2. The spike on R1 is at the correct time for S but
again is overridden by the pick on R2. In Figure 14.12, the events have now
separated by more than a window length (16 ms) and are therefore clearly
picked by the function R1.
14.7 Conclusions
Three-component seismic data acquisition and processing is still a long
way from becoming a standard processing technique. This is because singlestation polarization processing requires data with high signal-to-noise ratio
events and no coherent noise. There are also practical problems associated
316
Figure 14.9.
Raw data from the experiment shown in Figure 14.8. The data have
been rotated into horizontal and vertical components and the times
scaled by a factor of 1000. This corresponds to increasing the dimensions of the experiment by 1000. Correlation functions are shown for a
reference event on station 22 and correlation data on station 31 indicated by the rectangles on the seismograms. The window length used is 16
ms and the major separated and overlapping events are labelled on the
functions R1 and R2.
317
Figure 14.10.
Magnified view of a section of the physical model data shown in

Figure 14.9 and its picking functions. The reference event is the direct
P arrival on trace 22 with window length 16 ms. Station 27 is used for
the correlation station with functions R1 and R2 calculated over the time
indicated by the rectangular box. Direct S arrival is picked at 320 ms and
reflected PP arrival at 325 ms.
318
Figure 14.11.

Figure 14.9 and picking functions. The reference event is the direct P
arrival on trace 22 with window length 16 ms. Station 28 is used for the
correlation station with functions R1 and R2 calculated over the time indicated by the rectangular box. Direct S arrival is picked at 328 ms and
319
Figure 14.12.

Figure 14.9 and its picking functions. The reference event is the direct
P arrival on trace 22 with window length 16 ms. Station 29 is used for
the correlation station with functions R1 and R2 calculated over the time
indicated by the rectangular box. Direct S arrival is picked at 336 ms and
320
with acquiring the data as the stations must be carefully calibrated to be able
to gain any directional information. On the other hand, the advantages of triaxial data are numerous in certain environments. Whenever there is restricted
access, conventional seismic techniques are not possible. A triaxial geophone
station records the complete wavefield with no assumption of a two-dimensional earth model. This then allows the polarization techniques to be applied
in areas with complicated geology.
If realistic data sets are to be examined using polarization analysis, a multiple-station processing approach is more likely to succeed than that of the single station. This will enable the problem of coherent noise to be tackled and
also provide an increase in signal-to-noise ratios. A multicomponent binocular
correlation technique has therefore been developed with the potential to distinguish overlapping arrivals. This technique will supplement conventional
single station event detection algorithms which cannot perform in the presence of coherent noise. A confidence estimate of the event pick is also directly
related to the size of the correlation picking function. Performing the two-station covariance eigenanalysis as opposed to that of the single station increases
the computational time by a factor of around four.
The algorithm has been tested successfully using synthetic data with signal-to-noise ratios varying down to at least unity and with physical scale
model data obtained in the laboratory. The authors are currently in the process of acquiring three-component field data in a mine and hope to apply the
technique to this data set in the near-future.
This technique would be of special interest in areas where conventional
seismic surveys cannot be carried out due to access difficulties. The procedure
also has direct applications in the areas of statics and wavetype identification
as well as earthquake seismology where multicomponent data are commonly
acquired using sparse sensor arrays.
14.8 References
Aki, K., and Richards, P. G., 1980, Quantitative seismology: W. H. Freeman
Co., 2 vols
Bataille, K., and Chiu, J. M., 1991, Polarization analysis of high-frequency,
three-component seismic data: Bull., Seis. Soc. Am., 81, 622-642.
Benhama, A., Cliet, C.,and Dubesset, M., 1988, Study and applications of
spatial directional filtering in three-component recordings: Geophys.
Prosp., 36, 591-613.
321
Born, M. and Wolf, E., 1965, Principles of optics, 3rd edition: Pergamon
Press, Inc.
Esmersoy, C., 1984, Polarization analysis, rotation and velocity estimation in
three-component VSP, Toksoz, in M. N., and Stewart, R. R., Eds., Vertical seismic profiling Part B: Advanced concepts: Geophysical
Press, 236-255.
Flinn, E. A., 1965, Signal analysis using rectilinearity and direction of particle
motion: Proc. IEEE, 53, 1874-1876.
Freire, S. L. M., and Ulrych, T. J., 1988, Application of singular value decomposition to vertical seismic profiling: Geophysics, 53, 778-785.
Jackson, G. J., Mason, I. M., and Greenhalgh, S. A., 1991, Principal component transforms of triaxial recordings by singular value decomposition:
Geophysics, 56, 528-533.
Jurkevics, A., 1988, Polarization analysis of three-component array data, Bull.,
Seis. Soc. Am., 78, 1725-1743.
Kanasewich, E. R., 1981, Time sequence analysis in geophysics: Univ. of
Alberta Press.
Kennett, B. L. N., 1991, The removal of free surface interactions from
three-component seismograms: Geophys. J. Internat., 104, 153-163.
Magotra, N., Ahmed, N., and Chael, E., 1987, Seismic event detection and
source location using single-station (three-component) data: Bull.,
Seis. Soc. Am., 77, 958-971.
Means, J. D., 1972, The use of the three dimensional covariance matrix in
analyzing the polarization properties of plane waves. J. Geophys. Res.,
77, 5551-5559.
Montalbetti, J. F., and Kanasewich, E. R., 1970, Enhancement of teleseismic
body phases with a polarization filter: Geophys. J. Roy. Astr. Soc., 21,
119-129.
Pant, D. R., 1989. Physical seismic modelling: Studies in ground roll suppression, reflector resolution and multi-component wavefield separation.
Ph.D. Thesis, The Flinders University of South Australia.
Pant, D. R., and Greenhalgh, S. A., 1990, A multi-component offset VSP
scale model investigation: Geoexploration, 26, 191-212.
Vidale, J. E., 1986, Complex polarization analysis of particle motion: Bull.,
Seis. Soc. Am., 76, 1393-1405.
322
Chapter 15
Parameterization of Narrowband Rayleigh
and Love Waves Arriving at a Triaxial Array
R. Lynn Kirlin, John Nabelek, and Guibiao Lin
15.1 Introduction
It is desirable to separate and parameterize Rayleigh and Love wavefronts
arriving at a three-component seismometer. We show that methods of modern
array signal processing and parameter estimation will accomplish this task. An
overall covariance matrix of vectors having elements representing the traces
from each component of all seismometers yields the necessary information.
We assume that the only vertical axis response is due to Rayleigh rv , and Love
wave components Le and Ln appear only on east and north axes respectively.
The relative powers of the three Rayleigh components and the two Love
components are constant across the array, although the phase relationships
between seismometers will, of course, vary. That is to say, the 5 5 covariance
matrix of the five components (rv , re, rn, LE, LN) at a single seismometer is not
a function of seismometer location, but the 5 5 cross-covariance matrix of
the five-component vector seismometer from one seismometer with the fivecomponent vector of another seismometer has complex values whose phases
vary according to the relative locations of the two seismometers. After formulating a method of estimating the powers of each component, we show that we
can use all seismometers contributions coherently to determine the possibly
different azimuths and horizontal velocities of the Rayleigh and Love waves.
15.2 Background
The MUSIC (multiple signal classification) algorithm is a maximum likelihood estimate of directions of arrival of simultaneous narrowband plane
323
waves. However, it requires knowledge of the relative amplitudes of the arrivals at each sensor. Often equal amplitudes are assumed. Because this assumption does not hold at all axes of any seismometer for either the Rayleigh or
Love wave, we must estimate the amplitude relationships before we can estimate directions of arrival. We also assume that all signal sources components
and all data are Gaussian and that sensor spacing is always less than one-half
wavelength. With these assumptions and an overall covariance matrix of all
sensors data, we can estimate all parameters of interest.
We first discuss the component response method of Jepson and Kennett
(1990). They show that the pressure (P), shear vertical SV-wave (Sv) and shear
horizontal SH-wave (Sh) components of a source signal arriving at a surfacelevel, 3-component seismometer from azimuth appear as Z, N, and E (vertical, north and east) components according to
1 0
Z
0
N 0 cos ( k ) sin ( k )
0 sin ( k ) cos ( k )
E
w zp w zs 0 P
w np w ns 0 S v . (15.1)
0 0 w eh S h
The coefficients w transform P, Sv , and Sh to Z, R, and T [vertical, radial

(in the N, E plane) and transverse (to radial)]. They are as follows,
w zp v p0 q p0 C 1 ,
w zs v s0 pC 2 ,
w np v p0 pC 2 ,
w ns v s0 q s0 C 1 ,
and
w eh 2 .
where vp0 and vs0 are pressure and shear surface velocities respectively, qp0 and
qs0 are pressure and shear vertical velocities, p is horizontal slowness,
324
2 ( v 2 2p 2 )
2v s0
s0
----------------------------------C 1 ( v 2 --2p
2 ) 2 4p 2 q q
s0
s0 v0
and
2 q q
4v s0
s0 v0
---------------------------------C 2 ( v 2 2p 2 ) 2 4p 2 q q - .
s0
s0 v0
15.3 Estimation of the Component Powers

Assume that we have the following relationship between the Rayleigh and
Love components and the axial data. Assume also that only one Rayleigh and
one Love wave are present so that there are three components of signal,
T
s ( LE LN
rv ) .
The diagonal elements of Rs are those we wish to estimate. The covariance

of s is:
2
LE
T
R s Ess } 0
02
2
LN
0
0 2 .
2
rv
(15.2)
We allow that the two Love components may be less than fully correlated,
but that all three Rayleigh components will be a complex scalar multiplication
of rv . Later we revert to a further simplification that the Love components are
simply related by a scalar multiplier.
Now let there be M sensors having three axial components each. These are
all placed in a data vector x = (x1 x2...x3M)T, where
x i r vi L Ei n i , i 1, 4, 7, 3M 2;
x i r vi L Ni n i , i 2, 5, 8, 3M 1;
x i r vi n i , i 3, 6, 9, 3M.
325
(15.3)
and n = (n1 n2...n3M)T is a vector of white Gaussian noise samples having covariance matrix n2 I. Then the data covariance matrix
R x E { xx H }
(15.4)
contains submatrices which contain covariance elements that we now will

identify and relate to Rs. We use superscript H to denote conjugate transpose.
All Rayleigh components rvi are simply phase rotations of rv at the reference
sensor. Both horizontal Rayleigh components are also complex scalar multiplications of rv . The model for the signal-only (noise-free) data at the ith sensor is
x i1
r v L E
x is x i2 r v L N
x i3
rv
Ks ,
(15.5)
where
1 0
K 0 1 ,
0 0 1
s ( LE LN rv ) .
T
The sensors actual data vector is

x i x is n i .
(15.6)
The covariance matrix of this vector is

R x E { ( x is n ) ( x is n i ) H }
i
( KR s K H ) N
(15.7)
R xis N ,
where the noise covariance matrix is the same at all sensors due to the assumption of spatial stationarity, and the covariance of signal part of xi is
326
R xis ( KR s K H )
2 2 2
rv
LE
( L L r2 )*
E N
v
* 2
rv
LE LN * r2v
L2N

2 2
rv
* r2v
r2v
.
2
rv
2rv
(15.8)
Note that if we can estimate this matrix, we can easily identify all of its separate parameters, r2, , , L2E , L2N , LE LN . Having accomplished this, we can
construct the covariance matrix of any subset of the components (rv , rE, rN,
LE, LN). Of course we assume that Rayleigh components are uncorrelated with
Love components.
Using a derivation from estimation theory, the conditional estimate of the
reference sensors components x1 (x11, x21, x31)T, given any other sensors
components xi (x1i, x2i, x3i)T is the conditional mean (Scharf, 1991,
chapter 7)
x 1 xi E { x 1 x i } R 1i R i1 x i
(15.9)
and the covariance of x 1 xi is
H.
R x1 x R 1i R i1 R 1i
(15.10)
We note that the covariance R x1 x is the estimate of the signal part covarii
ance R xis . This is because the noise at sensor i does not correlate with either
the signal or the noise at sensor 1. In fact, the covariance R 1i is given by
R 1i E { x 1 x iH } KR s G iH ,
(15.11)
where Gi is the transfer matrix for the signal vector to the signal component of
xi, or
xi Gi s ni .
327
(15.12)
Gi is the ith sensors equivalent of K at the reference sensor. Inserting

equations (15.10) and (15.11) into equation (15.9) shows that
equation (15.9) can also be written
Rx
1 xi
KR s G iH R i1 G i R s K H
KR s G iH ( G i R s G iH n2 I ) 1 G i R s K H
(15.13)
KR s K H R xs ,
where the approximation is good for large S/N. This is an estimate of

R xis ( KR s K H ) in equation (15.7), thus an average of sample covariances
R x1 x over all i gives an improved estimate of the elements of R x . We
is
i
should not directly average the covariance matrices R xi , because the diagonal
elements all contain a positive noise variance which would not decrease with
averaging. Rather, we should average in a way that eliminates this bias so that
the expectation is as in equation (15.3), R xis .
H are averaged over all i. In fact,

The sample matrices R x1 x R 1i R i1 R 1i
i
because of spatial stationarity, we can repeat the averaging for each jth sensor in
turn, not just the first, treating each as the reference sensor. Averaging all of
these gives
M
R xis
Rij Rj1 RijH .
i1
(15.14)
j=1
ji
Since it is known that horizontal components of the Rayleigh wave are 90degrees out-of-phase with the vertical, we use the estimates
imag ( R xis ( 3, 1 )/R xis ( 3, 3 ) )
imag ( R xis ( 3, 2 )/R xis ( 3, 3 ) )

(15.15)
Having , now allows us to estimate the signal covariance matrix

R s E ( ss H ) ( K H K ) 1 K H R xs 1 K ( K H K ) 1 ,
328
(15.16a)
using the minimum norm or unconstrained least-squares error [see section 9.2
or Scharf (1991, chapter 9)] estimate of s given x. Alternately, since K is nonsingular, we may easily write
R s E ( ss H ) K 1 R xs 1 K .
(15.16b)
If s includes more than one Rayleigh and Love wave and we want to estimate their directions of arrival, we need to enlarge the reference sensor vector
to include more than one sensor. In that case, only sensor subsets with similar
geometric distribution can be averaged coherently. When all sensors are
spaced on a grid or equally spaced in line, the condition for geometric similarity of subarrays is satisfied. Although we address only one each simultaneous
Rayleigh and Love wave, more general treatments of limitations of this sort of
problem solution are currently being researched.
Anderson and Nehorai (1994) have found the limiting number of wavefronts that can be found using data from one and two three-component geophones. Burgess and VanVeen (1994) have analyzed the generalized likelihood
test for detecting signals using subspace data from an arbitrary array of vector
sensors (multicomponent sensors). Also Wax et al. (1994) have determined
the number of waveform parameters that can be estimated with an arbitrary
M-sensor array for signals in colored noise. They show that more than the
usual M 1 number of wavefronts can be localized with their method. These
recent works represent the state-of-the-art in this field at the time of our writing.
In the remainder of this chapter we assume that only one Rayleigh and
one Love wave are present.
15.4 Results using 0.10.2 Hz Geophysical Data at a Triaxial

Array
We have processed data taken October through December 1990 from 12
sensors of the triaxial seismic array deployed in the coastal plane of northeastern North Carolina and southeastern Virginia. A plot of the seismometer locations with respect to array center is shown in Figure 15.1. The data include
energy arising from the microseismic signals in the regions of Hudson Bay and
the Great Lakes. The sensor locations are given by coordinates in meters relative to the reference sensor, y = north, x = east. To generate 80 samples of Fourier transform values, 80 length-1024 FFTs of 50% overlapping and Hanning
windowed segments from each trace are used.
329
At any FFT frequency between 0.12 Hz and 0.18 Hz, a vector (one vector
element from each trace) is produced, as in equation (15.3); but the samples
from the traces are the Fourier transform values at a single frequency rather
than time samples. Given the sample rate fs =1/s and the Hanning window,
the effective filter bandwidth for each transform value is a bit wider than the
2/1024 Hz that would be in effect without windowing. The Hanning window
suppresses leakage from neighboring frequencies (Bardat and Piersol, 1986,
chapter 11).
Operating on the data covariance matrix as suggested above in
equations (15.13), (15.14), and (15.15), we find and versus frequency as
shown in Figure 15.2. Figure 15.3 gives the angles of the complex and
before imaginary parts are taken. Note that the angles are very nearly 90degrees except that the angle of changes dramatically above 0.155 Hz.
Figure 15.4 shows the Rayleigh vertical, east and north components versus
frequency. In Figure 15.5 are plotted total vertical, Rayleigh vertical, total east,
Love east, total north and Love north components. By the model, there is no
difference between total vertical (signal power) and Rayleigh vertical. The
Love components are approximately 20% to 50% of the respective total east
and north powers.
15.5 Signal Model in the Case of One Rayleigh and

One Love Wave
We now can write the response at multiple seismometers to one Rayleigh
and one Love wave. Let the elements of the signal response vector
x s ( E 1 N 1 Z 1 E 2 N 2 Z 2 Z M ) T
(15.17)
be the Fourier transform values of east, north, and vertical (Z) components at
all M sensors. Here we redefine the source vector
s ( L rv )
(15.18)
to have elements that are the transforms of the sources waveforms. This representation of the sources is different than in equation (15.5). (We could have
used this definition before, letting LN LE, in which case rather than LE
will be estimated.) Now we can write
330
Figure 15.1.
Locations of seismometers with respect to array center. Dimensions are in

meters.
x As n ,
(15.19)
where A is the transfer coefficient matrix from the three components to all
traces and n is a vector of zero mean and spatially and temporally white Gaussian noise samples. A must have two columns, one for each signal arrival. We
write the first column for the Rayleigh wave and the second for the Love, giving
331
alpha (
) and beta (
) vs. frequency
*
*
*
* ** *
*
*
*
1 * * *** *
*****
*
*
*
*
1.5
0.5
0
-0.5
-1
-1.5
-2
0.12
0.14
0.16
0.18
frequency
Figure 15.2.
Alpha and beta versus frequency using equation (15.15). Alpha and
beta are the Rayleigh east and Rayleigh north components, respectively
[see equation (15.5)]
332
200
150
100
50
* **** ** ****** *** ***

* * *
*
*
-50
-100
-150
0.12
0.14
0.16
0.18
frequency
Figure 15.3.
Angles of complex alpha and beta

(Rayleigh east and Rayleigh
north, respectively) estimates before imaginary parts are taken.
333
*
*
**
0
0.12
Figure 15.4.
*
*
*
*
* **
0.14
*
*
*
* * ********
0.16
0.18
Estimates of Rayleigh components versus frequency. Rv ( ), RE ( ), and

RN ( ) [equation (15.5)].
334
4
power
*
*
*
0
0.12
** *
0.14
*
** * * **
* * ** ** **
0.16
0.18
frequency
Figure 15.5.
Estimates of total Rayleigh plus Love, powers of East ( ) and

North ( ) components (upper curves) and Rayleigh vertical
( ) (same as total vertical), Love East ( ) and Love North
( ).
335
a1

1
e i2r
e i2r
e i2r
.
.
.
e i( M 1 )r
e i( M 1 )r
e i( M 1 )r
L
L
LE e i2L
L e i2L
a 0
2
.
LE e i( M 1 )L
L e i( M 1 )L
(15.20)
These columns represent the delays and attenuations of the Rayleigh and
Love waves at the sensor and axis represented by the element of the vector x.
The delay i at sensor i for one of the waves is a function of the azimuth of
arrival , the slowness p for that wave, and the relative x and y offset of the
sensor from the reference sensor. For a sensor at x, y we have
i ( x sin ( ) y cos ( ) )p .
(15.21)
We have assumed that the first sensor is the reference sensor, so its components relative delays are zero, unity exponential factor. Delay elements repeat
in triplicate because all three components of both the Rayleigh and Love
waves have equal relative delays, although Love has zero amplitude on the vertical axes.
Because the delay factors in any delay vector a are a function of frequency,
vectors x(i), sample vectors of the Fourier coefficients at different frequencies
i, will have different phases. The covariance method of analyzing direction of
arrival requires that many samples of the vector x(i) have the same average
relative phases. For purely stationary, sources many sequential Fourier transforms can produce these samples, and this is the approach we have taken.
336
15.6 Application of the MUSIC Algorithm to the Array Data

The MUSIC algorithm, discussed in Chapter 5, requires analysis of a
covariance matrix. The MUSIC algorithm does not require time shifting (nor
interpolation) for each trial direction before creating the covariance matrix.
From the sample covariance matrix multiple, DOAs can be calculated if their
amplitudes at each sensor are constant or their relative proportions known.
We have only found the proportions for single Rayleigh plus Love waves in
the above analyses.
It can be shown that the eigenvectors of Rx in equation (15.4), which correspond to its two largest eigenvalues, span the same vector space as the two
coefficient vectors in A, one for each source. This space is called the signal
space and the corresponding eigenvectors are a basis of the signal space. The
space of dimension 3M 2, which is orthogonal to the signal space, is called
the orthogonal space or, sometimes, the noise space, although there are components of noise in the signal space also. The assumption is that there is an
equal amount of noise power in all dimensions:
R x AR s A H n2 I .
(15.22)
Note that there are 3M elements in the data vector and 3M is the size of the
covariance matrix.
We can find the vectors in A, modeled equation (15.22) and which are
functions of the directions of arrival and slownesses, by searching for each
through all its possible values of parameters p and , to as fine an increment as
is useful. This finds those parameter pairs which yield coefficient vectors a ideally orthogonal to the noise space eigenvectors (Haykin, 1991). This is the
MUSIC method of Chapter 5. If Vn is the matrix whose columns comprise
the set of 3M 2 noise-space eigenvectors and Vs is the matrix whose columns are the two signal space eigenvectors, then, ideally, a trial delay vector a
is a solution if
J a H V n a a H ( I V s )a 0
(15.23)
Usually the inverse J -1 is searched for peaks, because true zeros of J are not
realized in practice. We have used both MUSIC and minimum variance distortionless response (see Chapter 5) and get similar results; however, MUSIC
can give more resolution. Because the actual number of arrivals is not known
337
Figure 15.6.
MUSIC algorithm applied to 12 triaxial seismometer array data.
exactly we tried ranks up through five to define the signal subspace. The
results at 0. 13 Hz are shown in Figure 15.6. It appears that there are only two
strong point sources, but there may be other less strong sources. This process
agreed reasonably with findings produced by other means.
Having found the p and parameters of two waves, we might proceed to
find the waveforms themselves, waveform parameters, and quality measures.
Waveform estimation is discussed in Chapter 9. Quality measures can be estimated through the eigenvalues of Rr . The two largest have a sum equal to M
times the total power in the signals, and the average of the 3M 2 smaller
eigenvalues is an estimate of the noise on each sensor component. For example, if only one signal were present, the first eigenvalue 1 would represent the
total variation of the data which has the delays implied by the elements of the
associated eigenvector. This variation also includes noise in that dimension,
however. Thus 1 M s2 n2 , where the average powers of signal and
noise over the time window analyzed are s2 and s2 , respectively. Of course
2 includes P, S, and H component powers of the wave. When more than one
338
wave is present, the eigenvalues do not correspond one to one with wavefront
powers.
We have already found the powers at the solution coefficient vectors a
through the elements of the matrix R xis of equation (15.8), which we estimated via equation (15.14). A measure of coherence for these estimations is
given by the ratio of their powers,
r2v Im 2 ( ) r2v Im 2 ( ) r2v , L2E L2N ,
to the estimate of the noise power,
1
n2 ---------------3M 2
3M
i .
(15.24)
i3
Note, we assumed that the Rayleigh vertical and horizontal components

are related by a 90-degree phase rotation as well as an amplitude scaling, thus
the imaginary part extraction of the , , estimated from equations (15.14)
and (15.8).
15.7 Conclusions
We have formulated the received data such that elements of the signal
covariance matrix reveal the unknown parameters of powers of both Rayleigh
and Love waves arriving at the sensor array. The signal covariance matrix is
accomplished by a unique spatial averaging that tends to nullify the random
noise on the diagonal. We have demonstrated our method with real data and
results concur with theories on the source of the data. We find clear distinctions between the two wavetypes and a 90-degree phase difference in Rayleigh
components as it should be over most of the frequency analysis band. Source
directions were established in a separate report and concur with other knowledge available.
15.8 References
Anderson S., and Nehorai, A., 1994, Analysis of a polarized seismic wave
model: 8th IEEE Workshop on Statistical Signal and Array Processing,
281-284.
339
Bendat, J. A., and Piersol, A. G., 1986, Random data, analysis and measurement procedures, 2nd ed.: John Wiley & Sons, Inc.
Burgess, K., and VanVeen, B., 1994, Vector-sensor detection using a subspace
GLRT: 8th IEEE Workshop on Statistical Signal and Array Processing,
109-112.
Haykin, S., 1991, Adaptive filter theory: Prentice-Hall, Inc.
Jepson, D. C., and Kennett, B. L. N., 1990, Three-component analysis of
regional seismograms: Bull. Seis. Soc. Am., 80, 2032-2052.
Wax, M., Sheinvald, J., and Weiss, A., 1996, Detection and localization of
correlated and uncorrelated signals in colored noise via generalized least
squares: IEEE Trans. Sig. Proc., 44, 1734-1743.
Scharf, L. L., 1991, Statistical signal processing: Addison-Wesley Publ. Co.
340
INDEX
A
analysis region 5, 6
beamformer 156
bias 9
binocular 298
broadband 47, 83, 86, 87, 92, 95
CMP 14, 155, 161, 162, 227, 237, 258

coherence 2, 5, 43, 83, 96, 105, 185
detection 105
coherency 98, 101
constrained minimum variance 87, 160,
176
constraints 60, 63, 65, 160, 164, 287, 296
correlation 37, 43
matrix 33, 174
multicomponent 303, 307, 321
covariance matrix 10, 11
estimation 15, 58, 141, 144, 154,
158, 181
robust estimation 15, 31
sample 9, 10, 11, 13, 20, 21, 36, 40,
156, 158, 165, 202, 228,
337
squared 150, 151
triaxial 276
data vector window 9

data window 5, 7, 8, 9, 109, 110, 182,
229, 284
direction finding 40, 291, 297
eigenimages 4, 242, 243, 244, 245, 246,

250, 251, 256, 263, 266, 268
eigenstructure 15, 19, 20, 21, 23, 28, 38,
39, 41, 46, 47
relation to power spectrum 53
statistics of estimates 20, 31, 40
341
eigenvalues 19, 20, 21, 22, 24, 38, 40, 53,

95
ordered 20
eigenvectors 13, 15, 19, 20, 21, 23, 24,
31, 38, 53, 54
ordered 20
elliptical 285
ESPRIT 77, 109, 127, 138, 139
event identification 258, 259
filter 55, 178, 186

finite impulse response (FIR) 57
lattice 57
prediction 56
focussing
frequency 95
Frobenius norm 15, 24, 165
F-test(s) 284, 285, 287, 288
Gaussian 10, 11, 13, 15, 31, 36

geophone 5, 9, 13, 110, 329
array 46, 83
triaxial 4, 275, 276, 302, 321
Hampsons algorithm
comparison to subspace method 178,
179, 181
Hampsons multiple elimination method
170, 182
Hermitian 19, 165, 166, 167
high-resolution spectral estimators 58, 154
interference 14, 26, 103, 160, 170, 171,

172, 174, 175, 176, 177, 185, 189, 190,
191, 192, 204, 263, 276, 296
canceled 102
canceling 83, 93, 173, 185, 186, 187,
190, 192, 196, 206, 217
eigendecomposition 190, 196, 206
patterns 3, 185
suppression 186, 187, 206
null space 24, 25, 62
overdetermined 26, 119
Karhunen Loeve Transform (KLT) 19, 21,

23
parabolic approximation 84, 90, 177

polarization
analysis 275, 276, 278, 291, 292, 295
elliptical 275, 287, 288
positive definite 15, 21, 33, 66, 165, 166,
243, 248
positive definite, constraints 164
predictor 55, 56, 58, 63, 151, 168
principal component(s) 16, 23, 185, 189,
228, 229, 241, 247
sensitivity to lithologic variations 230
pseudo-inverse 102, 104, 106, 118, 164,
173
linear statistical model 35, 36, 37

Love 4, 323, 324, 325, 327, 329, 330,
331, 335, 336, 337, 339
maximum entropy (ME) 248

maximum likelihood (ML) 11, 13, 14, 59,
75, 78, 142, 323
mean 9, 10, 11, 12, 15, 16
sample 10, 11, 16
minimum variance 62, 69, 72, 74, 75, 76,
176
distortionless response (mvdr) 75
linearly constrained (lcmv) 87, 160
minimum variance distortionless response
(MVDR) 60, 75, 337
Minimum-mean-square error 176
minimum-norm (MN) 26, 78, 80, 116,
134, 137
multicomponent 291, 292, 303, 307, 321,
329
multiple sidelobe canceler (MSC) 101, 102,
106
comparative computation time 106
Multistation 303
MUSIC 61, 86, 87, 88, 89, 93, 94, 95,
98, 109, 110, 116, 122, 123, 124, 125,
132, 133, 134, 143, 153, 156, 179, 181,
323, 337
range space 26
rank 20, 21, 24, 26, 27, 30, 38
low rank approximation 21, 24, 83,
104, 106
reduced rank 26, 29, 30, 63, 105
Rayleigh 291, 293, 323, 324
seismic wavefront(s) 3, 46, 47, 76, 105,

109, 110, 113, 115, 132, 135
seismometer 323, 324, 329, 330, 331
array 338
triaxial 338
semblance 83, 93, 94, 95, 98, 100, 110,
132, 155
comparison to MUSIC 93, 95, 156
review 155
subspace algorithm 83, 98, 100, 101,
156
signal-to-noise (S/N) 31, 38, 44, 62, 119,
173, 175, 228, 237, 253, 275, 276, 328
enhancement 252, 268
maximum 107, 173, 176
singular vector 23, 24, 26, 27, 165, 176
singular-value decomposition (SVD) 19,
21, 104, 110, 115, 242, 278, 322
smoothing 232, 233, 234, 235, 237
forward-backward 150, 151
spatial 43, 87, 91, 109, 149, 150
narrowband 28, 40, 45, 47, 60, 63, 83,

88, 142, 167, 179, 181, 323
noise 6, 13, 35, 36, 37, 38, 39, 46, 47
colored 85, 329, 340
nonwhite 37, 47, 120
subspace algorithm 35, 38, 39, 43
suppression 237, 256
white 28, 37, 39, 40, 45, 55, 61, 72,
95, 120, 245, 253, 256,
267, 269
normal density (see Gaussian) 12
342
273
waveform 60, 103, 169, 227, 329
estimation 183
estimator 102
wavefront 28, 29, 44, 46, 57, 83, 115, 323
multiple model 83, 87, 92, 93, 136
Wishart distribution 10
spectral
analysis 1, 3, 51, 57, 59, 68, 77, 79,
80, 167
estimation 40, 41, 57, 140, 154, 167
estimators 58, 61, 64, 65, 67, 71, 72,
74, 77, 154
spectrum 51, 53, 57, 253, 254, 255
discrete power 52
static correction 241, 252, 265
stationarity 37, 142, 326
stationary 37
subarray 43, 141, 142, 329
subarrays 146
subspace 35, 39, 42, 83, 98, 113, 170,
173, 192, 234, 329
and eigenstructure 38, 41
estimators 83, 172
examples, signal 44
noise 35, 38, 43, 171, 189
perturbation 41
signal 35, 38, 39, 42, 44, 83, 93, 94,
98, 110, 113, 147, 158,
159, 168, 170, 189, 235
statistics of components 42
vector 35
time 125
time delay 84, 91, 109, 123, 305
estimation 115, 118, 125
perturbation 122
time gate 84, 101, 156
time window 156, 230, 275, 277, 292
Toeplitz 12, 13, 19, 57, 165
constraints 13, 164
triaxial
array 323, 329
covariance matrix 276
data 275, 278, 288, 291, 297, 321
velocity estimation 40, 44, 105, 119, 138,

154, 155, 158, 159, 160, 164, 289, 322
root-mean-square 83
VSP 263, 264, 266, 278, 289, 291, 298,
314, 322
wavefield decomposition 252, 256, 259,
343

Covariance Analysis For Seismic Signal Processing

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Covariance Analysis For Seismic Signal Processing

Încărcat de

Drepturi de autor:

Formate disponibile

Downloaded 06/26/14 to 134.153.184.170. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.

Geophysical Developments Series, No. 8

Data Vectors and Covariance Matrices ........................................ 5

Eigenstructure, the Karhunen Loeve Transform, and SingularValue Decomposition .................................................................. 19

Eigenstructure and Least-Squares Fit of a Random Vector .... 19

Vector Subspaces ......................................................................... 35

The Linear Statistical Model................................................. 35

Temporal and Spatial Spectral Analysis ..................................... 51

Root-Mean-Square Velocity Estimation ...................................... 83

Subspace-Based Seismic Velocity Analysis............................... 109

Problem Formulation ......................................................... 110

Enhanced Covariance Estimation with Application to the Velocity

Spatial Smoothing.............................................................. 142

The Optimal Velocity Estimator with Spatial

Waveform Reconstruction and Elimination of Multiples and Other Interferences ..........................................................................169

Signal-Plus-Interference Subspace ......................................170

10 Removal of Interference Patterns in Seismic Gathers .............185

An Interference Cancelation Approach...............................186

11 Principal Component Methods for Suppressing Noise and

Sensitivity of Principal Components to Lithologic

13 Single-Station Triaxial Data Analysis........................................ 275

14 Correlation Using Triaxial Data from Multiple Stations

14.2.1 The Analysis Domain ............................................293

15 Parameterization of Narrowband Rayleigh and Love Waves

Covariance Analysis for

relate covariance methods analytically to more conventional methods, and

Chapter 5 is an overview of many of the well-known covariance-matrix

A region of 2-D data from which information is to be obtained is called an

(a) Two NMO-corrected seismic gathers with windows that characterize

x  ( x ( 10,  2 ), x ( 10,  1 ), ..., x ( 10, 2 ) , x ( 11,  2 ) , ..., x ( 11, 2 ) ) T

Sample Data Covariance Matrix

In some applications, the vector elements are discrete Fourier transform

Rationale for Sample Covariance Analysis

then the expected value of x is m, and the covariance of the vector x is R. If we

Statistics of the Sample Covariance Matrix

The density of a single sample zero-mean complex Gaussian vector x is

Thus L independent samples of the complex vector x have joint density

Defining the variation of R to be

(Matrix A is Toeplitz if Ai,j  Aim,jm.) Without any constraints, it is easy to

 (Ay  xi)(Ay  xi)T

is to be minimized. Similarly, the modes of variance of the data xi can be

Robust Estimation of Sample Covariance Matrices

uncontaminated normal data. An adaptive multivariate trimming (MVT) and

and the robust estimator of R is

The constant v is the degrees of freedom (dof ) of d (assumed X v2 ) and b1 and

Anderson, T. W., 1958, An introduction to multivariate statistical analysis:

This page has been intentionally left blank

Eigenstructure and Least-Squares Fit of a Random Vector

To review eigenstructure and simultaneously demonstrate one of its uses,

When it is necessary, we will distinguish the eigenstructure of Cx from that

The Eigenstructure Forms of the Covariance Matrix

The covariance matrix or the sample covariance matrix may be expanded

where V is the matrix of eigenvectors, V  (v1v2...vM), and  is a diagonal

Singular-Value Decomposition and the Karhunen Loeve

where the columns vi of V and the elements i of the diagonal of  are,

Assuming zero mean, Cx is formed by averaging the outer products x i x i , or,

Whereas eigenstructure factors only square matrices, the singular-value

where  2 is the diagonal matrix of the p  r zero-valued eigenvalues of G,

Equation (3.9) forces X(V1V2) to be expressible as

where the second sum is zero because i  0, i  r  1, ..., M. Note that X is

The equations U 2  12 U 2H  0 and V 2  22 V 2H  0 could be added to

The Karhunen Loeve Transform

Similarly any linear combination (LC) of the xi is an LC of the uk. If only

x ( x ( 10, 2 ), x ( 10, 1 ), ..., x ( 10, 2 ) , x ( 11, 2 ) , ..., x ( 11, 2 ) ) T

(Matrix A is Toeplitz if Ai,j Aim,jm.) Without any constraints, it is easy to

(Ay xi)(Ay xi)T

where V is the matrix of eigenvectors, V (v1v2...vM), and is a diagonal

where the columns vi of V and the elements i of the diagonal of are,

where 2 is the diagonal matrix of the p r zero-valued eigenvalues of G,

where the second sum is zero because i 0, i r 1, ..., M. Note that X is

The equations U 2 12 U 2H 0 and V 2 22 V 2H 0 could be added to

where Ur, r and Vr are composed of the appropriate parts of U, , and V.

where Ui, i 1,2, are as in equation (3.11a).

Note too that the error y y ( I P # )y is orthogonal to the range

where X is M p, is p 1, y are M noisy measurements, and M p. If

where we are indicating estimates of ULVH S. We have seen in the previous

where is an unknown p M coefficient matrix. Thus, the data appear to be

Note that b rH b r ( p r ) n2 is an unbiased estimate of bias-squared, and

where x is the length M data vector, H is M r, is r 1 and w is M 1.

Because of the independence of w with H,

where x is of length M, A is M r and s(t) is the vector of signals or signal

2) The smallest M r eigenvalues of R are all equal to n (This fact

Then by definition, 2 must be an eigenvalue of R.

under the conditions that P s s and AH A aHa Tr[AAH] M. For

again assuming that Ps is diagonal with elements sk , and columns ai of A

when estimates of Vs, s, and A are found.

This particular subspace component v i is of importance in the

Referring back to equation (4.1), x H n for the general problem.

Given independent, white stationary noise with variance n and zero-mean

is the wave velocity, and m 0, 1, 2, , M 1 indexes the sensors. Thus the

ju1 ( M 1 )1 ju2 ( M 1 )2

The parabolic approximation uses ( ( m ) V )<< T 0 , for which

so that V is the matrix with orthonormal eigenvectors vi and is the diagonal

such that cov ( ) is diagonal:

Further, the columns of UH are orthogonal because UUH diag ( A ii ) .

As long as THT I (or orthonormal transformation vectors are

The relationship between a continuous autocorrelation function rx() of a

What is the relationship of a 2M 1 2M 1 covariance matrix Rx to