Documente Academic
Documente Profesional
Documente Cultură
of Systems of Polynomials
Arising in Engineering and Science
The Numerical Solution
of Systems of Polynomials
Andrew J. Sommese
^p World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • H O N G K O N G • TAIPEI • CHENNAI
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Forphotocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.
ISBN 981-256-184-6
Printed in Singapore.
To Rebecca, Rachel, and Ruth To Vani, Megan, and Anne
Preface
This book started with the goal of explaining, to engineers and scientists, the ad-
vances made in the numerical computation of the isolated solutions of systems of
nonlinear multivariate complex polynomials since the book of A. Morgan (Morgan,
1987). The writing of this book was delayed because of a number of surprising devel-
opments, which made possible numerically describing not just the isolated solutions,
but also positive-dimensional solution sets of polynomial systems. The most recent
advances allow one to work with individual solution components, which opens up
new ways of solving a large system of polynomials by intersecting the solution sets
of subsets of the equations. This collection of ideas, methods, and problems makes
up the new area of Numerical Algebraic Geometry.
The heavy dependence of the new developments since (Morgan, 1987) on alge-
braic geometric ideas poses a serious challenge for an exposition aimed at engineers,
scientists, and numerical analysts — most of whom have had little or no exposure to
algebraic geometry. Furthermore most of the introductory books on algebraic geom-
etry are oriented towards computational algebra, and give short shrift at best to the
geometric results which underly the numerical analysis of polynomial systems. Even
worse, from the standpoint of an engineer or scientist, such books typically aim to
resolve algebraic questions and so do not directly address the numerical/geometric
questions coming from applications.
Our approach throughout this book is to assume that we are trying to explain
each topic to an engineer or scientist. We want to be accurate: we do not cut
corners on giving precise definitions and statements. We give illustrative examples
exhibiting all the phenomena involved, but we only give proofs to the extent that
they further understanding.
The set of common zeros of a system of polynomials is not a manifold, but it is
close to being one in the sense that exceptional points are rare. This vague statement
can be made mathematically precise, and indeed, the theoretical underpinnings
of our methods imply that we avoid such trouble spots "with probability one."
The usual algebraic approaches to the subject do not show how familiar geometric
notions from calculus relate to these solution sets. The geometric approach is harder,
since to link concepts like prime ideals to algebraic sets with certain very nice
vii
viii Numerical Solution of Systems of Polynomials Arising in Engineering and Science
geometric properties, you must use not only algebra, but topology, several complex
variables, and partial differential equations. Doing this with full proofs would rule
the book out for all but a very small audience. Yet the theory basically says that,
in any number of dimensions, solution sets are as nice as a few well chosen and
simple examples would naively lead an engineer or scientist to expect.
There remains a tension that we see no way to completely resolve. Dealing with
polynomials and algebraic subsets of Euclidean space is basic, but this is not general
enough to cover the applications common in engineering and science. For example,
the use of products of projective spaces and multihomogeneous polynomials which
live on them is extraordinarily useful, but these polynomials are not "functions" on
the products of projective spaces. Working in an appropriate generality to cover
everything needed would cast a pall over the whole book. Moreover, the early
parts of the book need only advanced calculus and a few concepts from algebraic
geometry. For this reason, we often restate results in different levels of generality
in different parts of the book. We have also included an appendix with detailed
statements of useful, more technical results from algebraic geometry.
Part One of the book is introductory.
Chapter 1 gives examples of polynomial systems as they arise in practice and
gives an introduction to homotopy continuation, the numerical solution tool under-
lying our work.
Chapter 2 gives a more detailed discussion of homotopy continuation and what
it means to be a complex or real solution of a system of polynomials.
Chapter 3 introduces some algebraic geometry and shows some of the ways it
naturally presents itself, e.g., dealing with solutions at infinity and continuation
paths going to infinity.
Chapter 4 gives a first discussion of generic points and probability-one algo-
rithms. The powerful ability to choose "generic points" in Euclidean space increases
the efficiency and stability of numerical algorithms and eliminates some problems
that are endemic in exact symbolic procedures.
In Chapter 5, there is some detailed discussion of polynomials in just one vari-
able. For example, we discuss the fundamental limitations that the number of digits
available to us impose on our recognizing a zero of a polynomial.
Chapter 6 gives a brief discussion, with some pointers to the literature, of other
approaches to solving systems of polynomials.
Part Two is devoted to the theory and practice of finding isolated solutions of
polynomial systems. Here we consider the many special features of a polynomial
system that make it amenable to efficient solution.
Chapter 7 explains the coefficient-parameter framework for systems arising in
engineering and science. It is a compelling fact that almost all systems that arise
in practice depend on parameters, and need to be solved many times for different
values of the parameters. Thus it becomes worthwhile to spend extra computation
solving such a system if that extra time, amortized over all the times we solve the
Preface ix
system, leads to a more efficient and quicker average solution time. We include a
case study of this approach applied to Stewart-Gough platform robots.
Polynomial systems arising in engineering and science tend to be sparse and
highly structured. In Chapter 8, we give an extended discussion of such special
structures. These features cause systems to have fewer solutions than would be
naively expected. Taking advantage of this structure leads to more efficient homo-
topies and much faster solution times.
Chapter 9 gives case studies for systems arising from a number of different
engineering and scientific applications. We have found that these systems present
challenging problems and excellent trial grounds for improving our algorithms.
Chapter 10 covers endgame methods. These methods exploit continuation to
improve the numerical accuracy of singular solutions, such as double or triple roots.
Chapter 11 deals with how to recognize and deal with problems that may occur.
The probability-one methods we use are based on choosing generic points. If only
we had computers with infinite precision, these methods would eliminate all manner
of unpleasant difficulties, e.g., path crossing. Since real computers have only finite
precision, the probability of "probability zero" events is very small, but positive.
This chapter discusses how to detect the occurrence of such events, in the large
problems occurring in engineering and science, and how to deal with them.
Part Three of the book shows how the ability to compute isolated solutions by
homotopy continuation can be exploited to manipulate higher-dimensional solution
sets of polynomial systems. To do so, we introduce "witness sets" to represent
curves, surfaces and other algebraic-geometric sets as numerical objects. Witness
sets and the underlying theory should be looked at as a new subject Numerical
Algebraic Geometry whose relation to Algebraic Geometry is similar to the relation
of Numerical Linear Algebra to Linear Algebra.
Chapter 12 introduces some needed material from algebraic geometry, such as
the Zariski topology, its relation to the complex topology, the irreducible decompo-
sition, constructible algebraic sets, and multiplicity.
Chapter 13 introduces the basic concepts of numerical algebraic geometry. Pri-
mary among these are witness points, which is the natural numerical data structure
to encode irreducible algebraic sets. We also give an extensive discussion of the
reduction to systems with the same number of equations as unknowns. Based on
(Sommese & Wampler, 1996), the article where the Numerical Algebraic Geometry
started, this chapter explains the numerical irreducible decomposition and how to
compute "witness point supersets," a first approximation to the witness point sets
occurring in the numerical irreducible decomposition.
Chapter 14 presents an alternative procedure to compute the "witness point
supersets" of Chapter 13. We follow (Sommese & Verschelde, 2000), with some
of the later improvements from (Sommese, Verschelde, & Wampler, 2004b). One
novelty is the complete removal of slack variables.
Chapter 15 explains the algorithms to compute the numerical irreducible de-
x Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Watanabe. We give special thanks to Daniel Bates for his many helpful suggestions
and remarks.
Most of all, we thank our families for their strong encouragement and patience
during the writing of this book.
Preface vii
Conventions xxi
I Background 1
1. Polynomial Systems 3
1.1 Polynomials in One Variable 3
1.2 Multivariate Polynomial Systems 5
1.3 Trigonometric Equations as Polynomials 7
1.4 Solution Sets 8
1.5 Solution by Continuation 9
1.6 Overview 10
1.7 Exercises 11
2. Homotopy Continuation 15
2.1 Continuation for Polynomials in One Variable 15
2.2 Complex Versus Real Solutions 18
2.3 Path Tracking 20
2.4 Exercises 24
3. Projective Spaces 27
3.1 Motivation: Quadratic Equations 27
3.2 Definition of Projective Space 29
3.3 The Projective Line P 1 30
3.4 The Projective Plane P 2 32
3.5 Projective Algebraic Sets 34
3.6 Multiprojective Space 35
xiii
xiv Numerical Solution of Systems of Polynomials Arising in Engineering and Science
6. Other Methods 67
6.1 Exclusion Methods 68
6.2 Elimination Methods 72
6.2.1 Resultants 73
6.2.1.1 Hidden Variable Resultants 73
6.2.1.2 u-Resultants 76
6.2.2 Numerically Confirmed Eliminants 76
6.2.3 Dixon Determinants 77
6.2.4 Heuristic Eliminants 79
6.3 Grobner Methods 81
6.3.1 Definitions 81
6.3.2 From Grobner Bases to Eigenvalues 83
6.4 More Methods 84
6.5 Floating Point vs. Exact Arithmetic 84
6.6 Discussion 85
6.7 Exercises 86
II Isolated Solutions 89
7. Coefficient-Parameter Homotopy 91
7.1 Coefficient-Parameter Theory 92
7.2 Parameter Homotopy in Application 98
Contents xv
Appendices 297
Appendix A Algebraic Geometry 299
A.I Holomorphic Functions and Complex Analytic Spaces 300
A.2 Some Further Results on Holomorphic Functions 302
A.2.1 Manifold Points and Singular Points 306
A.2.2 Normal Spaces 308
A.3 Germs of Complex Analytic Sets 308
A.4 Useful Results About Algebraic and Complex Analytic Sets 310
A.4.1 Generic Factorization 316
A.5 Rational Mappings 317
A.6 The Rank and the Projective Rank of an Algebraic System 318
A.7 Universal Functions and Systems 320
A.7.1 One Variable Polynomials 321
A.7.2 Polynomials of Several Variables 322
A.7.3 A More General Case 322
A.7.4 Universal Systems 323
A.8 Linear Projections 324
A.8.1 Grassmannians 325
A.8.2 Linear Projections o n P * 327
A.8.3 Further Results on System Ranks 329
A.8.4 Some Genericity Properties 330
A.9 Bertini's Theorem and Some Consequences 331
A.10 Some Useful Embeddings 334
Contents xix
Bibliography 379
Index 397
Conventions
dfN dfN
L dzi " ' dzm J
This is an abuse of notation in the case N = 1 or m = 1. Rather than
avoiding the abuse and obscuring things we usually leave the reader to fill
in the special cases, e.g., in the example just given with N = m = 1 we
mean
WA and not [ | | .
• When clear from context, we let 0 denote the origin of a vector space.
• When we have a map / : X —> Z between sets, and Y C X, we usually
denote the restriction of / to Y by / y . Similarly, for a point z € Z, we
denote the fiber f~1(z) by Xz.
• We often use := when we are making a definition, e.g., the disk of radius r
in the complex plane C around a point x is defined
Ar(a;) := {z G C | \z ~ x\ < r) .
In pseudocode statements of algorithms, we use the same symbol for copying
the right-hand result to the left, e.g., k := k + 1 increments k by one.
• We use multidegree notation. For example, if z\,..., zN are indeterminates,
and / = (ii,...,ijv) is a n -/V-tuple of nonnegative integers, then z1 denotes
N
*i'---2#. We let |/| := 5 > .
i=i
xxi
xxii Numerical Solution of Systems of Polynomials Arising in Engineering and Science
p(zi,...,zN) := ^2 cizI e
C[zi,...,ZAr],
|/|<d
is said to be of total degree d (or of degree d for short) if there is at least
one coefficient with c/ ^ 0 and \I\ = d. For one variable polynomials we
often follow the reverse convention on the ordering, i.e., we write p(z) :=
aozd H \-ade C[z], with a0 ^ 0.
• The symbol \ is the "setminus" operator, that is
A\B:={xG A\x<£B}.
• For set A, #A is the cardinality of A.
• For a subset A of a topological space, A denotes the closure of A.
• We use C* := C \ 0, the complex line minus the origin.
• We use p^\x) :— d?p(x)/dx^, the j t h derivative of p with respect to x.
• The ./V-dimensional complex Euclidean space is
CN := C x • • • x C = {(xu...,xN) J Xi e C } .
N times
• For real numbers a, b, we denote open and closed intervals of the real line
as
Polynomial Systems
The goal of this book is to describe numerical methods for computing the solutions
of systems of polynomial equations. It is appropriate, therefore, to begin by denning
"polynomials," discussing how they may arise in science and engineering, describing
in nontechnical terms what the "solutions of polynomial systems" look like and
how we might represent these numerically. The last section of this chapter gives
an overview of the rest of the book, to help the reader understand it in a larger
perspective.
As will be our habit throughout the book, we start with simple scenarios before
proceeding to more general ones. A polynomial of degree d in one variable, say x,
is a function of the form
f(x) = aoxd + a^"1 + h ad-ix + ad, (1.1.1)
where ao,...,a,d are the coefficients and the integer powers of x, namely 1, x,
x2, ..., x d , are monomials. In science and engineering, such functions usually
have coefficients that are real numbers although sometimes they may be complex.
Accordingly, we will consider f(x) as a function that maps complex numbers to
complex numbers, / : C —> C. The notation C[x] is often used to denote the set of
all polynomials over the complex numbers in the variable x, so that we may write
/(x) G C[x]. When we say that f(x) in Equation 1.1.1 is degree d, this implies that
a0 ^ 0; otherwise, we say that / is at most degree d.
The "solution set" of the equation /(x) = 0 is the set of all values of x g C such
that /(x) evaluates to zero. We may write this as
/- 1 (0) = { x e C | / ( x ) = 0 } .
One of the great advantages of working over complex numbers is that, by the fun-
damental theorem of algebra (see Theorem 5.1.1), we know that as long as ao ^ 0,
f~l(x) will consist of exactly d points, counting multiplicities. Thus, a data struc-
3
4 Numerical Solution of Systems 'of Polynomials Arising in Engineering and Science
ture convenient for representing the solution of the equation is just a list of d complex
numbers, say x\,...,x*d, not all necessarily distinct. These are also called the roots
of the polynomial. If some of the roots are repeated, then the reduced solution set
is just the list of distinct roots. We know the solution set is complete and correct if
d
= a
a0 Y[(x ~ xi) o%d + a-\xd~l + • • • + a,d-ix + ad-
1=1
That is, we expand the left-had side and check that all the coefficients match.
It is possible to study polynomials over other rings, for example: the reals, M[x];
rational numbers, Q[x}\ the integers, Z[x]; any finite field1, W[x\; or sometimes, in
statements of theory, an unspecified field, usually denoted K[x]. In one sense, there
is no loss of generality in restricting our attention to C [x], for if we find all complex
solutions of f{x) = 0, all real solutions will be contained therein, and similarly
rational and integer solutions. However, the situation may be turned on its head
if we ask other questions. As an example, suppose we seek the conditions for a
sixth degree polynomial to be factorable over a field other than C. Since the funda-
mental theorem of algebra tells us that all polynomials of degree greater than one
in one variable factor over the complexes, we would have to consider the specific
field in question to get an answer. Computer algebra systems deal extensively with
polynomials over the rational numbers or over finite fields, since these permit exact
calculation. And in the area of encryption, essential to secure digital communica-
tions, polynomials on finite fields are crucial. However, in engineering and science,
real or complex numbers are of greatest concern, and it is in this arena that we
focus our effort.
At this point, it is worth noting that our approach will be numerical, so in fact,
all of the coefficients and the solutions we compute will be represented in floating
point arithmetic. Typically, both will be only approximate, so that in reality we
compute approximate solutions to a polynomial that is already an approximation of
the original problem. This is the nature of almost all scientific computation. What
is critical is that we have some estimate of the sensitivity of the problem so that we
have assurance that the solutions are near the correct ones, or, as some would have
it, that the problem that our solutions satisfy is near the one we want to solve.
There is an extensive literature on the numerical solution of polynomials in one
variable. We will not delve into it here, as our focus is on multivariate cases. For
low-degree polynomials in one variable, one approach is to reformulate the problem
as finding the eigenvalues of the companion matrix,
/ 0 1 ••• 0 \
: : :
A= '•• , (1.1.2)
0 0 ••• 1
^ oo a0 ao'
a
Most commonly, the integers modulo a prime number.
Polynomial Systems 5
having ones on the superdiagonal and the coefficients of / in the last row. Since
the characteristic polynomial of A is det(xl — A) = f(x)/a0, its eigenvalues are the
roots of / . This formulation is convenient due to the wide availability of high-
quality software for solving eigenvalue problems, and as documented in (Goedecker,
1994), it is a highly effective numerical approach. For polynomials with high degree,
divide-and-conquer techniques may be better (Pan, 1997).
We may generalize the single-variable case in two ways: we may seek the simulta-
neous solution of several polynomials, and each of these may involve more than one
variable. The formal definition of a polynomial, which includes the single variable
case, is as follows.
• - / e c[4.
• f + geC[x],
. f-g£c[x],
• fg £ C[x], and
• fk £ C[x], for any nonnegative integer k.
6 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Proof. Apply the distributive law to expand each expression into a sum of terms.
•
Note that constants are polynomials too, so we may add, subtract or multiply by
them as well.
Notice that the operation of division is missing. Although with suitable care,
many of the techniques in this book can be extended to algebraic functions, which
allow division, we will for the most part concentrate on polynomials. Mainly, this
just means that before commencing to solve algebraic equations, we must clear
denominators. 2
The facts listed in Proposition 1.2.2 allow us to consider systems of polynomial
functions given in "straight-line" form, convenient for both the analyst and for
evaluation by computer.
which reduces the operation count to 18. This is still far from the most efficient
straight-line form, in which we first evaluate the quantity inside the parentheses
and then cube the result. Compiled into a sequence of elementary operations, the
evaluation proceeds as follows, using two temporary variables a and b and only five
operations
Problems in geometry and kinematics are often formulated using trigonometric func-
tions. Very often these can be converted to polynomials. For example, equations
involving sin 9 and cos# can be treated by replacing these with new indeterminates,
say sg and eg, respectively, and then adding the polynomial relation s^ + Cg = 1.
Once solution values for sg and eg have been found, the value of 9 is easily deter-
mined.3 Sine or cosine of a multiple angle can always be reduced to a polynomial
in sine and cosine of the angle, e.g., sin29 = 2 sin9cos9, and the sine or cosine of
sum and differences of angles can also be expanded into polynomials in the sines
and cosines of the angles.
There are limits, of course: Not all trigonometric expressions can be converted
to polynomials. Examples include x + sinx and sin a; + sinxy.
The reason that trigonometric expressions arising in practice are so often con-
vertible to polynomials is that they usually have to do with angular rotations,
3
A different maneuver is to use a new variable t and the substitutions sin6< = 2i/(l + t 2 ) and
cosd — (1 — t 2 )/(l + t 2 ). This avoids introducing a new equation at the cost of making the
substitution quadratic.
8 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
whose main property is the preservation of length. Length relations are inherently
polynomial, due to the Pythagorean Theorem.
We have already described above the nature of the solution set of a single polynomial
in one variable, which can be represented numerically by a list of approximate
solution points. As summarized in Table 1.1, the situation is more complicated for
multivariate systems of polynomials. Such a system may have solution sets of several
different dimensions: that is, a system could have isolated solution points (dimension
0), curves (dimension 1), surfaces (dimension 2), etc., all simultaneously. Moreover,
just as a univariate polynomial may have repeated roots, a multivariate system may
have solution sets that appear with multiplicity greater than one. Corresponding
to the factorization of a univariate polynomial into linear factors, the solution set of
a multivariate system can be broken down into its irreducible components. Isolated
points are always irreducible, but higher dimensional sets may factor. For example,
the quadratic x2 + y2 factors into two lines {x + iy){x — iy), whereas the quadratic
x2 + y2 — 1 is an irreducible circle. The computation of a numerical representation
of the irreducible decomposition of a multivariate polynomial system is the major
topic in Part III of this book. This requires witness sets, a special numerical data
structure. We postpone any further discussion of this until that point.
If f(x) : CN —> C™ is a system of multivariate polynomials, we use the notations
1
f~ (0) and V(f) interchangeably to mean the solution set of f(x) = 0, i.e.,
sets retain the property that the complex solution sets must contain the real solution
sets. However, the containment can now be looser, because the real solution set
may be of lower dimension than the complex component that contains it. For
example, the complex line x + iy = 0 on C2 only contains one real point (x, y) =
(0,0). Also, an irreducible complex component can contain more than one real
component, as for example, the solution of y2 — x(x2 — 1) = 0 is one complex curve
that has two disconnected real components, one in the range x > 1 and one in
— 1 < x < 0. Regrettably, the extraction of real components from complex ones
is not developed enough for treatment in this book. We refer the reader to (Lu,
Sommese, k Wampler, 2005).
This caveat notwithstanding, the complex solutions often give all the information
that an analyst desires. In fact, although systems can, and often do, have solution
sets at several dimensions, a scientist or engineer may often only care about isolated
solution points. When circumstances dictate this, higher dimensional solutions may
be justifiably labeled "degenerate" or "mathematical figments of the formulation."
Consequently, methods that are guaranteed to find the isolated solutions, without
systematically finding the higher dimensional solution sets, are of significant value,
and we will spend a large portion of this book discussing how to do this efficiently.
Moreover, the numerical treatment of higher dimensional solutions will rest upon
the ability to reformulate the problem so that at each dimension we are seeking a
set of isolated solution points.
The earliest forms of continuation tracked just one root as parameters of a problem
were moved from a solved problem to a new problem. A notable example is the
"bootstrap method" of (Roth, 1962; Freudenstein & Roth, 1963), which happened
to be applied to problems involving polynomials but made no essential use of their
properties. Beginning in the 1970's, an approach to solving multivariate polynomial
systems, called "polynomial continuation," was developed. To just list a few of
the early articles, there are (Drexler, 1977, 1978; Chow, Mallet-Paret, & Yorke,
1979; Garcia & Zangwill, 1979, 1980; Keller, 1981; Li, 1983; Morgan, 1983). A
more detailed history of the first period of the subject may be found in (Morgan,
1987). That period had relatively sparse use of algebraic geometry and centered on
numerically computing all isolated solutions by means of total degree homotopies.
A more recent survey of developments in finding all isolated solutions, taking into
account which monomials appear in the equations, may be found in (Li, 2003).
Methods for finding higher-dimensional solution sets are new; for these, we refer
you to Part III of this book. In (Allgower & Georg, 2003, 1993, 1997), a broader
perspective on continuation, including non-polynomial systems, is available.
By using algebraic geometry and specializing "homotopy continuation" to take
advantage of the properties of polynomials, the algorithms can be designed to be
10 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
theoretically complete and practically very robust. Besides being general, polyno-
mial continuation has the advantage that very little symbolic information needs
to be extracted from a polynomial system to proceed. It often suffices, for exam-
ple, just to know the degree of each polynomial, which is easily obtained without
a full expansion into terms. For small systems, other approaches may be faster,
and we will mention some of these. But these alternatives are quickly overwhelmed
by systems of even moderate size, whereas continuation pushes out the boundary
to include a much larger set of practical applications. For this reason, we highly
recommend continuation and we devote nearly all of this book to that approach.
1.6 Overview
The main text of this book is divided into three main parts:
Part I an introduction to polynomial systems and continuation, along with mate-
rial to familiarize the reader with one-variable polynomials and a chapter
summarizing alternatives to continuation,
Part II a detailed study of continuation methods for finding the isolated solutions
of multivariate polynomials systems, and
Part III in which continuation methods dealing with higher dimensional solution
sets are presented.
As such, Part I is a combination of classical material and warm-ups for a serious
look at the continuation method. Although we give brief looks at some alterna-
tive solution methods, beyond Part I, we concentrate exclusively on polynomial
continuation. Part II is our attempt to put a common perspective on the major
developments in that method from the 1980's and 1990's. Part III brings the reader
to the cutting edge of developments.
The book also contains two substantial appendices. The first, Appendix A, pro-
vides extra material on some of the results we use from algebraic geometry. The
style of the main text is intended to be understood without these extra details, but
some readers will wish to dig deeper. Unfortunately, most of the existing mathemat-
ical texts take a more abstract point of view, necessitated by the mathematicians'
drive to be general by encompassing polynomials over number fields other than the
complexes. By collecting the basics of algebraic geometry over complex numbers,
we hope to make this theory more accessible. Even mathematicians from outside
the specialty of algebraic geometry might find the material useful in developing a
better intuition for the field.
Appendix C is important for the serious student who wishes to work the exercises
in the book. We give a user's guide to HomLab, a collection of Matlab routines for
polynomial continuation. In addition to the basic HomLab distribution, there is a
collection of routines associated with individual examples and exercises. These are
documented in the exercises themselves.
Polynomial Systems 11
1.7 Exercises
As the focus of this book is on numerical work, most of the exercises will involve
the use of a computer and a software package with numerical facilities, such as
Matlab. A free package called SciLab is also available. While most exercises require
a modicum of programming in the way of writing scripts or at least interactive
sessions with the packages, there are a few that require extensive programming.
Unless stated otherwise, statements such as »x=eig(A) refer to Matlab com-
mands, where " » " is the Matlab prompt. Similar commands are available in the
other packages mentioned above.
Exercise 1.1 (Companion Matrices) See Equation 1.1.2 for the definition of a
companion matrix. In the following, poly() is a function that returns the coeffi-
cients of a polynomial given its roots, whereas roots () returns the roots given the
coefficients.
and find its roots using an eigenvalue solver (in Matlab: eig).
(2) Repeat the example using » f = p o l y ( [ l , 1.5, - . 4 + . 6 i , - . 4 - . 6 i , - . 2 ] ) to
form the polynomial and » r o o t s (f) to find its roots. (Note that in Matlab,
roots() works by forming the companion matrix and finding its eigenvalues.)
(3) Wilkinson polynomials. Use >>roots(poly(l:n)) to solve the Wilkinson
polynomial (Wilkinson, 1984) of order n
n?=1(x-i).
Explore how the accuracy behaves as n increases from 1 to 20. Why does it
degrade? (Examine the coefficients of the polynomials.)
(4) Roots of unity. Use roots () to solve xn — 1 = 0 for n = 1 , . . . , 20. Compare
answers to the roots of unity, e27™/™, where i = -J^l.
(5) Repeated roots. Solve x6 - 12a;5 + 56a;4 - 130a;3 + 159a;2 - 98x + 24 using
» r o o t s ( p o l y ( [ l , 1, 1, 2, 3 , 4 ] ) ) . What is the accuracy of the triple
root? What is the centroid (average) of the roots clustered around x = 1?
. Xi X2
P2\X\,X2,X-i,X4,) = =X1Xi-X2X3.
X3 Z 4
(2) How many terms are there in the fully expanded form of pn? Using the sequence
of operations implied by the fully expanded expression, how many arithmetic
operations are required to numerically evaluate pn, given numeric values of
(3) Using expansion by minors, how many operations are required to numerically
evaluate pn?
(4) What method does Matlab use to efficiently evaluate the determinant of an
n x n matrix? How many operations are required?
(1) Given the degrees of / and g, what can you say about the degree of the result
for each of the operations listed in Proposition 1.2.2?
(2) Suppose each step of a straight-line program is given as an operator followed by
a list of the addresses of one or two operands (as appropriate) and an address
for the result. Design an algorithm to compute an upper bound on the degree
of a straight-line polynomial. The complexity of the algorithm should be linear
in the number of steps in the straight-line program.
(3) Implement your algorithm in a language of your choice.
(4) Can you think of a polynomial for which your algorithm computes a degree that
is too high?
Fig. 1.1 A planar two-link robot arm. The triangle with hash marks indicates a grounded link,
meaning that it cannot move. Open circles indicate hinge joints that allow relative rotation of the
adjacent links.
Exercise 1.5 (Solution Sets) Create a system of three polynomials in three vari-
ables such that the solution set includes a surface, a curve, and several isolated
points? (Hint: it is easier to do if the equations are written as products of factors,
some of which appear in more than one equation.)
Chapter 2
Homotopy Continuation
In this chapter we present the basic theory underlying the homotopy continuation
method. This flexible method works well in many situations where there is no other
numerical method. The underlying approach of homotopy continuation is to
(1) put the problem we are solving into a family of problems depending on para-
meters;
(2) solve the problem for some appropriate point in the parameter space; and
(3) track the solutions of the problem as the point representing it in the para-
meter space passes from the point where we have the solutions to the point
representing the original problem that we wish to solve.
This approach is useful on a wide variety of problems, not necessarily polynomial,
which exhibit a continuous dependence of the solutions on the parameters. Of
course, in this generality, many things can go wrong, even to the extent that the
approach completely fails. The major theme of this book is that for polynomial
problems arising in applications, this approach works wonderfully well. An added
advantage of homotopy continuation is that it may easily be parallelized: if the
starting problem has several solutions, the corresponding solution paths may be
tracked on different processors.
In this chapter we start with simple examples and gradually build up to more
general ones. For the first examples, there are other methods, but even for these
examples the continuation method's many robust properties recommend its use.
Let us consider how to find the roots of the polynomial p(z) := zd + aizd~l + • • -+0,4
where d is a positive integer and the ai are constants. In Chapter 1, we saw that
finding the eigenvalues of the companion matrix is an effective approach.
Let's see how continuation might be used to solve this same problem. We know
how to solve zd ~ 1 = 0: the roots are
z* = ek1-n^\/d {OTk=l,,..,d.
15
16 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
When t = 1 we have the system H(z, 1) = zd — 1, with known roots, and when
t = 0, we have the system H(z,0) = p(z), which we want to solve. We propose to
track the solution paths as t goes from 1 to 0.
For example, applying Equation 2.1.1 to the very simple case p(z) = z2 — 5 = 0,
we have
Thus for t 6 [0,1] we have two solutions of H(z, t) = 0, namely z{(t) = \/5 — 4£ and
Z
2^) = —-v/5 — 4t. As £ goes from 1 to 0, the roots go from ±1 to ±\/5, the roots of
the equation z2 - 5 = 0. Pretending that we don't know formulae for the solution
paths, our continuation method consists of numerically tracking the solutions of
H(z, t) = 0 as t goes from 1 to 0. Of course, no one would bother to solve this
trivial case in such a complicated way, but the point is that, with a few tweaks, the
same approach works for any polynomial.
So how can we numerically follow the solution paths? One approach is to observe
that the solution paths z*{t) satisfy the Davidenko differential equation; see, e.g.,
(Davidenko, 1953a, 1953b; Allgower & Georg, 2003). This equation is obtained by
noting that H(z*(t),t) = 0 for all t. Consequently, letting Hz(z,t) and Ht(z,t)
denote the partial derivatives of H(z, t) with respect to z and t respectively, we
have
At this point we could numerically solve the two independent initial value problems,
dz\ 2 , dz2 2 ,N
—7- = with zi{\) = 1, and = with 22(1) = —1-
at z\ dt Z2
This does work, though it opens us up to all the issues and numerical errors facing
the use of the numerical theory of ordinary differential equations.
Homotopy Continuation 17
A more numerically stable approach takes full advantage of the fact that the
solution paths satisfy the equation H(z, t) = 0 for each t. Thus we might use the
following algorithm to track the paths starting at zi{l) = 1 and 2:2(1) = —1-
Simple Path Tracker
Begin
(1) Set up a grid to,..., ijvf with M some large number, h = j ^ , and tj =
(M - j)h;
(2) For each i from 1 to 2, do
(a) set w0 = Zj(l);
(b) for each j from 1 to M — 1 do
i. use one step of Euler's method to define w = w-: h\
ii. find the solution Wj+\ of H(z,tj) = 0 using Newton's method1
with start value w.
End
The reader probably has many worries about this simple algorithm. Some ob-
vious ones are:
Ql. How should one choose Ml;
Q2. Euler's method is pretty terrible;
Q3. Newton's method could fail; and
Q4. If you had a multiple root, e.g., your original system was z2 = 0, Newton's
method does not work so well.
To these we also add the following observation:
Q5. If one wants to solve the equivalent equation p(z) = 5 — z2 = 0, the homotopy,
Equation 2.1.1, becomes H{z,t) = t(z2 - 1) + (1 - t)(5 - z2) = 0. This gives
trouble at t = 5/6 (because H(z, 5/6) = (2/3)z2 has a double root) and at
t = 1/2 (because H(z, 1/2) = 3/2 has no solution).
Some quick responses to these concerns are:
Al. In fact, we do not pick an M but choose the tj by an adaptive procedure.
Of course this raises more questions, e.g., "How do we control the step size?"
Section 2.3 below addresses the main points.
A2. Because we use Newton's method to correct solutions as we move along, Euler's
method gives the same accuracy as using a more sophisticated solver for ordi-
nary differential equations. Higher-order predictors can be used in place of
Euler's method to increase efficiency.
1
The method known as "Newton-Raphson's method" in engineering circles is commonly called
just "Newton's method" in the numerical analysis community. We adopt the briefer appellation.
18 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
A3. The adaptive procedure is designed to keep the application of Newton's method
within its zone of convergence. There are good ways of dealing with special
situations where Newton's method still fails (see next item).
A4. Yes, singular solutions pose particular difficulties, but there are a number of ef-
fective "endgame" procedures to refine such singular solutions. See Chapter 10.
A5. Certain simple procedures guarantee that bad situations such as these happen
with "probability zero." In the next paragraph, we apply a quick fix of wide
applicability to the particular example above.
All of these answers and answers to other questions, e.g., how to construct good
homotopies H(z, t) when we have many equations with special structure, will be
dealt with in this book. For now, let us satisfy ourselves that we can eliminate the
troubles arising in the example in item Q5, above, using the following "quick fix."
This is a special case of the gamma trick first introduced in (page 108 Morgan &
Sommese, 1987a). Let's introduce a random angle 9 e [—vr, TT] and modify the
homotopy of Q5 to
where i = \/^T is the imaginary element. Note that at t = 1, we have the same
start points z = ±1 as before. But now, due to the complex factor e%e, the paths
are well-behaved for all t G [0,1]; the coefficient of z2 does not vanish nor does the
constant. Figure 2.1 shows the solution paths for several values of 6 in (0, TT]. For
values of 6 in [—n, 0), the paths are the reflection through the real line of those
shown in the figure. We see that trouble is brewing for 9 near zero. For 8 = 0.1 the
paths are mildly behaved, but the trend of what will happen for small 9 is apparent:
as 9 —> 0, the paths start at ±1, meet at a double point at the origin, then follow
the positive and negative branches of the imaginary axis to infinity, then re-enter
the scene along the real axis, coming in from infinity to arrive at the final roots
±\/5. Numerically, we can stand a very small value of 0, although the length of
the path becomes longer and longer. Thankfully, if we were to pick 6 at random,
there would be a very small chance of picking 9 close enough to zero to cause any
trouble. This kind of random complexification of a homotopy is a very useful tool
for avoiding singularities. We will justify the gamma trick in a more general context
in Chapter 7.
In applications, it is quite common that only real solutions have physical meaning,
yet we find all solutions, including the complex ones. Isn't this a waste of computing
time? Why bother? One might think, "Surely it must be simpler to just find the
real solutions."
The answer has different aspects. First, there is currently no good general
Homotopy Continuation 19
method for finding all real roots directly. A good choice in low dimensions is to
use exclusion methods, also known as interval or box-bisection methods, to fence in
isolated roots, but in high dimensions, the rate of convergence tends to be slow. We
summarize these methods in more detail in § 6.1. These methods have a place, faring
best in comparison to continuation if dimensions are low, degrees are high (where
there is the possibility of large numbers of complex roots), and if one only desires real
roots in a limited region. These methods often perform poorly if the problem has
any nonisolated solution components, as they bog down computing a large number
of boxes covering the solution curve, surface, etc. Research in methods for real
roots is an active area, so one shouldn't count them out. Meanwhile, continuation
offers the option of finding all roots, real and complex, and then casting out the
complex ones.
The second answer is that there is useful information to be gained from the
whole solution list. One example is a complex root with small imaginary parts,
an "almost real" solution. Such roots suggest that a small perturbation of the
problem might introduce a new real root. Indeed, a mechanical system modeled as
a collection of rigid bodies always has a bit of elasticity, so "almost real" solutions
of the mathematical model might indicate an extra assembly configuration for the
actual device.
An even more compelling reason to find all roots is that it can reveal structural
information about other problems in the same family as the one at hand. The total
20 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
number of nondegenerate isolated roots for a general problem from the family is an
upper bound on the number of such roots for any other problem in the family. The
number of real roots does not respect such a relationship. The complete set of roots
of the general problem can be used as start points for a homotopy to solve other
problems in the family. This sometimes can make a large difference in the amount
of computation used for those subsequent problems. Chapter 7 deals with this in
some detail.
One might hope to use continuation to follow just the real roots from the start
system to the target system. As a general approach, this is doomed to fail, because
the number of real roots is usually not constant. Even if the number of real roots
is the same for the start and target systems, surprising things can happen. Fig-
ure 2.2 shows two examples where real solutions become nonreal while nonreal ones
become real.
Example 2.2.1 Suppose we set up a homotopy between the polynomials
f(y)=y4-2y2~y +l and g(y) = y4 - 2y2 + y + 1,
which both have two real and two nonreal solutions. The linear homotopy
h(y, t) = yA - 2y2 - ty + 1 = 0
has two real roots for all t s [—1,1], except at t ~ 0, where there are two real
double roots. But the two positive real roots for 1 > t > 0 do not connect to the
two negative real roots for — 1 < t < 0.
Example 2.2.2 Consider the homotopy
h{y, t) = y4 - t + 0.25 = 0
which at t = 1 has two real and two imaginary roots. Let t travel around the unit
circle in the complex plane; that is, let t = el6 as 9 goes from 0 to 2n. At the end
of the circuit, we end up with the same polynomial and hence the same roots as at
the start. But the paths starting at the two real roots lead to the imaginary ones,
and vice versa.
As a final word on the utility of finding nonreal solution points, we note that in
Part III of this book, we give algorithms for finding all solutions to a polynomial
system, including positive-dimensional solution components. These algorithms rely
heavily on the ability to reliably find all isolated solutions, both real and complex,
to certain polynomial systems related to the initial problem.
Fig. 2.2 Interchange of real and imaginary roots for two homotopies.
some deficiencies, especially regarding selection of the step size. Much has been
written about path tracking in general (Allgower & Georg, 2003) and path trackers
for polynomial continuation (Morgan, 1987) in particular, so we only sketch the bare
necessities here. Surprisingly, perhaps, the basic algorithm presented below is suffi-
cient for most of our needs without further improvements. The main improvement
over our earlier simple algorithm is the use of an adaptive step size.
For solving algebraic problems, we often place a higher priority on finding all
solutions reliably than on finding one or a few solutions quickly. Therefore, when
faced with a choice between speed and reliability, we choose the more cautious route.
This has the added benefit that the cautious choice is usually simpler as well.
General path trackers must deal with all sorts of difficult issues, for example,
a path that bifurcates into several paths, or a path that reverses direction. For-
tunately, with proper care in forming a homotopy, one can assure that the paths
for solving polynomial systems have none of these troubles: they advance steadily
as the homotopy parameter t advances and never intersect except possibly at the
end target. (More precisely, the probability of a singularity occurring on a path
is zero. This is an issue that will be discussed at greater length when we discuss
homotopies.) The numerical treatment of singularities at the end of the homotopy
is addressed in Chapter 10 on endgames.
The nonsingular path-tracking task may be summarized as follows. Here, as
throughout this book, we arrange the homotopy to begin at t — 1 and end at t = 0.
Given the following:
OH
H(z(t),t) — 0, and the Jacobian matrix -j—(z(t),t) is nonsingular for all t £
(0,1].
Again, the existence of the nonsingular homotopy path, z(t), is one of the primary
topics of Part II; for the moment, we just assume that it exists.
Our goal is:
• to move along the path, from t = 1 to as close as possible to t = 0, in order
to produce a close approximation to the endpoint z(0) = limt_,o z(t) or else, in
the case of a diverging path, to conclude that the limit does not exist.
Section 3.7 outlines an improved treatment for the case of diverging endpoints.
In the context of the introductory example of this chapter, we already touched
on using Davidenko's equation to turn the path-tracking problem into an initial-
value problem for an ordinary differential equation. We also saw that we may use
a predictor/corrector method based on having an explicit homotopy H(z,t). Such
a predictor/corrector method is highly preferred, because the corrector step avoids
the build-up of error which often accumulates in a numerical o.d.e. solver.
Fig. 2.3 Schematic of path tracking, showing prediction (Euler) and correction (Newton) steps.
In practice, the step size would not be so big.
Basic prediction and correction, schematically illustrated in Figure 2.3, are both
accomplished by considering a local model of the homotopy function via its Taylor
series:
H(z+Az, t+At) = H(z,t)+Hz(z,t)Az+Ht(z,t)At+mgher-Order Terms, (2.3.4)
where Hz = dH/dz is the n x n Jacobian matrix and Ht = dH/dt is size n x l . If
we have a point (zi,t\) near the path, that is, H(z\,ti) « 0, one may predict to a
new approximate solution at t\ + At by setting H(z + Az, t\ + At) = 0 and solving
the first-order terms to get
On the other hand, when if(z 1 ,i 1 ) is not as small as one would like, one may hold
t constant by setting At = 0 and solving the equation to get
^z=-H-\zuh)H{zut1) (2.3.6)
These are precisely Euler prediction and Newton correction. The main concern of
a numerical path-tracking algorithm is deciding which of these to do next and how
big a step At to use in the predictor.
A generic path-tracking algorithm proceeds as follows, adapted from (Allgo-
wer & Georg, 1997), (see also (Allgower & Georg, 2003; Morgan, 1987)). In our
homotopies, we may assume that the path parameter, s, is strictly monotonic, that
is, the path has no turning points. This is a consequence of the assumption above
that the Jacobian matrix is nonsingular along the path.
There are many possible choices for the implementation of each step. Some
useful choices are as follows.
Predictor The simplest predictor is just u = Vj_i, but it is much better to use a
linear prediction along the path. Higher-order predictions can also be used, such
as matching a quadratic to two points and a tangent. There are two sensible
linear predictors:
Secant Predictor Use the last two points on the path to linearly extrapolate
to the next. That is,
u = vi-a(dg/dv)-1(dg/ds),
where a is calculated to give the desired step length. This is Euler's method.
Step Length The step length can be measured by any preferred norm of (u —
Vj, s' - Si-i). A simple choice is ||(u — Vj, s' — Si-i)|| := \s' — s,_i|.
Corrector A common corrector strategy is to hold s constant, that is, s" — s',
and compute w by Newton's method, allowing a fixed number of iterations.
The correction is deemed successful if Newton's method converges within a
pre-specified path-tracking tolerance within the allowed number of iterations.
Step Length Adjustment A good strategy is to cut the step length in half on
failure of the corrector and to double it if m successive corrections at the current
step size have been successful. A choice of m in the range two to five works well.
Final Step Near the end of the path-tracking interval, the step length is adjusted
to land exactly on s = 0.
Terminate Eventually, we must either arrive at s = 0 or else \s' — s»_i| must be-
come progressively smaller. It is useful to set a minimum threshold for progress
in s, below which we declare that the path is either diverging or approaching
a singularity. One can also terminate if the magnitude of the solution grows
excessively large.
Refine Newton's method will work well for nonsingular endpoints.
By keeping the number of iterations in the corrector small (no larger than three, con-
servatively just one) and the path-tracking tolerance tight, all intermediate points
are kept close to the exact path, minimizing any chance that a solution will jump
tracks. However, to save computation time, the path-tracking tolerance is generally
looser at the beginning (for say, 1 > s > 0.1), then made tighter near the end
(0.1 > s > 0), and finally set very tight for the final refinement at s = 0.
This path-tracking algorithm incorporates an adaptive step size. One can also
employ adaptivity at a higher level. Specifically, if a path-tracking failure occurs,
the whole path can be recomputed with more conservative choices in the control
parameters. Most useful is adjustment of the tracking tolerance.
2.4 Exercises
This set of exercises begins with the simple o.d.e. and fixed-step path trackers
applied to single variable homotopies and carries through to the application of
HomLab's variable-step path tracker to multivariate systems.
and check that for 7 = o — 1 you get Equation 2.1.2. In HomLab\exercise, there is
a short m-file, davidenko .m that defines this function in a form suitable for solution
by an o.d.e. solver. To use it, you must declare global variables
» g l o b a l gamma sigma
and assign them values. The solution path beginning at z(l) = a may be obtained
with the command » [ t ,z]=ode45(Qdavidenko, [1 0] ,a) ; Use this approach to
do the following.
(1) Verify that for 7 = 1, a = 1, the solution paths starting at z(l) = ±1 terminate
near ±\/5. What accuracy is achieved?
(2) Try the same for 7 = 1, a = —1. (Tip: <Ctrl>C interrupts a nonterminating
process.)
(3) Reproduce Figure 2.1 by setting a = - 1 and using 7 = el6 for 9 =
{0.1,0.3,1,2}.
(4) Compute the path from z(l) = 1 for a = - 1 and 7 = eie for 0 = W~k,
k — 0,2,4,6,8,10. Monitor the number of time steps, the computational time
used, and the final accuracy \z(0) - \/5| versus k.
(5) Using the results of the previous item, examine the history of t returned by
ode45. What values of t require small time steps? (Hint: p l o t ( t , ' . ') may
be insightful.) Save the array of t values for use in the next exercise.
where g(z) and f(z) are any polynomials in one variable. The calling sequence is
>> [z] = crudetrack2(zO,t, gamma, g,f);
where g and / are given as coefficient arrays in the usual Matlab convention. For
g=[l 0 -1], f=[-l 0 5], this is exactly the same as crudetrack with a = — 1.
(1) Compare the speed of crudetrack and crudetrack2. Can you explain the
difference? What does this say about the importance of efficient function eval-
uation?
(2) Use crudetrack2 with g and / as in Example 2.2.1. Choose 7 complex in the
vicinity of 1 to avoid trouble with double roots. Try other values of 7 around
the unit circle. Do the start points always end up at the same endpoints?
(3) Try crudetrack2 to solve a polynomial f(z) of degree 7 having random real
coefficients chosen in the range [—2,2]. Use the start system g(z) = z7 — 1.
Compare the success rate using 7 = 1 versus using 7 = e%e for a random
6 e [0,27r].
Exercise 2.4 (Multivariate Davidenko O.D.E.) The Davidenko differential
equation generalizes for multivariate homotopies.
(1) Derive the Davidenko equation for a homotopy H(z,t) = 0, where H(z,t) :
C " x l ^ C n . (Hint: see Equations (2.3.4) and (2.3.5).)
(2) Use this approach and Matlab's ode45 to solve the system
Projective Spaces
A very useful tool in both algebraic geometry and in the practical implementation of
polynomial continuation, projective spaces are a fundamental construct. They sim-
plify theorems by sewing up infinity, compactifying Euclidean space so that points
at infinity becomes just like ordinary points. This allows us to more conveniently
make accurate statements about the number of roots of polynomial systems. Fur-
thermore, in the numerical context, this has the benefit of allowing "solutions at
infinity" to be computed as easily as finite ones. The concept of solutions at infinity
and why one would wish to compute them will also be covered in this chapter.
To motivate the introduction of projective space, let's begin with the very familiar
quadratic equation in one variable, x,
ax2 + bx + c = 0, (3.1.1)
which has two solutions given by the quadratic formula:
x = ~6±v/*2-4aC. (3.1.2)
2a
Of course, this is not quite the whole story, for if we wish to be precise, we must
add the caveats:
• if 62 — 4ac = 0, then there is just one (double) root, x = —b/(2a);
• if a = 0, b ^ 0, there is just one root, x = —c/b;
• if a = b = 0, c 7^ 0, there is no solution; and
• if a = 6 = c = 0, the solution is all x e C
There are two ways to simplify the situation. One way is to exclude all but one
of the special cases by observing that if a = 0, we don't really have a quadratic
equation, but something of lower degree. Accordingly, we may say that a quadratic
equation with nonzero coefficient on x2 has two roots, possibly equal, given by
27
28 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Equation 3.1.2. This is correct and simple, but it merely sidesteps the exceptions.
Moreover, we will have a completely separate statement for linear equations that
says nothing about the connection between linear and quadratic equations, even
though it is clear that the set of quadratic equations parameterized by the coeffi-
cients (a, b, c) includes linear equations.
There is an associated concern in numerical work: what should we do if a is
very small? Careful analysis of the quadratic formula shows that as a —> 0, one root
approaches — c/b while the other root diverges to infinity. Is there a well-behaved
numerical representation of the large root?
A second way to simplify the situation by formulating the solution of the
quadratic equation in terms of projective space addresses these concerns. We replace
x by the ratio u/v and clear denominators to obtain the homogeneous polynomial
Because of the homogeneity, if (u,v) satisfies Equation 3.1.3, then so does (Xu, Xv)
for any A G C, and as long as v ^ 0, these give the same value of x = u/v. We use the
notation [it, v] j^ [0,0] to denote all pairs (v!, v') ^ (0,0) such that (V, v') = (Au, Xv)
for some A G C. We call the space of all nonzero [it, v] the one-dimensional complex
projective space, denoted P 1 , and we call [u,v] the homogeneous coordinates of P 1 .
Points [u, v] with « / 0 are said to be "finite," whereas the point with v = 0 is said
to be "at infinity." (There is only one point, [u,v] = [1, 0], at infinity in P 1 .)
With this notation, we see that for a = 0, b ^ 0, Equation 3.1.3 factors as
(bu + cv)v = 0, so there are two roots: [u,v] = {[—c,b], [1,0]}. The first gives the
same x = —c/b as we had before, while the second is a root "at infinity." Similarly,
for a = 6 = 0, c / 0, we have cv2 = 0, which implies a double root at infinity
[u,v] = [1,0]. Note that b2 — Aac = 0 for this case. Accordingly, we may eliminate
two of our former caveats to say that in projective space, the homogeneous quadratic
equation, au2 + buv + cv2 = 0, has two roots for general a, b, c, one double root for
b2 — Aac = 0, and all [u, v] G P 1 when a = b = c — 0. This is certainly more succinct
than our first statement in the opening paragraph of this section, while still covering
all the cases. This is because roots at infinity have become just like any other roots.
In homogeneous coordinates, the quadratic formula can be written in many
equivalent ways, since only the ratio of u to v matters. The following formulae
agree everywhere that they are well defined:
For every (a, b, c) ^ (0,0,0), at least one of these formulae is well denned. These are
also useful for accurately computing numerical values of roots in the neighborhood
of infinity.
Projective Spaces 29
Of even greater importance for our larger goal of treating general polynomial
systems is the fact that homogeneous coordinates allow the continuation method to
track solution paths to infinity without any numerical difficulty. We will return to
this in § 3.7 after first discussing projective spaces more thoroughly.
The definition makes sense, because a line through the origin in CJV+1 is a set
of the form
{(Xzo,...,XzN)eCN+1 I AGC}
with not all the Zi zero. The Zi occurring within the brackets [z$,..., z^} are called
homogeneous coordinates, even though they are not coordinates on P^, but rather
coordinates on C^"1"1.
To put the structure of a complex manifold and hence also the structure of a
topological space on FN, we specify coordinate charts. We define the sets
Ul:={[z0,...,zN}GPN | Zi/0}.
On Ui the ratios Zj/zi of the homogeneous coordinates Zi, Zj are well-defined func-
tions that can be used to identify Ut with C ^ . Indeed, we identify CN with UQ by
the map (zo,i, • • • i ZO,N) —• [1,20,11 • • • 1 ZO,N], and for other i, we identify <CN with Ui
b y t h e m a p (-Zj.o,- • • , Z i , i - i , 2 i , » + i > • • • J ^ . J V ) —> [zi,\,- • • , ^ i , i - i , 1 , zi<i+1, zi<N], where
we make the obvious modification for i = N. The transition functions between
Ui \ {zitj = 0} and Uj \ {zjti = 0} for i ^ j are given by z^k = ziykjzi%} for all i and
j . Here we follow the convention z^i = 1 for any i.
One way to think of projective space P^ is as C^ with infinity a slit filled by
P-^"1. In other words, we have the following.
• P° consists of a single point, C° = [1].
• P1 has the chart UQ given by (w) —» [l,w] and the chart U\ given by the map
(z) —> [z, 1], and the transition function z = 1/w. Uo is thus identified with C1
and covers all of P 1 except the point [0,1]. So we have that P 1 = C1 U C° =
Uo U (Ui \ Uo) = {weC}U{z = 0}.
30 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
First, let's clarify a bit of terminology. Historically, C1 has often been called the
"complex plane," because we may identify each point x e C 1 with a point in the real
plane M2 by sending x to (Re(x), Im(x)). To avoid confusion, we prefer the terms
"Argand plane" or "Argand diagram" for this construction. We say that C1 is a
line, having one complex dimension, in analogy to the real line M1, which has one
real dimension. We will always have to keep in mind that the n dimensional complex
Euclidean space Cn is isomorphic to K2n, the 2n dimensional real Euclidean space.
We have seen above that P 1 is the union of the complex line C1 with a single
point, .Hoo := [0,1], which we call the point at infinity. We may visualize the real
part of P 1 , as in Figure 3.1, by plotting real [u, v] as a line through the origin in R2
through the point (u,v). Several such lines are shown, with [1,0] as the horizontal
line and [0,1] as the vertical line.
Figure 3.1 also shows the chart Uo represented by the dashed vertical line (1, to)
as w varies. We see that there is one point of intersection of this line with each line
through the origin except for the line [0,1], which is the point at infinity that we
must add to complete P 1 . It is just as valid, however, to consider P 1 as the union
of U\, shown here as the horizontal line (z, 1) as z varies, with the single point
[1,0]. In fact, any inhomogeneous line au + f3v = 1 cuts each line through the origin
exactly once, except for the line au + (3v = 0. That is, we may view P 1 as the line
au + (3v = 1 union the single point \fl, —a]. Such a line, labeled £/', is shown in
the figure. Each such line is a "Euclidean patch" that covers all of P 1 except one
point. When performing a calculation in P 1 , we are free to choose any patch that is
convenient as long as the answer we seek is not the point missing from that patch.
For example, the line L, which happens to represent the point [cos7r/8,sin7r/8],
intersects each of the patches Uo, U\, and U' in one point, whose coordinates will
be the numerical representation of this projective point on that patch. This will
turn out to be very useful in § 3.7 below.
The Riemann sphere uses stereographic projection to visualize P 1 more fully,
showing all of P 1 , not just its real part. It allows us to draw P 1 , which is a manifold
having two real dimensions, as a surface in real three-dimensional space. This
Projective Spaces 31
Fig. 3.1 Real part of P 1 , showing three patches and the real part of the Riemann sphere.
origin. That is, the points of the sphere and the points of P 1 are in one-to-one
correspondence, and in fact P 1 is topologically equivalent to the real two-sphere S2.
The projective plane, P 2 , is a compactification of the complex plane C2. There are
several natural ways of adding points at infinity to compactify C2, but P 2 stands
out as the "simplest" and arguably the most useful in general. One sees that C2
is equivalent to M4 just by taking real and imaginary parts. In contrast, P2 is
a manifold of four real dimensions for which there is no easy visualization. This
limitation of our three-dimensional minds notwithstanding, by the definitions we
have already given, it is easy to see that P 2 is C2 with a projective line, P 1 , added
at infinity.
Just to clarify, let's restate the general construction for the specific case of P 2 .
It is defined as the set of all triples [zo,zi,z2] of complex numbers except [0,0,0]
subject to the equivalence of [z0, Zi,z2] and [z'o, z[, z'2] if there is a nonzero complex
Projective Spaces 33
We can identify P 2 \ UQ with P 1 via the association [0,21,22] <-> [21,22]. We call
Hoo := P 2 \ UQ the line at infinity.
The importance of P 2 and its homogeneous coordinates [20,21,22] lies in the
fact that if p(zo, z\, z2) is a homogeneous polynomial, i.e., a polynomial where each
monomial term has the same total degree, then the zero set of p(zo, z\, z2) is well
defined on P 2 . For example, if p(zo, z\, z2) = ZQ + 2122, then p(\zo,Xzi,Xz2) =
A 2 p(2 Q ,2i,2 2 ). Moreover, given the zero set C C C 2 = : Uo C P 2 of a polynomial
p{x\, x2) of degree d, the closure C of C in P 2 is the zero set of the homogenized
polynomial
\z0 zoj
which by abuse of notation we typically write p(zo, Z\, z2). When we say a polyno-
mial has degree d, we assume that there is at least one term of the polynomial with
degree d.
Homogeneous coordinates are very well adapted for computations, as we shall
see in the next section. As a simple illustration, let's consider the intersection of
two parallel lines, given as
Two general lines in C 2 intersect in a single point, but these parallel lines either
coincide, if c = d, or they do not meet, if c ^ d. Homogenizing with x = z\/z0,
y — Z2/Z0, one has
p(x,y)=(aZl+buZ2 + C
;°)=0. (3.4.8)
v ;
^y 'yj \az! +bz2+dzoj
Assuming that at least one of a, b is nonzero, we now find the solution [0, —b, a};
that is, the lines meet at infinity. The line at infinity, H^, meets each finite line
in a point and any two lines passing through a given point on H^ are all parallel.
Accordingly, Hoo is the set of slopes for the finite lines, and in the breakup we gave
34 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
above for H^, = C1 U C°, C1 represents the slopes of lines that have a finite slope
while the final point, C° = [0,0,1], represents the slope of a vertical line. We have
the nice result that every pair of noncoincident lines in P2 meet in a single point.
For any pair of lines, parallel or not, we may write the homogenized equations in the
form Az = 0, where A is a 2 x 3 matrix. Then, the solution can be computed using
any of the standard methods from linear algebra, such as Gaussian elimination with
column pivoting, or more robustly, the singular value decomposition.
/(«):= I , (3-5.9)
_fn(zi> • • • >2jV+l).
system and, as we did in this chapter for the quadratic equation, Equation 3.1.1,
and for lines in the plane, Equation 3.4.7, we homogenize the equations in order to
facilitate computing their solutions. Similar to what we did in § 3.1 for equations
on C 1 and in § 3.4 for equations on C 2 , we homogenize a polynomial p(xi,... ,xn)
of degree d on C" as
P ( z 0 , z u ..., z n ) = z ^ ( — , . . . , — ) . (3.5.10)
( C n i + 1 \ 0 ) x ••• x ( C n m + 1 \ 0 ) .
for all
{{Al, • . . , Am), Z\, . . . ,Zmj fc lb X U_ X • • • X H_
We may also say that such a function is m-homogeneous, and the 1-homogeneous
case is understood to be included. We say that a multihomogeneous polynomial /
is compatible with multiprojective space X if the dimensions n i , . . . , nm match.
36 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
{AX + B[i)v = 0,
in which A and B, each an n x n square matrix, are known, and (A, fi,v =
( « i , . . . , D n ) ) e C 2 x Cn are to be found. This is a set of n homogeneous quadratics.
The equations are homogeneous of bidegree (1,1) in (A,/x) and in v separately, so
the solution sets have a natural interpretation as sets in P 1 x P™"1. Much more
could be said about eigenvalue problems, but for now we just show this as an exam-
ple, one common enough that most packages for linear algebra include a solution
method for it. To avoid confusion later, we point out that unlike in this example,
in a more general case of a multihomogeneous polynomial system, the individual
equations can have different multihomogeneous degrees.
Let us return now to the subject of Chapter 2, that is, solving polynomial systems
by tracking the solution paths of a suitably defined homotopy h{x, t) = 0, where
h(x,l) is a starting polynomial system whose solutions are known and h(x,0) is
the target polynomial system we wish to solve. Often, this will be a linear interpo-
lation between a start system g(x) and a target system f(x), both consisting of n
polynomials in n variables, as
g(x)=1\ : , (3.7.12)
W" ' 1/
where 7 is a randomly chosen complex number and di is the degree of the ith
polynomial in f(x). The art of choosing a good homotopy is studied extensively in
Part II of this book, so let's just take it on faith for now that, with probability one,
the homotopy paths starting at the JliLi ^ solutions to g(x) = 0 are nonsingular
for t £ (0,1] and the endpoints of the paths as t —> 0 include all the nonsingular
solution points of f(x) = 0.
The matching of the degrees in the polynomials of g(x) to those of f(x) is
an attempt to match the number of roots of the two systems, so that there are no
wasted paths. This works some of the time, but not always, and when the difference
is too great, we will make use of more sophisticated homotopies. But despite our
best efforts to match the homotopy to the problem at hand, it is very common
for the start system to have more solutions than the target system. In such cases,
the extra solutions must diverge. This causes two problems for the path tracker.
First, a diverging path has infinite arclength, which can cause the path tracker to
spend an inordinate amount of time on a futile quest. Second, as the magnitude of
the solution grows, the polynomials can no longer be accurately evaluated and all
numerical accuracy is lost.
One simple remedy mentioned in § 2.3 is to simply truncate any path whose
solution components grow too large in magnitude. This introduces an uncertainty
about setting the limit, because one never knows if the path may be heading to a
large, but finite, solution, or even if the path might reverse course and converge to
a small magnitude. Indeed, in the example from Q5 in § 2.1, we encountered a path
that approached infinity at t = 1/2 and then returned to the finite realm.
A robust way to eliminate the trouble is to homogenize the polynomials, as in
Equation 3.5.10, and track the paths in P n . Our homotopy becomes
Along any path, at any value of t, we can rescale [ZQ, ..., zn] to keep the magnitudes
of the homogeneous coordinates in range.
In numerical work, we want to restrict the representation of a root to just n
variables at any particular moment. One way is to pick one of the variables and
38 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
A = l/(aozo H anzn).
common. Single variable examples have the advantage that the solution path of
the homotopy can be visualized by plotting it in an Argand diagram. Examples of
multivariate homotopies with diverging paths are given in the exercises that follow.
Example 3.7.1 Choose a start system g{x) = a;3 - 1 and a target system f(x) =
x + 1.5. Form the homotopy
h(x,t)=tg + (l-t)f = O,
On the patch ZQ = 1, the solution paths for z\ are the same as the paths for x
in the inhomogeneous homotopy. In contrast, on the patch z\ = 1, we get the
picture at the right in Figure 3.3. The roots that diverge on the left patch now are
seen to approach the origin. In addition to being represented numerically by finite
numbers, the paths to infinity (i.e., to z0 = 0) also have finite arclength, so one can
successfully track the entire path. Neither patch is suitable for all the roots, as the
real root on the patch z\ = 1 now goes to infinity at t — 2/3. Accordingly, let's
pick a "random" complex patch:
(In practice, we would use a random number generator for the coefficients of this
equation, but for illustrative purposes, we keep the numbers simple here.) In this
patch, the paths of both ZQ and z\ stay finite on all of t e [0,1], as shown in
Figure 3.4. At t = 2/3, zx passes through zero on the path labeled "1," and at
t = 0, ZQ reaches zero for paths labeled "2,3."
3.8 Exercises
Fig. 3.3 Solution paths of Equation 3.7.16 as t goes from 1 to 0 (real), shown on two different
patches.
Fig. 3.4 Solution paths of Equation 3.7.16 as t goes from 1 to 0 (real), using a general projective
transformation.
this into the homotopy Equation 3.7.16 to get a homotopy in z\ alone. What
are the start points? Adapt goodtracklnfty.m to solve this homotopy. How
do you recover the value of [zo>2i] f° r the endpoints?
• A one-variable system
h{x, t) = 7i(x + l)(x - l)(x - 2) + (1 - t)(x + 2) = 0.
Plot the three solution paths of a; in an Argand diagram. Try several values of 7.
How do the paths to infinity respond? Do they seem to be going to the same or
different endpoints? Now, plot the paths of z0 that resulted from the projective
transformation (presuming you used x = Zi/z0 in the homogenization step).
Does this help you explain the paths of x?
• A two-variable system
This chapter explores how one of the fundamental concepts of algebraic geometry,
genericity, is also the foundation of polynomial continuation. In an idealized model
where paths are tracked exactly and where random numbers can be generated to in-
finite precision, our homotopies can be proven to succeed "with probability one." In
the non-ideal world of floating point arithmetic and pseudo-random number genera-
tors, probability one cannot be achieved, but experience shows that high reliability
is obtained when reasonable precautions are taken. Moreover, that reliability can
be raised asymptotically close to one by increasing the precision of the calculations
and taking other steps to bring the actual numerical behavior closer to the ideal.
It is impossible to talk about generic points without introducing a few notions
from algebraic geometry.
We have various types of sets, which it is natural to refer to as algebraic sets.
Affine algebraic sets An affine algebraic set on CN (see § 12.1 for more details)
is a set defined by the vanishing of a finite number, say n, of polynomials
Pi,... ,pn £ C[a:i,... ,XN]. That is, a set X c CN defined by
X = { ( x i , . . . , x N ) € CN \ p i { x 1 , . . . , x N ) = 0 , i=l,...,n} .
Projective algebraic sets Recall from § 3.2 that the set of lines through the ori-
gin in CN+1 is equivalent to the projective space ¥N (the projective plane) and
that the zero set of any homogeneous polynomial f(xo,xi,... , Xjv) is a subset
of PN with homogeneous coordinates [xo,Xi,..., XN], see § 3.5. Accordingly, a
projective algebraic set on P^ (see Chapter 3, § 3.5, and § 12.3 for more details)
is a set defined by the vanishing of a finite set of homogeneous polynomials, say
Pi(xo,xi,... ,xN),... ,pn(x0,xi,... ,xN).
That is, a set X C PN defined by
X = { [ x o , a ; i , . . . , : r j v ] 6 P W | Pi(xo,x1:... ,xN) = 0, i=l,...,n}.
Quasiprojective algebraic sets Sets of the form X\(XDY), where X C ¥N and
Y C VN are both projective algebraic sets, are called quasiprojective algebraic
43
44 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
sets. These sets include both affine algebraic sets and projective algebraic sets
(see § 12.4 for more details).
For this book, quasiprojective algebraic set and algebraic set are synonyms.
In differential geometry and topology, there is the basic notion of a manifold.
This is defined precisely in § A.2.1, but for now we can use the loose definition that
an n-dimensional complex manifold is a space that is locally like C™. Not every
algebraic set is a manifold, e.g., V(xy) is locally like C except at the point (0,0).
A point of a quasiprojective algebraic set with neighborhood like C™ for some n
is called a smooth point or a manifold point of X. Here the word "like" must be
made precise: this will be done in § A.2.1. For now the important point to note is
that the subset of manifold points Xreg of a quasiprojective algebraic set X is dense
and open, and the set of singular points Sing(X) := X \ Xleg is a quasiprojective
subset of X.
The most basic building block of any of the above three types of algebraic sets
is an irreducible algebraic set. We say that a quasiprojective (or affine algebraic
or projective algebraic) set Z is irreducible if ZTeg, the set of manifold points of
Z, is connected. The dimension of an irreducible quasiprojective set Z is defined
to be dimZ reg as a complex manifold, which is half the dimension of Z reg as a
manifold. Note that in all three cases Zreg is quasiprojective, but if Z is projective
(respectively affine) then Zreg is not necessarily projective (respectively affine). In-
deed, if Z is projective and has singularities, then Zreg is noncompact and thus not
projective. Moreover if Z is affine, then Zreg is affine if and only if the singularity
set Zsing of Z contains no manifold point x with the dimension of Zsing at x less
than dimZ — 2. These sorts of algebraic sets, the singular subset of a quasiprojec-
tive set, irreducibility, the natural breakup of an quasiprojective set into irreducible
quasiprojective sets, and dimension are discussed in detail in Chapter 12.
Throughout this book, we often apply the adjective "generic" to various geometric
objects, as in a "generic point" or a "generic line." The precise meaning of the
adjective always depends on the context, which we illustrate here by considering in
detail the meaning of the following statement:
The degree of a homogeneous polynomial p(xo,x\,X2) is the same as the
degree of the homogeneous polynomial obtained by restricting to a generic
line in P 2 .
In the notation of the previous section, this statement without the word generic is
our property P.
Saying a line is generic we are implicitly referring to all the lines on P2 and
assuming they have some sort of algebraic structure. Then, a "generic line" is any
line that is not a special exception to the statement at hand, or said another way,
the statement is true for all lines except those in a proper algebraic subset of the
set of all lines. In the notation of the previous section, we need to show that:
• there is an irreducible algebraic set X, each point of which represents a line in
P2,
• the failure of proposition P is described by a set of algebraic equations, and
• there exists a line for which the proposition holds.
In the next few paragraphs, we show this in some detail.
Typically we can represent objects in different ways. The simplest way of repre-
senting lines on P2 is as the solution set of a linear equation b\Xo + b\Xi + 62^2 = 0.
Lines correspond to three-tuples (bi,b2,bs) 6 C3 with not all three coordinates
0. Since (&o,&i,&2) and {b'^b'^b'^) give the same line if and only if there is a
A e C* := C \ {0} with b't = Xbi for i = 0,1,2, we see that lines in P 2 are pa-
rameterized by points [60, &i, 62] £ P2-
Since the proposition concerns the degree of the restriction of p(xo,x\,X2), it
is more convenient to parameterize the line by its solution points, rather than
representing it by the coefficients of its equation. Suppose that two distinct points
[aio,an,ai 2 ] and [0.20,0,21,0,22] are on the line. Then, the entire line on P2 is given
in parametric form as
[ZQ,ZI] —>• [xo,xi,x2] = [zo,zi] • A,
Genericity and Probability One 47
where
\a 2 o 021 ^22/
i.e., the set of common solutions of the three polynomials 011022 — 012021, 0 ^ 2 3 -
o-iso-2i,ai2a23 — 013022- The set T> is a typical example of an affine algebraic set,
i.e., the set of solutions of a finite set of polynomials on complex Euclidean space
(see § 12.1). As such, it follows from T> ^ C 6 , that T> is "thin" in a precise sense,
e.g., it's complement U := C 6 \Z? is dense and open, and T> is of measure zero in C 6 .
Moreover, T> is of complex dimension at most five, which is equal to real dimension
at most ten. Since U is open dense, we have that generically a point of C 6 is in U.
In practice this means that a six-tuple generated by a random number generator
will lie in U.
But this space is six dimensions, and we have already identified the space of lines
as P 2 . Why are the dimensions different? Notice that given any B G GL(2,C), i.e.,
any invertible 2 x 2 matrix B, then A and B • A give maps with the same line
as image in P 2 . This accounts for the four dimensions. For genericity questions
it suffices to work on U, and indeed, more often than not, we will work on larger
spaces that map onto the true parameter spaces.
Now, the restriction of p(xo,Xi,x2) to a line is just
Let B G C[A] be the set of coefficient polynomials. The only thing that remains is
to check is that not all of the polynomials B are identically zero, that is, V(B) ^ U.
It suffices to check that there is at least one line on which g has degree d. To do this
48 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The concept of generic points leads quite naturally to the notion of "probability-
one" algorithms. Before making a general definition, let's motivate it by considering
the question of whether a polynomial p{z) on C^ is zero or not.
Of course, for any p[z) of reasonable complexity, we could expand it into terms
and check if any of the coefficients is nonzero. In this sense, the question may appear
to be a toy problem, but it has many aspects of serious questions we face about
whether a given polynomial system has some property or another. For example,
given a polynomial f(z) on C^, how can we check whether it is identically zero
on an affine algebraic set X C CN? But even the question posed on CN is not so
trivial as it may seem at first, for p(z) might be defined in straight-line fashion in
a form not so easily expanded into terms; for example, it could be the determinant
of a matrix whose elements are all polynomials.
To settle whether p{z) is zero, we propose choosing a random point of z* G C^,
and checking whether p(z*) = 0 or not. We wish to conclude that if p(z*) = 0, then
p is the zero polynomial and if p(z*) =£ 0, then p is not the zero polynomial. The
important observation is that if p(z) is not the zero polynomial the set V(p) is an
affine algebraic subset of C^ of codimension at least one, and in particular of real
Generidty and Probability One 49
False Positive |p(^*)| < e even though p(z) is not the zero polynomial; and
False Negative |p(z*)| > e even though p(z) is zero.
False negatives are the result of numerical error only, because if p(z) is identically
zero, the random pick of z* cannot land on a mathematical exception. This is not
true for false positives, where by chance we might pick a z* close to a solution of
the equation p(z) = 0 even though p(z) is not identically zero.
The chance of a false positive can be reduced by testing more than one random
point. Suppose that for a given e, there is a false positive rate, neglecting numerical
error, of r. This rate depends only on the set {z £ C | \p(z)\ < e} and the distrib-
ution from which we draw the random test point z*. Suppose we test twice with
independent random test points, and declare p to be zero only if both tests indicate
so. Then, the false positive rate neglecting numerical error declines to r 2 .
Consider a polynomial given as the determinant of a matrix with polynomial
entries, say, p(z) = detM(z), z € C^. Instead of expanding the determinant, the
probabilistic null test is to simply evaluate the elements of M at a random point
z* G C^ and check if M(z*) is a singular matrix. It is well-known that instead
of simply evaluating det M(z*), a safer test for singularity is to use singular value
decomposition. Suppose that M{z) does represent a singular matrix for all z, so
p(z) is the zero polynomial. Typically, neither numerical evaluation of detM(;z*)
nor numerical determination of the smallest singular value of M(z*) will return an
exact value of zero: instead we will get a value which is at best a small multiple
of machine precision. We must make a judgement of how small the result must be
before we declare that M(z) is singular. This gets to the heart of the matter: we
cannot know with certainty using floating point arithmetic that p{z*) = 0, but by
raising the number of digits used in the computation, we can make the uncertainty
in the conclusion arbitrarily small.
In short, under the assumption of exact arithmetic and a random number gen-
50 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
erator of infinite precision, the probabilistic null test will give correct answers with
probability one. In floating point arithmetic, this ideal is not achieved, but the
probability of false answers can be made arbitrarily close to zero by increasing pre-
cision.
For example, fix a positive integer d and positive real numbers M and R and
consider the monic polynomials p(z) of degree d on the region {z 6 C | \z\ < R}
with all coefficients bounded in absolute value by M. Then the probability of
false positives and false negatives in the probabilistic null test goes to zero as the
number of digits increase. This follows by combining the fact, that we can choose e
smaller and smaller in such a way that the absolute error of evaluating p(z) is less
than e, with the fact, that the bound on the area of {z € C | \p(z)\ < e} given in
Lemma 5.3.2.
Analogously on C^, the probability of false positives and false negatives in the
probabilistic null test goes to zero as the number of digits increase. As in the case
of one variable we need to have some limits on our data for this to hold, e.g., it
suffices to fix a positive integer d and positive numbers M, R and restrict to
• z = (zi,...,zN) G CN with max{|zi|,...,|;zjv|} < R;
• those polynomials p(z) of degree d on CN having all coefficients bounded in
absolute value by M and at least one term of the form zf for some i.
From the discussion of the probabilistic null test, one sees that the idea of "generic"
translates directly to randomized algorithms that succeed "with probability one."
While this is exactly true in a mathematical sense, in floating point arithmetic,
probability one is an ideal that is only attained in the limit as the arithmetic is
extended to an infinite number of digits, consuming infinite computer time and
memory! The success of such an approach in practice depends on careful consid-
eration of numerical processes and benefits greatly if the mathematical functions
under consideration are mildly behaved. In this respect, algebraic questions have
properties not generally enjoyed in other mathematical domains. For this reason,
we declare the following equivalence.
Even though we often drop the adjective "algebraic," the distinction is mean-
ingful. Consider a proposition P that holds for irrational real numbers, but fails
for rational ones. It is known that although the rational numbers are dense in the
real line (there is a rational number between any two given real numbers), they
are also countable and hence measure zero. In this sense, a random number drawn
uniformly from the real interval [0,1] has a zero probability of being rational. One
could then imagine a test for the truth of P based on testing it at a random point.
But this becomes utter nonsense in floating point computations, where every num-
ber represented on the computer is rational! We can only draw test points from
the rational numbers, so we cannot test P on any irrational number, let alone a
random one.
We are in a much stronger position when treating algebraic systems, as illus-
trated in the following simple theorem.
Proof. This follows from the definition of generic and the fact that a polynomial
in one variable has a finite number of roots. •
In the probabilistic null test for polynomial p(z), two sources of uncertainty come
into play: the random selection of a test point z* and numerical error in evaluating
p(z*). Intuitively, if p(z*) is far from zero, we feel very secure in concluding that
p(z) is not identically zero. It is only when p(z*) is small that doubts enter in. But
how small is small? That is, if our test is "Is |p(z*)| < e?," how do we pick e? And
can we ever have certainty in our conclusion?
One can attain certainty in many instances. If we can establish bounds on the
round-off errors in the calculations and find a z* such that |p(z*)| is bounded away
from zero, then we know with certainty that p(z) is not identically zero. It would be
onerous to derive bounds for every situation that arises, but fortunately, methods
exist for automating the process. In particular, interval arithmetic can be used for
this purpose. The idea is each number in a sequence of arithmetic operations is
replaced by an interval guaranteed to contain the exact result. To ensure this, each
arithmetic operation rounds down the lower limit and rounds up the upper limit
according to strict rules. In a complex version of this, numbers become rectangular
regions in the complex plane (i.e., a cross product of a real interval and an imaginary
interval). If the region computed for p(z*) does not include 0, then one knows with
certainty that p(z) is not zero.
This eliminates the question of deciding a value for e, by changing the question
to "Does the interval value of p(z*) include zero?" If it does not, we have a certain
52 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
but the theory is basically the same. We do not ever need this generality.
We refer to (Sommese & Wampler, 1996) where generic points were introduced
numerically and some different approaches are contrasted with more detail.
Though our experience with solving systems of polynomials using probabilistic al-
gorithms has been very good, more research needs to be done on quantifying how
secure we are in using probability-one algorithms. In such an endeavor, more quan-
titative measures of the size of numerically bad sets are needed. The remarks in
§ 5.3 discuss some of the numerical issues involved in deciding whether a point
x G CN is a zero of p(z). We know that the model we are using is good for a range
of degrees and dimensions dependent on the number of digits we use. As use of
these algorithms spreads and applications are made well outside of the ranges so
far considered, it will be useful to have more than rules of thumb for the behavior
of this dependence.
4.8 Exercises
randn.(n, m) + l i * randn(n,m).
Exercise 4.1 (Generic Circles) Interpret the statement: two generic circles in
the plane meet in exactly two distinct finite points. Prove it.
complex plane. What affect, if any, do these have on the probability of false
positives?
Exercise 4.3 (Singular Matrices) The expression det(AAT) where A has more
rows than columns is an identically zero polynomial in the elements of A. The
following experiments explore the effectiveness of the probabilistic null test on such
expressions.
(1) Form a singular 2 x 2 matrix M by generating a random 2 x 1 vector a and
setting M = aaT. Let the elements of a be complex random normal. Plot a
histogram of loglO(le-20+abs(det(M))). What is the largest observed value?
How does this relate to false negatives in the probabilistic null test?
(2) Perform a similar experiment for nxn matrices M = AAT, where A is nx(n— 1)
and complex random normal.
(3) Compare these results to those of Exercise 4.2.2. Does there exist an e so that
the null test "|detM| < e?" gives a correct answer in all your tests? Does the
size of the matrix matter? Why?
Exercise 4.4 (Null Tests on Random Polynomials) Experiment with the
probabilistic null test on randomly generated polynomials of degree d, d = 1, 2, 3,4.
Pick d roots r,, i = 1,..., d and a test point x, all complex random normal, and let
p = Yli=i(x ~ ri)- Notice that considering p as a polynomial in x, it is never the
zero polynomial, because it has leading term xd.
(1) For d = 1, show that Prob(\p\ < a) = 1 - e"" 2 / 4 . (Hint: the sum of two
normal distributions is normal, and the sum of two squared unit normals is a
chi-squared distribution.)
(2) Estimate Prob(\p\ < a) for d = 1,2,3,4 by numerical experiment.
(3) Plot the experimental data and overlay the theoretical result for d = 1 for
comparison.
(4) What is the behavior of Prob(\p\ < a) for small a? How does this relate to the
probability of false positives in the probabilistic null test?
Chapter 5
This chapter presents three interrelated but distinctly different perspectives on poly-
nomials in one variable: their algebraic properties, the analytic behavior of their
roots, and their numerical behavior when evaluated in floating point arithmetic.
The algebraic picture is important as a precursor to more general results for multi-
variate systems. Each algebraic result for one variable polynomials may be viewed
as a special case of the more complicated set of possibilities that arise in the mul-
tivariate situation. The analytic and numerical pictures do not generalize quite so
readily, although, as demonstrated in the short discussion of growth estimates, one
may sometimes gain insight to the multivariate case by considering a multivariate
polynomial as a polynomial in one variable with coefficients that are polynomials
in the remaining variables. Let us begin with the algebraic point of view.
where the Xi are distinct complex numbers and di are positive integers satisfying
d = d!-\ + dk.
55
56 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
In this case, the set V(p) defined by {z € C | p(z) = 0} consists of the k points
Xj,... ,Xk, which are the k irreducible components of V{p). This set is the simplest
example of an affine algebraic set, i.e., a closed algebraic subset of complex Euclidean
space (see § 12.1 for a precise definition). The description V(p) = {zi} U • • • U {xk}
is a special case of the irreducible decomposition, see § 12.2. The multiplicity of a
root Xi oi p(z) is the integer di > 0 occurring in the factorization of p(z). It is easy
to check the following theorem, which we state without proof. We use the notation
p(i)(z) to mean the jth derivative of p(z), i.e., p^(z) = j^jp(z).
Considering the common zeros of more than one polynomial leads to no new
sets, since it is an easy consequence of the Fundamental Theorem of Algebra that
V(pi,... ,pn), the common zeros of n polynomials, equals V(p) for the greatest
common divisor of the p^. Another way of approaching this is to take the set of
zeros of one of the Pi and keep only those for which all the remaining pi are zero.
Let p \ (z) — aozdl -\ \- adl and p2 (z) = bozd2 -j \- bd2 denote polynomials of
degrees d\ and d2- The polynomials P\{z) and p2(z) have a root in common if and
only if the Sylvester determinant (defined below in Equation 5.1.1), a polynomial
of degree d\ + di in the coefficients of p\ and p2, is zero. A quick proof of this,
in sufficient generality to be used as a tool to symbolically investigate multivariate
polynomials, may be found in (Walker, 1962). A more extensive development of
resultants may be found in (Cox, Little, & O'Shea, 1997). In the case of polynomials
of one complex variable, the proof in (Walker, 1962) comes down to simple linear
algebra. Since we will have occasion to contrast the numerical methods we use
with purely algebraic methods, we prove the underlying lemma about the Sylvester
determinant in this case.
Lemma 5.1.3 Let p\{z) — aozdl H \- adl and p2(z) = b0zd2-i \-bd2 denote
polynomials of degrees d\ and d2- If there is an x E C such that pi(x) = 0 and
p2(x) = 0, then there exist polynomials f(z),g(z) e C[z] with p2{z) f'(z) = Pi{z)g(z),
deg/(z) < degpi(z), and degg(z) < degp2(z)-
Proof. Since x is a root of both pi(z) and p2(z), we may factor out [z — x) to write
Pi(z) = {z- x)f(z) and p2{z) = {z - x)g(z). Accordingly,
an
resultant Res(pi,p 2 ) — 0, where Res(pi,p 2 ) := det(Syl(pi,p 2 )) d
. 0 ... 0 b0 ...bd2.
x
The matrix in this expression is size (di +^2) (di + d 2 ) and has d2 rows involving
the ai 's and d\ rows involving the bi 's. The columns above and below the dividing
line do not necessarily line up.
Proof. The condition given in Lemma 5.1.3 is the existence of f(z) and g(z) such
that p2(z)f(z) = Pl (z)g{z), where f(z) = fozdl-1 + ... + fdl-1 andg(z) = gozd'-1 +
• • • + 9d2 -1 • This may be written in matrix form as
The reader should write out a few low degree cases for himself or herself. For
example, the special case when d\ = di = 1 is
Res = det[f? 1 l
and R = ao&i ~ &o°i- -R = 0 if and only if the vectors (ao,ai) and (6o>&i) a r e
linearly dependent. This agrees with what we know: if two linear equations in one
variable have a common solution, then one is a multiple of the other.
Remark 5.1.5 Treating the ai and bj as indeterminates, we see (looking ahead
to A.13.1) that R is a bihomogeneous polynomial of bidegree (d2,di).
Theorems 5.1.2 and 5.1.4 may be combined to conclude the following.
Theorem 5.1.6 A polynomial p(z) = a,ozd + • • • + ad, ao ^ 0, has a multiple root
if and only if its discriminant Dis(p) is zero, where D\s(p) := Res(p,dp/dz).
Note that the discriminant condition, Dis(p) = 0, is a polynomial equation on
C[ao, •. •, an], so we are justified in saying that a generic polynomial of degree d has
d distinct roots.
58 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
We now collect some of the classical relations between the coefficients of a polyno-
mial p(z) = aozd + • • • + Od £ C[z] and its roots, i.e., the solutions of p(z) = 0. We
follow Marden's beautiful book (Marden, 1966), which contains many more results
than we present here. This section is marked "optional," because it is not essential
to an understanding of the continuation method. Indeed, it is quite difficult to
find similar relations that apply to systems of multivariate polynomials, our main
subject of concern. We include this material as background, because it at least
gives a hint of what we might expect in the more general situation. Moreover, in
Remark 5.2.4, we show the one variable growth estimates given here give growth es-
timates for general affine algebraic sets. These estimates combined with the Noether
Normalization Theorem A.10.5 and the use of trace functions as in § 15.5.4 may be
developed into a geometric proof of the existence of the irreducible decomposition.
We start by getting numerical bounds on the roots of p(z) in terms of the
coefficients a«. The basic trick here is an observation of Cauchy. For any complex
number a £ C and any real number r > 0, we let
Ar(a) := {z e C | \z - a\ < r]
q{z)-\aQ\zd-Y,\ai\zd~i
2=1
has a unique positive root R, and all the roots of p(z) are contained in the disk
AR(0).
Proof. Without loss of generality we can assume that a,j ^ 0, since otherwise we
could factor a power zl with i > 0 out of the polynomials p(z) and q(z) and have
the condition that p(z) has a nonzero constant term.
Polynomials of One Variable 59
Consider the function h(x) := q(x)/xd on x G (0, oo). Note that the derivative
h'(x) is positive for all x G (0, oo). This shows that h(x) is an increasing function
with at most one x G (0, oo) with h(x) = 0. Since
and
we conclude from the intermediate value theorem that h(x) = 0 has at least one
solution. Thus q(z) has a unique root R on (0, oo), and q(x) > 0 for real x > R.
Now we will assume that there is a root z* oip{z) which satisfies \z*\ > R and
show we get a contradiction. We have p(z*) = 0, which gives
™*i<^(|S|/(")) 1A >
a<r < R< -r^—•
~ ~ " 2 3 - 1
60 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Proof. We can assume that r and R are nonzero, since otherwise the result is trivial.
We have already shown that r < R. The left inequality follows from the observation
that if we denote the roots of p(z) = 0 by z\,..., Zd we have
- = £ **•••**<(*>.
For the right hand inequality, using a as defined in the theorem, note that
for all sufficiently large r, we have that \at(zi,...,z/v_i)| < Cj?*8, where Q is
a positive constant independent of r. Thus Theorem 5.2.2 implies that for all
(zi,... ,ZJV_I) G C^" 1 with \/Ylj=i \zj\2 sufficiently large, we have that any solu-
tion (zi,... ,ZN) of p(z\,... ,ZN) = 0 satisfies
N-l
\zN\<C J2\ZJ\*
It is a numerical fact of life that constants are only known to (and computations
are only carried out with) limited numbers of digits. It is worth spending a little
time thinking through what this means for polynomials of a single variable, i.e., to
consider how closely numerical calculations match the algebraic-geometric picture.
If we were considering polynomials with coefficients in a finite field, it may well
happen that a polynomial is nonzero even though it evaluates to zero at all points of
the field, e.g., z(z — l) over Z2 = {0,1}. One happy consequence of the Fundamental
Theorem of Algebra is that this does not happen over the complex numbers, i.e., a
given polynomial p(z) is only zero at a finite set x\,... ,Xd- But what about the
situation when we use the floating point numbers on the computer?
At first sight there is nothing to worry about. Assuming 15 digits on our com-
puter we have on the order of 1015 distinct numbers and for a polynomial to be zero
at all of them, it would need to be of degree at least 10:5, which is absurdly large
for any application we know of. But there is a snag here. If, for a polynomial to
numerically be zero we mean it is less than some small constant, e.g., 10~15, then
the Fundamental Theorem of Algebra is certainly false. Consider the normalized
Chebychev polynomial of order n, which is given by
n
Tn(z) = ]J(z - cos((i - l/2)7r/n)).
i=l
As Hamming eloquently points out (§28.5 Hamming, 1986), since the normalized
Chebychev polynomial of degree n is a real polynomial that oscillates between ±2 1 ^ n
on the interval [—1,1], the 51st of these,
is < 10~15 on [—1,1]. Thus, although the exact polynomial has just 51 roots in this
interval, the numerical approximation of it in standard double precision arithmetic
is zero to within round-off error on the entire interval.
Indeed, it is worth thinking about what we mean when we say we find a zero
of a polynomial p(z). We mean a floating point number x of some prescribed
number of digits (say 15 for simplicity of discussion) whose distance from one of
the k zeros of p(z) is less than some prescribed number, e.g., 10~15. In light of
Hamming's observation, we might want to say, "all right, if p(x) is very small we
cannot conclude that x is close to a zero, but certainly, if x is close to a zero oip(z),
then p(x) is close to zero." Unfortunately, even this is false. Consider
far from 0, i.e., p(x) = - 2 . Even with 17 digit accuracy, the approximate root is
x = 27.999999999999905 and we still only have p(x) = -0.01.
To go a bit further in this direction let p(z) = zd + a\zd~1 + • • • + a<j with
positive coefficients all of modest sizes, e.g., < 10. Let the degree be d = 15, and
consider the implications of Theorem 5.2.2. It implies that the roots of p(z) are
all within the disk of radius 24.66. Suppose that p(z) is the same as p(z) with the
four lowest degree terms dropped, that is, p(z) = z15 + aiz14 + • • • + anz4. Then,
also by Theorem 5.2.2, the polynomial p(z) — p{z) has all its roots within the disk
of radius 12.83. Then, for \z\ > 24.66, the relative error, \\p(z) - p(z)\\/\\p{z)\\, of
approximating p(z) by p(z) is bounded by
Proof. Assume that max \p(z)\ = c < 1. Then by Rouches theorem (Hille, 1959), it
follows that zd and zd — p{£) have the same number of zeros within the unit disk.
Since these numbers are d and d—1, they cannot be equal and we have shown the
lemma. •
This shows that the normalized Chebychev polynomials of any order, and in
fact, any polynomial with leading coefficient 1, is distinguishable from zero in the
unit disk. Though comforting, the real problem is that T^\{z) and its relatives are
very close to zero on a significant set within the unit disk. We start with a crude
order of magnitude result.
Lemma 5.3.2 Given a polynomial p{z) = zd + a,izd~l + ... + a<j G C[z] and a
positive number e, the area of the set of z e C such that \p(z)\ < e is at most d-ne2ld.
Proof. Let z\,...,z& denote the roots of p(z). Let w denote a point such that
Polynomials of One Variable 63
d
{J\1/d{Zi).
i=l
If not, then we would have that \w — Zi\ > e1^. Thus we get the absurdity that
e > |p(u>)| = \w — Zi\ • • • \w — Zd\ > e. •
If the roots are sufficiently separated, the bound for the area is actually of the
form < Cdirc2/d, where Cd is a universal constant bounded by 2. We see this as
follows.
Proof. It suffices to show that the region \p(z)\ < e is contained in the union of
disks
i=i I Id"1) d
J
, ( d \ x
z - Zj\ + \z - Zi\ > \Zi - Zj\ > — - ^ e*.
\(d-l) d J
64 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
( «J w d - i ,V"'
= e.
The set of z such that |p(z)| < e has at most d connected components. This can
be seen by noting that z —> p(z) is a d-sheeted branched cover.
f e^ 1
Since the roots Zj are in distinct disks < z \z — zt\ < j—f > , we are done.
{ (d-l)-a-J ^
Conjecture 5.3.4 (Zero Region Bound) Letp{z) = zd+aizd~l + .. .+ad G C[z]
be a complex polynomial and let e be a positive number. Then the area of the set of
z G C such that \p(z)\ < e is
< Cdne2'd
where Cj, is a constant only dependent on d and bounded by 2 for all sufficiently
large d.
--£<-a-log10(27r)<-a-l
a
or
If the degree is high, the assumption on the spread of the roots looks like
C{d)e1/d, where C(d) = d/{(d - l)^"1)/"*) slowly approaches 1 from above as d
grows. If the closest pair of roots is separated a distance r, we may turn this
around and say that Theorem 5.3.3 only applies for e < (r/C)d « rd/(de), where
e = 2.718... For a difficult case such as the Chebychev polynomial T51(;z), which
has d = 51 and two roots within r = 3.8 • 10^3, one must have e < 2.7 • 10~126.
This ensures that the sets with \p(z)\ < e are in distinct disks centered on the roots.
Sharper bounds may be possible, but the message is that high degree polynomials
require high precision arithmetic, and any roots close to each other exacerbate the
difficulty.
5.4 Exercises
Exercise 5.1 (Resultants) We call the matrix appearing in (5.1.1) and (5.1.2)
the Sylvester matrix for the resultant.
(1) Write out the Sylvester matrix for the resultant of a general cubic and a general
quadratic.
(2) Let the cubic and quadratic have random coefficients. Use a numerical test of
the rank of the Sylvester matrix (singular value decomposition is best) to show
that it is nonsingular.
(3) Form the Sylvester matrix for
Numerically evaluate the determinant and find the rank of the Sylvester matrix.
Do pi and pi have a common factor? If so, use linear algebra to compute the
polynomials f(z), g(z) as in Lemma 5.1.3.
(4) Repeat the above for
Pi = z3 - 2x2 - x + 2, pi = z3 - -x2 + -x + l.
(5) Use the results of the last two items to form a conjecture about how the rank
of the Sylvester matrix relates to the number of common solutions. Prove it.
Be sure to account for the possibility of multiple roots.
(6) Pick any one of the polynomials in the preceding items and use Theorem 5.1.6
to show that it does not have a repeated root. What does the same test say
about z3 + z2 - z - 1?
(2) Make a contour plot of |TiO(.z)| for z £ C in the unit disk, Ai(O).
(3) Zoom in around z = 1 until the contour lines separate the roots.
(4) Try this for larger n and see how far you can go before the roots near z = 1 can
no longer be separated.
(5) Plot the contour |f n (z)| = 0.001 for n = 5,10,15,20.
(6) How much of a problem is this for the probabilistic null test? Consider degree
and the precision of arithmetic in your answer.
Chapter 6
Other Methods
While the focus of this book is on homotopy methods, this chapter highlights some
of the most useful alternatives: exclusion methods, eliminants, and Grobner bases.
We already indicated in § 1.1 that the eigenvalue approach is one of the most
effective means of solving a polynomial in one variable, but its extension to systems
in more than one variable requires significant symbolic preprocessing. In contrast,
we have seen that homotopy methods for one variable extend rather naturally to
multivariate systems, a matter that we take up in detail in Part 2. Exclusion
methods have this same property: the multivariate algorithm looks almost exactly
like the one-variable method. Numerical applications of eliminants and Grobner
bases work the other way around: they reduce multivariate problems back to just
one variable so that an eigenvalue routine or other method for one variable can be
employed.
As our interest is in numerical methods, some readers may be surprised to
see resultants and Grobner bases mentioned here. These are usually regarded as
symbolic approaches, applicable to systems with rational coefficients and computed
in exact arithmetic. But, in fact, even if we use a symbolic method for most of
the computation, we will generally have to rely on numerics to estimate the zeros
of the system. As a very simple example, consider the equation x2 — 2 = 0, which
has no roots over the rational numbers. To proceed further symbolically, we must
add the symbol \[2 to the number field, whereupon the roots can be expressed as
x = ±y/2. This may be perfectly suitable for some purposes, but a scientist or
engineer will usually want to know that y/2 « 1.41421.... The situation is even
more dicey in general, because according to Galois theory, there is no symbolic
formula for the roots of a polynomial of degree greater than five. In short, for
most practical purposes, it is not a question of whether to proceed symbolically
or numerically; rather, it is a question of how far to proceed symbolically before
turning to numerics. With this in mind, one may craft symbolic approaches that
lead naturally into numeric methods. It is from this viewpoint that we discuss how
eliminants and Grobner basis methods can lead us to eigenvalue formulations for
computing solutions numerically.
There are a host of considerations relevant to choosing a solution method, in-
67
68 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
eluding
• Does it find all solutions? What happens if there are isolated singular solutions?
How about higher-dimensional solutions?
• Does it provide error estimates and/or error bounds?
• Under what conditions is it efficient?
• Is it easy to implement? Are software packages readily available?
Since each method has an extensive literature on its own, full answers to such ques-
tions are beyond the scope of this book. We will not attempt detailed comparisons
here, nor in fact, will we even give in-depth descriptions of practical algorithms. Our
aim is only to introduce these alternatives to give the interested reader a starting
point for further investigation.
Then, |x| = maxi(xi — x j , and S(x) bisects x along the mid-plane of this maximally
wide coordinate direction,1 so k = 2 and p = 1/2. A nice consequence of this choice
is that the subdivisions of a box exactly cover the original box with no overlap
except for sharing a boundary face. This means that a solution point can be in
the interior of only one box, so we get duplicate copies of a solution only in the
rare instance that a bisection face passes exactly through it. A box in C n can be
considered a box in ]R2n having independent coordinates for the real and imaginary
parts of each complex coordinate.
The obvious question at this point is how to construct good exclusion tests.
The most common approach, popular because of its wide generality, is interval
arithmetic. An interval extension of a function / : M.n —> R is a function f : R " —>
R, where R C I 2 is the half-plane of intervals [a, b], a < b, and f(x) G f (x) for any
x
It can be advantageous to bisect along a smaller edge of the box, using derivative information
to inform the decision; see (Kearfott, 1997).
70 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
x G x. That is, the interval extension function evaluated on an interval box gives an
interval that contains all possible values of the function evaluated on points in the
box. Clearly, if there is a solution of f(x) — 0 in box x, that is, if there is a x* G x
such that f(x*) = 0, then f (x) must contain 0. Consequently, if 0 ^ f (x), then the
box can be excluded, or in the notation above, T(f, x) = —1. The interval extension
does not have to be sharp, that is, it may give loose bounds on the actual image
of f(x), x € x, and in practice, this is almost always the case, as sharp bounds are
prohibitively expensive to compute.
An interval extension of a polynomial in straight-line form can be computed
by concatenating interval extensions of each of the basic operations of negation,
addition, subtraction, multiplication, and integer powers (see § 1.2). For these, we
have the sharp bounds
~{ao,ai] C [-ai,-a0]
[a0, ai] + [b0, bi] C [a0 + b0, ax + bi]
[a o ,ai] - [bo,6i] C [a0 - b i , a i - b0]
[a o ,ai]- [bo,bi] C [min(aobo,aob1:aibo,aibi),ma,x(aobo,aob1,aibo,aibi)]
[ao,ai]fc C [(0, if aoai < 0; else min(|ao|, |ai|)fc),max(|ao|, lai|)fe]i k even
[ao,ai]k C [oo,ai]. k o d d
(6.1.1)
When the operations are carried out in floating point, one must be careful to round
the upper limit of the output interval up and the lower limit down to be sure that
it properly contains all possible results. To evaluate a general polynomial function,
one may simply apply these interval operations at every stage of a straight-line
implementation of the function. Sharper bounds can be determined by considering
the special properties of a polynomial, as illustrated by the exponentiation formula
above: in principle we only need the multiplication formula to evaluate x2 as x • x,
but the formula invokes the fact that x2 is always nonnegative.
With only an exclusion test, we have a bisection method for narrowing potential
solution boxes down to size |x| < e. But bisection becomes very expensive as the
dimension n grows, because we may generate as many as 2™ sub-boxes in the course
of bisecting each of the coordinates. The process is greatly expedited if an inclusion
test returns T(f, x) = 1 while |x| is still relatively large. An approach that can
provide this is the interval Newton test. Although the method can be refined in
various ways, the basic idea is to compute a Newton step using interval arithmetic
and test the overlap of the resulting box with the initial box. To be precise, the
interval Newton step is computed as
N(/,x)=x-f'(x)-1/(x),
where x is any point in x (typically the midpoint), f' is an interval extension of the
Jacobian matrix of / , and the inversion is computed by Gaussian elimination using
Other Methods 71
interval arithmetic. If f (x) includes singular matrices, the inversion will fail, and
the test is inconclusive. Otherwise, we have the following facts.
In any case, we can restrict further search for a solution to the box N(/, x) fix, and
the general algorithm given above can easily be refined to take advantage of this.
If the intersection is empty, we may declare the test T(f,x) = — 1.
Once the Newton test confirms that a box contains a unique solution, the box
can be constricted by repeated iterations of the interval Newton step. As in the
usual Newton method, convergence is quadratic, under certain assumptions on dif-
ferentiability and on the tightness of the interval extension, which are satisfied by
polynomial functions evaluated with interval arithmetic.
This brief overview just gives a glimpse of the approach; for more information,
see the books (Alefeld & Herzberger, 1983; Kearfott, 1996; Moore, 1979; Neumaier,
1990). References (Allgower, Georg, & Miranda, 1992; Dian & Kearfott, 2003;
Xu, Zhang, & Wang, 1996) are also useful. Substantial effort has been expended
on methods to sharpen the interval tests or to reduce the computation required
(Georg, 2001, 2003; Kearfott, 1997), and software packages are available, including
IntBis (Kearfott & Novoa, 1990), ALIAS (Merlet, 2001), and IntLab (Rump, 1999).
A major strength of the approach is that the search can be conducted entirely
in the reals and limited to a finite region of space, so if one is only interested in such
solutions, effort is not expended elsewhere. The approach also easily generalizes to
non-polynomial functions, just by including interval extensions of other elementary
functions. (In fact, almost all the literature on the subject is for general nonlinear,
continuous functions.) Importantly, even though we are using floating point arith-
metic, we obtain not just an approximate answer, but also mathematically reliable
bounds and a guarantee that all solutions in the initial box are somewhere in the
final set of solution boxes.
The method has several weaknesses. First, the Newton test is inconclusive in the
neighborhood of singular solutions, even isolated ones, in which case the method
behaves like bisection and converges slowly. Worse, in the presence of a higher-
dimensional solution set that intersects the initial box, the method returns a set
of boxes covering that whole set. The number of such boxes grows exponentially
with the dimension of the solution set, so this sea of boxes can easily founder the
computation. Finally, interval arithmetic does not return sharp results, and with
every arithmetic operation, the looseness may accumulate. For functions with many
operations, the interval extensions may grossly overestimate the true bounds. This
also applies to the linear solving step in the interval Newton test, so that for large
dimensions n, loose bounds inevitably accumulate.
72 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
6.2.1 Resultants
Recall from Theorem 5.1.4 that the condition for two polynomials p\(z) and pz(z)
in one variable to have a common root is the vanishing of their Sylvester resultant
Res(pi,p2)5 a determinant in the coefficients of the two polynomials. Similarly, for
degrees di,..., dn, let Pi(x) be the polynomial i n i = ( i i , . . . , xn-\) composed of all
monomials xa with |a| < di and with coefficient ci%a on monomial xa. This is called
the "universal polynomial system" of degree d\,..., dn. There is a polynomial in the
coefficients Cj)Cn called the resultant, unique up to scale, such that the n polynomials
Pi in n — 1 variables x have a common root if and only if the resultant is zero
(Ch.3,Thm. 2.3 Cox, Little, & O'Shea, 1998).2 We may denote the resultant as
Res(i1,...,dn to indicate its relation to the degrees of the polynomials. An exposition
on how to find the resultant for n > 2 is beyond our scope; see (Canny & Manocha,
1993; Cox et al., 1998; Manocha, 1993) for details. More generally, following the
notation introduced in Equation 1.2.3, for index sets I i ; i = 1,... , n, we suppose
that polynomial pt(x) is of the form Pi(x) = J2aej. Ciaxa. Then, the condition
that the polynomials have a common root is again a resultant polynomial in the
coefficients, called the sparse resultant (Cox et al., 1998; Emiris, 1994, 1995; Gelfand,
Kapranov, & Zelevinsky, 1994), which we may denote as Resi1,...iin.
While Res^j,^ is given in Equation 5.1.1 as the determinant of a matrix having
a single coefficient or zero in each entry, this is not true in the general case. For
universal polynomial systems, the resultant is a ratio of two such determinants, e.g.,
(Ch.3,Thm.4.9 Cox et al., 1998) and (Macaulay, 1902). For nongeneric coefficients,
such as when a system has specific integer coefficients or when a system is sparse,
the determinant in the denominator of such an expression may vanish, so that more
complicated formulae may have to be employed. Some conditions that guarantee
that the resultant has an expression as a single determinant, sometimes referred to
as a resultant of "Sylvester type," are given in (Sturmfels & Zelevinsky, 1994).
Although resultants apply to n polynomials in n — 1 variables, several techniques
exist for applying them to compute solutions to n polynomials in n variables. We
briefly touch on two of them here.
2
Officially, the scale is made unique by adding an extra condition, as in CLO98, but that is not
of interest to us here.
74 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
where J7i is a new index set and CjQ(a:n) are the corresponding coefficient polyno-
mials, these being derived from 2,- after hiding xn and collecting like terms. Then,
a necessary condition that pi (x) = 0,..., pn (x) — 0 have a common solution is
Resyi,...,J-«(ci,a(*n))=0, (6.2.2)
where we mean to indicate that the resultant depends on all the coefficient poly-
nomials that appear in the system of equations. Since this is a polynomial in the
single variable x n , we may solve it numerically via the eigenvalues of the companion
matrix or any other suitable numerical method.
Equation 6.2.2 does not tell us how to find the corresponding values of the
remaining variables. We will not address this in a general way, but will content
ourselves to show how it can be done for systems of two equations in two variables.
We have px (x, y) = ao(y)xdl + ... + adl(y) and p2(x,y) = bo(y)xd2 + ... + bd2(y),
where y is "hidden" in the coefficients. Looking back to the proof of Theorem 5.1.4,
we note that each column in Equation 5.1.2 corresponds to a power of x, that is,
we have the matrix equation
a a
flo i ••• di 0 0 ...
0 a0 ... adl-i adl 0 ... rx(d1+d2-i) -
: ]
0=[fl,-/]- , , , „ n , (6.2.3)
o0 b\ . . . bd2 U U . . . x
0 6 0 • • • bd2-i bd20 ... i
where, to save space, we have written g and / in place of the row vector for their
coefficients. We may rename the matrices appearing in this equation as
[g, -f]S(y)x = 0,
so that the resultant condition is just det S(y) = 0. Key to the proof of Theo-
rem 5.1.4 was that the vanishing of the resultant is necessary for the existence of
left null vectors [g, —/] satisfying [g, —f)S{y) = 0, but this also implies the exis-
tence of right null vectors x satisfying S(y)x — 0. So for each value of y satisfying
det>S(y) = 0, we solve the linear homogeneous system S(y)x — 0 for x, and since
this is determined only up to scale, we recover x as the ratio of the last two entries
in x. This approach assumes the co-rank of S(y) is one at each solution for y,
otherwise x is not uniquely determined. Also, the final entry in the solution for x
must be nonzero for x to be well defined. We cannot go into the details of what to
do when these conditions fail.
Example 6.2.1 Using y as the hidden variable, the resultant formulation for the
Other Methods 75
system
2x2 -xy-y-2 =0
2 2
x - y - 2x + 2y = 0
is
"2 -y -y-2 0 1 |V
.0 1 -2 -y2 +2y\ L 1.
The determinant of the matrix gives the resultant -12 + 16j/ + lly 2 - 14y3 + 3j/4,
whose roots are y = —1,2/3,2,3. Substituting each of these in turn back into
Equation 6.2.4 and solving the homogeneous linear system, one obtains column
vectors whose last two entries are in the ratio x = — 1, 4/3, 2 , - 1 , respectively.
For nongeneric coefficients Cj>a, the hidden resultant formula Equation 6.2.2 can
fail to yield solutions of the system. The problem is that the system may have
a positive-dimensional solution set so that there is a solution x for every value of
xn. This implies that the hidden-variable resultant must be identically zero. The
system may have isolated solution points in addition to the positive dimensional
solution set, but the resultant formula does not find them. An approach for dealing
with this situation can be found in (Canny, 1990).
Example 6.2.2 Consider a system of two quadratics of the form
x2 + (3y + 4)x + {2y2 + 5y + 3) = 0
x2 + 7x + (-y2 + 5y + 6) = 0.
Using y as the hidden variable, the resultant condition is
"1 3y + 4 2y2 + 5y + 3 0
.0 1 7 -2/ 2 + 5y + 6.
A bit of algebra shows that this polynomial is identically zero, even though the
system has a nonsingular root, (x, y) = (—5,1). The trouble is that the system also
has a singular solution set: x + y + 1 = 0.
This failure of the hidden-variable resultant formula on nongeneric systems is
one of the major drawbacks of the approach. Also, the symbolic derivation of
resultant formulae can be an onerous task, even if done using computer algebra.
For example, a result due to B. Sturmfels, reported in (Cox et al., 1998), is that the
resultant for three general quadratics in two variables, Res2,2,2, when fully expanded
as a degree 12 polynomial in the 18 coefficients of the system, has 21,894 terms.
This exaggerates the problem though, for when we apply the method to a system of
76 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
three quadratics in three variables having numerical coefficients, the hidden variable
resultant formula gives at most a degree 8 polynomial in the hidden variable. The
trick, then, is to use resultant theory to set up Sylvester-type matrix formulae and
operate on these, without expanding the associated determinants, see (Manocha,
1994). Such approaches can be very fast, especially for small or sparse systems,
which may outweigh the drawbacks. For large systems, the resultant formulae tend
to be unwieldy, and the method is no longer useful.
6.2.1.2 u-Resultants
Instead of hiding a variable to get n equations in n — 1 variables, one can add an
extra linear equation
A(xn)m = 0, (6.2.6)
tion 6.2.3. When A(xn) is square, the existence of a nontrivial solution requires
f i ( o L i , c t 2 , • • •, a n _ i ) ••• f n ( a i , a 2 , - • • , a n - \ )
In the ith row of this equation, variables X\,..., Xi—i are replaced by u\,..., cti-i.
If for any i we let Xi = on, then row i and row i + 1 will be identical and so the
78 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
determinant is zero. Cancelling out such factors, one obtains the Dixon polynomial
n-l
8(xu. •. ,a;n-i,ai, • • • ,an-i) = N Y[(xi - (*)• (6-2.9)
When this determinant is expanded and like terms collected, it can be put into the
form 6 = maWmx, where m Q is a row vector of monomials in the a» variables,
mx is a column vector of monomials in the variables xt, and W is a function of
the coefficients of f\ , . . . , / „ . It is clear that for a common solution of the original
equations, the first row of the determinant is zero, so 5 must also be zero. Moreover,
this will be true for arbitrary values of the auxiliary variables a$. Consequently,
solutions must satisfy the matrix equation
Wmx = 0. (6.2.10)
Notice that if a; is a solution to T, that is, fi(x) = 0 for i = 1,... ,n, then
h(x) = 0 for any h 6 I{3~). Thus, any subset of polynomials in I(!F) is potentially a
80 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
set that could be rewritten in the form of Equation 6.2.6 and used as an eliminant.
If the requirements on the number of equations and monomials can be met, and if
the consequent eliminant matrix is nonsingular, then a viable eliminant (possibly
including extraneous roots) has been found.
We call this a "heuristic method," because it is not based on an algorithm guar-
anteed to deliver a set of polynomials in the ideal having the necessary properties
to form an eliminant. See Stetter's book (Stetter, 2004) for more on finding such
formulations without resorting to Grobner methods. (Grobner bases are sketched
in the next section.)
A variant of this approach was presented in (Wampler, 2004) and also used in
(Su, Wampler, & McCarthy, 2004). Instead of hiding a variable, a set of equations
from I{J-) is written as a constant matrix A, depending only on the coefficients of the
original equations, times a set of monomials m, as Am = 0. To these, one appends
identity relations that are linear in one variable. For example, if in multidegree
notation, xa and x0 are both in the set of monomials m, with a — (3 = [1,0,..., 0]
then the identity xa — x\x® = 0 is an allowed identity. Such identities can be
appended to the equations from the ideal to form a system
[B+AXiC]m = 0, (6.2.16)
where the lower block is the collection of monomial identities. We are again in
the situation of Equation 6.2.6. An important characteristic of Equation 6.2.16
is that x\ appears linearly in the elimination matrix, so the numerical solution of
the problem falls within the purview of linear algebra and is, in fact, a sparse,
generalized eigenvalue problem.
To illustrate this last approach, let's return again to the example of three
quadratic equations.
Example 6.2.5 (Three quadratics revisited) Consider again three general qua-
dratics as in Equation 6.2.11. We may multiply each of the three original poly-
nomials by each of nine monomials {1, x, y, z, x2, xy, xz, y2, yz} to generate a set of
27 polynomials in the ideal. These polynomials contain 34 monomials, being all the
monomials of degree 4 or less, except for z4. Thus, A in Equation 6.2.16 is a 27 x 34
matrix, but numerical test shows that it is rank only 26. Keeping 26 independent
rows of A, we need 8 identities to produce a square system. These can be formed
using x as the eigenvariable and the 8 monomials {I,x,x2,x3,y, xy, z,xz}, that is,
the 8 identities are
x • {1, x, x2, x3,y, xy, z, xz} = {x, x2, x3, x4, xy, x2y, xz, x2z}.
The net result is a 34 x 34 generalized eigenvalue problem in which x appears only
in the last 8 rows. A numeric test shows that for generic coefficients and a random
value of x, the matrix is nonsingular, so this is indeed an eliminant, and in fact,
generically there are no extraneous roots. The problem is sparse, as there are 10
Other Methods 81
nonzero entries in each of the first 26 rows and just two nonzero entries in each of
the last 8 rows (these being one appearance each of x and -1). Standard linear
algebra can be used to reduce the problem to size 8 before applying an eigenvalue
routine to solve for x.
6.3.1 Definitions
First, some terminology. We already introduced ideals in Definition 6.2.4 above. A
basis of an ideal is denned as follows
Two bases for the same ideal, say T and H with I(T) = I(H), have the same set
of solutions, because each of them is in the ideal of the other. So, for the purpose
of equation solving, we may exchange one basis for another at our convenience.
Beginning with a system T that we wish to solve, one may, of course, append any
auxiliary polynomials in the ideal, these being algebraic combinations of the original
polynomials, without changing the ideal. We may also discard any polynomials that
can be generated from others remaining in the basis. In this way, we can manipulate
the polynomials into helpful forms without changing the solution set.
The key to organizing the process of changing bases is to establish a monomial
ordering that tells which of any two monomials is "greater."
82 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
We need one last property of a Grobner basis. Any polynomial p has a unique
remainder r with respect to a Grobner basis Q such that p = g + r with g £ I{Q)
and no term of r divisible by a leading monomial of Q. In other words, all of the
monomials in r are in the normal set of Q. The remainder can be computed by
initializing r — p, and if any term in r is divisible by a leading monomial of any
9 G G, we just add the appropriate multiple of g to r to cancel that term. Repeat
this until no term in r is so divisible. Let us denote the remainder as renig(p).
An = An. (6.3.18)
Hence, by computing remainders using the Grobner basis, we have derived an eigen-
value problem. For each eigenvector n, we can get a unique solution x, because
either i ; £ n or else it is a leading monomial of Q. In the latter case, we just
evaluate Xi using the Grobner basis element that has it as the leading monomial.
By picking the constants co,...,cn at random, the procedure is made more
robust than if one were to make a special choice, such as A = xn. With the
randomization, distinct solution points give distinct values of A with probability one.
Still, a root with multiplicity greater than one can lead to a repeated eigenvalue, a
84 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
situation that requires extra care, as addressed in (Moller & Stetter, 1995).
We should note that the eigenvector in Equation 6.3.18 is defined only up to
scale. But the correct scale is easily discerned, because one of the members of the
normal set is 1. If the monomial 1 is not in the normal set, then the constant
polynomial p = 1 must be in the Grobner basis, which means that the system has
no solution.
The construction of the eigenvalue problem at Equation 6.3.18 has at its heart the
so-called multiplication map for the polynomial system. Eigenvalue problems can
be formed by devising other algorithms for constructing this map without using
Grobner bases and the Buchberger algorithm (Auzinger & Stetter, 1988; Mourrain,
1998; Stetter, 2004). Methods that take advantage of the sparse structure of the
polynomial system are described in (Emiris, 2003) and extensions of the approach
allowing it to treat systems with higher-dimensional components are in (D'Andrea &
Emiris, 2003). (For background on sparse structures, see § 8.5.) Related methods
and more can be found in the book (Dickenstein & Emiris, preprint).
If the system to be solved has integer or rational coefficients, the elimination and
Grobner methods can be carried forward in exact arithmetic through the stage of
forming an eigenvalue problem. After that point, floating point algorithms must be
employed. In that way, one may be sure that the calculations are rigorous up to the
eigenvalue routine, at which point at least one knows the number of solutions to
expect. For the elimination methods, exact arithmetic guarantees the determination
of the rank of the eliminant matrix, and for Grobner methods, it guarantees that
leading terms are determined properly. In floating point, either of these may require
judgements of whether small numbers should be declared zero, because they may
just be the figments of limited precision.
Unfortunately, exact arithmetic over the integers is often not feasible, because for
a series of computations, the number of digits usually grows ponderously large. To
avoid this, one may do the calculations over a finite field (i.e., over integers modulo
a moderately large prime number). For the purposes of determining the rank of an
eliminant matrix or finding the correct leading term of an s-polynomial, this will
almost certainly work correctly. A polynomial that is found to be nontrivial over
a finite field is certainly nontrivial over the integers, while the opposite direction is
not necessarily true, but holds with a high probability.
In engineering problems, more often than not, the polynomials have real coeffi-
cients. What shall we do then? One option is simply to proceed in floating point
Other Methods 85
and make decisions about zero quantities taking round off into account. Another
way is to use a finite field calculation in parallel with the floating point, using the
exact arithmetic results to determine when quantities are zero and using the float-
ing point results as the value of the nonzero quantities. See (Losch, 1995) for a
discussion of this approach to Grobner bases as applied to kinematics problems.
If all one wants is to count the generic number of solutions for a family of
problems, one can sometimes choose a candidate system having integer coefficients
and proceed in exact arithmetic over a finite field. In general, this can only be
employed when the coefficients for the family are a Euclidean space, for if they are
defined by algebraic conditions, integer examples may not exist. See (Faugere &
Lazard, 1995) for examples of this technique applied to problems in kinematics
and also for a discussion of the validity of such demonstrations. (They are not
mathematical proof, but the authors argue that facts discovered in this way may be
more reliable than proofs constructed by fallible humans.) As we argued at the top
of the chapter, however, if one wants solutions, or more properly speaking, solution
estimates, then floating point must be invoked at some point.
6.6 Discussion
All of the methods mentioned above have their place. Exclusion methods work best
in low dimensions and when only the real solutions in a finite region are desired.
Positive dimensional solution sets outside the region of interest have no effect, but
inside the region, they can be devastating. Isolated roots with multiplicity greater
than one cause extra work, as at best, the box containing such a root must be
whittled down to size e for the method to terminate.
Algorithmic resultant methods and Grobner methods work well on small sys-
tems, and in the case that all solutions are nonsingular and isolated, these can be
very fast as well. However, these methods can get very expensive for high degrees
and many variables. Clever heuristic eliminants can occasionally fill in where algo-
rithmic methods fail. The basic elimination methods described here can break down
completely when positive dimensional solutions exist and multiple roots can also
cause difficulties, although in both cases more advanced techniques can be brought
into play. Also, note that resultants and Grobner methods require the polynomials
to be expanded into sums of monomials; they do not work directly with straight-line
programs for evaluating polynomials. This expansion can significantly increase the
complexity of the calculations (see § 1.2). Work on using straight-line programs in
symbolic computations is relatively new (Krick, 2004).
The continuation methods that we propound have their own set of strengths
and weaknesses. In contrast to exclusion methods, continuation cannot be limited
to a pre-defined region and the only way to find all real solutions is to first find
all complex solutions and then pick out the real ones. For small systems where
elimination is still cheap, continuation can be orders of magnitude more expensive.
86 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
On the other hand, homotopy continuation is very robust in the face of multiple
roots and positive dimensional solutions. Homotopies can easily be set up to find all
isolated solutions and, as we shall see in Part III, can be adapted to catalog all the
positive dimensional solutions as well. Continuation can use straight-line programs
and the cost per solution point tends to grow mildly with the number of variables.
And unlike the other methods, continuation very naturally applies to families of
polynomial systems that are parameterized by real-valued physical quantities, such
as typically arise in engineering and science. We take up this last point in some
detail in the next chapter, as we concentrate on the continuation method exclusively
for the remainder of the book.
6.7 Exercises
(1) Solve Example 6.2.1. How large are the boxes when the interval Newton test
terminates?
(2) Solve Example 6.2.2. Try different termination settings for the size e for which
boxes are put in the list B€. How does the number of boxes in the list depend
on e?
(3) Try to solve the six-revolute serial-link inverse kinematic problem as formulated
in Equations (9.4.30), (9.4.31), and (9.4.32). For parameters, use the Manseur-
Doty example from Exercise 9.5. Beware of long running times. Why?
(1) Repeat Examples 6.2.1 and 6.2.2 by hand or using a symbolic manipulation
program.
(2) Convert Equation 6.2.4 to a generalized eigenvalue problem for y by adding
xy and y to the column of monomials and appending related identities. The
resultant matrix should no longer have any quadratic entries. Solve the problem
numerically with an eigenvalue routine (in Matlab, see qz). How do the results
for this 6 x 6 problem reconcile with the fact that we expect just four roots?
Coefficient-Parameter Homotopy
91
92 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
In the following paragraphs, we will state several versions of the basic coefHcient-
parameter theory. The most concise approach would be to give the most general
version first and then state the rest as corollaries, but for the sake of understanding,
let's work the other way, from simple to general.
Before stating the theorem, we need the concept of a Zariski open set. A Zariski
open set of an algebraic set A is any set derived by removing from A an algebraic
subset of A. If A is smooth and connected, e.g., C m , then except for the trivial case
of the empty set, a Zariski open set of A is dense in A. It is almost all of A with all
the missing pieces equal to a lower-dimensional algebraic set. See § 12.1.1 for more
details.
F(z;q) :Cn x C m - • C n ,
that is, F(z; q) = {fi(z; q),..., fn(z; q)} and each ft(z; q) is polynomial in both z and
q (see Definition 1.2.1). Furthermore, let Af(q) denote the number of nonsingular
solutions as a function of q:
(1) Af(q) is finite, and it is the same, say Af, for almost all q € C m ;
(2) For all q£Cm, M{q) < N;
(3) The subset ofCm where M{q) = J\f is a Zariski open set. That is, the exceptional
set Q* := {q e Cn\Af(q) < AT} is an affine algebraic set contained within an
algebraic set of dimension n — 1;
(4) The homotopy F(z; <f>(t)) = 0 with 4>(t) : [0,1] ->• C m \ Q* has N continuous,
nonsingular solution paths z(t) £ C n ;
Coefficient-Parameter Homotopy 93
(5) As t —> 0 ; the limits of the solution paths of the homotopy F(z; <p(t)) = 0 with
<f>{t) : [0,1] -» Cm and 4>{t) g Q* for t £ (0,1] include all the nonsingular roots
ofF{z;<t>{0))=0-
Items 1 and 3 are implied by Corollary A.14.2; the quantity d\ in that theorem
is the generic number of nonsingular roots which we denote as M here. In the
terminology established in Chapter 4, the property M(q) = Af holds generically
on q £ Cm. Item 2 holds because by Theorem A.14.1, a nonsingular solution at a
parameter point q* G C M must extend to a nonsingular solution in the neighborhood
of q*. Hence, it would be a contradiction for the open neighborhood around an
exceptional parameter point q* to have fewer nonsingular roots than at q*. On the
other hand, in the reverse direction as we approach a point q*, it is possible for a
solution path to become singular or to diverge. Items 4 and 5 follow from similar
reasoning, because if the N nonsingular solution paths coming from qi did not
arrive at the nonsingular solutions of go, there would be more than Af nonsingular
solutions in the neighborhood of go-
lf we have all nonsingular solutions for one set of generic parameters q\, items
4 and 5 allow us to find all nonsingular solutions to any system in the family by
continuation. We simply track all the solution paths along a path, i(t), through
the parameter space that avoids the exceptional set, Q*, for t G (0,1]. All that is
lacking is a method for constructing ^{t) to have the required property. But this is
easy, as the following lemma shows.
is contained in Cm \ A.
Item 5 of Theorem 7.1.1 with Lemma 7.1.2 imply that for a given target set of
parameters go almost any starting set of parameters qi will give a homotopy
whose solution paths include all the nonsingular solutions of .F(z; go) = 0 at their
endpoints as t goes from 1 to 0 on the real line. If somehow we can arrange to solve
F(z;q\) = 0 for a random, complex set of parameters gi, we are ready to solve the
94 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
target system, because the one-real-dimensional open line segment of the homotopy
is contained in C m \ Q* with probability one.
Suppose we have all the nonsingular solutions for only the particular system
F(z; qi) = 0, with M(q\) = A/*. Even though gi is generic, it could happen that we
wish to solve the system for a target go f° r which the homotopy of Equation 7.1.1
fails. This means there is some relation between qi and qo, for example, they might
both be real with a degenerate point on the real line segment joining them. Referring
to the proof of Lemma 7.1.2, we have that q\ is not in the degenerate set, Q*, but
it is in the set of points lying on a real straight line from q0 to a point of Q*.
When gi is generic, in the sense that Af(qi) = Af, but not random complex
independent of qo, can we still formulate an homotopy to find all nonsingular solu-
tions of F(z;qo) = 0 with probability one? Yes: the answer is to follow a different
continuation path, one that is not the real straight-line segment from qi to go and
that includes some extra parameter or parameters that can be chosen generically
to avoid degeneracies. Here are three, among many, possibilities:
• Pick a third random, complex parameter point p £ C m and follow the broken
line homotopy path from qi to p to go- Each of the two real-straight-line
segments will succeed with probability one, and so the concatenation of the
two will succeed also.
• Pick p as in the previous item, and employ a curved-path homotopy such as
Lemma 7.1.3 ( " G a m m a Trick") Fix a point q0 £ Cm, a proper algebraic set
A C Cm, and a point gi £ Cm, q\ £ A . For all 7 £ C except for a finite number of
one-real-dimensional rays from the origin, the one-real-dimensional arc
T := {t G C | (tqi + (1 - £)<7o) e A}
is algebraic, it must either be all of C or a finite number of
points in C. But by assumption, t = 1 is not in T, so T must
be finite. The bilinear transform from r to t maps [0,1] to a
circular arc in the Argand plane for t, leaving t = 0 with angle
equal to the angle of 7. Hence, any two choices of 7 / 0 having
different angles give distinct circular arcs that meet only in the
two points t = 0 and t = 1. This implies that there is only one
such arc through each t € T, and each such arc is produced by
values of 7 on a one-real-dimensional ray from the origin. For
all other values of 7 £ C, the path 4>{t) for T £ (0,1] is contained
in C m \ A.
The final statement follows because each ray from the origin hits the unit circle,
I7I = 1, in a single point. •
There are many alternative ways one could set up paths with the desired gener-
icity, but these simple approaches suffice. We have already seen the usefulness of a
variant of the "gamma trick" in the example of Figure 2.1, and we will return to it
in § 8.3.
Theorem 7.1.1 covers many of the cases that arise in practice, but situations
arise when more refined versions are useful. Some useful variants are:
(1) the variables z live on projective space or on a cross product of projective spaces
instead of on Euclidean space;
(2) we count solutions on a Zariski open subset of the variable space instead of on
the whole space, that is, solutions that satisfy prespecified algebraic conditions
are to be ignored;
(3) the parameters q live on an irreducible algebraic set in Euclidean space or in
projective space or in a cross product of projective spaces.
In the case that the variable space or the parameter space involve a projec-
tive factor, the system of equations must be multihomogeneous in a way that is
compatible with those spaces. Recall from § 3.6 the definition of a multiprojective
space as a product of projective spaces, for which we have the associated concept
of multihomogeneous polynomials.
Then,
(1) N(q, U, Q) is finite and it is the same, say Af(U, Q), for almost all q G Q;
(2) For all qeQ, N{q, U, Q) < Af(U, Q);
(3) The subset of Q where Af(q, U, Q) = M(U, Q) is a Zariski open set; We denote
the exceptional set where Af(q, U, Q) < Af(U, Q) as Q*;
(4) The homotopy F(z; 7(£)) = 0 with 7(4) : [0,1] —> Q\Q* hasM(U, Q) continuous,
nonsingular solution paths z(t) G U;
(5) As t —> 0, the limits of the solution paths of the homotopy F(z;/y(t)) = 0 with
j(t) : [0,1] —» Q and j(t) $ Q* for t G (0,1] include all the nonsingular roots
inU o/F(z; 7 (0))=0.
Note that computations will be done in z G C™1+1 x • • • x C" m + 1 but interpreted
as points in X. For each projective factor, we typically add an inhomogeneous hy-
perplane equation to make the scaling factor unique. This is the projective transfor-
mation technique described in Chapter 3. The constancy of the number of solutions
for the algebraic case still follows from Corollary A. 14.2, which allows an even more
general situation than we use here. We require Q to be irreducible so that it is path
connected which implies the constancy of the root count; if Q were not irreducible,
the root count could be different on different components of Q. Since C n is a Zariski
open subset of IP™, Theorem 7.1.4 clearly includes Theorem 7.1.1, by letting m = 1,
U = Cn, and Q = C m , an irreducible algebraic set.
Notice that in the generalized version of the theorem, we denote the generic
number of nonsingular solutions as N{U,Q), because the count may change if we
consider a different Zariski open set U for the variables or if we restrict the para-
meters to a different algebraic set Q. We will consider both of these possibilities in
the succeeding sections.
We can generalize the theorem further. It sometimes happens that the parame-
ters appear via analytic expressions instead of polynomial ones. That is, the coeffi-
cients of F(z; q) as a polynomial system in z may be trigonometric or other analytic
functions of q. All the same conclusions follow. This is discussed in § A.14.2, so we
omit further discussion here and simply state the analytic version of the theorem
in the following abbreviated form.
(3) The subset of Q where Af(q, U, Q) = N{U, Q) is an analytic Zariski open set.
Elsewhere, without the qualifier analytic, we use the term Zariski open set to
mean the algebraic case. The inclusion of analytic in item 3 of Theorem 7.1.5 implies
a weaker condition than the algebraic case, as is to be expected since the set of
holomorphic functions is larger than the set of polynomial functions. The difference
is illustrated by the algebraic case f(z; q) = z2 — q, which has Af(q) — 2 everywhere
in C except q = 0, as compared to the analytic case of f(z; q) = z2 — sin(g), which
has exceptions for q = kir, k any integer. An algebraic equation can never have
an infinite number of isolated roots, but an analytic one can. Even so, an analytic
Zariski open set of C m is path connected, so continuation will succeed.
A final generalization of the theorem is to consider not just nonsingular roots,
but isolated roots of any multiplicity. Theorem A.14.1 and Corollary A.14.2 are
general enough to justify a restatement of Theorem 7.1.4 for isolated roots. Care
must be taken in the restatement of items 2 and 5, as the limit behavior of multiple
roots as a parameter path approaches the exceptional set is more complicated than
for nonsingular roots. The fact is that in this limit only three things can happen:
a solution path can leave U by landing on X \ U (this may include paths going
to infinity); a solution path can land on a higher-dimensional solution component
and thus cease being an isolated point; and two or more solution paths may merge
to form an isolated solution whose multiplicity is the sum of those for the incom-
ing paths. The number of isolated roots of a given multiplicity can increase, but
only at the expense of a corresponding decrease in the number of roots having a
lower multiplicity.
In numerical work, the paths traced by roots of multiplicity greater than one
are hard to track, but in principle, singular path tracking is possible, see § 15.6. If
we track only nonsingular paths, item (5) tells us that we are assured of obtaining
98 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
all nonsingular roots of the target system, which is what was claimed in the earlier
theorems. To be assured of finding all isolated roots of the target system, we must
track all the generically isolated roots, as indicated when m in item (5) is equal to
fi in item (1). A special case of particular interest is when all the isolated roots of
a generic system in the family are nonsingular, that is, when fj, — 1 in item (1) of
the theorem. Then, we can easily track all the isolated solution paths, and we are
assured that the endpoints of these include all isolated solutions, even those with
multiplicity greater than one.
It is important to note that where Theorems 7.1.1, 7.1.4, and 7.1.5 refer to a
polynomial system F(z; q), it is acceptable for F to be given in straight line form
(see Definition 1.2.4).
The foregoing describes the essence of the polynomial continuation method. To find
nonsingular solutions of the polynomial system p(z) = 0 in a Zariski open set U,
we do the following, a restatement in mathematical terms of the steps enumerated
in the introduction to Part II.
Ab Initio Procedure:
To find all solutions in a Zariski open set U of p(z) = 0.
(1) Embed p(z) : C" —> C" as a member of a parameterized family F(z;q) : C" x
Q —> C n of polynomial systems. Denote by qo £ Q the particular parameter
values that correspond to p(z), that is, F(z,qo) =p(z).
(2) Arrange the embedding such that we have starting parameters q\ £ Q, q\ $ Q*,
for which we either have or can compute all N(U, Q) nonsingular solutions to
F(z;qi) = 0. Call these the "start points."
(3) Construct a continuous path j(t) : C —> Q such that 7(1) = qi, 7(0) = qo, and
7(i) £ Q* for t in the real interval (0,1]. That is, -y(t) for t £ [0,1] connects the
start parameters to the target parameters without intersecting the exceptional
set, except possibly at t — 0.
(4) Follow the Af(U, Q) solution paths of F(z;j{t)) = 0 from t = 1 along the real
axis to the vicinity of t = 0. These paths begin at the start points, and we
propagate them towards t = 0 using a numerical path-tracking algorithm.
(5) In the neighborhood of t = 0, determine which paths are converging to nonsin-
gular solutions. Refine these to numerically approximate the solutions to the
desired accuracy.
(6) Keep only those roots which are in U, that is, eliminate those that lie on the
algebraic set C™ \ U.
Suppose that p(z) is not just a single system of interest, but rather it is a member
of a family systems G(z; q) : X x Q' —> C" of the sort we have been discussing:
Coefficient-Parameter Homotopy 99
p(z) = G(z; q) for some q G Q'. For the sake of item 2 above, we may have had to
cast p(z) in a larger family of systems than G. That is, G(z; q) is F(z; q) restricted
to Q' C Q. This is often necessary when we have no generic member of G for which
we have (or can easily generate) all nonsingular solutions. The larger family F is
chosen in a way that provides such a start system. However, once we have solved an
initial generic member G, we can then solve any other member of G by parameter
continuation along paths in Q'. This can be advantageous because the generic root
count on G can be smaller (perhaps much smaller) than the generic root count for
F. To capture this advantage, one may apply a two-phase procedure as follows.
Two-Phase Procedure:
To find all solutions of G(z; q) — 0 in Zariski open set U for several parameter
points, say qlt..., qk £ Q'.
The first of these is the basic trigonometric identity for sine and cosine, and the
second says that point (acg,asg) is distance c from point (6,0). Our parameters
are the physical parameters q = (a,b,c), and the variables are z = (cg,sg). The
coefficients in /i are constants and, when expanded out, the coefficients in /2 are
quadratic polynomials in (a,b,c).
(acg,asg)
/Q \
(0,0) b (6,0)
change under scaling, and so for (a, b, c) = (5a, 4a, 3a) we have the same solution
points (ce,sg) = (4/5, ±3/5) for any nonzero, complex a.
One may wonder if there are any other solutions. The total degree of the system
is four, and its one-homogenization has two roots at infinity of the form [ZQ, c$, sg] =
[0,1, ±i], so there are only two finite roots. Here, the one-homogenization is obtained
via the substitutions eg = Cg/zo, sg = sg/' ZQ.
Next, we need a homotopy path from our starting system (ai,&i,ci) = to the
target (a, b, c)o- The straight line path
-y(t) = (1 - t)(5a, 4a, 3a) + t(a0, b0, c0) (7.3.6)
will suffice for almost all targets. It is not difficult to check that when a is complex
and the target is real, the values of t for which the path intersects the singularity
conditions is complex, unless the target itself is singular. So we will not encounter
any singularities for t on the real interval (0,1]. For a fixed complex-valued a, there
will exist complex targets for which the homotopy path hits a singularity, but if we
choose a at random, independent of the target, then there is a zero probability of
this failure.
It may be instructive1 to consider what would happen if we were to choose a
homotopy path in the reals, say a = 1. The homotopy is still fine for any real
target that is inside the triangle inequalities, since these bound a convex region of
the real parameter space. However, a line segment connecting a real target outside
the triangle inequality region to a real start system inside must cross the singularity.
These real targets form a set of measure zero in C3, so considering all targets in C3,
the homotopy is still valid with probability one. But in practice, we usually want
to solve systems for real-valued parameters. This illustrates that it is important to
use some sort of complex randomizing factor in the homotopy so that real systems
are solved with probability one.
Qo 3 Qi D Q2 3 • • •
1
Exercise 7.1 at the end of the chapter is a good way to get a feel for the numerical behavior
of this simple homotopy.
102 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
each of which is an irreducible quasiprojective algebraic set, and a Zariski open set
U C C n , the generic nonsingular root counts J\f(U,Qi) obey the inequalities
that are known by other means, but they may also be certain pro forma conditions
that have been noticed to arise often. A common choice of the latter type, especially
when using monomial product homotopies, is the side condition s(z) = HILi z* = 0>
which simply means that we are not interested in solutions that have any coordinate
equal to zero. This is equivalent to saying that we are working on the open set
U — (C*)n, where C* = C \ 0. We will see below the use of side conditions specific
to a particular application, such as two variables being equal: s(z) = z\ — z% = 0.
In essence, even when we work on U = C n , we are invoking a side condition on P":
we are ignoring solutions at infinity.
Side conditions work hand-in-hand with nested parameter homotopies. When-
ever we solve the first generic example in a parameter space, we check the solutions
against the side conditions. Then, when solving other problems in the same pa-
rameter space using the first example as the start system, we drop the solutions
that satisfy the side conditions from the list of start points for the continuation. In
some cases, the degenerate solutions specified by the side conditions vastly outnum-
ber the interesting ones, and the number of paths in the parameter continuation is
dramatically reduced.
Some systems respect symmetry groups and we can reduce the number of paths to
follow accordingly. Suppose we have a mapping 5 : C" —> C n such that for any
q G Q, if F(z;q) = 0, then F(S(z);q) = 0. Furthermore, suppose that if z is a
nonsingular solution, then so is S(z). Often, F(S(z);q) is either exactly F(z;q) or
a rearrangement of the polynomials of F(z;q). For example, under the mapping
i > (y, x), the polynomial system {xy—qi, x2+y2+q2} is invariant, whereas
S : (x, y) —
the polynomials in the system {xy3 — a, x3y — a} interchange. In such cases, it is
clear that nonsingular roots map to nonsingular roots.
Using the notation S2(z) = S{S(z)), S3(z) = S(S(S(z))), etc., suppose k is the
smallest integer such that z = Sk(z). We say that / respects 5 as a symmetry
group of order k. The symmetry implies that for the homotopy F(z;q(t)) = 0, a
solution path zo(t) is matched by the paths Zi(t) = Sl(zo(t)), i = 1,..., k - 1 . So we
only need to compute one of the k paths: we use S to compute the endpoint of the
matching paths without knowing their intermediate points. It can happen that for
the same symmetry mapping, roots appear in symmetry groups of different orders.
For example, for the system {xy3 — 1, x3y - 1} = 0 and the mapping (x, y) i-» (y, x),
the root (x,y) = (1,1) maps to itself, while the root (\/2/2, —\/2/2) is in a group
of order two. This must be taken into account when using symmetry to reduce the
number of solution paths.
When we solve the first generic example in a parameter space, we usually must
resort to an ab initio procedure (§ 7.2), embedding the target system into a larger
family of systems. Since the members of this larger family generally do not respect
104 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the symmetry, we must follow all the paths in the first run. The symmetry can
still be useful as a check on the computation: do all roots appear in the requisite
symmetries? If so, we have some assurance that the numerical process was carried
out successfully. Then, in subsequent runs using Phase 2 of the two-phase parameter
homotopy procedure, the symmetry is used to reduce the number of paths in the
computation.
For the first significant example of this book, we examine an important family of
problems from mechanical engineering: the forward kinematics of Stewart-Gough
platform robots. As we will see shortly, there are a number of different options
for the design of such robots, and these can be organized into nested families of
robot types. These parameterized families are ideal for illustrating the concept of
parameter continuation.
A Stewart-Gough platform, shown schematically in Figure 7.2, is a type of
parallel-link robot, having a stationary base platform upon which a moving platform
is supported by six "legs." Each of these legs has a spherical (ball-and-socket) joint
at each end,2 with a prismatic joint (linearly telescoping) in between. The prismatic
joint is actuated, usually by a ball screw and electric motor, so that the distance
between the center of its adjacent universal and spherical joints can be controlled by
computer. That is, leg i, i — 1,..., 6, connects point Ai of the stationary platform
to point Bi of the moving platform, and we control the lengths Lt = \Bi — Ai\.
By proper coordination of the six leg lengths, the moving plate can be placed in
any position and orientation within a working volume (actually a six-dimensional
workspace, a subset of R3 x 50(3)), whose boundaries are determined by the limits
of travel of the prismatic joints. Collisions between the legs can also limit the range
of motion.
These robots are best known as the mechanism beneath motion platforms for
aircraft flight simulators, but they are applicable to tasks as varied as aiming tele-
scopes or welding automotive bodies. The kinematics of these robots has been the
subject of extensive academic research, which we cannot begin to address here. We
refer the interested reader to (Merlet, 2000; Tsai, 1999) as a starting point.
Although many interesting algebraic problems arise in the study of these mech-
anisms, for the moment, we will consider only the so-called "forward kinematics"
problem, which is as follows:
Given: the geometry of the stationary and moving platforms and the six leg
lengths,
2
One ball joint on each leg can be replaced by a universal joint to eliminate rotation of the leg
around its axis, but this does not alter the motion of the moving platform, our present object of
study.
Coefficient-Parameter Homotopy 105
Find: the position and orientation of the moving platform with respect to the
stationary one.
As usual, in what follows, we embed the real problem into complex space, so even
though only real values of the leg lengths are physically meaningful, we consider
complex Li e C. Similarly, we treat the robot workspace as C3 x 50(3, C), where
SO(3,C) = {AG C 3x3 |,4 T yl = I,detA = 1}.
the position of point Bi in the reference frame of the stationary platform is written
(ge' + eble')/(eel),
where multiplication follows the rules for quaternions and g' = {go, —g\, —gi, ~gz)
and e' = (eo, —&\, ~&2-, —63) are quaternion conjugates of g and e. Clearly, we must
exclude the points that satisfy
The Study quadric is exactly the condition that the translation ge' be a pure
vector, and sincefo,is a pure vector, so is e^e''. These facts and the fact that the
length of pure vector v, considered as a quaternion, is just vv', allow us to write
the basic kinematic equations for the Stewart-Gough platform as
Note that this system of equations immediately solves the "inverse" kinematic prob-
lem: given the position and orientation of the moving platform as [e,g], we can
calculate the leg lengths Lt. We are looking to solve the opposite problem: given
Lu find [e,g\.
To proceed, we expand Equation 7.7.9 and multiply through by ee' to get, for
i = l,...,6
subject to the side condition s(e,g) ^ 0 from Equation 7.7.8. System F(e,g) = 0 is
a set of seven homogeneous quadratic equations in [e,g] G P 7 .
1993), vector bundles and Chern classes from algebraic geometry (Ronga & Vust,
1995), computer algebra using Grobner bases (Lazard, 1993), and computation of
a resultant using computer algebra (Mourrain, 1993). See also (Mourrain, 1996).
The formulation of the problem we use here follows (Wampler, 1996a), wherein a
simple proof of 40 roots is given. The same formulation was derived independently
by Husty (Husty, 1996), who gave a procedure that uses computer algebra to derive
a degree-40 equation in one variable. This is but a small indication of the level of
interest this problem has attracted.
If we could solve the forward kinematics problem for just one general member
of C42, we could solve any other member by parameter continuation. The question
of how to get that first solution set is addressed in the next chapter. For the mo-
ment, let us just say that the trick is to cast the Stewart-Gough forward kinematics
problems as members of a much larger family, the family of all systems of seven
quadrics on [e,g] G P 7 . General members of this family have 27 = 128 isolated
solution points, so we can find all isolated solutions for an initial Stewart-Gough
problem by tracking 128 solution paths for a homotopy defined in this larger space.
Doing so reveals that a generic Stewart-Gough platform, p0 £ C42, (chosen using a
random number generator) has 40 nonsingular solutions and 88 singular ones. The
singular solutions are on the degenerate set Equation 7.7.8, so we can safely ignore
them as they are not of physical significance. In short, we have 7V(P7,C42) = 40
and only these roots are of interest.
Having the 40 isolated solutions x0 € -Fp^1(0) to a generic Stewart-Gough plat-
form, po £ C42, we are ready to apply parameter continuation within the family.
By Lemma 7.1.2, a straight line path from po to almost any other pi G C42 stays
generic and so by Theorem 7.1.4, the 40 solution paths starting at Xo for t = 1 of
the homotopy
will lead to a set of endpoints that contains all isolated solutions of F((e, g);pi) = 0.
(We invoke the generalized Theorem 7.1.4 instead of the basic version, Theo-
rem 7.1.1, because we are working on projective space P7.) There exist points
p* for which the line segment between p0 and p*, parameterized by t G [1,0) in the
homotopy above, strikes a singular point. Such points are a set of measure zero in
C42, but they do exist. If one happens to encounter such a problem, where some
homotopy paths founder before t approaches zero, all that is necessary is to first
continue from po to another random point in C42 before proceeding to the final tar-
get. Or, to accomplish the same thing, we may choose a random 7 £ C and follow
the homotopy HSG{{^,9)>t(s)) = 0 along a nonlinear path t(s) = s + -ys(l — s) on
the real segment s £ [1,0]. In practice, unless one is solving a large number of such
problems, the exceptions to the linear homotopy path will almost certainly not be
encountered, so Equation 7.7.12 is sufficient.
108 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
VvyY - YvM
4-4a 4-4b
are both 4-4 patterns, but they are topologically distinct. We will only address
a few of the possibilities in the next few paragraphs. A more complete catalog
of coincident-joint geometries and their root counts can be found in (Faugere &
Lazard, 1995).
Consider first the 4-4 connection pattern illustrated on the left above, which we
label 4-4a. It is given as a quasiprojective algebraic subset of C42 by the equations
{ai = a2,a5 = ae,b2 = 63,^4 = b5}. We may solve such an example by making it
the target system of either a total degree homotopy or the general Stewart-Gough
homotopy HSG, because it is a member of both. Usually, it is more efficient to use
the 40-path option than the 128 paths of the total degree homotopy. But either
way, one finds only 16 solutions, with the rest of the paths having endpoints on the
degenerate condition, Equation 7.7.8. With 16 solutions for a generic example in
family 4-4a in hand, we can solve any other problem in that subfamily using HSG
and only 16 paths.
This is just the tip of the iceberg in terms of the possible subfamilies of the
Stewart-Gough platform. Figure 7.3 shows a family tree of six sub-families, with
arrows indicating inclusions (lower families in the figure are sub-families of higher
ones). At the top, "quad7" is the family of all systems of 7 quadrics, which contains
Coefficient-Parameter Homotopy 109
all of the Stewart-Gough platform systems. Table 7.1 lists these same families:
each is given a name, such as 4-4a, and the pattern of coincident joints is indicated
graphically. Ignore for the moment the families whose names end in "P;" these are
discussed in the next subsection. The number of nonsingular roots is indicated as
M. This will be the number of homotopy paths for a parameter homotopy starting
from a generic point in the family and ending at any other point in the family,
including any point in a family that is a subset of that family. For each family,
the dots in the table in its row indicate which families it belongs to. For example,
the first column is the family of all systems of 7 quadrics, which contains all of the
other families, so there is a dot in every cell of the first column. We can solve any
Stewart-Gough platform by a 128-path homotopy through the parameter space of
7 quadrics or by a 40-path homotopy through the space of general 6-6 platforms.
Of course, if the target system is a member of some other subfamily, it is more
efficient to work within that family after a first generic member of the family has
been solved by continuation in a family above it. This is why, for example, we need
the seven-quadric system to get the process started.
4 4
q 6 6 4 4 4 4
d 6 6 6 4 4 a 4 b 3
Pattern Name Af 7 6 P 4 P a P b P 3
N/A quadf~ 128 .
66 40
nnn
111111
"
6-6P 20 + 20
•» •• •
64 32
IA A ! "
1A/M 6-4P 16 + 16
4 4a
\A A / ~ is • •
Y W V 4-4aP 8+ 8 . « » . « « »
\AA T ^ 24 r~; ; i
V V \1 4-4bP 12+12 « •
/W 3-3 8+8 ..»»
We will have reason to study such a case later, in Part III.
ward kinematics of Stewart-Gough platforms, N(F7, C42) = 40, so any problem can
be solved using a 40-path homotopy. We have identified a number of sub-families
that have a reduced number of nonsingular solutions, and a homotopy that stays
within such a parameter subspace solves other members of the sub-family using
the reduced number of solution paths. Sub-families with planar platforms admit
a two-way symmetry which can be used to reduce the number of solution paths
by half.
We see that parameter continuation can be an effective way to explore such
nested families and discover the generic number of nonsingular roots for each. In
the exercises in the next section, we encourage the reader to experience this directly,
by running Matlab routines supplied for this purpose.
It should be mentioned that there are many other approaches to such a study.
In addition to studies of the general 6-6 case already mentioned (Husty, 1996; Mour-
rain, 1996; Raghavan, 1993; Ronga & Vust, 1995; Wampler, 1996a), for several of
the subfamilies, kinematicians have found elimination procedures reducing the prob-
lem to a single polynomial (Chen & Song, 1994; Nanua, Waldron, & Murthy, 1991;
Sreenivasan, Waldron, & Nanua, 1994; Zhang & Song, 1994) or have applied their
own variants of continuation (Sreenivasan Sz Nanua, 1992; Dhingra, Kohli, & Xu,
1992). An extensive study of coincident-joint sub-families using Grobner bases can
be found in (Faugere & Lazard, 1995).
Among those who have some passing knowledge of developments in polynomial con-
tinuation, there has sometimes been confusion between parameter homotopy and a
similar approach called the "cheater's homotopy" by its inventors (Li, Sauer, &
Yorke, 1989). Appearing in print before the article establishing "coefficient-
parameter homotopy" (Morgan & Sommese, 1989), the cheater's homotopy pre-
saged much of the flavor of the full parameter theory. Consequently, the cheater's
homotopy holds an important place in the development of the subject, even though
it was soon eclipsed by the more general parameter homotopy theory.
Rather than working in the natural parameter space Q associated to a system
f(z; q) = 0, the cheater's homotopy expands the parameter space by generic con-
stants b E C". The method starts by solving the initial system f(z; q{) + b = 0 for
generic qi E Q and b € C". Then, the finite, nonsingular solutions of this system
are used as start points in a homotopy to find all the finite, nonsingular solutions
to some other example in the family, say f(z;qo) = 0, qo € Q. This is done by
following the solution paths from t = 1 to t = 0 in the homotopy f(z; q(t))+tb = 0,
where q(t) G Q is a continuous path in Q with q(l) = q\ and q(0) = qo.
We can see immediately from the parameter homotopy theory that this approach
works: we have a generic start system (qi, b) in an expanded parameter space Q X C"
and the target system is given by (qo,O) 6 Q x C " . However, the addition of
112 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the generic constants to each equation often destroys crucial structure, causing an
increase in the number of paths to track, often substantially. A simple example
that shows a big difference is
For general q, this has one nonsingular solution (x,y) = (q,q), so a parameter
homotopy will have just one path to track. But the start system for the cheater's
homotopy
f(^MM={x4;2XXb:b2}=0. (7.8.14)
has six nonsingular solutions. Computing solutions of Equation 7.8.13 for several
different values of q by the cheater's homotopy requires six paths each time. The
added constants b\ and 62 destroy all the structure of the original system. This kind
of difference arises in meaningful problems as well; for the nine-point path synthesis
problem discussed in § 9.6.7, a parameter homotopy requires only 1442 solution
paths, whereas the cheater's homotopy would require at least 90,000 continuation
paths (see (Wampler, Morgan, & Sommese, 1992, 1997)). The difference is due to
the presence of positive dimensional solution components. Parameter homotopy
preserves these components and so the associated paths can be safely ignored. But
the cheater's homotopy perturbs these components, replacing them with thousands
of nonsingular paths that must be tracked.
The same property that makes the cheater's homotopy undesirable in the gen-
eral situation can make it the method of choice in certain specialized situations: the
addition of the random constants makes all finite roots nonsingular. For example,
Equation 7.8.13 has a quintuple root at the origin, (x,y) = (0,0). Adding the con-
stants as in Equation 7.8.14 perturbs this into five distinct roots. If we wish to have
the origin appear as the endpoint of nonsingular homotopy paths, the cheater's ho-
motopy will accomplish this. Usually though, our aims are in the opposite direction:
we would like to avoid computing degenerate solutions whenever possible.
7.9 Exercises
The following exercises are intended to help the reader understand the principles
of parameter continuation and also to experience the numerical behavior of the
continuation method. They assume that the user has access to Matlab, and that
the package HOMLAB, available on the authors' websites, has been installed on the
Matlab search path. A users guide to HOMLAB appears in Appendix C.
Demonstration codes are provided for most of the exercises, so they can be run
with minimal knowledge of Matlab commands. A few exercises require the user to
Coefficient-Parameter Homotopy 113
write or modify an m-file. Even those with minimal prior experience with Matlab
should be able to handle these after a little experimentation.
A few words about HOMLAB. The main output of the demonstration programs
is always stored in two arrays: xsoln and stats. Each column of xsoln contains a
solution of the system in homogeneous coordinates, and column i of s t a t s compiles
some statistics on the numerics of the ith solution.
HOMLAB treats all problems as formulated in a multiprojective space to take
advantage of the ability of the projective transformation to handle paths leading
to solutions at infinity. For the Stewart-Gough platform problems, this is natural,
since we have formulated them on IP7. The code requires that problems naturally
formulated in C n , such as the initial triangle example, be homogenized for solution in
P". Typically, the homogeneous coordinate that is added in this process is appended
as the last row in xsoln. (See the user's guide for information on the full range of
options.) Function y=dehomog(xsoln,epsO) de-homogenizes solutions by dividing
through by the homogeneous coordinate for any solution for which the homogeneous
coordinate is nonzero as judged by the test abs (xsoln(n+1, :) )>epsO. Part of the
learning process of the exercises will be to see how to set the tolerances such as
epsO.
The second output, stats, compiles some statistics for the run. Each column
of stats corresponds to the matching column in xsoln. Full information is given
in the user's guide. For the exercises to follow, we are mainly concerned with rows
2, 3, and 5, having the following meanings:
Exercise 7.1 (Triangle) This exercise experiments with file triangle.m, which
solves Ex. 7.3 using the parameter homotopy path given in Equation 7.3.6. It
uses a path tracker without an endgame to handle singular roots so that one can
see what happens in such cases. The routine allows the option of accepting a
randomly-generated, complex value for the path constant a in Equation 7.3.6. Try
the following experiments:
(1) Solve several triangles of your own choice, accepting the option to use a random,
complex value for a. Does the routine reliably return accurate solutions?
(2) Try again, but choose a = 1. Can you find examples for which the routine fails?
Succeeds? Can you determine a condition on (a, b, c) that predicts success versus
failure?
114 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(3) Now choose a = I + li. Can you find a (a,b, c) for which the algorithm now
fails? What happens if you add a small random perturbation to the values?
(4) Enter an (a, b, c) that is on the boundary of the triangle inequality, for example,
(2,1,1). Let the routine pick a random value for a. What happens? How about
for {a,b,c) = (2,l,l+le-8)?
coordinates in P 7 , so you will need to devise a scheme for judging that two such
points are equal. How closely do the points match?
(4) Run a real case and check that any complex solutions appear in conjugate pairs.
Change the parameters and see if the number of real roots changes.
(5) Solve a problem with real parameters, p £ IR42. Then, use 3-D graphics com-
mands to draw simple (stick-figure) models of the Stewart-Gough platform in
all its real poses.
Exercise 7.5 (Secant Homotopy) Let f(x;p) : C" x Q —> C" be a system
of parameterized polynomials. Then the secant system derived from f(x;p) is
g(x;\,n,Pi,P2) = A/(x;pi) + fj,f(x;p2).
(1) What is the parameter space for g(x;\,n,pi,p2) = 0? (Note, we may consider
[A,/i] G P 1 . Why?) Denote the parameter space as Q' in the following items.
(2) What is the relationship between the nonsingular root count for g, J\fg(U,Q'),
and the one for / , Nf(U,Q), where U is any Zariski open subset4 of C n ?
(3) Suppose we know all A//(U,Q) nonsingular roots of f(x;pi) = 0 for some general
pi £ Q. We would like to use these as start points for a secant homotopy
(1) In (Wampler, 1996a), it is shown that the root count of 40 for general 6-6 plat-
forms follows from the fact that Bi is antisymmetric when we re-write Equa-
4
see § 12.1.1 for definition
116 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Polynomial Structures
117
118 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Fig. 8.1 Classes of Product Structures. Below line A, start systems can be solved using only
routines for solving linear systems of equations. Above line B, special methods must be designed
case-by-case.
Figure 8.1 shows a hierarchy of classes of special structures that are useful in
constructing homotopies. Each structure in the diagram is a member of the class
above it; for example, a total degree structure is a particular kind of multihomo-
geneous structure. (In particular, as we will shortly see, it is a one-homogeneous
structure.) As we ascend the hierarchy, each class of structures presents more and
more possibilities for matching a particular target system that we wish to solve. As
indicated on the right of the diagram, this means that we can select a more special
structure, usually with the aim of reducing the number of solution paths to track
in the homotopy. The trade-off we face in this ascent is indicated by the downward
pointing arrow on the left of the diagram: the lower structures allow us to select
start systems that are easier to solve. For some problems, the ascent up the dia-
gram pays handsomely in path reduction and may turn an intractable problem into
a solvable one. On the other hand, it can happen that solving a start system for a
higher structure can consume more computer time than is saved in path reduction.
Unfortunately, even just counting the number of roots of the start system can be
expensive, so it is a matter of experience to decide the most advantageous spot in
this hierarchy to solve a particular problem.
Two dashed lines appear in Figure 8.1 to demarcate significant differences in the
start systems of homotopies respecting the various special structures. Below Line A,
Polynomial Structures 119
the start systems can be chosen in a factored form which permits all solutions to
be computed using simple combinatorics and routines for solving linear systems.
Thus, for these structures, the time spent solving the start system is insignificant
compared with tracking the solution paths to the target system. Above Line A,
some path tracking is usually required just to solve the start system. Furthermore,
above Line B, solving the start system usually requires the use of a homotopy based
on one of the structures below it in the hierarchy. Typically, these are not optimal
in the sense that some paths lead to degenerate points. Between the two lines lie the
monomial-product and Newton-polytope homotopies. These require path tracking
to solve the start system, but the homotopies involved can be specially designed
to produce all solutions of the start system without any extra paths leading to
degenerate solutions. In addition to the cost of the path tracking, the combinatoric
calculations can be significant.
In addition to differences in computation times, the position in the hierarchy
also has an effect on the complexity of the computer code that implements it. In
this regard, the two extremes are the simplest. All homotopies require routines for
path tracking. To this, a total degree homotopy adds a simple start system that
is almost trivially solved. Consequently, the corresponding computer code is as
simple as possible. At the other extreme, we may formulate a coefficient-parameter
homotopy in terms of the physical parameters of the engineering or science problem
at hand, a step which we must do in any case. The start system simply amounts
to choosing random, complex values for these parameters. The difficulty comes
in solving the start system. A simple way to proceed is to solve the start system
with a total degree homotopy. This may be expensive, but it only has to be done
once. After that, we may solve any target system in the same parameterized family
using only the paths from the nondegenerate solutions of that first start system.
So, once we have implemented a general-purpose solver for total degree homotopies,
coefficient-parameter homotopies require only a bit of data management to solve a
start system and store its nondegenerate solution list.
The other intermediate structures introduce intermediate levels of complexity
to a computer code. Multihomogeneous and linear product homotopies introduce
simple combinatorics into the enumeration of the start solutions. In contrast, the
combinatorics introduced by monomial homotopies have been the subject of signif-
icant mathematical study, of which we give only a hint in § 8.5.
A final important consideration in the choice of homotopy is numerical stability
and robustness. For the paths leading to nonsingular solutions, there is not much
difference to be expected in this regard no matter which homotopy is chosen. How-
ever, it can happen that if one uses a homotopy near the bottom of Figure 8.1, the
singular solutions may vastly outnumber the nonsingular ones. In some practical
situations, we may be satisfied to casually discard all badly-conditioned solutions
without wasting much computer time on them. This runs the risk of dropping out
some generically nonsingular solutions that happen to have marginal conditioning.
120 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
When we wish to be more careful about finding all nonsingular solutions, a great
deal of effort may be necessary to resolve all the badly-conditioned solutions. Mov-
ing up the hierarchy to a more special structure may eliminate these solutions from
the homotopy and avoid the cost and uncertainty of computing singular solutions.
In some cases, singular solutions remain but have reduced multiplicity, making them
easier to compute accurately using "singular endgames," (see Chapter 10).
With this general picture in mind, we will proceed to examine each of the special
structures in some detail. Before starting this journey, we present a discussion of
homotopy paths that is relevant to all the special structures. Then, we start at the
bottom of the diagram of Figure 8.1 and work our way up to structures of increasing
specificity. We give only simple examples in this chapter, postponing case studies
of more significant examples to the next chapter.
8.2 Notation
Throughout the remainder of this chapter, it will be convenient to use the following
notations.
(1) Let (ei,...,en) be the n-dimensional vector space having basis elements
e i , . . . , e n and coefficients from C. Any point in this space may be written
in the form X^Li Ci&i w i* n c i £ C for all i. Note that we have not specified
anything about the basis elements: in the structures we discuss below these will
be variously individual variables, monomials, or polynomials.
(2) Let {pi, • • •,pn} ® {Qi, • • • ,Qm} be the product of two sets, that is, the set
{Pi ® Qj-, 1 < i < n, 1 < j < m} having nm elements. Throughout this
chapter, we take this product as the image inside the polynomial ring; that is,
x ®y = y ® x = xy is just the product of two polynomials.
(3) Define P x Q = {pq | p s P,q £ Q}. Accordingly, {P ® Q) is the space whose
members are sums of members of (P) x (Q). Since this includes a sum of one
item, we have (P) x (Q) C (P<g>Q).
(4) For repeated products, we use the shorthand notations P^ = P <S> P and
(P) = (P) x (P), and similarly for three or more products.
(5) We write an element of complex projective space P™ using square brackets, for
example, x = [xo,x\,... ,xn] £ P™, see § 3.2.
As we shall soon see, in our hierarchy of special structures, Figure 8.1, all but
the top case (general coefficient-parameter structures) have parameters that appear
linearly. This means that the family of systems F(z\ q) : C™ x C m —> C n has the
Polynomial Structures 121
The special structures of this chapter all obey this linearity condition because they
are parameterized by coefficients which multiply a basis set of monomials or poly-
nomials.
Since the parameter space, C m , is linear, we can easily construct an homotopy
that stays in the parameter space while continuing from a start system, F(z; qi), at
t = 1, to a target system, F(z; q2), at t = 0, as
By Lemma 7.1.2, to solve the system for a given target q2, we just need the solutions
at almost any starting qi £ C m , from which we can follow the real straight line path
t £ (0,1]. However, in the case of an Ab Initio homotopy, where we have chosen
<?i specially so that F(z; qi) = 0 is easy to solve and where q2 is a target that may
not be general, we cannot use the lemma: neither endpoint is chosen with complete
freedom in the parameter space.
Since we are going to choose qi in a way that guarantees it has the generic
number of nonsingular roots, we can use any of the alternatives mentioned in § 7.1.
Among these, the "gamma trick" of Lemma 7.1.3 leads to the homotopy
where 7 £ C is chosen randomly and r € (0,1]. For nonzero 7 not on the negative
real axis and r G [0,1], the denominator 1 + (7 — l)r ^ 0. By the linearity of F(z; q)
with respect to q, we can clear the denominator to get
without changing the solution paths. It can save computation to further rewrite
this as
This is sufficiently convenient that we state it formally below. The upshot is that
in the succeeding sections, we may concentrate on finding start systems for each of
the special structures. Any start system in the family will do, as long as it has the
generic number of roots. Recall from the previous chapter, the notation N{q, U, Q)
is the number of nonsingular roots in U of F(z; q) = 0 at parameter point q € Q.
Theorem 8.3.1 Suppose F(z; q) : Cn x C m —> Cra is polynomial in z and linear
in q, and let f(z) = F(z;q0) for some given q0 G C m . If g(z) = F(z;q*) with
M{q*, U, Cm) = Af(U, Cm) for some Zariski open set U C Cn, then
122 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(1) for almost all 7 G S1, i.e., for all but finitely many complex numbers 7 of
absolute value one, the homotopy
h(z,t):=-ytg(z) + (l-t)f(z)=O
Proof. This is a consequence of Theorem 7.1.4, Theorem 7.1.6, and Lemma 7.1.3
with rearrangements described above for linearly parameterized families. •
Through use of Theorem 8.3.1, as long as the parameters of the family of systems
appear linearly, all that we need to form a good homotopy is to find one start system
in the family having the generic number of nonsingular roots. Then, by picking 7
at random in C, the homotopy leads to all nonsingular solutions of a target system,
with probability one.
Let us now jump to the bottom of the hierarchy of Figure 8.1 and work our way
up. Although the lower structures can be justified as special cases of the higher
ones, it is better for building understanding and intuition to start with the simpler
cases. Not surprisingly, for the most part, this follows the historical development
of the subject.
/i(z)e({l,;zi,...,zn}<*>),
where the parameter space is the set of coefficients multiplying the elements of the
vector space.
Since the parameters of F appear linearly, we can apply Theorem 8.3.1, if only
we can find a start system g £ F that has the generic number of nonsingular roots
and is easy to solve. We know from the classical Bezout Theorem for systems that
the number of finite, nonsingular solutions to a generic member of the total degree
family is J\f = d\ • • • dn. A simple system that achieves this bound is
(#-1) d
z 2 _ -^
2
g(z) = I . \ = 0. (8.4.2)
We can solve the individual equations independently, obtaining dt roots for z»; the
solutions of the system g(z) = 0 are the d\ • • • dn combinations of these. It is easy
to see that all of these roots are nonsingular. So, even though it is very sparse,
g(z) has as many roots as the most general member of the total degree family. We
summarize the net result in the following theorem.
h(z,t):=<ytg(z) + (l-t)f(z)=O
starting at the solutions of g(z) = 0 are nonsingular for t £ (0,1] and their endpoints
as t —> 0 include all of the nonsingular solutions of f(z) — 0 for almost all 7 £
C, excepting a finite number of real-one-dimensional rays through the origin. In
1
Simple demonstration: a monomial za, z = {21,..., zn}, \a\ < d, can be written as a string of
d + n symbols as za = 1 • • • 1 X z\ • • • z\ X • • • X zn • • • zn, where the positions of the n occurrences
of the "x" symbol uniquely specify the monomial. Hence, the choices of n items in a list of n -f- d
things enumerate the monomials.
124 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
particular, restricting 7 to the unit circle, 7 = e%0, the exceptions are a finite number
of points 9 £ [0, 2TT].
Proof. Because the family of all polynomial systems with the specified degrees
is a vector space over the coefficients of its monomials, this follows directly from
Theorem 8.3.1 under the condition that g(z) = 0 has the generic nonsingular root
count. The classical Bezout Theorem says that d = 11"= 1 di i s the generic root
count for this family, so we are done. •
Remark 8.4.2 The system g(z) from Equation 8.4.2 satisfies the conditions of
the theorem, and so it can be used as the start system of a homotopy to solve
f(z) = 0. There are, however, many viable alternatives. One that is occasionally
useful has gi(z) a product of di generic linear factors. Using the notation of § 8.1,
we may write
gi(z)e{z1,...,zn,l)(d').
The roots of this start system are found by choosing one factor from each equation
and solving the resulting linear system of equations. If we choose the coefficients
of all the linear factors at random, these linear systems will all be nonsingular with
probability one. Equation 8.4.2 is a special case in which
9i{z) G (Zi,l)idi) .
Instead of taking the classical Bezout Theorem as given, we can prove it with the
tools at hand. It is instructive to do so, because a slight generalization of the same
argument will apply for multihomogeneous structures in the next section. First, we
rephrase Bezout's Theorem in the current notation.
Theorem 8.4.3 (Projective Bezout) Given positive integers di,...,dn, let
F(z, q) : C r a + 1 xQ —> C" be the family of homogeneous polynomial systems whose ith
function is a member of the vector space ({?o, ~z\,. •., z n } d i ) and whose parameters
Q are the coefficients of this space. Then,
n
t=i
Corollary 8.4.4 (Affine Bezout) Given positive integers d\,..., dn, let F(z,q) :
C " x Q ^ C n be the family of polynomial systems whose ith function is a member
of the vector space ({1, z\,..., zn}d') and whose parameters Q are the coefficients
of this space. Then,
n
Af(Ci,Q) = '[[di.
Polynomial Structures 125
Proof. Let q* G Q be the set of coefficients for the system 5(2) = F(£; g*) as
z z
l 0
2
5(2) = I . ° [=0. (8.4.3)
\zn z
0 >
Remark 8.4.5 We call d = Yl7=i °k * n e total degree of the system. Thus, we may
say that the number of finite, nonsingular roots of a system of n polynomials on C71
is less than or equal to its total degree.
Remark 8.4.6 The system of Equation 8.4.3 can be used as the start system in
an homogeneous homotopy to solve n homogeneous polynomials in n + 1 variables
using the homogeneous analogue of Theorem 8.4.1. In fact, it is very useful to
homogenize a target system and solve it on P n , so that solution paths that would
diverge to infinity in C n can be followed to their endpoints at infinity in Pn. See
Chapter 3 for more on this.
The total degree homotopy is easy to implement and very effective for systems
of dense polynomials. However, systems arising in practice often display patterns
of sparsity that result in fewer than the total degree number of roots. The next few
sections move up the hierarchy of Figure 8.1 to capture more of the structure of the
target system.
126 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Clearly, for this system, the two-homogeneous treatment is more restrictive than
the one-homogeneous treatment. The corresponding start system is
9i(x,y)£(x,l)x(y,l),
g2(x,y) e (x,l) x (x,l). ^ ^
A particular instance that is sufficient is
which has two solutions (x, y) = (±1,1). When solving this system, we cannot
choose the first factor x = 0 in the first equation as it is incompatible with either
factor in the second equation. This hints at the general phenomenon that we make
use of in multihomogeneous homotopies.
More formally, the structure used in a multihomogeneous treatment of a system
can be summarized as follows. We have n variables that are partitioned into m
disjoint subsets of size ki,..., km, (fci + • • • + km = n); that is, we have z G C™
written as
Furthermore, in the target system f(z), the degree of the zth polynomial fi(z) with
respect to the jth set of variables Zj is d^. This can be written for i = 1 , . . . , n as
We consider the family of all such systems, parameterized by the coefficients of all
the monomials that appear in this vector space. In the remainder of this section,
f(z) is a particular member of this family and Af(Cn, Q) is the root count for
the family, where the parameters, forming space Q, are the coefficients of all the
monomials of the vector space specified by Equation 8.4.12.
As we will justify below, a start system that corresponds to F given by Equa-
tion 8.4.12 is g with
That is, gi is the product of linear factors, with d^ factors of the variables Zj. Let G
be the family of all such systems, having a parameter space Q' consisting of the cross
product of the parameter spaces for the vector spaces of the factors. Clearly, after
expanding the product and collecting terms, each such g is in the family defined by
Equation 8.4.12, which defines a map <j> : Q' —> Q. Let Qg C Q denote the image
128 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
of Q' under the map 0. We know that Qg is irreducible, because Q' is, so we may
speak of M(U, Qg), the generic nonsingular root count of the start system family G
as a subfamily of F.
To find a solution of g{z) = 0, choose one factor from each equation and solve
these n linear equations simultaneously. One finds all of the solutions by ranging
over all possible choices of the factors. As we saw in the example of Equation 8.4.10,
some combinations of factors will be incompatible; in fact, we must choose exactly
kj factors for each group of variables Zj.
There are several ways to count the number of solutions of the start system
g(z) = 0. Let D be the n x m matrix of nonnegative integers with entries d\j and
let K = {k\,..., km}. For generic coefficients in the linear factors, we have a generic
root count that depends only on D and K. We'll call this function Bez(D,K) =
J\f(Cn,Qg). Let s(K) be a list of length n containing kj copies of a,- for j = 1,..., m.
From this, let ir{K) be all the distinct permutations of the list s(K), of which there
will be n\/(k\\ • • • km\). Then, a direct formulation of the combinatoric process
described in the previous paragraph is
n
Bez(D,K)= J2 11^- (8A14)
An equivalent definition is
( n m \
^•••amm.I[2dtfaJ h (8A15
)
where coeff(x,p(x)) reads as "the coefficient of monomial x in the polynomial p(x)."
A special case of this formula occurs when m = n, which implies kj = 1,
j = 1,... ,m. Then, D is a square matrix and Bez(£>, K) — permanent(Z?), where
the permanent of a matrix is just the determinant except all terms are added without
introducing negative signs on the odd permutations. If D has all nonzero entries,
then there are n! terms in the sum. The other extreme is the one-homogeneous
case m — 1, k\ = n, for which we get one term, the total degree Bez(D,{n}) =
dn • • -dni.
Now, let's justify the use of this start system by proving the following theorem.
Theorem 8.4.7 (Multihomogeneous Bezout Theorem) Let F : Cn x Q —>
C n and G : Cn x Qg —> Cn be the families of systems specified by Equation 8.4.12
and Equation 8.4.13, respectively. Then
JV(C", Q) = M(Cn, Qg) = Bez(D, K),
where a formula for Bez(D, K) is given in Equation 8.4.I4.
Proof. The proof is essentially the same as the proof of Theorem 8.4.4, except
we use multihomogenizations of F and G to compactify the solution domain. See
Polynomial Structures 129
A solution to 5 = 0 must have at least one factor in each equation equal to zero. For
a generic J E G , a choice of kj factors in the group of variables {WJ, 2ji,..., "z^ }
from kj different equations, determines a unique point in the corresponding ¥kj,
and a collective choice of one factor from each equation that has kj factors in each
group of variables for j — 1,..., m gives one nonsingular solution of 5 = 0 in X.
These are the only possible choices, since any other choice must have more than
kj factors in some group j and so has only the trivial solution {0, ...,0} ^ Fkj.
These are the same combinatorics that define Bez(D,K), so we have Af(X,Qg) =
Hez(D,K). Moreover, generically none of the roots are at infinity, and no other
solutions exist. Since the multiprojective space X is compact, by the same argument
used in Theorem 8.4.4, we have for the multihomogenized family F that Af(X, Q) =
Af(X, Qg). Since generically none of the roots is at infinity, and since the affine roots
of the original inhomogeneous systems F and G are in one-to-one correspondence
with those of their multihomogenizations, the affine root counts are the same as the
root counts on X. •
particular, restricting 7 to the unit circle, 7 = e , the exceptions are a finite number
of points 0 e [0,2TT].
Proof. Because the family of all polynomial systems with the specified degrees is
linear with respect to the coefficients of its monomials, this follows directly from
Theorem 8.3.1 under the condition that g(z) — 0 has the generic nonsingular root
count. Theorem 8.4.7 establishes that Bez(D, K) is this count. •
f = (X1A + \2B)v = 0.
This becomes a conventional eigenvalue problem for A if we set Ai = 1 and B — —I.
The problem consists of n quadratic equations, thus the total degree is 2".
Partitioning the variables in the natural way as Z\ = v and z-z = A, we have
da = di2 = 1; that is, the equations are bilinear. The root count is the coefficient
of a™~1a2 in the polynomial (c*i + 02)") which is simply n. This agrees with the
well-known result from linear algebra.
A suitable start system has
gi(v,X):=(aJv)(bJX)=O, i = l,...,n
where a* € C™ and 6j £ C2 are chosen randomly. For k = 1,... ,n, we choose the
second factor in the /cth equation to solve for A and solve the linear system formed
by the first factors from the remaining (n — 1) equations to get v. This gives n start
points.
Notice that the equations are all two-homogenized from the outset. To treat
these numerically, we may dehomogenize by appending a random, inhomogeneous
linear equation for v and one for A. This amounts to choosing a random patch
C"" 1 x C 1 on P"" 1 x P.
with, as usual, the parameters being the coefficients of the vector space.
For such a family, a sufficient family of start systems G has for the ith equation
Af(U,Q)=M(U,Qg).
Corollary 8.4.13 For any f(z) in the family defined by Equation 8.4-16 and a
generic g(z) from the family defined by Equation 8.4-17, the solution paths of
h{z,t):=-ytg(z) + (l-t)f(z)=O
starting at the nonsingular roots of g(z) = 0 are nonsingular for t g (0,1] and their
endpoints at t = 0 include all the nonsingular roots of f(z) = 0, for all 7 G C
excepting a finite number of one-real-dimensional rays through the origin.
Proof. This is the usual application of Theorem 8.3.1 in light of Theorem 8.4.12.
•
132 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The theorem and its corollary are quite simple to apply. Consider the system
(8418)
^U'H^C^H-
We see that /i G ({x,y} <g> {l,x,y}} and / 2 € {{x,y}2 <g> {l,y}), so we pick g1 e
(x,y) x (l,x,y) and g2 £ (x,y)2 x (l,y). A particular example is
(X+ )(1
*(*,*) = (" W , J ^v +2/ M=0- (^.19)
v
lS2 / \ ( ^ - 2 / ) ( ^ + 22/){l + 3/) / >
Although the total degree of g is 6, it has only 4 nonsingular roots, since (0, 0) is a
double root. Although we chose very simple coefficients, it is easy to see that this
is true for generic coefficients. Hence / has at most 4 nonsingular roots on C2. We
give a more substantial example in the case studies below (see § 9.3).
It is easy to build a computer program that takes advantage of linear-product
homotopies, if we rely on the user to identify the product structure. Then, the
program forms a start system consisting of linear factors with coefficients picked by
a random number generator. This gives a system that is generic with probability
one. The program cycles through the various combinations of choosing one factor
from each equation and, if the resulting linear subsystem is full rank, its solution
is determined. This potential start point is a solution of the start system, but it
is a true start point of the homotopy only if it is nonsingular and it is in the set
U. We can check for singularity by numerically evaluating the condition number
of the Jacobian matrix of partial derivatives at the point. Assume U is defined
explicitly as the complement of the solution set of a given polynomial system, say
U = Cn \ s-\0) where s : Cn -> <Cm are "side conditions" as in § 7.5. We need
only evaluate s at the potential start point to determine membership in U.
It is worth considering how a potential start point may fail to be a true start
point. It is singular if and only if it is a solution of two of the linear subsystems
formed by choosing one factor from each equation. Because the coefficients of
the linear factors are chosen randomly, this can only happen when a subset of the
variables is zero due to the lack of an inhomogeneous term in a corresponding subset
of factors. This is what happens, for example, in Equation 8.4.19, where (0,0)
appears twice. In the absence of this phenomenon, there is a zero probability that
the solution satisfies more than one linear subsystem, and it is therefore nonsingular.
Similarly, the only way a potential start point can fail to be in U is if it has a pattern
of zero entries that make it satisfy s independent of the values of the nonzero entries.
In particular, one may choose to work on (C*)n, in which case any solutions to g that
have one or more variables equal to zero, can be cast aside. To understand these
statements better, see the proof of Theorem 8.4.14 below. For an inhomogeneous
linear factor, the base locus of (1, z\,..., Zk) is empty, while for a homogeneous linear
factor, the base locus of (zi,..., Zk) is the linear subspace z\ = z2 = ... = z^ = 0.
Clearly, linear product structures include multihomogeneous structures which
Polynomial Structures 133
include total degree structures. In HOMLAB, the Matlab code distributed for use
with this text (see Appendix C), the general-purpose code uses linear products.
The drivers for multihomogeneous and total degree homotopies construct equivalent
linear-product structures and then proceed as in the general linear-product case.
/i,/z e (x,y,x2y,xy2).
These are cubics, so the total degree is 9. The two-homogeneous Bezout number is
the coefficient of a(3 in (2a + 2/?)2, which is 8. The best linear-product structure
that contains the given monomials is ({x, y} ® {l,x} ® {l,y}}. If we work on (C*)2,
this structure has 6 roots. But the equations obey the following monomial product
structure
A,/2G {{l,xy}®{x,y}).
This structure gives the same root count as the factored system
the monomial product theory does not apply, but the convex polytope approach
gives the same root count of 4, because xy is inside the "convex hull" of the other
monomials. Still, despite its limitations, it may occasionally be useful to analyze a
small system by monomial products. It also serves as a stepping stone to our final
product structure: polynomial products.
or what is equivalent,
Each vector space (sy) has Ckij coefficients, so the entire family of start systems
G of the form of Equation 8.4.22 has a parameterization as the cross product of all
Polynomial Structures 135
of these Euclidean spaces, which is therefore just a big Euclidean space. But since
every g{z) e G is also in F, we can cast G as a subfamily of F having parameter
space Qg C Q. Clearly, Qg is connected, because it is the image of a Euclidean
space, where the map is defined by expanding the product and collecting terms.
Accordingly G{z;q) is just F(z;q) restricted to C n x Qg, where Qg is the set of
systems in F that factors as Equation 8.4.22.
The sufficiency of g(z) € G as a start system for any f(z) G F is established by
the following theorem.
Af(U,Q)=Af(U,Qg).
In other words, the number of nonsingular roots in U for a generic start system,
one that factors in the specified way, is the same as the generic nonsingular root
count of the whole family. Such a start system g(z) is much easier to solve than
a general system in the family, because g^z) = 0 implies that at least one of
gij(z) = 0, j = I,...,mi.
Our earlier proofs of Theorems 8.4.4 and 8.4.7 hinged on showing that the start
system had no singular solutions and no solutions at infinity. The question of
excluding roots that satisfy some side conditions, that is, the limiting of the root
count to some Zariski open subset U, did not arise, because those start systems
will not generically have roots on any given quasiprojective set. In the case of
polynomial-product structures, a generic system g 6 G may have singular solutions,
solutions at infinity (in some multihomogenization of C"), or solutions on some
quasiprojective set. The inclusion of U in the theorem strengthens the result (as
compared to using just C n in its place), because it will allow us to drop solutions
that generically lie on some quasiprojective set that we wish to ignore. So while
these possibilities give the formulation extra power to eliminate solution paths in
the homotopy, we must pay for them with a more difficult proof. In particular, we
must argue in more detail that in a continuation from a generic member of F to
a generic member of G, none of the nonsingular, finite solution paths end at such
degeneracies. The proof is a little long by the standards of this chapter, but we
attempt to keep the arguments elementary. This sacrifices some rigor and elegance,
but hopefully it grants the reader an easier grasp of the essential facts.
In the linear-product example of Equation 8.4.18 with start systems like Equa-
tion 8.4.19, we already saw an example of a singular solution to the start system
which also happens to lie on the affine algebraic set x = 0. These conditions persist
generically for the entire family of start systems for that example.
We pause here a moment to emphasize that the theorem can be readily applied
136 Numerical Solution of Systems of Polynomials Arising m Engineering and Science
without understanding its proof. In fact, we will give only a sketch of a proof here,
as a rigorous one requires the language of line bundles and sheaves. The reader
who is versed in these technicalities may wish to consult (Morgan, Sommese, &
Wampler, 1995) for a better proof. The proof sketch below may be useful as a
guide to understanding the rigorous proof. On that note, some readers may wish
to skip to the end of the proof now.
This completes our tour of the product structures in Figure 8.1. In the next sec-
tion, we consider a different generalization of monomial products, using monomial
polytopes, which respect product structures but also take advantage of monomial
sparsity that is not captured in by any breakdown into products.
A natural way to specify a family of polynomial systems is just to list the mono-
mials that may appear in each polynomial. The family is parameterized by the
coefficients. By the general coefficient-parameter theory, it is clear that there is a
root count associated to such a family. A remarkable theorem, repeated below, due
to Bernstein (Bernstein, 1975) tells how the root count depends on the pattern of
the monomials. Since the family is linear in its coefficients, we can use the homo-
topy of Theorem 8.3.1 to solve problems in the family, if only we can solve a start
system having the generic number of roots. Several methods for formulating and
solving such systems have been invented. We describe here only the basics, so that
the reader can appreciate the methodology, but due to the highly technical nature
of efficient combinatorial formulations, we defer to references for the details. After
reading this section, one might next wish to consult the review article (Li, 2003).
Before we can state Bernstein's theorem, we need a few definitions.
Ji\X,Ci) = y Ci,ct'E
aeSi
where Si C Z n is the set of exponent vectors appearing in the monomials,
#(£i) = rrii is the number of monomials, and Ci%a G C is the coefficient for the
Laurent monomial xa. The qualifier "Laurent" acknowledges that we allow nega-
tive exponents, which are disallowed in our usual definition of polynomials. The
set Si is called the "support" of fu and its convex hull Qi = conv(5j) in W1 is its
"Newton polytope.2" The polynomial family
f{x\c) = f(x;ci,...,cn) = {/i(x;ci),...,/ n (ar;cn)}
is parameterized by mi+m2H \~mn coefficients for the support S = {Si,..., Sn}.
When working on (C*)™, multiplication of any equation by a monomial does
not change the root count, as the zero set of xap(x) = 0 is just the union of the
zero set of p(x) = 0 and the zero set of xa = 0, the latter having no points that
A convex polytope is a bounded region of n-dimensional real space enclosed by hyperplanes.
"Polytope" is to n dimensions as "polyhedron" is to three dimensions.
Polynomial Structures 139
We say that the mixed volume, Ai(S\,..., Sn), of supports Si,..., Sn is the mixed
volume of their convex hulls.
This result is variously called the "Bernstein count," the "BKK bound" (a term
coined in (Canny & Rojas, 1991) in recognition of the contributions of (Kushnirenko,
1976) and (Khovanski, 1978)), the "polyhedral root count," or the "polytope root
count." We adopt the last convention as the most descriptive and precise.
If all the exponents are positive, that is, if the system is polynomial in the
usual sense, there is a well-defined root count on C", which may be higher than the
polytope root count on (C*)n. The count in C" can be determined by the procedure
in (Li & Wang, 1996) with further refinements in (Huber & Sturmfels, 1997).
140 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
cn=p(l,l)-p(l)0)-p(0,l),
or in other words,
M{Qu...,Qn) = Y,{-l)^-^Yo\n I J2 QJ I » ( 8 - 5 - 26 )
i=i \jecr* J
where the inner sum is a Minkowski sum of polytopes and Cf are the combinations
of n things taken i at a time.
It is instructive to see how the mixed volume relates to Bezout's theorem. Sup-
pose fi(x,y) and f2ix,y) are general polynomials of degree d\ and d2, respectively.
This implies that their support polytopes Q\ and Q2 are isosceles right triangles of
size d\ and d2, shown in Figure 8.2, and the Minkowski sum Qi + Q2 is another
such triangle of size d\ + d2- Accordingly, by Equation 8.5.24, the root count is
which is, of course, the same result as given by Bezout's Theorem. The subtrac-
tion of areas is shown graphically in the drawing of Qi + Q2- Alternatively, we
can visualize the definition of the mixed volume directly by drawing a picture of
AxQi + X2Q2, as shown at the right side of Figure 8.2. Only the area of the shaded
parallelogram scales as A1A2, whereas the triangles scale as \\ and X2-
Fig. 8.2 Mixed volume for two polynomials of degree d\ and di.
In a similar fashion, one may easily see that the mixed volume for two equations
having bidegrees (dix,d\y) and (d2X,d,2V) is dixd2y + diyd2x, in agreement with the
two-homogeneous Bezout count. Figure 8.3 shows this in a self-explanatory way.
Fig. 8.3 Mixed volume for polynomials with bidegrees {d\x,diy) and (d2x, <^2j<).
Although the preceding examples only examine the two-variable case, the mixed
volume does in fact generalize the multihomogeneous Bezout count in any dimen-
sion. This relationship is pursued further in one of the exercises at the end of
the chapter.
Any linear product structure is exactly captured by the polytope root count,
as a linear product is just another way of saying that the monomials appear in a
certain pattern. There are, however, more general patterns that are captured by the
polytope formulation but not by any linear product formulation. Systems having
such patterns are said to be "sparse," because some of the monomials which could
appear in a total degree formulation are missing. Many of the problems that arise
in applications have such sparseness.
142 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
These diagrams hint at the main idea that underlies efficient algorithms for
computing the mixed volume. In each of the drawings of Qi + Q2, notice that the
gray cells, whose areas sum to the mixed volume, are parallelograms having one
edge in common with Q\ and one edge in common with Qi- These are known as
the "mixed cells" in a "mixed subdivision" of Q\ +Q%- Mixed subdivisions are not
unique, as we show in Figure 8.5. It is only required to find one.
One approach to finding subdivisions for the mixed volume calculation is based
on "liftings." A lifting algorithm augments each polytope by adding an (n + l)th
coordinate axis and assigning a value using a lifting function. That is, point a e Qi,
corresponding to monomial xa in fi(x), is lifted to (a,Wj(a)), where the lifting
function, u>i : Z n —> K, for the ith polytope assigns a lift value to each exponent
vector. If these assignments are chosen at random, the following procedure gives a
valid subdivision with probability one. Let Q\ be the (n + 1)-dimensional polytope
derived from Qi using u>i. Then, one forms the Minkowski sum Q[ + • • • + Q'n and
finds the lower convex hull. The projection of the edges of this lower hull onto
the original n coordinates gives a mixed subdivision, from which the mixed cells
Polynomial Structures 143
can be readily identified and their volumes computed. In fact, for efficiency, one
avoids forming the convex hull of the Minkowski sum and instead searches for the
mixed cells directly. See (Gao & Li, 2000, 2003; Li, 2003; Li & Li, 2001; Huber &
Sturmfels, 1995; Verschelde, Gatermann, & Cools, 1996). In (Huber & Sturmfels,
1995), it is also shown how to take advantage of several of the equations having the
same support.
has a number of solutions equal to the volume of that cell. These solutions can be
found by elementary means. Altogether, the paths emanating from the mixed cells
give the full set of solutions to G(x) = 0, whose number totals to the mixed volume
of the system.
In principle the homotopy could go directly to the target system F(x) — 0,
using the coefficients and monomials of F in Equation 8.5.29 instead of those of
G. In practice it is advisable to use the two-stage procedure, solving G and then
progressing to F. This is because target systems are often not generic in the family
defined by their support (that is, the coefficients may satisfy a degeneracy condition)
and this may cause the standard algorithm for solving H(x, 0) to fail.
8.5.5 Example
Rather than delve any deeper into the technicalities, let us simply show the workings
on the example of Equations (8.5.27) and (8.5.28). A choice of lifting functions as
u>i = 0 and u>2{a) = (1,1) • a yields the subdivision shown in Figure 8.4. To see
this, note that the Newton polytopes of the supports of the polynomials are
The lower hull of Q[ + Q'2 has the faces shown in the figure with vertices
Using these liftings, the homotopy of Equation 8.5.29 applied to this example be-
comes
1 + ax
H(x,y,t) = Clf^l) =( + bx2y2 2 A (8.5.30)
v y 3 } v
' ' ' \h2(x,y,t)J \1 + cxt + dyt + exyH )
The solution paths of H(x, y, t) = 0 are intimately related to the two mixed cells,
labeled A and B in the figure. It can be shown3 (Lemma 3.1 Huber & Sturmfels,
1995) that H(x,y,t) only has branches of the form
check that the inner product of (71,72,1) with the lifted vertices takes a minimal
value of 0 on the cell. In the case at hand, this means that
0 = l + bx%yZ,
0 = l + dy0-
These give two solutions
(zo,2A>) = (±id/Vb,-l/d).
For each of these, we may use Equation 8.5.31 to predict the values of x(t),y(t)
for small t and then commence path tracking on the homotopy Equation 8.5.30 to
t = 1.
In similar fashion, the mixed cell B in Figure 8.4 is generated by monomi-
als x,x2y2 from fi and l,x from J2- This time the inner normal is (71,72,1) =
(-1,1/2,1), so we get
which gives
Table 8.1 Various root counts for the toy example, Equation 8.6.33
Let us review by studying a "toy" example for which each product structure gives
a different root count.
Consider a system of four equations, each of the form
We have four variables 2 = {z\, z2, z3, Z4} and forty parameters g^, i = 1,...,4,
j = 1 , . . . , 10. Table 8.1 gives the root counts for various embeddings of the system.
The combinatorics for the linear-product embedding follows those for the two-
homogeneous case, but the simultaneous choice of two factors {zi,z2) gives a
solution with z\ = z2 — 0, and two choices of the form {z3, z±) yield a similar
result. Thus, we get a smaller root count when working on (C*)4 of 3-3- (4) = 54.
• The monomial product root count and the polytope root count are the same for
this system. Evaluation of the mixed volume by computer yields the count of 26.
• The polytope root count does not account for the fact that Z\Z4 and z2z3 do not
appear independently in the factors. The polynomial product structure cap-
tures this fact, and as a result the root count decreases to 6. To determine this
count, one must consider the 24 ways to choose one factor from each equation
in the corresponding start system:
gi G {zxz4 - z2z3, zi,z2) x {ziZi - z2z3, z3, z 4 ).
It turns out that only choices with two of each kind of factor give roots in (C*)4.
Each of these (2) = 6 combinations gives a single root.
Although for this example polynomial products give a lower count than the
polytope root count, that is not necessarily true in general. It depends on whether
the equations admit a favorable polynomial product. Often the polytope root count
is lowest. Other than that, the ordering in the table is fixed, as the structures lower
in the table are generalizations of those higher in the table, as indicated at the
outset of the chapter in Figure 8.1.
8.7 Exercises
The next chapter on case studies contains more challenging exercises connected to
applications. For now, the exercises are simpler, illustrative problems.
Exercise 8.1 (Warm Up) Use HOMLAB to solve the system 8.4.4 using
• a total-degree homotopy (see routine totdtab), and
• a two-homogeneous homotopy (see routine mhomtab).
Exercise 8.2 (Linear Products) Consider the system 8.4.18. What is its total
degree, its two-homogeneous Bezout count, and its best linear-product root count.
Solve it all three ways using the HOMLAB routines totdtab, mhomtab, and lpdtab.
Exercise 8.3 (Generalized Eigenvalues) Create a straight-line program for the
generalized eigenvalue problem
(XtA + X2B)v = 0,
where A and B are n x n matrixes, [Ai,A2] G P 1 , and v £ Fra. Solve a randomly
generated example using a two-homogeneous homotopy with n paths. (Use routine
lpdsolve.) Compare your result to the gz algorithm in Matlab.
148 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Exercise 8.6 (Circle Tangents) A circle of radius r and center (a, b) has the
equation f(x,y) := (x — a)2 + {y — b)2 — r2 = 0. The condition for a line through
(x, y) and point (c, d) to be tangent to the circle is g(x, y) := (x — a)(x — c) + (y —
b)(y-d)= 0.
(1) Assume r, a, b, c, d are given. Find the points of the circle where it is touched by
the tangents through (c, d). Do so by solving the system {/ = 0, g = 0}, then try
again by solving {/ = 0, f — g = 0}. Is there a difference in the number of paths
for a total degree homotopy? How about for a two-homogeneous homotopy?
(2) Assume two circles are given. Find the point pairs where a line simultaneously
touches both circles in a tangency. Use the same trick as in item 1 to reduce
the number of homotopy paths.
(3) Show that with the change of variables (z, z) :— (x + iy,x — iy) and judicious
linear combinations of the equations, the simultaneous tangents to two circles
can be found with a system having total degree 8, linear-product root count 6,
and polytope root count 4.
(4) What happens if the two circles are tangent to each other?
Chapter 9
Case Studies
149
150 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
U
Pi(xU...,XN)= J2'"Y1 xi31,-,xNJNxljix2h---XNjN- (9.1.1)
ji=0 j«=0
Notice that this is multilinear in the players' mixed strategies. Equilibrium occurs
for player A if, while holding B and C's mixed strategies fixed, every pure strategy
for A returns the same payoff. Otherwise, A would be motivated to bet more heavily
on the higher paying pure strategy. Let e/- be a pure bet on the fcth strategy:
eo = ( 1 , 0 , . . . , 0), ei = (0,1,0,..., 0), etc. Then, a Nash equilibrium occurs when
for i = 1 , . . . , N and k — 1 , . . . , Si
were such that more monomials vanish from the equations, such as may happen
when payoffs for two pure strategies are equal, then the polyhedral method could
provide a lower root count. For small systems, a multihomogeneous root count
can be done by hand while a general multihomogeneous routine for larger systems
remains a simple and efficient alternative to polyhedral approaches.
Let us take, for example, the case of N = 3 players, with players 1 and 2 having
Si + 1 = 3 pure strategies each, and player 3 having just s 3 + 1 = 2 pure strategies,
so (si, s2, S3) = (2,2,1). By Equation 8.4.15, the multihomogeneous root count is
B = coeS(a2b2c\ (b + c)2(a + c)2(a + b)1)
= coeft(a2b2c, b2{a + c)2(a + b)) + coeff(a2b2,2b(a + c)2(a + b))
= coeff (a2c, (a + c)2a) + coeff(a2b, 2a2(a + 6)) = 2 + 2 = 4
The explanation of the first line is that the exponents in a2fc2c1 match the dimensions
of the space, P Sl x PS2 x PS3, on which we work, while those in the polynomial
(b + c)2(a 4- e)2(a + b)1 match the number of equations of each type, which are
also s\, S2, S3 by Equation 9.1.2. The factor (b + c)2 says that the two equilibrium
equations for player 1 do not involve player l's bets while those of players 2 and 3
appear linearly, and similar factors come from the other two players' equilibrium
conditions. It is clear, we hope, from this example how to generalize to other N
and Sj.
Another way to arrive at the same result is to examine the linear product start
system. For the (si, S2, S3) = (2,2,1) game, the 3-homogeneous start system is
/ 2( ) 3( r player 1 equilibrium
(xi) x <x3) 1 , . .... . (9.1.3)
v ;
; ; ; > player 2 equilibrium
Wxwr
(xi) x (x2) } player 3 equilibrium.
Among the 25 ways to choose one factor from each equation, we are limited to 2
choices each in x\ and x-i and only one choice for X3. Making the choice for x% first,
which can only be done 4 ways, one may see that all the other choices are forced.
The valid choices are
'fa)} [ten [ten [ter
(x2) te) (x2) (x2)
< te> \ \ (zi> > \ te> > < (xi) > - (9.1.4)
(Xi) (Xi) (Xi) (x3)
. (x2) ) I (x2) ) I (xi) ) \ (xi> .
The disparity between the multihomogeneous root count and the total degree, here
4 and 32, respectively, grows rapidly with the size of the problem, for example, for
N = 4 players having 4 pure strategies each, the 4-homogeneous Bezout count is
13,833, while the total degree is 3 12 = 531,441.
152 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
H2O ^2H + O,
gives rise to an equilibrium reaction equation governing the balance between the
constituents on the two sides, in this case
kXH2o = XHXO,
where k is an equilibrium constant that depends on temperature. (Equilibrium
constants for many reactions are available in standard tables, typically derived
from laboratory experiments.) To go with this reaction equation, the conservation
equations would be
2XH2O + XH = TH,
XH2O + XO = To,
where TH and To stand for the total amount of hydrogen and oxygen in the vessel.
Notice that the coefficient of 2 on XH2O in the conservation equation for hydrogen
comes from the fact that each water molecule has two hydrogen atoms. The conser-
vation equations are always linear, and the reaction equations are polynomial. The
three equations just given determine the equilibrium balance between water, hydro-
gen and oxygen in a simple model that ignores molecular hydrogen and oxygen, if 2
and Oi-
Morgan presents a model (Model B in (Chapter 9 Morgan, 1987)) involving
eleven species formed from oxygen, hydrogen, carbon and nitrogen. The reaction
equations, given in standard chemical notation at left and in polynomial form at
Case Studies 153
right, are:
02 ^ 20 kiXO2 = Xo (9.2.5)
H2 ^ 2H k2XH2 = X2H (9.2.6)
2
7V2 ^ 2N k3XN2 =X N (9.2.7)
C02^0 + CO k4XCO2 = XOXCO (9.2.8)
OH^O +H k5XOH = XOXH (9.2.9)
H2O^±O + 2H k6XH2o=XoX2H (9.2.10)
(l,Xo,XH,XoXH,Xfj,XoXff)
(l,Xco,XoXCo) /g 2 1 7 \
(l, Xo, XH, XCO, XQ,XH, XOXH,XOXCO, XOXN)
\ 1, XN , XN, XOXN ) .
A four-homogeneous formulation gives a root count of 18, which is the lowest possi-
ble multihomogeneous count. We can improve on that slightly with a linear product
154 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(l,Xo)x(l,XH)i2)
(l,Xo) x (l,XCo) /Q2 lgx
{1,XO,XH) x (1,XO,XH,XCO,XN)
(1,XN) x (1,XO,XN),
which gives a root count of just 16. As always, a sparse monomial homotopy would
do just as well as the best linear product homotopy.
In chemical equilibrium problems a significant numerical issue arises: equilib-
rium constants often have wide ranges of magnitude. For a temperature of 1000°,
Morgan gives reciprocal equilibrium constants Ri = 1/fcj that range from 10 22120 to
10 47 ' 970 . It is essential to rescale the variables and the equations to work in double
precision arithmetic. We will not discuss this issue here. The interested reader may
refer to Morgan's treatment in (Morgan, 1987) or (Meintjes & Morgan, 1987), or
study the implementation in the function scalepol distributed as part of HOMLAB.
This problem is treated in the exercises of this chapter.
G(e,g) = {e2 - e2, e 2 - e2,, e2 - e2, e2 - g2, e2 - g2, e2 - g22, e2 ~ g2} = 0. (9.3.19)
We immediately see that this system has exactly 128 solutions, all of the form
ex = ±eo,...,g3 = ±e0. The theory presented in § 8.3,8.4.1 shows that with
probability one, the solution paths of the homotopy
six leg equations, Equation 7.7.10, are the same, namely gg'. Hence, if we subtract
the equation for leg 1 from all the others, this term is eliminated from five of the
equations. That is, the system becomes
fo(e,g)=ge' = 0, (9.3.21)
fi(e,g) = ( M i + aia[ - L\)ee' + (gb'xe' + ehg') - (ge'a[ + aveg')
- (e&ie'ai + axeb\^) + gg' = 0, (9.3.22)
fi(e,g) = (bil/i + aia'i - L})ee' + (gbtf + ebig') - {ge'4 + a%eg')
- {ehe'a'i + a ^ e ' ) = 0 , i = 2 , . . . , 6. (9.3.23)
/o G {g ® e) (9.3.24)
he({e,g}®{e,g}) (9.3.25)
/,efej}®«). i = 2,...,6. (9.3.26)
The linear-product root count may be tallied up by noting that in picking one factor
from each equation, we must never choose more than three of the form (e), because
choosing four or more forces e = 0, and we wish to ignore any solutions on that
degenerate set. Accordingly, if we pick the factor (g) in go, we may pick either
of two factors in gx and among the remaining five equations, we may choose (e)
from zero to three times. If instead we choose (e) in go, we must limit the last five
equations to choose (e) at most twice. These observations give a root count of
4(HHK)H[0KMDH
It is shown in (Wampler, 1996a) that the count of 40 for general Stewart-Gough
platforms is due to the antisymmetry of the mixed quadratic terms. That is, if we
write Equation 7.7.10 for leg i in the form,
for any relationships between the coefficients of the monomials, so these give 84
roots when applied to Equation 9.3.21.
The high points in the history of the problem begin in 1968 with (Pieper, 1968),
who gave a formulation of the general problem having total degree 64,000. This
1
Ferdinand Freudenstein, Higgins Professor Emeritus of Mechanical Engineering, Columbia
University
Case Studies 157
upper bound was substantially sharpened in 1973 to only 32 (Roth, Rastegar, &
Scheinman, 1974), but it was not until 1980 that (Duffy & Crane, 1980) derived
a reduction of the problem to a single polynomial of degree 32. This essentially
solved the problem in the sense that good numerical methods exist for factoring
a polynomial in one variable and also in the sense that one could solve a generic
example and find the true root count. The count is only 16, since 16 of the 32 roots
were extraneous ones introduced by the reduction process. However, at the time
this was not fully appreciated and the prevailing attitude at the time was that the
problem could not be considered fully solved until a reduction to single univariate
polynomial of degree 16 was found. Besides, a numerical demonstration does not
carry the full weight of mathematical proof.
It was into this scene that, in 1985, (Tsai & Morgan, 1985) introduced the
method of polynomial continuation to the kinematics community. They cast the
problem as eight quadratics (total degree 256) and found that only 16 endpoints
of the ensuing homotopy were valid solutions. Perhaps the most important con-
tribution of that work was not the confirmation of the count of 16, but rather the
demonstration that systems of polynomial equations could be solved reliably by
numerical means.
Work continued after that on two fronts: elimination methods and continuation.
(Primrose, 1986) gave the first real proof of the root count of 16, by showing that the
other 16 roots of the Crane-Duffy polynomial correspond to solutions at infinity for
the intermediate joints. Morgan and Sommese (Morgan & Sommese, 1987a) showed
that the Tsai-Morgan system had a two-homogeneous Bezout number of only 96,
the first application of multihomogeneous continuation. Finally, in 1988, Lee and
Liang (Lee & Liang, 1988) produced the long sought-after reduction to a univariate
polynomial of minimal degree, although it was a complicated procedure. A simpler
one was later given by (Raghavan k. Roth, 1993), and a numerical treatment of
this reduction as an eigenvalue problem was given by (Manocha & Canny, 1994).
Complementing all of these works, (Manseur & Doty, 1989) found an example with
all 16 solutions being real.
The reduction of a problem to a univariate polynomial of minimal degree has
two payoffs: it proves an upper bound on the root count and it leads to a numerical
solution. But it is not the only route to either of these. A system of equations
that admits a sharp root count via a multihomogeneous formulation or a monomial
polytope analysis suffices for proof, and continuation can provide the numerical
method. We should not fail to mention the extensive work in computer algebra
to compute Grobner bases as a means of proof; see (Cox et al., 1997) and (Cox
et al., 1998) as a beginning point to the extensive literature on this. Any reduction
of a problem to a Grobner basis can be converted for numerical solution to an
equivalent eigenvalue problem (Auzinger & Stetter, 1988; Moller & Stetter, 1995).
But even as late as Raghavan and Roth's paper, algorithms for computing Grobner
bases were not capable of handling a problem as difficult as the six-revolute inverse
158 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
position problem.
If we are willing to give up rigorous proof of the true root count, it is often
convenient to find a "good enough" formulation of a multivariate system with a
root count low enough that continuation can be reasonably applied. In this sense,
with the tremendous increase in computer power of late, even Pieper's original
formulation of total degree 64,000 might be considered within range. But we will
proceed below to give a much more amenable formulation than that.
The approach we give here, first published in (Wampler & Morgan, 1993), is
of a different cloth than all the others we have mentioned. Those others begin
with a formulation of the kinematics as a product of homogeneous transformation
matrices (Denavit & Hartenberg, 1955), (Chapter 12 Hartenberg & Denavit, 1964).
Reductions starting from that point lead to rather long algebraic expressions, as
one can see from the cited references and (Chap. 10 Morgan, 1987). Instead, we
write down a system mirroring closely the geometry of the problem, and proceed
to solve it in its unmodified form.
Let Zi € R3, i = 1 , . . . , 6, be unit vectors along the joint axes of a six-revolute
serial-link chain; see Figure 9.1. The kinematic chain is completely described by
finding the common normal between each pair of successive joint axes and listing
three values: the "twist angle" ar between joint % and i + 1, the distance a, between
these joint axes (a.k.a, the "link length"), and the distance d» (a.k.a., the "joint
offset") between successive common normals. If none of the successive joints are
parallel2, the common normal directions are
Xi = z% x z i + i / s i n a j , i = l,...,5,
where "x" means the vector cross-product in 3-space. Then, the six-revolute inverse
position problem can be written as the system
(ai/sinai)zi x z2 + ^ ( d i z i + ( a l / s i n a i ) 2 i x z i + 1 ) = p, (9.4.32)
i=1
where ft is a known vector from where the first common normal intersects joint 1
to where the last common normal intersects joint 6. The vectors z0, XQ, and z\
are known, being fixed in the ground, as are ZQ and xy, being fixed in the last
link whose position and orientation is given. From these, and the known lengths
and offsets of the links, ft is readily computed from p, and we take it as given. So
we have 12 equations (vector Equation 9.4.32 is equivalent to 3 scalar ones) in 12
variables, which are the 3 elements each of £2,2:3,2:4^5. Although these vectors
naturally live in R 3 , we will treat them as if they live in C 3 , by the usual embedding.
2
See (Wampler & Morgan, 1993) for how to handle parallel links.
Case Studies 159
Among the equations, two are linear and the rest are quadratic, for a total degree
of 210 = 1024. Using the two-homogeneous groupings (I,z 2 ,z 4 ) and (I,z3,z5), we
get a lower root count of 24(g) = 320. Although this is quite a bit inflated over the
true root count of 16, it is low enough that we have no trouble tracking all paths
by continuation. Then, we can solve subsequent examples with only a sixteen-path
parameter homotopy.
One of the most prevalent classes of mechanical systems consists of planar links
joined by rotational joints, also known as "pin joints," or simply "hinges." The
axes of all the joints in the mechanism are perpendicular to the plane of motion. In
reality, the links occupy three-dimensional volumes, and they can move in separate
parallel planes, but for the purpose of analyzing their motion, only their projection
onto one of these planes needs to be considered.
Consider the seven-bar assembly shown in Figure 9.2, consisting of four triangles
and three simple bars. (We call this the "type a" seven-bar, as there are two other
topological arrangements of interest; see Exercise 9.6) For general dimensions of
the links, such an assembly is a structure, meaning that it will be rigid. However,
it is quite possible that if we disconnect a joint and reposition the pieces, we can
reconnect that same joint with the links in new relative positions. The question of
finding all such assembly configurations comes up in the study of related six-bar
and eight-bar linkages which have internal motion.
{1,01,02,03} X {1,01,02,03}.
{1,01,02} x { l , M 2 }
{1,02,03} X {1,02,03}
{1,03,01} X {1,03,0!} f q , w
(9 5 37)
{1,01} X {1,0!} " -
{1,02} X {1,02}
{1,03} X {1,03}
Of the same 20 combinations of factors that gave start points in the two-
homogeneous formulation, six now do not give solutions. For example, we cannot
simultaneously choose the initial factor from the first, fourth and fifth equations, as
we would then have three equations in only two variables: 9%, 02- From this, we see
that the linear product homotopy based on Equation 9.5.37 has a root count of 14.
Readers with a particular interest in planar linkages may wish to look at
(Wampler, 2001) to see an alternative solution approach which, when applied to
the seven-bar problems, converts them to eigenvalue problems of size 14, 16, or 18.
Fig. 9.3 Four-bar linkage. Heavy lines are rigid links, whereas thin lines are vectors. Open circles
mark hinge joints, and hash marks indicate a stationary link.
ball-shaped foot so that only its center position matters, not its orientation).
Body Guidance In this case, the entire motion of the coupler is at stake, both
position and orientation. Such a machine might scoop up material in one loca-
tion, carry it without spillage to deposit the contents in a second location. A
four-bar might guide the motion of the scoop.
Four-bar synthesis means that we specify at the outset the desired motion, and
seek to find a four-bar that will produce it. Synthesis is the inverse process of
analysis, which seeks to describe the motion characteristics of a given mechanism.
We will proceed to write out the basic equations of four-bar motion, which can
be employed for analysis and for various kinds of synthesis, depending on which
quantities are given and which are treated as unknowns. We will then describe
several synthesis problems. Among these, the most challenging are path-synthesis
problems, and as we shall discuss in some detail, the most challenging of all is the
synthesis of a coupler curve to pass through nine given points.
From these basic equations, we can define a wide variety of synthesis problems,
varying in how many positions are prescribed and which of the symbols in the
above equations are known quantities versus variables.
Since <pj is given, this is just a quadratic equation in ipj. After solving it, we can
backtrack through Equation 9.6.40 to get 6j and then Equation 9.6.38 for (pj,pj)-
Remember that "real" points on the motion curve are those points for which \ipj =
\63\ = l.
We should mention that the engineering analysis of a four-bar under considera-
tion for a real machine would encompass much more than just plotting its motion
curve. One would need to consider, for example, the forces transmitted through the
links. This and other considerations are beyond the scope of the present discussion.
It is now easy to see that the total degree is 2 4 = 16, whereas a two-homogeneous
structure {u, v, 1} <g> {u, v, 1} still has a root count of six.
The two-homogeneous root count is not sharp. Expanding Equation 9.6.44, we
see that the monomials uu, vv and 1 appear on both sides and cancel leaving only
the monomials {u, u, v, v, uv, vu}. This means that the origin (u, v, u, v) = (0,0, 0, 0)
is always a solution, which is of no use as a four-bar. Also, after two-homogenizing
via (u,v,u,v) — (U/W,V/W,U/W,V/W), we see that there are two solutions at
infinity ([W,U,V},[W,U,V]) = ([0,1,0], [0,0,1]) and ([0,0,1], [0,1,0]). Since all
three roots appear even for arbitrary coefficients, the root count on C* is three, and
accordingly, the polyhedral approach gives a mixed volume of three.
Burmester points, and points (—a, —a) are the Burmester centers, named after the
man who first solved the problem (Burmester, 1888).
Eliminating <pj from Equations 9.6.38, the equations for the left dyad are, for
j=0,...,4
(Pj + x9j + a)(pj + x9~x +a) = (p0 + x90 + a)(p0 + X8QX + a). (9.6.46)
This is almost identical in form to Equation 9.6.44, except this time the constant
term does not cancel out since pjpj ^ PoPo- Thus, the system has total degree
24 = 16, two-homogeneous degree (2) = 6, and only 4 finite roots. (The same
two roots at infinity exist as in the function generator problem.) In fact, one
classical approach to solving the function generator problem is to use the principle
of kinematic inversion to convert it to the Burmester body guidance problem, but
we will not go into that here.
where we have used 60 = 1. For five precision points, i.e., n — 4, we have eight
equations in the unknowns x, x, y, y and 9j, j = 1 , . . . , 4.
Expanding the products and cancelling terms, we have the system
where the 9j multiplying each equation clears negative exponents. This gives eight
cubic equations, for a total degree of 3 8 = 6561. This obviously misses the sim-
ple monomial structure of the system. We can do much better by noting that
Equation 9.6.49 has the monomial structure {x,x,l} x {l,9j,9^} and similarly,
Equation 9.6.50 has the monomial structure {y, y, 1} <g> {1,9j, #|}. The reader may
verify that this gives a six-homogeneous root count of 24(^) = 96.
Case Studies 167
The monomial structure is in truth sparser than the product structure just given
would imply. Only the monomials {X6J,X6J,X,X~6J,0J} appear in Equation 9.6.49,
and Equation 9.6.50 has a similar pattern. This allows solutions of the form x =
y = 0j = 0, so it is clear that the root count is lower than 96. In fact, the polyhedral
mixed volume yields a root count of 36, which is sharp.
In 1923, Alt (Alt, 1923) noted that the extreme path-synthesis problem for four-bars
is to specify nine points on the coupler curve. Compared to the six-revolute serial-
link problem, this one has a longer chronology, but a shorter historical account. The
problem has so far proven to be invulnerable to reduction by hand, and it seems no
one as yet has made a serious attempt at it using computer algebra. To date, the
problem has only been solved by polynomial continuation.
After Alt, the main advance came in 1962, when Roth (Roth, 1962) (Roth &
Preudenstein, 1963) abandoned analytical methods and invented an early form of the
continuation method, which he called the "bootstrap method." The work was done
using real variables, so Roth invented heuristics to work around difficulties which
we now recognize to be solution paths that meet and branch out into complex space.
Most bootstrap paths never found a solution, but nevertheless, the approach did
produce for the first time linkages to interpolate nine specified points. After the
invention of the cheater's homotopy (see § 7.8), Tsai and Lu (Tsai & Lu, 1989) used
a heuristic version of it to improve the yield of solutions, but a complete solution
was not found until 1992, by Wampler, Morgan, and Sommese (Wampler et al.,
1992). A follow-up discussion of this article (Wampler, Morgan, & Sommese, 1997)
showed how the approach could be specialized to design symmetric four-bar coupler
curves with a maximal specification of precision points (five points plus the line of
symmetry).
The system of equations is exactly the same as Equations (9.6.49) and (9.6.50),
except now a, a, b, b are unknown and the index ranges over j = 1 , . . . , 8. Accord-
ingly, the system has the product structure, for j = 1 , . . . , 8,
(l,x,x,a,a,ax,ax){l,0j,0?), (9.6.51)
(l,y,y,b,b,by,by) {1^3,6*). (9.6.52)
Using the fact that four general equations in the monomials {l,x,x,a,a,ax,ax}
have just 4 solutions (hint: introduce new variables n = ax and ft = ax), one sees
that this system has a root count of 212(®) = 286,720. This is the root count of the
formulation used to solve the problem in (Wampler et al., 1992), which at the time
was probably the largest polynomial system ever solved.
This is a case where symmetry can play a helpful role. It is easy to see that
swapping (x, x, a, a) with (y, y, b, b) leaves the equations reordered but otherwise
unchanged. If we can arrange our start system to have this same two-way symmetry,
168 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
we can track just half the number of paths. This can be done by using the same
random coefficients for the factors in Equation 9.6.51 as in Equation 9.6.52. Thus,
the problem can be solved using only 143,360 paths.
This is far from the end of the story. The system has numerous solutions at
infinity. Moreover, if (x, x,a, a) = (y,y,b,b), Equation 9.6.49 and Equation 9.6.50
are identical, so there is a positive dimensional solution component obeying this
relation. Many continuation paths terminate on this singular set. Actual solution
of the problem showed that there were only 8652 nonsingular solutions, appearing
in 4326 pairs due to the two-way symmetry. Since the two-way symmetry amounts
to nothing more than swapping the labels between the left and right dyads of the
mechanism, we may say that there are 4326 distinct four-bars that interpolate nine
general points. Moreover, these appear in triplets, called Roberts cognates, which
not only go through the nine points but have exactly the same coupler curve. This
means there are just 1442 distinct four-bar coupler curves that pass through the
points. By using parameter continuation, we can solve subsequent examples using
only 1442 paths, about a 100-fold reduction from the 143,360 used to solve the
first example.
When dealing with a very sparse system like Equations (9.6.49) and (9.6.50), it
is often advantageous to eliminate some variables. This is because one of the main
costs of the continuation method is solving the linear systems for Euler prediction
and Newton correction. The cost of linear solves grows as 0(n 3 ) with the number
of variables, unless sparse solving methods can be applied. In the problem at hand,
we can eliminate all the 8j variables without increasing the root count, thereby
increasing efficiency when using a linear solver for full systems.
The elimination is accomplished by applying Cramer's rule for linear systems.
The system
at9 + a20~l + a3 = 0,
(9A53)
M + w-i + p3 = o,
has solutions only if
SiS2 + Sj = 0, (9.6.54)
where
6l =
ft ft ' * = ft fc ' S
> = ft ft ' (9-6'55)
Applying this to Equations (9.6.49) and (9.6.50) gives a system of 8 equations with
the monomial product structure
This reduced system has been the subject of further study. The mixed volume
of the reduced version of the system, computed by Verschelde (Verschelde, 1996)
Case Studies 169
(Verschelde et al., 1996), was found to be 83,977. The best root count known was
found by applying polynomial products (Morgan et al., 1995). The approach is to
observe that Equation 9.6.54 admits the product decomposition {5i, 53} ® {52S3}.
A homotopy based on this decomposition has 18,700 paths appearing with two-
way symmetry so that only 9,350 paths must be tracked. However, the start system
itself must be solved by continuation since the subsystems obtained by choosing one
factor from each equation are not linear. The whole computation requires 24,300
paths. Although this is a substantial reduction in the number of paths, it requires a
specialized computer program, so one may prefer to use a general purpose algorithm
with more paths.
No matter which method is used to solve the first random example, considerable
efficiency is to be gained in subsequent examples by applying parameter continua-
tion to track only 1442 paths.
Table 9.1 Constants for the chemical equilibrium model, Exercise 9.2
Equilibrium Total
Constants T = 1000° T = 3000° T = 6000° Concentrations
Iog10(l/fei) 24.528 7.289 3.108 To 5.e-5
Iog 10 (l/fc 2 ) 22.206 6.997 3.270 TH 3.e-5
Iog 10 (l/fc 3 ) 47.970 15.107 6.942 Tc l.e-5
Iog 10 (l/fc 4 ) 24.942 6.825 2.559 TN l.e-5
Iog 10 (l/fc s ) 22.120 7.208 3.541
Iog 10 (l/fc 6 ) 46.989 14.680 6.791
Iog 10 (l/fc 7 ) 32.187 10.285 4.878 |
9.7 Exercises
Exercise 9.2 (Chemical Equilibrium) This exercise concerns the chemical sys-
tem of § 9.2. Data for this problem is given in Table 9.1. (A typographical error in
Morgan's Table 9-2, corrected here, reverses the constants Tc and TH•)
(1) Carefully verify the 4-homogeneous and the linear-product root counts given in
§ 9-2.
(2) Find a 3-homogeneous formulation that also has a root count of 18.
(3) Follow the steps outlined in § 9.2 to derive expressions for the coefficients of the
monomials listed in Equation 9.2.17 in terms of the mass conservation parame-
ters TH, To,Tc,TN and the equilibrium constants k\,..., k7.
(4) Use routine chemsys in HOMLAB to compute solutions to the system. First,
choose random coefficients. Try the different start systems. Do you get the
same number of finite roots each way? What do the roots at infinity look like?
(Hint: s t a t s (4,:) indicates the multiplicity of roots as determined by the end
game. See Chap.10.)
Case Studies 171
(5) Compute the solutions for random parameters. Is the result the same as for
random coefficients?
(6) Compute the solutions for T = 6000°, 3000°, and 1000°. How many physically
meaningful roots are there (concentration values must be real and nonnegative)?
(7) The test in chemsys for real solutions checks if the imaginary part of the con-
centrations is less than 10~6. Why is this not an adequate test for this problem?
Can you devise a better one? Can you spot complex conjugate pairs in the list
of "real" solutions?
(8) Try turning off scaling for T = 6000° and see what happens. What do you
think will happen for T = 3000°? Try it and see.
(9) (Open ended.) Why is T = 1000° so difficult? Can you devise a strategy to
treat this problem more easily?
The sole physically meaningful answer for T = 1000° is given in Table 9.2.
Exercise 9.3 (Stewart-Gough by total degree) Try running the Matlab file
stewart/sgtotdeg.m to solve the forward kinematics of a general 6-6 Stewart-
Gough platform.
(1) Confirm that among the 128 endpoints of the total-degree homotopy, 88 lie on
the afHne algebraic set {(e, g) : e = 0, gg' = 0}.
(2) The degenerate points are all singular. Why?
(3) Save the 40 nonsingular roots and use them as start points for parameter ho-
motopy, as directed in Exercise 7.4.
Exercise 9.4 (Stewart-Gough by LPD) HOMLAB provides a routine, called
lpdsolve, that creates a linear-product start system for a given product struc-
ture and tracks the resulting homotopy paths. The user must provide an m-file
function that computes the function value f(x) and its Jacobian matrix df/dx.
The script file stewart/sglpdhom.m does all of this for Stewart-Gough forward
kinematics problems.
(1) Run sglpdhom and check that it tracks 84 paths and obtains 40 nondegenerate
solution points for a general 6-6 platform.
172 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(2) The routine warns that the start system has 30 singular solutions of the form
e = 0. Can you see why these are present and why there are 30 of them?
(Hint: they are nonsingular roots for some choice of factors in the start system
G = {go, • • • ,ge}, but singular as solutions of G.)
Exercise 9.6 (Seven-Bar Structures) The structure in Figure 9.2 is one of just
three topological arrangements of seven links in a structure that cannot be solved by
analyzing a five-bar or three-bar substructure. The other two are shown in Figures
9.4 and 9.5.
(1) Derive equations for each of the seven-bar structures in Figures 9.4 and 9.5 and
find linear product decompositions having root counts of 16 and 18, respectively.
Case Studies 173
(2) Create a single program using HOMLAB to solve any of the seven-bar structures
with a 20-path two-homogeneous homotopy. Solve a random example of each
type and verify the root counts of 14, 16, and 18.
(3) Create individual programs for the three cases using linear-product decompo-
sitions having the minimal number of paths. Run the same examples as you
used in the previous item and verify that the same solutions are found.
Verify that one of the solutions has x w 0.71477 + 1.3365i. How many "real"
solutions are there?
• For some real solutions, plot the coupler curve and verify that it passes through
the specified points. A "circuit defect" is said to occur if the real coupler curve
has two circuits and some precision points fall on each. Find examples with and
without circuit defects. Can you find an example having multiple real solutions
without circuit defects?
• Download one of the publicly available packages that implements polyhedral
homotopy and use it to solve the five-point problem.
Chapter 10
Endpoint Estimation
177
178 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
When the endpoint of a solution path is singular, there are several approaches that
can improve the accuracy of its estimate. All the singular endgames are based
on the fact that the homotopy continuation path z(t) approaching a solution of
H(z,0) = 0 as t —> 0 lies on a complex algebraic curve containing (x,0). In this
section we collect the facts that follow from this and underpin the methods. In
particular, we will see that the methods become valid only after the path z(t) has
been tracked into an "endgame operating zone" around t = 0. For very singular
endpoints, this operating zone may only be reached by increasing the number of
digits used. In § 10.4, we discuss in fuller detail what happens if one computes an
estimate while still outside of this operating zone.
Since this chapter is about local behavior of holomorphic functions, our homo-
topies H(z,t) will usually only need to be assumed holomorphic and not algebraic.
In § 10.2.1, we collect all the assumptions we use in one place.
(1) (x,0)sX;
(2) X <t U x {0};
(3) X is an irreducible component of
In simpler terms, we know that the paths in our polynomial homotopies re-
main nonsingular for t £ (0,1], so each path is one-dimensional and makes a steady
advancement as t goes to zero. The defining equations for the homotopy are all poly-
nomial, so the path is a complex analytic set. This is the essence of the conditions
stated above as applied to polynomial continuation.
Proof. Apply Corollary A.3.3 with the function g of that result set equal to t. •
We call c the winding number of X at (a;, 0). Given an isolated solution (x, 0) of
H(z, 0) = 0, there is a positive e e R such that for 0 < t < e, H(z, t) = 0, considered
as a system in z, has only nonsingular solutions in the vicinity of (x, 0). From this, it
follows that the multiplicity of the solution as a solution of H(z, 0) = 0 is the sum of
the winding numbers of the one-dimensional irreducible components of the solution
set of H{z,t) at (x,0). The nonsingularity condition is satisfied automatically for
many algebraic systems.
Note that since the components Zi{4>(s)) are holomorphic functions of s, they
can be expressed as convergent power series of s. We can consider these as fractional
power series in t1^0.
For the above representation of the components of z(t) to hold we must be within
a disk
such that Tfxnir-1(Ar ) has either no branch point (in which case c = 1) or a branch
point at (x, 0). We refer to rc as the endgame convergence radius.
A good way to visualize the situation is to consider what happens when we track
a solution path as t circles the origin in the complex (Argand) plane at a real radius
r, say as t = retB as 9 goes from 0 to 2TT. We start at z$ satisfying H(zo,r) = 0
and follow the path implicitly defined by H{z,rel9) = 0. For example, the reader
may think about H(z, t) = z2 — t(r] — t) with 77 a small positive number. For almost
all r, paths satisfying the basic setup above will remain nonsingular as we continue
around such a circle, returning at the end of the loop either to z0 again or to a
distinct nonsingular solution z\. For the example H(z,t) = z2 — t{j] — t), paths
will remain nonsingular except for r = n, and we will go from z0 = y/t(rj — t) to
z
i = "V^l 7 ? ~ 0 o r t o z i = \A(^ ~ *) depending on whether r < r\ or r > 77. We
Endpoint Estimation 181
may then proceed around the circle again and again to return to solutions z%, Z3,...
Since there are only a finite number of nonsingular solutions, after some number of
such loops, the solution path must return to the original point; that is, for some k,
we have Zk = z0. In the example, H(z, t) = z2 — t{j] — t), k = 2 or k = 1 depending
on whether r < r\ or r > r\. Considering this whole process again at a slightly
smaller radius r', we generally expect the same picture again, meaning that we get
a sequence of return points z'o, z[,..., z'k with z'k = z'o and z[, i = 1,..., k, being
the continuation of zt as t goes from r to r' on the real line. However, there may
be exceptional values r* of r where at least one of the loops hits a singularity, thus
breaking continuity. In the example, this value is rj. Stepping across this value,
the return sequence may change such that the ith return values Zi and z[ for r and
r' with r > r* > r', i.e., on opposite sides of the exceptional value, are no longer
joined by continuation of t from r to r' in the reals. The value of k that closes the
sequence may change as well. The endgame convergence radius rc is the smallest
such exceptional value of r*: for all smaller radii, the return map remains stable
and the winding number c of the path is the value of k in this range.
Remark 10.2.2 For simplicity we have slid over questions about whether you
can indeed choose small enough open sets so that we can decompose the solution
set of H(z,t) = 0 in a neighborhood of a solution (x,0) components so that for
the one-dimensional components, we have the desired uniformization result. The
language of germs is the way to gently deal with these issues in a rigorous manner.
We have included a short introduction to germs in § A.3.
The endgame operating zone can be empty in the case that the ill-conditioned
zone is larger than the convergence radius. However, whereas the convergence radius
is completely defined by the homotopy, the size of the ill-conditioned zone is not.
It depends on the precision of the arithmetic, so it can be made smaller by using
more digits. Roughly speaking, if we wish to estimate the endpoint with k digits
of accuracy, then we need to sample the path with k digits of accuracy also. Let
10 c denote the condition number of the Jacobian J(z, t) of H{z, t) with respect to
the z-variables and some fixed norm. When we do a correction step of Newton's
method we solve the equation J(z,t)5z = —H(z,t). Here we lose roughly C digits
of accuracy. Computing with d digits of precision, we need d — C > k for success.2
By increasing d, one may effectively shrink the ill-conditioned zone. With enough
2
This analysis of Newton's method is very rough, as the iterative nature of the method can
correct some errors. It would be closer to the truth to say that Newton's method converges
quadratically only to k < d — C digits, but even that is a rough generalization. Our comments are
meant to give a correct general picture without a complicated analysis.
Endpoint Estimation 183
digits, one can ensure that the endgame operating zone is not empty.
Once inside the endgame operating zone, we can sample the path just for real
t or we can sample for complex t in the zone. For a given precision of arithmetic,
better accuracy in the estimate is achieved by sampling for complex t.
This leaves open two questions that must be answered in order to deploy the
method:
• How do we find the endgame operating zone so that we can sample within it?
• How do we determine the winding number c?
The only practical approach to finding the endgame operating zone seems to
be adaptive trial-and-error. Suppose we fix a pattern of the sample points,
{asi,..., asK}, where a is a scaling factor for shrinking the sample pattern around
the origin. Typically, asi is real and we arrive at it by tracking t in (0,1). The
remaining sample points may be real or complex, but either way, we evaluate z at
them by continuation. We may execute the endgame repeatedly for a geometrically
decreasing set of scalings aj = A* for some fixed real number A S (0,1), say A = 0.3.
When successive estimates of the endpoint agree to some pre-specified tolerance,
we declare the method a success and stop. If this tolerance is never satisfied, we
stop when the scaling gets so small that we can no longer accurately track paths
due to the ill-conditioning near t = 0. If this happens, we must report that the tol-
erance was not met and return as our best estimate the one for which the smallest
successive difference was found.
There are several good ways to determine c. One is to directly measure the
winding number by tracking a circular path, t — Re^^6 until the path closes up at
9 = 2TTC with c a positive integer, i.e., with z(Re2'KC^1) = z(R). If R is inside the
endgame operating zone, then c, the number of loops around the origin necessary to
close the path, is the winding number. As always, there is the numerical problem of
deciding when two approximate numbers, z f Re2lxc^~^\ and z(R) are equal. This
is the same as the problem of needing to keep the allowed error in our tracking
small enough that we do not have path crossing.
A less computationally-expensive method for small c is to note that since c is
an integer, we can quickly test small values of c, say, from 1 to 4, for consistency
with a power-series fit to an oversampled data set. Such a data set can be obtained
with less path-tracking than would be required to find the winding number by path
closure. A method for determining c and estimating z(0) is as follows.
(1) Use continuation to collect sample values of z(t) for t = ti,... ,tK,tK+i.
(2) For c = 1,..., c max , do the following.
(a) Transform the sample points into the s-plane, using Si = ti . The contin-
uation path in t determines the proper matching angle of each Si, that is,
if U = Re^16 for R e (0,1), then s; = R}/ceV=ie/c taking R1/0 in the
reals.
(b) Derivatives with respect to t at the sample points must also be con-
verted to derivatives with respect to s using the value of c, e.g., dz/ds =
(dz/dt)(dt/ds) = (dz/dt)csc-\
(c) Fit an Mth-order power series, <t>c(s), to the samples at s i , . . . ,sK, as de-
Endpoint Estimation 185
scribed above,
(d) Calculate the prediction error at the extra sample point as ec = \\4>c(sK+i) —
^(Wi)ll-
(3) Use the c that gives the smallest prediction error ec as the estimate of the
winding number, so (f>(s) = 4>c{s). Estimate the path endpoint as z(0) = 4>(0).
When used in conjunction with the adaptive method of determining the endgame
operating zone, one often observes that c = 1 gives the best prediction when the
path is far outside the convergence radius. As the path is tracked into the operating
zone, c settles into the correct value. This is because the order of the prediction
error for an incorrect value c' of the winding number is O(tl/C), whereas for the
correct value it is O{tM>c).
One way to collect samples is in a geometric sequence along the reals:
(^0)^1)^2) •••) = (R,XR,X2R,...) for some A € (0,1). Using z and dz/dt at two
successive values ti and tj+i, one may make a cubic prediction of the next value at
ti+2- A n i c e feature of this sampling pattern is that it advances by adding just one
sample point to the sequence, reusing the last two points of the previous sample.
That is, at one iteration we use samples at (to,£i,<2) a n d at the next (ti, *2J ^3)-
Such a geometric sequence can be used to determine the winding number without
trial and error. The value z{t) is approximately
z(t) = z(0) + at1/0 + higher-order terms,
where a is the first coefficient in the fractional power series. Thus, z(R) — z(XR) «
a(l - \l'c)R}-/c and z(XR) - z(\2R) « a(l - A ^ A ^ i ? 1 / 0 and so
z(XR) - z(X2R) ^ 1/c
z(R)-z(XR) ~
Since we know X, this can be used to estimate c, keeping in mind that c is a positive
integer. This method can fail when a is zero or small, so that the first nonconstant
term in the power series is order £2/c or higher. A method that attempts to deal
with such subtleties is described in (Huber & Verschelde, 1998) (see also (Verschelde,
2000)). We shall not pursue this further here.
As we approach t = 0, we can expect the predictions of the power series to be
quite accurate. Accordingly, we may use it in place of the linear predictor in the
predictor-corrector path tracker when collecting new samples. Of course, one should
use the current best estimate of c at each stage, which may change as the endgame
proceeds. Even when c is not correct, because the path has not fully entered the
endgame operating zone, the best estimate for c obtained by the above method will
generally be better than just assuming c = 1.
A final variation on the power-series method is worth mentioning. Once the
endgame operating zone is entered, it is valuable to quickly gather more samples to
raise the order of the prediction. This allows the process to converge to full accuracy
at larger values of t, before the ill-conditioned zone is encountered. Suppose we have
186 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
sampled along real t £ (0, R) and the prediction of 4>c{s) gives an accurate estimate
of z(tK+i) in step 2d. Then we may try to predict across the origin in s and use
Newton's method to refine samples there. It is particularly convenient to gather
a symmetric sample set —si, — S2, • • •, —sK, because the odd-powered terms in the
power series for tp(s) = {<fi{s) + (j>(—s))/2 drop out, while i/>(0) = 0(0) = z(0).
Consequently, with a change in variables to w = s2, we can estimate z(0) with an
Afth order power series for ip(w) that is the same as a (2M + l)th order power series
for <p(s).
For double precision arithmetic and samples on the real line in t, experience has
shown that there is little profit in attempting the use of winding numbers greater
than four or five. For higher precision arithmetic, this limit can be extended. The
problem is that an Mth-order power series in s corresponds to a power series in t
of order only M/c. To get a good estimate, we will need a large value of M and
a numerically stable method of computing the estimate 0(0) without finding all
M + 1 coefficients of the power series. The Cauchy integral method of the next
section provides this.
j^e2n^ikc/(M+i)^ faen the trapezoid method gives exactly the average of the sam-
ple points:
1 M+i
Moreover, it is easily shown that this is the same result as would be obtained from
a power series fit to the same points.
The success of the Cauchy integral method depends on finding an appropriate
radius for the circular sample. As in the power-series method, we do not know a
priori the convergence radius. The most practical recourse is to discover it adap-
tively, by trying the method at geometrically decreasing radii. Convergence may
then be judged by agreement in winding number and endpoint estimate between
successive trials.
wi(0) + • • • + Wm(0)
m
Each of the Wi(t) has a fractional power series, but their sum is holomorphic, that
is, it has a power series with integer exponents. Thus, we may conveniently estimate
z(0) by fitting an integer-exponent power series to the average of the Wi(t).
The main difficulty with this method is determining which solutions are con-
verging to the same endpoints. The difficulty arises because the estimate of the
endpoints of the individual paths is inaccurate unless the winding number is em-
ployed in the estimation. Only the average endpoint is well-behaved (holomorphic),
188 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
It may happen that the endgame is applied outside the endgame convergence radius,
either because there are insufficient digits to track within that radius or because
the endgame zone is not identified correctly. It is natural to ask what happens in
such a circumstance.
When there is a tight cluster of distinct solutions, the precision of the arithmetic
must be high enough not only to distinguish between them, but also to track paths
accurately near them. If the cluster is too tight in comparison to the precision of
the arithmetic, the end of path tracking, and hence the application of the endgame,
will occur outside the radius of convergence. There is the stability question of
whether the methods will compute some sort of average of some of the solutions of
the cluster. The methods do, in fact, have good stability properties, which hold in
a larger range than the endgame operating range.
The setup is that we have a holomorphic function, H(z, t) : CN x U —> C, where
0 6 U C C. Let 7T : C^ x U —> U be the product projection. Of course, in practice
this is our homotopy. We are trying to solve at t = 0.
We have introduced three interrelated methods. The Cauchy integral method
and the power series method are the most accurate. The clustering method of
§ 10.3.5 is less accurate but clearly fails gracefully: it gives the weighted average of
roots of the cluster.
The full gamut of possible behaviors of the methods when we are not in the
endgame operating region is not clear, but we can get some idea of the behavior
from the following examinations.
Consider the simple example on C2
H(z,t) = z2-t2-e2=0.
If we track down to t — R and R < e, then we are in the endgame operating region.
If R > e, we are not. Let's see what we end up computing. The solution set TZ of
H(z, t) = 0 over AR(0) is a Riemann surface that can be shown to be biholomorphic
to some annulus. The important point is that 7r~1({t G C | |t| = R}) is the union
of two disjoint circles C\ and C?,-
Endpoi^it Estimation 189
— / VR2e2^e + e2d6,
2TT JO
with a choice of one of the two branches of the square root. If R < e, the Cauchy
integral method yields the roots ±e depending on the choice of the branch. If R > e
we get a function dependent on R. This integral is an elliptic integral, but for
explicit values of R and e it is easy to evaluate numerically.
Fixing e = 10" 7 and R = 10~5 we get 0.64- 1(T 5 -0.50-lQ- 7 \f-[, which does not
compare favorably with the actual roots ±10~ 7 . Indeed, the error 0.63 • 10~5 is two
orders of magnitude larger than the root. Since the Cauchy integral method applied
to an approximating polynomial gives the value at the origin of the approximating
polynomial, we see that choosing interpolation points on the circle C\ or C2, the
power series method will yield answers identical to the Cauchy integral method.
It is important to realize that the trace method is not better than the power-
series or Cauchy integral method. Indeed, if we chose the paths wi(t),... ,wm(t)
apparently converging to a common root as in the trace method, and applied the
power series or Cauchy integral method to all the points and summed, we would
get the same sort of answer as in the trace method. Let's see this precisely for the
Cauchy integral method, realizing, as noted above, that this implies the analogous
statement for the power series method using interpolation points on the curves over
the circle \t\ = R.
We assume that over some small disk, AR := {( 6 C \t\ < R}, of radius R
around 0, with A# C U, the set H~1 (0) r\TT~1 (AR — 0) is a one-dimensional analytic
set with closure X in AR X C^ such that irx '• X —> AR and TT-^ : X —> AR are
proper. This is phrased this way to allow the possibility that there is a positive
dimensional analytic solution set in the fiber over 0. By definition, proper means
that the inverse image of any compact set is compact. One significance of properness
for a holomorphic map is that the map has a well-defined sheet number on each
irreducible component of X, e.g., see Corollary A.4.15. As mentioned previously,
these conditions are satisfied for all of our homotopies. We are not assuming that
we are in the endgame convergence radius. Theoretically this means that we do not
necessarily have a map 0 as in § 10.2.1. We still have the normalization mapping v :
TZ —> X, which for curves is the most classical special case of Theorem A.4.1. Here
TZ is a smooth curve (a Riemann surface in the terminology of complex analysis), v
is proper; and for a finite set of points B C AR, the map 7TTC\77-I(B), i.e., the map n
restricted to TZ minus the finite set TT~1(B) is a biholomorphism. When we are in
endgame convergence radius, TZ is a disk and v is 0.
Since TXX extends to a neighborhood of X, v extends to V : TZ —> X, where TZ is
a Riemann surface with boundary a union of circles, i.e., dTZ := TZ — TZ is a union
of disjoint smooth connected curves C\,..., CL for some integer L > 1.
Now the Cauchy integral method (Morgan et al., 1991) that we are using starts
190 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
with a point po £ dTZ and follows its continuation p as n(p) goes around the circle,
{t £ C | |t| = R}, c times, where c is the minimum positive number of times it is
necessary to go around the circle, {t e C | |i| = R}, until p returns to po. Note
p traces out a connected component, d, of OX = UjCj containing p. We let c,
denote the cycle number associated to the curve CV In analogy with the cluster
method we compute the integral
J-yfz.^(±) (10.4.2)
Corollary 10.4.2 If dp = dTZ, then the equation (10.4.2) computes the average
of c (counting multiplicities) solutions of H(z, 0) = 0.
Endpoints of homotopy solution paths can be divided into two types: isolated so-
lutions and points on positive solution sets. We say that z* £ CN is an isolated
root of f{z) = 0, f(z) : CN -> C ^ , if for a small enough positive e e l , the ball
Be(z*) c CN defined by B€(z*) = {z £ CN \ \z - z*\ < e} contains no other root
of f(z) = 0 besides z*. Isolated singular roots can be computed accurately with-
out resorting to the kinds of singular endgames we have discussed above. This is
Endpoint Estimation 191
No matter how close one starts to the multiplicity-three isolated root at the origin,
(x,y) = (0,0), Newton's method diverges. See (Griewank & Osborne, 1983) for
more on how Newton's method behaves near such irregular singular roots. The
system of Equation 10.5.3 is very special in the sense that if the coefficient (29/16)
is changed to a generic value, Newton's method converges even though the origin
remains a root of multiplicity three. However, we do not wish to depend on this
kind of genericity, as we may indeed be given a system with an irregular singularity.
Moreover, even when Newton's method converges, its behavior may not be sat-
isfactory. For a root of multiplicity fi > 1, its rate of convergence is only linear
and the function must be evaluated with precision /x times greater than the ac-
curacy desired in the estimated root. To be precise, consider a single polynomial
f(z) : C —> C with a root z* of multiplicity /x > 1. Denoting the kth iterate of
Newton's method as Azk, we have the iteration formulae
A2fc = -f{zk-i)/f'(zk~i), Zk = zk-i + Azk.
Let £k := zk — z* be the error between the kth iterate and the true value z*. If the
sequence of iterates converges to z*, then it obeys the following relation in the limit
tk+l = (»-iyk+o(ek).
(A simple demonstration of this result can be found in (Ojika et al., 1983).) So for
H > 1, the convergence rate is linear with geometric ratio (/i — l)//x. For fi = 1,
convergence is quadratic, a much faster process.
5,_1SL (104M)
p a — /i + 1
is a conservative estimate that guarantees that f^k\z), for all k < /i — 1, has
exactly /x — k zeros in AP(ZQ). Even if the root is truly a multiple root due to the
structure of the equations, at any finite level of precision in floating point, it will
likely become a cluster of roots. However, the higher the precision, the tighter the
cluster, and so beyond some precision, the cluster radius p will become small enough
that condition 10.5.4 will be satisfied and the deflation maneuver will succeed.
This does not resolve the question of deciding whether a given polynomial has an
exact multiple root or it has a cluster of closely-spaced roots. As we have indicated
before, this is not a question that can be resolved in favor of a multiple root using
floating point arithmetic. If it is a cluster, a high enough level of precision will
reveal it, but if it is a true multiple root, only exact arithmetic can prove it.
random and set v = v0 + Y^i=i \vu with unknowns Ai,..., Ar e C. Combining this
condition with the system f(z), we have 2N equations in TV + r unknowns
10.6 Exercises
Exercise 10.2 (Power Series Error Analysis) There are two sources of numer-
ical error in the estimate produced by the power series method: truncation error
due to the order of the fit and amplification in the fitting process of errors in the
sample points. Formulate the fitting process as the solution of a linear system whose
unknowns are the coefficients of the power series:
<f>(si) = [ 1 Sj sf • • • sf1 ] [ a 0 ai a2 • • • aM]T
with p = 1/R?
• What sample pattern is best for a thin endgame operating zone, characterized
by having an ill-conditioned region almost as big as the convergence radius?
• Give two reasons why the Cauchy integral method is a good approach for end-
points with large winding numbers.
Exercise 10.4 (Multiprecision) (open research topic: see (Bates, Sommese, &
Wampler, 2005b)) The control settings for the endgame in HOMLAB reflect the
fact that Matlab computes in double precision. How should these be changed if
multiprecision arithmetic were available? If the precision of the arithmetic could be
changed at will during the endgame, how should the endgame algorithm best use
this capability?
has a multiplicity four isolated root at (x, y) — (0,0). Show that one stage of
deflation gives a nonsingular system defining the root.
Exercise 10.6 (Deflation 2) Do the following for Griewank and Osborne's sys-
tem of Equation 10.5.3.
• Formulate Newton's method and experimentally observe that initial guesses
near (0,0) diverge.
196 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
• Use HOMLAB to solve the system with the power-series endgame and observe
the winding number of the origin (suggestion: use totdtab.m).
• Use deflation to obtain a new system for which the origin is a nonsingular
solution.
• How many stages of deflation are required? How many variables does the final
system have?
Chapter 11
This is a very short chapter to help those who might try to create their own con-
tinuation codes. These tips can also be useful in getting more secure results when
using an existing code.
Since continuation is a floating point numerical process, there is the possibility
of several kinds of failure. The first step in correcting a failure is recognizing that
it has happened. Sophisticated codes detect some failures automatically and take
corrective action. Whether done automatically or manually, the basic techniques
are similar.
11.1 Checks
There are two kinds of checks: local checks examine an endpoint in isolation using
numerical analysis of the iterative method used in the endgame, whereas global
checks use knowledge of the polynomial nature of the problem, primarily the fact
that we expect to find all isolated solutions.
If the path tracker fails mid-course, that fact should be flagged and a corrective
action taken. See § 11.2 below.
197
198 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
It is typical that nonsingular solutions will attain very small Newton residuals,
while the accuracy of singular ones will depend on the multiplicity of the root.
Without a singular endgame, a double root usually attains only about half the
accuracy of a nonsingular one. If the condition number is high enough (and we have
taken care that the bad conditioning is not due to poor scaling of the equations),
we can be relatively secure in classifying the root as singular and, if we are only
looking for the nonsingular roots, it can be discarded. It is more satisfying, of
course, to invoke a singular endgame and clean up the solution, if possible. Also,
higher-precision arithmetic can be invoked to clarify the situation.
Multiple Run Comparisons If one runs the same problem two or more times
with different choices for the random constants, the same results should be
obtained. This principle can be invoked at several levels.
• In a homotopy of the form h(z, t) = -ftg(z) + (1 —t)f(z) (see Theorem 8.3.1
for details), this means using a different value for 7. Then, one should
obtain the exact same list of path endpoints, because although the tracking
path has changed, it is a real-one-dimensional curve inside the same complex
curve and its destination point is the same. The association of start points
to endpoints likely will be permuted, however. If the endpoints from two
such runs cannot be sorted to match up, then one or both are in error, and
one can concentrate path re-runs on those paths whose endpoints have no
match in the other set.
• A stronger test than the above is to change the start system to another
in the same class. The start systems described in Chapter 8 all contain
random constants which can be reset to new values. Two such runs should
have the same set of nonsingular endpoints, which can be compared. The
singular endpoints will typically move, but usually these are not of primary
interest.
• For a parameterized family of systems, F(z; q) = 0, using the notation
of Chapter 7, one may solve two instances for different, randomly chosen,
values of the parameters q. The number of nonsingular roots should be
constant, but, of course, their values will change. To cross check them, one
can track paths from one to the other in a parameter homotopy F(z; tq\ +
(l-t)«2)=0.
Points with good quality measures at t = te and which pass the path-crossing test
are ready for the endgame. Those which fail on either count should be re-run from
the beginning, t = 1 to t = te, with different path-tracking parameters.
Paths that fail in one endgame might benefit from another. For example, the
power-series endgame in double precision is only effective up to c — 4, while the
Cauchy integral endgame has no such limit. But ultimately, the only way to compute
some difficult endpoints is to increase the precision of the arithmetic. We briefly
address these two issues next.
How much extra effort should be devoted to corrective actions depends on one's
aims. In an engineering problem, one might not care much about lost solution
paths. This is especially true if the trouble is due to a nearly singular endpoint,
as it may likely be useless for practical purposes anyway. However, if one is doing
an initial run to solve a random-parameter example in preparation for repeated
parameter continuations, then one wants to ensure that a full solution set has been
Checking Results and Other Implementation Tips 201
found. This is because there is no way to predict which of these starting solutions
will lead to the desired answers in a subsequent application.
11.3 Exercises
Exercise 11.1 (Checking) Revisit any problem from the exercises of previous
chapters; the six-revolute inverse position problem of Exercise 9.5 might be a good
choice. Do the following.
• Run the problem using standard settings in HOMLAB and make histograms of
condition number, function residual, and the homogeneous coordinate. Note
that for any of these quantities, a histogram of the exponents of the values in
scientific notation is more useful than a histogram of the values themselves. Use
routine pathcros to check for path crossings among the points in xendgame,
which is a list of the solutions for t — Endgame ^ ^- ^ s e P a t hcros again
for the list of solution points, xsoln, at t = 0. For any occurrence of multiple
paths having the same endpoint, check that the incoming paths have winding
numbers consistent with the multiplicity check described above.
Checking Results and Other Implementation Tips 203
• Loosen the path tracking tolerance so that pathcros discovers path crossing
errors.
• Return the path tracking tolerance to its default value, but this time cripple
the endgame by setting CycleMax=l. See what difference this makes in the
histograms.
Exercise 11.2 (Multiple-Run Checking) For any parameterized problem of
your choice, do a multiple-run global check that shows that the nonsingular solutions
for two independent total-degree runs match up under parameter homotopy.
PART III
Positive Dimensional Solutions
Chapter 12
In this chapter we discuss the basic properties of the different sorts of algebraic sets
that arise in the numerical solution of polynomial systems. The flexible "probability-
one" methods underlying the numerical approach to polynomial systems, developed
in Chapter 13, are based on the fact that given any system of polynomials, the set
of solutions breaks up into a finite number of irreducible components.
Recall that we say that an affine, projective, or quasiprojective algebraic set Z
is irreducible if ZTes is connected. The dimension of an irreducible algebraic set Z
is defined to be dim Zreg as a complex manifold, which is half the dimension of ZTeg
as a real manifold. Irreducible components, discussed in § 12.2 are nice sets that
are almost manifolds. For example, the system
f{x y)
' -[x(y>-x>)(v-2)(3x + y)\-° ^ ^
vanishes on the union of four irreducible components
It is a striking and powerful fundamental fact that the most general solution set is
not much worse than this simple example.
To even state this result, which is called the irreducible decomposition, we need
to make precise what is meant by an algebraic set. The aim of this chapter is to
familiarize the reader with the basic types of algebraic sets and their properties.
Four types of algebraic sets are useful to us: affine algebraic sets, projective
algebraic sets, quasiprojective algebraic sets, and constructive algebraic sets. The
first three of these were introduced briefly in the introduction of Chapters 3 and 4.
We consider them in more detail in the succeeding sections.
In § 12.1, we revisit affine algebraic sets, i.e., the solution sets of systems of
polynomials on C^, to discuss the topologies and the maps defined on them. In
§ 12.2, we discuss the irreducible decomposition for affine algebraic sets.
Often polynomials are homogeneous, e.g., f(x,y) = x2 + y2, and in this case
acknowledging that their solution set is naturally defined on P^ simplifies matters,
207
208 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
both conceptually and numerically. For this reason we introduced projective al-
gebraic sets, i.e., solution sets on FN, in Chapter 3 and consider them further in
§ 12.3.
Often we need to consider all points in a projective algebraic set X except for
some that are in a second projective algebraic set Y, i.e., sets of the form X\(Xf~)Y),
such as C2 \ {(0,0)}. These sets, which include affine algebraic sets and projective
algebraic sets, are called quasiprojective algebraic sets. They are discussed in § 12.4.
A map / : X —> Y between quasiprojective algebraic sets X and Y is said to be an
algebraic map if the graph of / is a quasiprojective subset oi X xY: see § 12.4 and
§ A.4 for more details.
Finally, we discuss constructible algebraic sets in § 12.5. These sets, which
include all quasiprojective algebraic sets, may be defined as follows.
Constructible algebraic sets prove useful for two reasons. First, many natural sets,
e.g., images of algebraic sets or the set of points of the image of an algebraic map
where the fiber is a given dimension, are not quasiprojective, but are constructible
(see Theorem 12.5.6 and Lemma 12.5.9). Second, a constructible set A contained
in a quasiprojective set X is quite close to being an algebraic set, e.g., the closure
A of A in the complex topology is a quasiprojective algebraic subset of A (see
Lemma 12.5.3), and there is a dense Zariski open set U of A contained in A (see
Lemma 12.5.2).
We end with § 12.6, a brief discussion of multiplicity of algebraic sets. Roughly
speaking, this notion allows us to relate the algebraic degree of a system of equations
to the degrees of the irreducible components of the system's solution set. For a
single polynomial in several variables, this is a straightforward generalization of the
phenomenon of multiple roots (double roots, triple roots, etc.) that may appear
when factoring a polynomial in one variable. For systems of more than one equation,
the situation becomes a bit more delicate, as we shall discuss.
All four basic kinds of algebraic sets arise quite naturally in discussing the so-
lutions of polynomials on CN, as we show by examples. We include in this chapter
only the rudimentary facts about these different classes of sets, with further useful
facts collected in Appendix A. As this book is focussed entirely on polynomial sys-
tems, we may sometimes drop the modifier "algebraic" and speak simply of "affine
sets," "projective sets," etc., but meaning these in the algebraic sense.
Before diving in, let's clarify briefly how quasiprojective sets include both pro-
jective and affine algebraic sets, and how constructible sets include them all. Since
quasiprojective sets are of the form X \ (X n Y), where X and Y are both projec-
tive, they include projective sets as the special case where Y is empty. As for affine
sets, recall that CN is equal to P^ minus its hyperplane at infinity, Hoo, which is
Some Concepts From Algebraic Geometry 209
Naively, an algebraic set is nothing more than the common zeros of a set of poly-
nomials. Making this precise and convenient to use takes some work.
We start with a polynomial system
~ fi(xi,...,xN)~
f(x) := : (12.1.2)
Jn{x1,...,xN)_
Such a set of common zeros is called an affine algebraic set. The word affine in
"affine algebraic set" signifies that the set is a closed subset of Euclidean space,
which is sometimes called affine space.
For a system / as above in Equation 12.1.2, we usually abbreviate V ( / i , . . . , /„)
by V(f).
Example 12.1.1 The simplest polynomial system is p(x) = 0 where p(x) is a
monic polynomial of degree d in one variable with complex coefficients, i.e.,
k
with a,i £ C constants. As discussed in § 5.3, p(x) factors as \[(x — x,)Mi. Thus
i=l
V(p) consists of the k complex numbers xt. The multiplicity of X{ equals /x, (see
§ 12.6 for further discussion of multiplicity). Thus p(x) = x3 — x2 = x2(x — 1) = 0
has a zero set consisting of 0 and 1.
Unions of affine sets are affine, e.g., if A := V(f) for polynomials / :=
(A) • • • > fr) and B := V(g) for polynomials g := ( g i , . . . ,5s), then A U B is de-
210 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
fined by
Since any point is an affine set, i.e., (xf,... ,x*N) is denned by (xi — x*,... ,XN~X*N),
we have that have that any finite set is an affine algebraic set. Lemma 12.4.3 will
show that these are the only compact affine sets.
For a single polynomial p(x\, £2) £ C[xi, X2] not equal to a constant, the solution
set is a nonempty one-dimensional affine algebraic set.
Example 12.1.2 A simple polynomial system on C2 is given by X\ = 0. Here
the solution set is the X2-axis.
It is worth emphasizing that passing from a system / of polynomials to V(f)
throws away all multiplicity information. For example, on C, x5, and x define
the same affine algebraic set V(x). Also note that CN is the affine algebraic set
corresponding to the identically zero polynomial, and the empty set is the affine
algebraic set defined by a constant polynomial.
Here is a less trivial one-dimensional example of an affine algebraic set.
Example 12.1.3 Consider the polynomial w — z2. The set
V{w - z2) := {(z,w)eC2\w-z2= 0}
and
V(f)nV(g) = V(f,g),
we conclude that affine algebraic sets in CN are closed under finite unions and
intersections.
Given an arbitrary, possibly infinite, set of polynomials on CN, the Noetherian
property for ideals in C[zi,..., zjv] (see, e.g, (page 74 Cox et al., 1997)) guarantees
that there is always a finite subset of the polynomials with the same common zeros
on C^. This guarantees that an arbitrary intersection of affine algebraic subsets of
C^ is an affine algebraic set. This implies that the set of affine algebraic subsets
of CN that lie on a given affine algebraic set X C CN satisfy the axioms to be the
closed sets of a topology on X, which is called the Zariski topology. Here the open
sets U C X are the sets X \ Y, where Y C C^ is an affine algebraic set contained
in X. Open sets in this topology are called Zariski open sets. Similarly the affine
algebraic subsets of CN that lie on the given affine algebraic set X c CN are the
Zariski closed sets of X.
Besides the Zariski topology, there is the complex topology, which is also called
the classical topology. Given an affine algebraic set X C C^, the complex topology
on X is the topology that X inherits from the usual Euclidean topology on C^,
i.e., a basis of open sets on X at a point x* € X is given by the intersection of X
with the balls
Not every Zariski open set of an affine algebraic set X is of the form X \ V(g), e.g.,
in Example A.2.3, we show that 0 C C w for iV > 2 is not of the form V(g) for a
polynomial g.
where
JV
e
Li(x) := al0 + ^2 a%ixh a
ij C.
We say that TT is a generic linear projection if the coefficients a^ are chosen "ran-
domly." Precisely speaking, this only has meaning in the context of some property
we are interested in. For example, in Theorem 12.1.5 below, we say that a generic
linear projection restricted to X is proper, which means that there is a Zariski open
dense subset of the a^ £ £kx{N+i) w^ t n e prOper^y t n a t ^ ne restriction to X of
the linear projection, constructed from the Oy, is proper. Choosing a generic linear
change of coordinates, i.e., choosing N generic linear maps to C, any projections
along the coordinate axes is generic.
The simplest example of a nontrivial linear projection 7r : C2 —> C is given by
sending (2:1,£2) to X\. To see what this corresponds to in projective space, fix the
ernbeddings
• C2 into P2 given by sending (xi,x2) —> [l,Xi,x 2 ]; and
• C into P 1 given by sending Xi —> [l,Xj].
We now have a commutative diagram
C2 ^ P 2 \ {[0,0,1]}
ni in'
C ^P 1
where the map TT' : P 2 \ {[0,0,1]} —> P 1 is given by sending [xcXi,^] ~* [^Oj^i]-
Given two distinct points a, b £ FN, let (a, b) denote the unique line through them.
The map TT' is often referred to as the projection from {[0,0,1]} because we can
think of the map as sending each point i £ P z \ { [ 0 , 0 , 1 ] } to
(x,[0,0,l])f){x 2 =0}.
Intuitively we have a source of light at {[0,0,1]} and we send each point to the
shadow it casts on {xi — 0}. With projections, we are perfectly happy to change
the image by a linear transformation, and with this notion of equivalence, the
projection is uniquely determined by the point {[0,0,1]}. The point {[0,0,1]} is
called the center of the projection. Projections from points at infinity, i.e., points of
the form [0, a, b], correspond to linear projections C2 —> C given by sending (x\, X2)
to x\ — (a/b)x2 G C, as illustrated in Figure 12.1.
From the point of view of projective space, there is nothing special about the
points at infinity, and indeed on occasion, e.g., (Sommese, Verschelde, & Wampler,
214 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
2001b) and (Calabri & Ciliberto, 2001), it is useful to project from points not at
infinity. The case of a projection C2 —• C with a finite center is illustrated in
Figure 12.2, where point c is the center of the projection. (We only draw the real
part.) The set of all the lines through c are equivalent to a projective space P 1 ,
and the projection of a point x is the line P{x) through x and c. To perform
calculations, we will often select a line, such as line L, and set TT(X) := LnP(x). No
matter which line we choose in place of L, the essential fact is that all points along
P{x) \ {c} have the same projection as point x. From this observation, it follows
that the projection is determined uniquely by the center c.
ally £ Y :=n{X).
If dim X < k, then there is a Zariski dense subset U c X such that i\u : U —>
TT(C/) is an isomorphism. If X is of pure dimension k, then nx is a branched
covering of degree degX.
Given an affine algebraic set Z, we let Z reg denote the set of smooth points of Z.
The set Z reg is an open set, dense in Z, with Z \ Zreg equal to a union of affine
algebraic sets, which is why smooth points are also referred to as regular points.
We say that Z is irreducible if Z reg is connected. We would like to follow the
traditional, and very common, usage, e.g., (Mumford, 1995), and call an irreducible
affine algebraic set an affine variety. It is unfortunate that affine variety has been
used as a synonym for affine algebraic set by some authors. At this point it is safe
to say that anyone picking up a book on algebraic or complex geometry must check
whether varieties are irreducible or not (also reduced or nonreduced if that applies).
For example, in (Mumford, 1995) affine variety means irreducible affine algebraic
set, but in (Gunning & Rossi, 1965), a variety is a not necessarily irreducible reduced
analytic set. The word variety is easier to say than irreducible algebraic set, but,
to avoid confusion, we have reluctantly avoided use of this ancient word.
The irreducible decomposition of an affine algebraic set Z C C is the decom-
position Z := UaezZa obtained by first decomposing Zreg into the disjoint union of
connected components Ua and letting Za denote the closure of Ua. Here I is just
an index set assigning subscript numbers to the irreducible components. For many
of our algorithms, it will be useful to group the irreducible components according to
their dimensions, in which case we have index set Xi for dimension i, and we write
Remark 12.2.1 (The algebraic situation) Though we will use the geometric ap-
proach to solution sets, there is a natural approach based on the underlying algebra
of the polynomial system. Let 1{f) C C[x\,... ,XN] denote the ideal generated
by the polynomials / i ( x i , . . . , XN), • • •, fn(xi> • • • > XN) making up the polynomial
system /. Note that V(/) = V(J(/)). Given an affine algebraic set S c C N , let
I(S) C C ^ ! , . . . , xN], denote the ideal of polynomials vanishing on S.
An affine algebraic set Z is irreducible if and only if I{Z) is a prime ideal.
Given any ideal J c C[xi,..., XN], V% the radical of I, is the ideal consisting of
all / G C[xi,... ,XN] such that fk G I for some k. The irreducible decomposition
is equivalent to the fact that for any ideal I C C[xi,..., XN], we can write \/T =
C\a^j(Pa, where Va are the finite number of minimal prime ideals containing I. For
example, y/l{x\x\) = I(x\X2) = I(xi) n I(x2).
max dim-r Z.
xez
Here is a basic fact about dimension, which follows from the general result (Theorem
III.C.14 Gunning & Rossi, 1965).
(1) Since the smooth points of an irreducible affine algebraic set Z are connected,
it follows that given any point z G Z, every Zariski open neighborhood of z is
irreducible. This can fail in the complex topology, as shown in the following
example.
Consider the curve Z := V(x2—Xi(xi + 1)) in the neighborhood
of the point z = (0,0). The real part of this curve is shown to
the right, where one may see that near the origin, the curve is
2
resembles two lines, xi = ±xi, so in the local neighborhood it is /
not irreducible, even though globally the curve is one irreducible /~\/
piece. The solution set over the complexes is topologically a real \ Z ^ \ ~x[
two-plane stretched and bent such that two points touch each \
other. Local to the point of contact, it looks like two disks
touching transversely, but globally it is all one surface. This is
discussed in more detail in Example A.4.18.
(2) Real points of irreducible algebraic sets do not have to be connected, nor do
the components have to have the same dimensions. V{x\ — X\{x\ — l)(xi — 2))
is an example of the former and V{x\ — x\(x\ ~ 2)) is an example of the latter.
Nor does there have to be much relation between degrees and number of real
isolated zeros. For example, following (Example 13.6 Fulton, 1998), let
Though, for applications, affine algebraic sets are the main interest, we must also
define projective algebraic sets. We need them to be able to discuss what happens
at infinity for a given polynomial system, and in particular to be able to carry
out accurate counts of solutions of polynomial systems. Also the behavior of pro-
jective algebraic sets is often easy to understand, e.g., see the Proper Mapping
Theorem A.4.3, and they can be used to understand the behavior of affine algebraic
sets. In this section we continue the discussion of projective sets started in § 3.5.
FN is a compact manifold containing CN as a dense open set. The natural
approach to the definition of algebraic sets on WN is to define them as the solution
sets of finite numbers of whatever are the analogue for FN of polynomials on C^.
At first glance this does not look hopeful, since we cannot expect any nontrivial
global algebraic functions.
To see this consequence of the compactness of P™, consider the representative
case of P 1 . Polynomials on C are holomorphic functions, and so under any reason-
able definition, an algebraic function / on P 1 should be a holomorphic function. The
218 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
snag is that since P 1 is compact it follows from continuity that \f(x)\ has a maxi-
mum at some point x* G P 1 . Thus, then by the Maximum Principle, Lemma A.2.7,
f(x) must be constant on an open neighborhood of x*, and therefore on all of P 1 .
At first sight this is discouraging, but the key insight is that although there is
no reasonable class of algebraic functions on ¥N, there are some "almost functions"
lying around, i.e., the homogeneous polynomials. It is important to realize that,
even though homogeneous polynomials are not functions on projective space, they
behave as "extensions" to FN of polynomials on C^. Later we will return to ho-
mogeneous functions in § A. 13 and see that they are the prototypical nontrivial
example of "sections of line bundles."
Before we give definitions, let's work out a simple representative example. Let
p(xi,X2) — x\ — X2 + 1 be a function on C2. Regarding C2 as the coordinate patch
Uo C P 2 as above, we have in terms of the homogeneous coordinates [zoi^i)^] o n
P2 that x\ = ZI/ZQ and a:2 = Z^/ZQ. Thus the function x\ — X2 + 1 is represented by
(?) a -? +i =(fy^-^ + ^-
\zoj ZQ \zoj
Under the identification of Uo with C , it is easy to check that the closure in P 2
2
of the zero set V{p) is the zero set V(f) of the homogeneous polynomial f(z) :=
Z\ - Z0Z2 + Zl.
The following two examples indicate that counting solutions in C2, even when
we just have points, is not so clear cut as on C.
Example 12.3.1 Consider the system
f
^--=[ax7bf+c]- ^
The reader can check that if a ^ 0, then there are two solutions to f(x,y) = 0
(counting multiplicities in the obvious way when b2 — \ac = 0). But what about
the case a = 0, b ^ 0 where we only have one solution?
Example 12.3.2 Consider the system of two polynomials on C2
(12 3 6)
/(*.»):= [^2.] --
We expect two lines to meet in a point, but these two parallel lines do not.
We already met similar systems in Chapter 3, so we know that the key to sim-
plifying solution counts is to homogenize the systems. In this way, Example 12.3.1
becomes
wx y2
g(w,x,y) := [ ~ ] (12.3.7)
yv y/ v
' ' [ax + by + cw,\ ' '
Some Concepts From Algebraic Geometry 219
which now has for a = 0 a second solution point at infinity of [w, x, y] = [0,1,0] 6 P 2 ,
formerly "missing" from the affine version.
Similarly, Example 12.3.2 becomes
(12 3 8)
S(w,*,v)--=[jSZ]> '-
which now has the solution point at infinity along the x-axis, [w, x, y] = [0,1,0] € P 2 .
Note that Example 12.3.2 shows that if we have a system / on CN, then V(f),
the closure in FN of the set of solutions of / , may be smaller than the set of
solutions V(f ) of the associated system / of homogeneous polynomials on FN. In
that example, V(f) is empty, so V(f) is too, whereas V(f ) is the point {[0,1,0]}.
It is easily checked that V{J) n C N = V(f).
Sets of the form X \X nY, where X, Y C FN are projective algebraic sets, are
called quasiprojective algebraic set. These include sets of the form X \ X nY,
where X, Y c P ^ are affine algebraic sets. The simplest nontrivial example of a
quasi-projective algebraic set which is neither projective nor affine is C 2 \ 0.
As with affine algebraic sets, we can with no changes define the Zariski and
complex topology and the notion of irreducibility.
The following is a basic fact.
Theorem 12.4.1 Let U be a Zariski open dense subset of a quasiprojective alge-
braic set X. Then the closure of U in X in the complex topology is X.
Finally we note that all the basic results such as the irreducible decomposition of
§ 12.2 hold for quasiprojective algebraic sets (respectively projective algebraic sets)
and not just for affine algebraic sets. The only difference is that the irreducible
components in this generality are not affine algebraic sets, but are only quasipro-
jective varieties (respectively projective algebraic sets). Using this we carry over all
the definitions of dimension. For example, a pure-dimensional quasiprojective set
is a quasiprojective set with all irreducible components having the same dimension.
Let X and Y be quasiprojective algebraic sets. We define an algebraic map
f : X —> Y between X and Y to be a map such that for all x € X and j e F
there are affine open sets U C X containing x and V C Y containing y such that
f(U) C V and / : U —> V is algebraic. The set X x Y is a quasiprojective set,
which may be shown by elaborating on § A. 10.2. The graph of a map f : X —> Y is
the set
Proof. The first assertion follows immediately from (Chapter 4, Corollary (4.16)
Mumford, 1995).
The second assertion would follow if we knew it for irreducible quasiprojective al-
gebraic sets. Given any irreducible quasiprojective set, there is a connected smooth
manifold mapping onto it by Hironaka's Desingularization Theorem A.4.1. Since
connected manifolds are path connected, we are done. •
Proof. To see this assume otherwise. By the irreducible decomposition from § 12.2,
we know that if X is compact and not finite, then X contains a compact irreducible
infinite affine algebraic set. We can assume without loss of generality that X is
this set. The absolute value of any coordinate functions Z; restricted to X has a
maximum on X. By Lemma A.4.2, the restrictions of all the coordinate functions
are constants, and hence X is a single point. •
F (12A9)
™=[Z,~-u]=0-
parameterized by (t,u) € C2. The set of (t,u) £ C2 where F^u)(x,y) = 0 has a
nonempty solution set is
{(o,o)}u{t^o}.
This set is not quasiprojective, but it is constructible.
Let X be a quasiprojective algebraic set. Let A{X) denote the set of closed
algebraic subsets of X. A(X) is closed under finite unions and arbitrary intersec-
Some Concepts Prom Algebraic Geometry 221
tions. The set T(X) of complements of the elements of A(X) are the open sets of
the Zariski topology of X. The set C(X) of constructible sets of X is the smallest
set of subsets of X that
• contains A(X) and
• is closed under a finite number of Boolean operations,
where the Boolean operations are union, intersection, and sending a subset of X
to its complement in X. Otherwise said, C(X) is the Boolean algebra of subsets
of X generated by A{X) (or equivalently T{X)). Constructible sets are the outer
limits of the type of sets that need to be considered in the numerical analysis of
polynomial systems. We will see that they arise naturally when working with affine
algebraic sets.
We present here a few key facts about constructible sets. A fuller discussion
may be found in (Chaps. AG.l and AG.10 Borel, 1969).
Lemma 12.5.2 Let X be a quasiprojective algebraic set. Assume that A C X is
a constructible set such that A = X, where the closure is in the Zariski topology.
Then there exists a Zariski open and dense set U C X such that U C A.
When we take closures of constructible sets (and almost every set that comes
up in this book is at worst constructible) this lemma tells us it does not matter
whether we use the complex or Zariski topology: in either case we get the same
algebraic set. For this reason, we often do not specify which topology we are taking
the closure in.
It is useful to record the trivial case when a constructible set is automatically
algebraic, a corollary to Lemma 12.5.3.
Lemma 12.5.4 Let A be a constructible algebraic subset of an affine (respec-
tively protective, respectively quasiprojective) set. If A = A, e.g., if A is closed
in the complex topology, then A is an affine (respectively projective, respectively
quasiprojective) set.
Example 12.5.1 is a simple and fairly typical example of a constructible set.
Here it is said a slightly different way.
Example 12.5.5 Consider the map F : C2 —> C2 which sends (z,w) —• (z,zw).
This is a nice algebraic map, but the image is (C2 \ {z = 0}) U {(0,0)}, which is
222 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
neither the set of zeros of a set of polynomials nor the complement of such a set.
This is about the worst the image of an algebraic map gets.
Maps of algebraic sets that "should" be surjective often fail to be because the
domain lacks some points at infinity. For example, the map from V{zw — 1) C C2 to
C obtained by sending (z,w) —> z misses z = 0. Example 12.5.5 is also of this sort.
For this reason, it is often more useful to use the notion of a dominant map. A map
/ : X —> Y between quasiprojective algebraic sets is called dominant if f(X) = Y.
Lemma 12.5.8 Let f : X —>Y be a dominant algebraic map from a quasiprojec-
tive set X to an irreducible quasiprojective set Y. There exists a Zariski open dense
set V C Y contained in f(X).
12.6 Multiplicity
V(p) = UUiZN-i,i
where ZN-I,I are distinct afSne algebraic sets, i.e., distinct irreducible affine alge-
braic sets. Moreover, dim.Zjv-i,t = N — 1 for all i and there are polynomials qi{x)
such that
z1=0 (12.6.11)
^2 — 0.
Ocv/(/i,-,/n),
where
(1) GcN,x' is the ring of convergent power series centered at x*\ and
(2) (/i,..., /„) is the ideal of OQN x. generated by the polynomials /».
It is straightforward to see that when n = N = 1 this agrees with the notion of
multiplicity, that we are used to, but it is certainly not clear what this means when
N > 1. Also, why convergent power series? It turns out this is just a convenience
for us. One could use instead formal power series, or the ring of rational functions
p(x)/q(x) with q(x*) ^ 0. But the equivalence of the multiplicities obtained these
different ways is not obvious!
In the special case n = N, \i has a simple geometrical interpretation. If x* is
a multiplicity /x isolated solution of f(x) = 0 and you choose a generic vector v 6
C^ sufficiently near 0, then f(x) = v has exactly /i nonsingular isolated solutions
x*,...,x^ near x*. By nonsingular we mean that the Jacobian matrix, J, with
elements
T _ dfj(x)
j
~ dXj
is invertible at each of x*,..., x*. This, in fact, implies that ji = 1 is equivalent to
the solution x* being nonsingular isolated. Another consequence of this, in the case
n = N, is that with appropriate homotopies of the sort we construct, the number
of paths ending at x* equals the multiplicity.
Unfortunately, when n ^ N, the meaning of multiplicity becomes a bit more
obscure, and not so closely connected to geometric intuition. This is a reflection
of the complexity of the nonreduced structures on points in higher dimensions,
i.e., the zero dimensional nonreduced schemes. Since we do not make much use
of multiplicity we do not pursue this. If you do, you need to put multiplicity
into a broader context of Hilbert functions, e.g., see the discussion of multiplicity
in (Hartshorne, 1977), and in particular (Exercise V.3.4c Hartshorne, 1977). The
books (Eisenbud, 1995; Fulton, 1998) are good algebraic references. See also (Bates,
Peterson, & Sommese, 2005a) for a numerical-symbolic algorithm for computing
multiplicity.
Multiplicity for us arises in another way. Consider C := V(x\ - x\) c C2. The
multiplicity of C as a component of the solution set of x\ — x\ = 0 is 1. In this case,
it is useful to attach a multiplicity to each point of C. We define the multiplicity of
a point x* € C as a point of C to be the multiplicity of x* as an isolated solution
Some Concepts From Algebraic Geometry 225
x* of the system
where ^ ( a 0 + aixi + 02^2) is a generic line vanishing at x*. An excellent and very
readable reference for this sort of multiplicity is (Chapter 8 Fischer, 2001).
12.7 Exercises
Exercise 12.1 (Solution Components) Solve the system on page 207 using a
total-degree homotopy. Do you get points on every component? How many?
Exercise 12.2 (Projection from a Point) Write out the formula for a projec-
tion C 2 —> C from center c onto the line {x2 = 0}.
Exercise 12.5 (Classifying Sets) Classify each of the following sets as affine,
projective, quasiprojective, or constructible. Remember that the classifications are
not mutually exclusive.
(1) V{xy-y-l).
(2) The image of V(xy — y — 1) under the projection (x, y) 1 — > x.
2 2 2
(3) V(x + y + yz,xz-2z ).
(4) The set of quadratic equations in one variable that have two distinct roots.
(5) The nonsingular solution points of y2 — x 2 (x — 1) = 0.
(6) Points in C 2 that are not nonsingular solutions of y2 - x 2 (x - 1) = 0.
(7) Pairs of points in C 2 such that there is a unique line containing them.
(8) Pairs of points as in the previous item such that the line contains the origin.
'hixy
f[x)= j =0. (13.0.1)
Jn{x)_
Then, our object of study is the solution set of / = 0, which we often write as
z = vu).
As discussed in § 12.2, we know that any affine algebraic set decomposes as
Z:=uf™zZz, Zi:=\JjeXiZi:i (13.0.2)
227
228 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
where Zj is the union of all i-dimensional irreducible components of Z, and where Zij
for j £ Xi are the finite number of distinct irreducible components of Zi. Geomet-
rically, the Z^ are the closures of the connected components of the set of manifold
points of Z. The algebraic set Z might be the entire solution set V(f) or it might
be the union of several of its irreducible pieces. In the latter case, once we have
built our encoding for Z, we wish to answer all our questions about Z alone, as if
all the components excluded from Z did not exist.
The purpose of this chapter is to motivate and describe an encoding of algebraic
sets that we call witness sets. A first look at these is given in an introductory section,
§ 13.1, without worrying about how we can compute them or even justifying that
they are well denned. In § 13.2, we present basic theory concerning the intersection
of irreducible components with linear spaces, which is the underpinning for our
formulation of witness sets, given precisely in §13.3. Next in §13.4, we define the rank
of a polynomial system and present a fast algorithm to compute it. Then, in § 13.5,
we show how the solution set of a system of polynomial equations relates to the
solution set of a system of random linear combinations of those same polynomials.
This prepares us for an algorithm in §13.6 to compute a loose inclusion of witness
sets, called witness supersets. The final section, § 13.7, uses these concepts and
procedures to obtain numerical methods to answer several of our basic questions
itemized above.
Much of this Chapter is based on the article (Sommese & Wampler, 1996), where
the subject Numerical Algebraic Geometry was started, and its name coined. The
name was chosen to indicate that this subject would be to algebraic geometry what
numerical linear algebra is to linear algebra.
After this chapter, one major problem remains before we can compute the nu-
merical irreducible decomposition, Equation 13.1.3, namely, the witness point su-
persets are only a crude approximation to the numerical irreducible decomposition.
A lesser problem is that the procedures given in this chapter for finding the witness
point supersets are not as efficient as we would like.
The two chapters following this chapter show how to solve these problems. Chap-
ter 14 gives the efficient algorithm of (Sommese & Verschelde, 2000) to find the wit-
ness point supersets. Chapter 15 gives efficient algorithms (Sommese & Wampler,
1996; Sommese, Verschelde, & Wampler, 2001c, 2002b) to process the numerical
irreducible decomposition out of the witness point supersets.
The notion of witness set has developed over time. Originally in (Sommese &
Wampler, 1996) and continuing through (Sommese & Verschelde, 2000), the cen-
tral notion was that of generic point of a component, though all the information
contained in what we now call witness sets was being computed and used. In the suc-
cessive articles (Sommese et al., 2001c, 2002b), the notion of irreducible witness sets
was distilled out as the essential numerical output of our algorithms. The enriched
version of the witness sets for nonreduced components, presented in this chapter
for the first time, is based on the experience gained from (Sommese, Verschelde, &
Basic Numerical Algebraic Geometry 229
Wampler, 2002a).
What should we adopt as our numerical encoding of algebraic sets? Let's begin
by considering the simplest case, a zero-dimensional algebraic set Z. This is just a
finite set of points, so we can use as our encoding a list of the points. When we are
given a system of N polynomial equations in TV unknowns, the methods of Part II
allow us to find a numerical approximation to all nonsingular solution points, and
in fact, those methods give us a list of homotopy path endpoints that includes all
isolated singular solution points as well, although we cannot readily sort these out
from singular endpoints on higher dimensional components. Nevertheless, we have
some confidence that the encoding of Z as a list of solution points is computable.
Moreover, up to the approximation of numerical roundoff, we can easily answer all
our questions about membership, union, intersection, etc. The subtlety regarding
isolated singular solutions is a concern that we can resolve by considering the larger
picture that includes higher dimensional components.
But what shall we do when Z is positive dimensional? Looking at natural
examples, e.g., the set V{x{) C C2, there are two obvious ways of encoding the
points of these algebraic sets.
The first approach is to use a parametric representation, e.g., representing
V(xi) C C2 as {t £ C | (xi,x2) = (0,t)}. Unfortunately, while parametric repre-
sentations are very useful, they are also rare. For example, in Remark A.2.10, we
sketch an argument showing that a curve as simple as V{x\—x\{x\ — l)(xi —2)) has
no parametric representation. A nice discussion of which curves have a parametric
representation may be found in (Abhyankar, 1990).
A second approach is to use denning equations. Since by definition, algebraic sets
are solution sets of polynomial equations, we know that this approach has to work.
Indeed, this is the approach taken in computational algebra. Low degree equations
vanishing on an algebraic set are nothing to scoff at: they can be very useful.
Unfortunately, computing denning equations is numerically expensive. Furthermore
such equations can be numerically unstable.
Numerical Algebraic Geometry rests on a third approach, using the notion of
witness sets. This natural data structure to encode algebraic sets is based on the
concept of generic points and the classical notion of a linear space section.
Since we are going to talk often about linear subspaces of CN, it is convenient
to introduce a shorthand notation for them. We use the following conventions:
• Z/dL*J c C^ denotes an affine linear subspace of dimension i; and
• Lcr*l c C^ denotes an affine linear subspace of codimension i, or equivalently,
of dimension N — i.
Depending on context, it is sometimes easier to use the notation of codimension
230 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
instead of dimension, which is why we introduce both. Consider two generic linear
subspaces, L\ and Li. Their dimensions add under the operation of union, while
their codimensions add under the operation of intersection. If their dimensions are
complementary, their intersection is zero-dimensional; i.e., L^ n L^ i s a point.
The following fact, demonstrated in § 13.2, is the foundation of the notion of a
witness set. Let A C C be a pure i-dimensional algebraic set. Given a generic
affine linear subspace LCM 6 C^, the set A = Lcl"*l n A consists of a well-defined
number d of points lying in ATeg. The number d is called the degree of A and
denoted deg A. We refer to A as a set of witness points for A, and we call Lcl~ll the
associated slicing (N — i)-plane or just slicing plane, for short.
The number of witness points A tells us the degree of the set A, and if we
determine the codimension of the slicing plane that cuts A in isolated points, we
have determined the dimension of A. However, to answer most any other question
about A, such as to test whether a given point x is in A, we need the ability to track
the paths of the witness points as the slicing plane is moved. When A is a pure
i-dimensional reduced component of a polynomial system / , then the witness points
A are nonsingular roots of / restricted to Lc'z> (more on this in a moment), so the
data structure W := (A,Lc^\f) is everything a nonsingular path tracker needs
to track solution paths starting at A as L c ^l evolves continuously. Accordingly,
we call W a witness set for A. When A is not reduced we need a slightly richer
structure, which will be discussed in § 13.3.2 and § 13.3.3. We will see later, in
§ 15.2, how to generate from a witness set as large a number as we wish of widely
spaced points on A.
The witness set data structure, more fully described in § 13.3, has many advan-
tages:
(1) it is stable and much cheaper numerically than finding defining equations;
(2) it is sparing of memory;
(3) it can be used to compute quantities of interest, e.g., if you really want defining
equations they may be computed from this encoding; and
(4) it is special case of the notion of a linear space section, for which there is an
extensive theory (Beltrametti & Sommese, 1995).
Using witness sets, we can make numerical sense out of what it means to find the
solution set of a system of polynomials f(x) = 0 in Equation 13.0.1. We wish to find
a numerical irreducible decomposition that mirrors the irreducible decomposition of
Equation 13.0.2, by which we mean to find a collection of witness sets W, for the
i-dimensional components V$, which are themselves decomposed into irreducible
witness sets Wy for the irreducible components Vij, i.e.,
an algebraic set A. When two algebraic sets, say A and B, have no components of
the same dimension, the witness set for their union is just a formal union of their
witness sets, that is,
However, when A and B have some irreducible components of the same dimension,
we require the witness sets of the components with the same dimension to have the
same slicing planes L. So, in the reduced case with A a pure-dimensional union of
components of V(f) for a system of polynomials / on C^ and B a pure-dimensional
union of components of V(g) for a possibly different system of polynomials g of the
same dimension as A, the formal union resolves as
where A and B are witness point sets for A and B, respectively. The resolution of
unions in this fashion is not necessary, but it is convenient, and if two witness sets
have different slicing planes, they can always be brought to a common slicing plane
by homotopy continuation.
In computing a numerical irreducible decomposition, we are faced with the op-
posite problem of computing a union. Our procedures will first find the witness set
Wi for the i-dimensional component Vit and subsequently, its witness point set Vj
will be partitioned into irreducible witness point sets Vy.
In the above overview, we have claimed the existence of witness sets and asserted
some of their basic properties. This chapter aims to justify these assertions and
to describe some rudimentary algorithms based on them. Subsequent chapters will
discuss refinements and extensions. We begin in the next section with the basic facts
about intersecting irreducible components with linear spaces, thereby establishing
that witness sets do indeed exist and have the main properties that we asserted
above.
We use the terms slicing or linear slicing to mean intersecting algebraic sets with
linear spaces. The answer to the following question supports the use of linear slicing
and will give witness sets much of their power.
How does the irreducible decomposition, Equation 13.0.2, behave under slic-
ing by general hyperplanes?
The crucial value of linear slices is that they have good preservation properties, i.e.,
given a general hyperplane L C CN, Z and ZnL share several important properties.
232 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
An affine hyperplane (or hyperplane for short) Lc^~\ C CN is the zero set of a
linear equation, which we denote
with the <2j E C not all zero for i > 1. C\{x;a) and £2(^1 &) have the same zero
set if and only if a = Xb for some complex number A ^ 0. Thus affine hyperplanes
are parameterized by the subset of points [ao,ai,... ,ajv] £ P ^ with aj 6 C not
all zero for i > 1. The single point not in this set, [1,0,... ,0] G P " , corresponds
to the hyperplane at infinity. Similarly, we regard affine linear spaces Lcl"*l c CN
as parameterized by i-tuples (Ai,..., Ai) G (P^) , where Aj := [a^o, • • •, CLJ,N] and
the rank of the matrix
ai,o • • • OI,JV
A— : •-.:
difi . . . Ili^N _
is i. The linear space is the zero set of the linear equations, so we may write
Though we use this representation below, it is not optimal for i > 2. For example,
given an invertible i x i matrix F, the linear equations associated to the matrix
(F • A) define the same affine linear space as the linear equations associated to A. A
much crisper parameterization is given by the use of the Grassmannian, as discussed
in § A.8.1.
We are interested in the relation between the solution set of the polynomial
system f(x) = 0 of Equation 13.0.1 and the augmented polynomial system
" fi(x) -
'• =0 (13.2.4)
fn(x)
.£(x; a) _
on C ^ , where C(x;a) is a general linear equation. The basic facts are as follows.
Theorem 13.2.1 (Slicing Theorem) Let X C C ^ denote a pure i-dimensional
affine algebraic set. There is a Zariski open dense subset U C PN such that for
a€U and L = V (£{x; a)),
: = A
: '
•LK{X) +
.XN .
is generic with respect to an irreducible affine algebraic set X if given any subset
Li1,..., Lir of r distinct Lj, it follows that
\L^X)] :=A
fxV
: '
•LK{X)1 XN
X
, N .
Proof. This follows from the stronger result (Lemma 1.7.2 Fulton, 1998). •
=o (i3 2 5)
^••=Uiri)] - --
Clearly, a general A with i < N has full rank i. Using standard techniques from
linear algebra, we can write the solution of C(x; A) = 0 in the form
The solutions of the extrinsic and intrinsic systems are isomorphic under the map-
pings u — i > p + B • u and x ^ B^(x — p), where B^ is the pseudoinverse of B.
Since /z, : CN~l —> C" has fewer equations and variables than f(x), the intrinsic
formulation can save computation compared to the extrinsic one.
From a geometric point of view, the extrinsic and intrinsic formulations are
identical: they both describe the intersection of V(f) with LC^T. Furthermore, in a
situation where we wish to choose the slicing plane generically among the set of all
affine (N — i)-planes, it does not matter if we do so by choosing random coefficients
A as a point in (P^) 1 or if we choose random {p, B) for the intrinsic formulation.
Either way, we are choosing a random slicing (N — i)-plane from the Grassmannian
of all such planes in C^.
Basic Numerical Algebraic Geometry 235
The strong version of the slicing theorem, Theorem 13.2.2, gives us everything we
need to justify our definition of a witness set. It tells us that for an affine algebraic
set X, a generic Lc^1^ c CN meets the irreducible components of X as follows.
• It misses any irreducible components of dimension less than i.
• It meets each irreducible component X^ of dimension i in degXij isolated
points, and these points do not lie on any other component.
• It intersects irreducible components of dimension k > i in an irreducible alge-
braic set of dimension k — i.
Moreover, Theorem 13.2.2 implies that LCW will be generic with probability one if
we choose the coefficients of its defining linear equations at random from <CIX^N+1\
so it is easy to numerically generate generic slicing spaces. Accordingly, we adopt
the following definition.
Definition 13.3.1 (Witness Set) Let Z C C^ be an affine algebraic set, and let
X be a pure i-dimensional component of Z. Then a witness point set for X as a
component of Z is the set X n L, where L C C^ is a linear space of codimension %
that is generic with respect to all the irreducible components of Z. A witness point
set for Z is just a collection of one witness point set at each dimension i, i =
0,... , dimZ. Finally, when X is an irreducible algebraic set, we say that X n L is
an irreducible witness point set.
The witness point set for a pure i-dimensional set X± whose irreducible compo-
nents are Xij, j G Xj, is just
If Xi has more than one irreducible component, then its witness point set can be
decomposed into a collection of irreducible witness point sets for its components.
The witness point sets are the main theoretical construct of interest in our
approach. The key to answering all of the questions at the top of this chapter comes
down to computing witness point sets. However, for a particular slicing plane L 0 ^!,
i > 0, afiniteset of points in Lc^^ does not uniquely identify any particular algebraic
set: many different algebraic sets pass through those same points. Accordingly, we
see that a witness point set alone is not a complete encoding of an algebraic set; for
that, we need to carry along additional symbolic information describing the set.
We wish to define a witness set for an affine algebraic set X to consist of a
witness point set plus such additional information required to uniquely define X.
Suppose that X is an i-dimensional irreducible solution component of a system of
polynomial equations f(x) = 0. If we have a symbolic formulation of f(x) on hand,
the set {X (1 Lc^%\ Lc^l\ f} defines X uniquely: the witness points tell us which
236 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
component of V(f) is of interest.1 We hasten to add that situations may arise where
we define an afHne algebraic set in some indirect manner such that, although there
must exist a set of polynomial equations that define the set, we do not necessarily
have such a set at hand. As such situations arise, our definition of a witness set will
be adapted to accommodate them.
While a witness set of the form {X n Lci>l\ Lc^l\ f} is everything we need for
theoretical purposes, it is not always sufficient from the numerical point of view.
As a data structure in a computer program, we want to treat / as a pointer to a
black-box routine that, given a point x*, returns just the function value f(x*) and
the Jacobian matrix df/dx(x*), as floating point numbers. We would prefer not to
perform any symbolic manipulations to extract information from / . Even in the case
of a zero-dimensional algebraic set, which is just a finite set of points, a numerical
witness point set is just a list of approximations to those points. A witness set
should carry along enough additional information to allow us to numerically refine
these approximations to higher precision. Exactly what additional information we
carry along to numerically encode an algebraic set X will depend on the properties
of X and also on the initial symbolic information we have been given to uniquely
describe X. Accordingly, we will have several different flavors of witness set, but
each will include a witness point set and enough additional information to allow
us to use the witness set in our numerical algorithms. In an implementation of
these algorithms in computer code, the witness set would be a data structure that
includes a field identifying its flavor, and basic operations on witness sets need to
be able to handle all such flavors.
In the next few paragraphs, we define threeflavorsof witness sets that are useful
in numerical work. We will not yet give numerical algorithms for computing such
sets; these come later in the chapter.
{XinLcW,LcW,f}
as its witness set.
{Wy,LcW,g,n}.
Of course, in our numerical work, Wy will be a numerical approximation to the ideal
points. To refine the witness point set n(Wy), we use Newton's method to refine
238 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
W(e) := {xw(e) \ w e W}
consisting of the solutions of h(x, e) = 0 that lead to the witness points W as e —>• 0,
with e € (0,1].
Our third flavor of witness set for an i-dimensional algebraic set X is, accord-
ingly, the data
{W,Lc^,h(x,t),W(e),e},
where in addition to the conditions that L c ^ is a generic (AT — «)-dimensional linear
subspace of C^ with W := Lc^ fl X, we have h(x, t) and W(e) satisfying
(1) for each point w G W, we have a positive e > 0 and a nonsingular path
xw(t):(0,e]->CN
with h(xw(t),t) — 0 and limt^oxw(t) = w, and
(2) W(e) = {xw(e) \w£W}.
Basic Numerical Algebraic Geometry 239
' xi '
A- x := A- : ,
_XN .
where A is an n x N matrix, the rank of the system is the classical rank A of the
matrix A and the corank is the dimension of the null space of A • x = 0.
Note that given a system / as above, neither adding polynomials in the equations
of / to the system nor replacing / with F • f, where F is an invertible nxn matrix,
changes the rank of the system.
Lemma 13.4.1 Let f(x) = 0 denote the system of n polynomials on CN. Then
there is a Zariski open set Y c f(CN) such that for y eY, V(f(x) — y) is smooth
of dimension equal to the corank of f. Moreover, the Jacobian matrix of f is of
rank equal to rank/ at all points ofV(f(x) — y).
An important consequence of the above is that for the dense Zariski open set
U := f~1(Y), we have for all points 2* € U, the rank of the Jacobian of / evaluated
at £* equals rank/. This gives us a fast probability-one algorithm for the rank for
a system f. Explicitly, given a system f(x) of n polynomials on CN, then the rank
of / equals the rank of the Jacobian at a random point of C^. To emphasize its
importance, we restate the algorithm below.
240 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
— return(r).
The numerical determination of the rank of the Jacobian matrix is best done using
the singular value decomposition.
The rank intervenes in the study of systems in the following way, which will play
an important role in subsequent developments.
Remark 13.4.3 Surprisingly, previous to this book, the rank of a system has not
been defined explicitly in numerical algebraic geometry.
The definitions of rank and corank make sense for systems of algebraic func-
tions, e.g., rational functions, defined on an irreducible quasiprojective algebraic
set X. Lemma 13.4.1, Theorem 13.4.2, and the algorithm for the rank carry over
immediately with the same proofs to this situation. This generalization, which will
be needed in Chapter 16, is presented in § A.6.
Basic Numerical Algebraic Geometry 241
7i(*)"
f(x) = : =0 (13.5.7)
Jn(x)_
of polynomials on C^ with n = N. When we numerically solve a system of equa-
tions, it is usually convenient, and sometimes necessary, to have the same number
of equations as unknowns.
The systems we wish to study might not be square. If n < N, we call the system
underdetermined, and if n > N, we call it overdetermined. However, if it is under-
determined, its rank is at most n, so by Theorem 13.4.2, its irreducible solution
components must be dimension at least N — n. We will work with such components
by slicing them with at least N — n hyperplanes, resulting in an augmented sys-
tem having at least as many equations as unknowns. Of course, when augmented
by slicing planes, square systems become overdetermined, and overdetermined sys-
tems stay overdetermined. Therefore, we see that the overdetermined case needs
attention.
To find the isolated solutions of an overdetermined system, n > N, the naive
approach is to pick out N equations, solve them, and check the solution points
against the remaining equations. This approach is fraught with peril. For example,
consider the system:
xy = 0
x(z-y) = 0 (13.5.8)
y(x - y) = 0.
Any two of the 3 equations have a 1 dimensional solution set, but all three together
have the origin (with multiplicity 3) as the solution set.
There is a natural procedure for obtaining a square system from the above
system, f(x) — 0. Given a n i V x n matrix of complex numbers A G CNxn, we can
form a square system
/ *1,1 fl +•••+M,nfn \
A./= :
\-V/V,l/l + • • • + AjV,n/n /
As we will show below, this square system has all the properties we need to compute
an irreducible decomposition of V(/), and in our first article (Sommese & Wampler,
1996) on Numerical Algebraic Geometry, this was our approach.
In the following paragraphs, we present a somewhat more general view of ran-
domization, which is essential in dealing with intersections of irreducible algebraic
242 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
sets (Sommese et al., 2004b). In particular, let us consider the system A • /(x),
where the k x n matrix A G Ckxn is chosen generically. If k = N, this is a square
system, but we may consider k ^ N as well. Let us discuss the principal facts about
this construction.
First note that this construction is only of interest if k < n. To see this, note
that if k = n, then such a A is invertible. Consequently, the systems A • f(x) = 0
and f(x) = A" 1 • A • f(x) = 0 are equivalent. If k > n, then we may break A G C fcxn
into two submatrices by rows; the matrix Ai formed from the first n rows is an
invertible matrix for a nonempty Zariski open set of the k x n matrices A £ Ckxn.
Let A2 be the remaining (k — n) x n matrix formed from the last k — n rows of A
and let
r ._ [ Aj Onx(k-n)]
1
• [-Aa-Ar /*-„ J
with 0nX(k-n) the n x {k — n) matrix with all zero entries and Ik-n the (k — n) x
(k — n) identity matrix. Then T is invertible and A • f{x) = 0 is equivalent to
r h(x) '
/l fk+l
A
': + ' • ':
Jk\ L fn .
the randomization
'x2 + x22-l + Xl(x1-l)^ = Q
a;2 + A 2 (x 1 -l) J
would be better than the randomization
[ X2 + Xi{xl + x22-l) 1_
[Xl-l+X2(xl+xl-l)\
since there would be only two paths to follow using a total degree homotopy on the
former as opposed to four paths on the latter.
The key properties of randomization are given by the following simple theorem
of Bertini type.
Theorem 13.5.1 Let
7i(*)'
/(*) = : =0
Jn(x)_
be a system of polynomials on CN. Assume that A C CN is an irreducible affine
algebraic set. Then there is a nonempty Zariski open set U of k x n matrices
A e C fexn such that for A e U
(1) if dim A > N — k, then A is an irreducible component of V(f) if and only if it
is an irreducible component ofV(A • f);
(2) if dim ^4 = N — k, then A is an irreducible component ofV(f) implies that A
is also an irreducible component ofV(A • / ) ; and
(3) if A is an irreducible component ofV(f), its multiplicity as a solution compo-
nent of A • f(x) = 0 is greater than or equal to its multiplicity as a solution
component of f(x) = 0, with equality if either multiplicity is 1.
It is important to emphasize that although an irreducible component of V(f) is
an irreducible component of the randomized system, V(A • / ) , its multiplicity as an
irreducible solution component of A • / = 0 (if not 1) might be larger than as an
irreducible solution component of / = 0. The following system, which is equivalent
to the system Equation 13.5.8 illustrates this:
xy = 0
x2 = 0 (13.5.9)
y2 = o.
The origin is an isolated solution of multiplicity 3. The randomized square system
x(y + /xix) = 0 (13.5.10)
y(x + n2y) = 0.
244 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Remark 13.6.2 Since for generic Lc^^, Z n Lc^l is empty for i > dim Z, we see
that the witness point supersets for all dimensions greater than dim Z are empty.
Let Vi be the union of all the i-dimensional irreducible components of V(f), and
let Wi be a witness superset for Vi. UVi is not the maximal dimensional component
of V(f), then a linear space Lc^l will meet the higher dimensional components, and
Basic Numerical Algebraic Geometry 245
Wi = Wi + Ji (13.6.11)
- If i > N - rank/,
* Compute S := HomSolve({fH(/; N - i), £.}).
* Let W :={s G S \ f(s) = 0}.
* retum(W,L).
- Else return(t? := 0,L).
Theorem 13.6.3 For 0 < i < N, there is a Zariski open dense set U € CixN+1
such that for (a, A) & U, algorithm WitnessSupi returns a witness point superset
for the i-dimensional component ofV(f).
For i = N, the algorithm uses the probabilistic null test to see if all the functions
in / are the zero polynomial. For all other i > N — rank/, we solve a square system
of size N. The statement of the algorithm above uses an extrinsic formulation of
slicing. To work intrinsically, we just change a few lines, and in so doing, decrease
the size of the square system we solve to only N — i.
When i = N — 1, the system to be solved has only one variable, so the call to
HomSolve could be replaced by any other method for solving polynomials in one
variable.
With WitnessSupi available to find a witness superset for the i-dimensional
component of V(f), it is a simple matter to assemble a collection of such sets for
every possible dimension. To be explicit, we display the full algorithm, Witness-
Super, below.
Recall that in the case of nonreduced components, we wish to include in our nu-
merical witness sets additional information to allow robust numerical computation
of the witness points, either a deflation formulation as in § 13.3.2 or a homotopy
formulation as in § 13.3.3. Clearly, a homotopy is available inside algorithm Wit-
nessSup, we merely have to return the information. A deflation formulation can
be returned if one is used in the endgame of HomSolve. Notice that deflation
can only work on the true witness points in the witness superset, because these
are isolated solutions, whereas the junk points are not. So before trying to deploy
deflation, we need to separate the junk from the witness points, which requires the
methods of Chapter 15.
13.6.1 Examples
In the following examples, the tables summarizing runs of algorithm WitnessSuper
have columns labeled as follows:
248 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The number of function evaluations depends on details of the path tracker and
the endgame, including the various control settings they use. The figures reported
here are for the default settings in H O M L A B . These numbers will change slightly
in repeat runs, because the paths depend on random choices of slices and in the
randomization to square up a system. They are included to give a sense of where
the algorithm spends most of its effort.
Example 13.6.4 Consider the system given in Equation 12.0.1, which for con-
venience we repeat here:
f(xv)-\ sfo2-*3)^-1) l - o
This is consistent with the fact that V(f) has a degree 4 component at dimension 1,
decomposable into V(x) and V(y2 - x3). The superset at dimension 1 has four
points and no junk. At dimension 0, all paths must end on V(f) as there is no
slice involved. (This would not necessarily be true if / had more equations than
variables.) Of the 30 path endpoints, 28 are singular. The true witness points are
the two nonsingular points in the witness superset, the other 28 are junk. Junk
points are always singular, but it would be erroneous to conclude that singular
points in the superset are necessarily junk. In fact, if the factor (x — 1) in the
first equation were changed to (x — I) 2 , the zero dimensional solution points would
become double points and therefore would be singular.
Basic Numerical Algebraic Geometry 249
f(x v ) - Wy-xy-2yA _
n*,v)-[ xy3_y j-o
leads to the following results from WitnessSuper:
At dimension 1, we have one witness point for the set V(y). At dimension 0, two
of twelve paths go to infinity, leaving ten points in the witness superset. The six
singular points in the zero-dimensional witness superset are in fact junk: they all
have y = 0. The remaining four points are the finite isolated roots in V(f).
Example 13.6.6 Running WitnessSuper on the equations for 50(3), see Ex-
ample 13.4.4, one obtains the following table. Note that the rank test saves us from
trying to compute witness points for dimension 2, which would have required 192
paths.
In all the examples, we may observe that the number of function calls grows
as we descend dimensions. This is due both to an increase in the number of paths
(which grows geometrically) and also a general tendency for the number of calls
per path to increase. Not reflected in the table is the additional fact that the
number of variables climbs as we descend, so the linear solving routine used in
prediction/correction iterations will be more expensive. So, by every measure, the
bottom run is by far the most expensive. This underscores the importance of using
the rank of the system to eliminate low-dimensional runs.
In this section we follow (Sommese & Wampler, 1996) and show how the witness
supersets immediately give some numerical algorithms. Subsequent chapters will
present more efficient algorithms, so the main point here is to recognize the capa-
bilities that witness supersets make possible.
250 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Note that the return condition must be true for i = N — dimp Z and it cannot be
true for smaller i. For i = 0, procedure LocalSlice amounts to just the probabilistic
null test on a point p' near p.
This algorithm is conceptual in the sense that we have not specified a description
of Z or an implementation of LocalSlice. A numerical implementation of the
algorithm must deal with all of these. To that end, let us assume Z — V(f) for a
system of polynomials / : C^ —> C". For a square polynomial system g : Cm —>
C m , the methods of Part II provide us with homotopy methods that give a list of
solution points in V(g) that includes all isolated points, so we may adapt this to
implement LocalSlice.
With these considerations in mind, we may construct a numerically viable
method for finding local dimension, as follows. Where overdetermined systems
appear in the algorithm, we use the randomization procedure of § 13.5 to reduce
to the same number of equations as unknowns. This version of the algorithm re-
places LocalSlice with a call to homotopy procedure S = HomSolve(g) for solving
square systems.
— return(£ := true).
We have not dealt with multiplicities in this algorithm. Thus this algorithm
gives a way of deciding if the reduced algebraic set defined by / = 0 is an algebraic
subset of the reduced algebraic set defined by g = 0.
This algorithm is a translation of the algorithm from van der Waerden's classic
(§93 to §98 van der Waerden, 1950). It is a strength of our numerical version of
generic points model that they model the classical generic points close enough that
such results of classical algebraic geometry translate without difficulty.
13.8 Summary
13.9 Exercises
Several of these exercises refer to routines from HOMLAB (see Appendix C). Routine
wit sup. m implements algorithm WitnessSuper. If the system to be analyzed is
provided in tableau form, script wsuptab.m will sort them by descending degree
and then call wit sup. m.
Exercise 13.2 (Inclusion Test) Use witsup.m to find witness points for the
twisted cubic, V(y — x2,z — x3), and also for V(xy — z2,xz — y2). Apply the
inclusion test to see if either of these contains the other.
Exercise 13.3 (Seven-Bar Linkage) Refer to Figure 9.5 and derive a set of six
equations similar to the ones in Equations 9.5.33—9.5.36, consisting of three loop
equations and three unit magnitude conditions. Compute a witness superset for
general link parameters (a o ,6 0 ,c o ,ai,02,62,03,63,4,4,£ 6 ) & C 1 1 . Then repeat
the exercise arbitrarily choosing a0 = —0.3- li, c0 = —1, ai = 0.28, 62 = 0.37, £6 =
0.55, and setting the remaining parameters with the formulae 60 = 0, a2 = a o 6 2 /co,
as = ai, 63 = <2i(ao — Co)/ao, £4 = £e\ao/co\, and £5 = 11>21. Make a table like those
shown in § 13.6.1.
Chapter 14
This chapter revisits the construction of a witness superset for the solution set of a
system f(x) = 0 of n polynomials on CN, a topic addressed earlier in § 13.6. The
algorithm, WitnessSuper, from that section leaves room for improvement both
from a theoretical and practical point of view. To understand why this might be so,
let us assume that we use total degree homotopies to solve the systems arising in
the algorithm. Without loss of generality, we may assume that we have squared-up
the system, so n = N, and we have sorted the polynomials fi{x) from the system
f(x) by descending degree, so that letting di = deg/j, we have d\ > ••• > djv-
Under these conditions, WitnessSuper tracks
N j
paths. In comparison, it is a classical fact, e.g., (12.3.1 Fulton, 1998), that given
the irreducible components Zij of V(f) with Z^ occurring with multiplicity /i^ it
follows that
N
ij i=i
At first sight this does not look so terrible. In the case when all the di = 2,
2N+l — 1 paths to be tracked in the algorithm to find at most 2N solutions. Since,
all other things being equal, computational work is proportional to the number of
paths followed, this amounts to only about twice as much work as is theoretically
needed. But all other things are not equal! Paths that do not lead to witness points
often end up going to singular solutions. This can be expensive.
In § 14.1, we explain an algorithm that follows only Yli=i °*i paths in the total
degree case. These paths are tracked in N stages, yielding at each stage a witness
superset for each successively smaller dimension, and hence, we call this the cascade
algorithm. In the worst case, all the paths survive to the end of the cascade, requir-
ing the equivalent of N FIi=i di paths to track, but in the typical case, many paths
terminate early in the process, making the algorithm relatively efficient. Moreover,
255
256 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
to survive to the next stage, a path must remain nonsingular, which helps keep
computational cost down and reliability high.
The version we present here differs slightly from its first appearance in
(Sommese & Verschelde, 2000). The most notable difference is the removal of slack
variables, which were never used in actual implementations. The new presenta-
tion draws on Theorem 13.2.2 to establish the genericity of the slicing hyperplanes,
removing any dependence on the order in which they are used.
For ease of reading, in this chapter we act as if components are reduced, e.g., we
talk about witness supersets (Wi,Li,f) instead of (Wi,Li: f,hi(x,t),Wi(e),e). All
our arguments and algorithms hold for nonreduced components also. For example,
the cascade algorithm for witness supersets produces the hi(x,t) whether the com-
ponents are reduced or nonreduced, and to obtain the Wi(e) we would only need
to have the t variable in the homotopies in the cascade algorithm take on the value
t = e for an appropriate small value of e.
This short chapter has only two sections: the description of the cascade algo-
rithm in § 14.1, and presentation of some examples of its use in § 14.2.
Any solution u* of g(u) = 0 maps to a point x* = b+B-u* £ L, and such points that
also satisfy f(x*) = 0 are the witness points that we seek. Whatever the values
of n and iV may be, we use this approach to convert the problem of analyzing
A Cascade Algorithm for Witness Supersets 257
tW = ( t 1 , . . . , t i , O , . . . , O ) .
By Theorem 13.2.2, there is a Zariski open dense set U C CNx{N+1) such that for
A := [<io Ai] £ U all subsets of the TV linear equations in the system
are generic with respect to all the irreducible components of V(f), where ao is the
first column of N x (JV +1) matrix A and A\ is the remaining columns. The witness
point superset for dimension i is a finite set of points containing all isolated solutions
of the system
for nonzero values of t\,..., f j . The zeros on the diagonal of T(v-l>) knock out N — i
of the linear equations in L(x), leaving us with a system of N + i equations in
X£CN.
To obtain all isolated solutions of F(A, x,t) = 0 by homotopy methods, we
need a square system. Theorem 13.5.1 tells us that there is a Zariski open dense
set U C CNxN such that for A e U the isolated solutions of F(A,x,t) = 0 are
contained in those of
Accordingly, a witness point superset for the i-dimensional component of V(f) can
be found by computing all isolated solutions to
For clarity, let's denote the fcth row of L(A,x) as Lfc(x) and the fcth entry in f(x)
as fk(x) so that we write £(A,A,x,t) expanded as
T/i(x)"| piL^a;)"!"
£(A,A,x,t):= : +A- : . (14.1.6)
.UN(X)\ [tNLN(x)\_
We summarize what we have done, with some additional useful conclusions, in
the following theorem, carrying over the notation of the preceding paragraphs.
Theorem 14.1.1 For a given polynomial system f : CN —> C^, there is a Zariski
open dense set U C C M x C JVx ( N+1) such that for (A, A) <5 U and any integer i
satisfying 0 < i < N, it follows that
(1) a witness point superset for all i-dimensional components of V(f) is a subset
of the isolated solutions of £i(A, A, x) = 0; and
(2) if x' is a solution of £i(A, A,x) = 0 then either:
(a) x' is in a component ofV(f) of dimension at least i, and Lfc(x) = 0 for all
1 < k <i; or
(b) x' is an isolated nonsingular solution of £i(A,x) — 0 and Lk{x') ^ 0 for
any 1 < k < i, and the number of such x' is the same for all (A, A) £ U.
Proof. If we show that for each specific choice of i, a Zariski open dense set Ui C
I^NXN x (£ivx(iv+i) gxjg^g w jth the above properties, then the intersection of the
Ui may be taken as U. Therefore we can assume without loss of generality that we
have a specific i.
The first assertion of the theorem and item (2a) both follow from the generic
slicing result in Theorem 13.2.2, the randomization technique of Theorem 13.5.1
and the logic given in the paragraphs leading up to the theorem. Item (2b) is a
direct application of Bertini's Theorem A.8.7. •
Having embedded all the systems of interest into £(A, A, x, t), we now turn to the
cascade for solving the £i(A, A, x) = 0 as % descends from N to 0. With probability
one, a random choice of (A, A) satisfies the genericity conditions of Theorem 14.1.1.
Choosing them so, we consider them fixed and suppress them from our notation,
hence writing £(x,t) for the embedding and £i{x) for the ith embedded system.
Define the level i nonsolutions as the set of solutions x' of £i(x) = 0 with
Li(x') / 0. Denote these by Mi. They depend on the choice of A and A, but by
Theorem 14.1.1, the number of them, which we denote i/j, is independent of A s U.
Each £i(x) is in the family of systems £{x,t) for a particular value of t^ G C^.
Moreover, holding il*"1) fixed but letting ti vary, we can view £i{x;ti) = 0 as a
parameterized family of systems which include as a special case £j_i(x) = £i(x; 0).
By the principles of parameter continuation, see § 7.4, if we can solve £t{x; U) = 0
for a generic ti £ C, then we can use those solutions as start points in a homotopy
A Cascade Algorithm for Witness Supersets 259
Hji{x, s) := £(x, (*!,..., t,-, atJ+u ..., stu 0,... ,0)) = 0 (14.1.7)
The Cascade Algorithm simply asserts that tracking the level i nonsolutions
using the homotopy Hji{x, s) = 0 of Equation 14.1.7, we get the level j nonsolutions
plus a witness point superset for the dimension j components of V(f).
The randomness of i'2' in the homotopy of Equation 14.1.7 simplifies the proof
of Theorem 14.1.2, but in practice all the U can be 1. They are just scaling factors
on the linear equations (see Equation 14.1.6) and since the linear coefficients are
already chosen generically in the matrix A, the generic choice of t^ is redundant.
For the same reason, when we track paths from s = 1 to s = 0, we may do so on
the real interval (0,1] with a probability of success equal to one (see Lemma 7.1.2).
Assume that we know by some other means that dim V(f) < K < N. Then,
260 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
we can start the cascade by solving £K(X) = 0 using any homotopy that will find
all isolated solutions, for example, a total degree homotopy. We can check for the
trivial case when V(f) = CN, using the probabilistic null test, and so we usually
start at level K = N — 1. Alternatively, one might use the algorithm TopDimen
from § 13.7.1 to determine a lower starting dimension for the cascade.
A final important note: at the top of the section, we began by squaring up the
system / to size r = rank/. Theorem 14.1.1 applies to the square system; call it / '
and its witness superset as W. If the original system / has more than r equations,
then W may include points which do not satisfy / . We simply discard these.
For simplicity, we state the algorithm concentrating on the witness point sets.
The linear slicing equations are easily constructed from b, B, and A.
14.2 Examples
f ( x v ) - \ *(2/2-*3)(z-l) l _ 0
the cascade results are as follows. There is a new column called "fail" to record
that some paths did not converge well.
f(x,y):=\x2y - Xy-2y}=0J
[ 3xyA-y
leads to the following results from the cascade:
Example 14.2.3 Running the cascade on the equations for SO(3), we obtain
For each stage after the first, the nonsolutions of the previous stage become the
start points of the next, so the number of paths can only decrease at each stage.
Examples like the SO (3) problem are the worst case for the cascade as far as the
total number of paths is concerned, because all the paths survive to the last stage.
A saving grace is that the number of function evaluations per path falls dramatically
after the top dimension. We can only surmise that the initial homotopy between a
generic start system and our sliced target has longer, perhaps more twisted, paths,
while the cascade homotopies connect highly related systems, so that the paths are
short and relatively straight. The rise in nfe for the final dimension of the 50(3)
problem is due to the solutions at infinity being singular, thus requiring a more
expensive endgame to compute them accurately.
Comparing these tables to the ones in § 13.6.1, we see that Cascade consis-
tently tracks more paths than WitnessSuper, but the total number of function
evaluations is almost the same. We experience some numerical difficulty on the first
cascade example, but it still returned a correct witness superset. There is one clear
difference in performance: Cascade returns a smaller superset than WitnessSu-
per on each of these examples. This means the supersets contain fewer junk points.
This is particularly notable in the zero-dimensional sets for Example 14.2.1, for
which WitnessSuper gave a set of 30 points containing 28 junk points, while
Cascade gave a set of only 9 points containing 7 junk points. When we move on
to computing a numerical irreducible decomposition, the first step is to remove the
junk points. It is quite advantageous to have fewer of them at the outset.
14.3 Exercises
Exercise 14.1 (Comparisons) Run Example 14.2.2 and Example 14.2.3 using
HOMLAB. DO SO both using witsup. m, an implementation of WitnessSuper of the
previous chapter, and using cascade.m, an implementation of Cascade. Compare
run times for the two methods. Use the profiler tool in Matlab to track which routine
is using the most computation. If you are using the "tableau" format, supplied for
both examples, see how much you can improve performance by writing an efficient
straight-line program.
Let Z be an affine algebraic set on C^. This means that Z is the solution set of some
system of polynomials / : C^ —» C", i.e., Z = V(f). In a typical situation, we start
with / as given, and we seek to find a description of its solution set. In other cases,
such as we address in the next chapter, Z may be only a portion of the full solution
set of the polynomials on hand. But for the moment, it does no harm to think of Z
as the full solution set V(f). No matter its origins, Z has a decomposition into its
pure-dimensional parts Zi, i.e., Z = L)fl!£lzZi with dim Zi = i. Furthermore, each
Zi can be decomposed into irreducible pieces Zij, i.e.,Zi = Uj^XiZij, where each
Zij is a distinct irreducible component and the index sets X% are finite.
Our goal is to find a numerical irreducible decomposition, that is, we wish to
find witness point sets, W^ := Z^ fl L c ^ for each irreducible component Z.Lj of Z,
where Lc^l is a generic linear space of codimension i. From Chapters 13 and 14,
we have algorithms WitnessSuper and Cascade that, given polynomial system
/, find a witness superset for V(f). That is, for each dimension i, they give a set
Wi D W{ := Zi n Lc^\ which contains all the witness points for all the irreducible
components of dimension i along with some possible junk points. Accordingly, our
goal becomes to find the breakups
Wi = Wi + J, = Uj-ez, W^ + Jj (15.0.1)
where Jj C (Uj>iZj) n L^1^. To achieve this, we show
• how to trim the junk points out of the witness supersets, Wi, to obtain the
witness sets, Wi, i — 0,..., dim Z, and
• how to decompose a witness set, Wi, into its irreducible components, Wij.
In the sequel, Chapter 16, we present methods for finding witness supersets for the
intersection of algebraic sets. The methods of the current chapter will apply equally
well to those witness supersets.
One way to approach the processing task is to employ a membership test. Junk
points in a witness superset at dimension i are members of component of dimension
greater than i, so one way of detecting them is to start at the highest dimension
and work down, eliminating any points found to be members of higher-dimensional
265
266 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
We will outline three variations on a procedure to complete this task, each based
on a different type of membership test. The details of how to implement the tests
follow in subsequent sections.
In this and the following sections, we denote the witness superset for dimension
i as Wi, which is composed of a witness set Wj for dimension i plus, possibly, some
junk points, J;. In addition to witness points, witness sets and witness supersets
carry along linear slicing planes and some description of Z in a form that allows
witness points to be refined numerically. When we speak of a point w G Wj, it is
implied that w is in the witness point set for Wi.
Before employing membership tests, we reduce the amount of work by partially
categorizing the points in the witness superset. The first observation is that all
points in the top-dimensional witness superset are true witness points: there is no
junk in the top dimension. This is because, by definition, the junk points in a witness
superset for an i dimensional component of Z must lie in some higher-dimensional
component of Z.
A second observation is that any nonsingular points in W must be true witness
points. Assume that Z = V(/), where / is a system polynomials. A point t t e W j
lies in Z n £ c ' 2 ', so letting / ^ m (x) denote the restriction of / to the linear space
Lc'1', we have / ^ m (w) = 0. Then, w is nonsingular if the Jacobian matrix of
partial derivatives for //,cm has full column rank.1 For this purpose, it does not
matter whether the linear slice is represented extrinsically or intrinsically. (See
§ 13.2.1 for explanation of these terms.) Nonsingularity implies that the point is an
isolated point of Z D £CM. In contrast, junk points in Wi lie in a component of Z
of dimension greater than i and hence in a component of Z fl Lc ^ of dimension at
least one.
The final observation builds on the second. A point w £ Wi is a true witness
point if, and only if, it is an isolated solution to fLo[^(x) = 0. Any test of local
dimension can serve to distinguish between junk points and witness points. If point
w £ Z n I c f ' l C CN is not isolated, then the slice Zn Lc^ must intersect a closed
hypersurface surrounding w. Interval arithmetic might be used to find that the point
is isolated by showing that none of the 2N faces of a rectangular box enclosing w
x
We use the usual convention that rows of the Jacobian correspond to functions in / and
columns correspond to variables.
268 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
contains a solution. Alternatively, a heuristic like the one in (Kuo et al., 2004) could
be used to find nearby points on Z D Lc^1^, thereby showing that the point is not
isolated and must be junk.
Taking such observations into account, we mark some points in W as true witness
points, we discard any points known to be junk, and we mark the remaining ones
as needing further investigation. If local dimension testing of the sort mentioned
in the previous paragraph is reliably complete, then no questionable points remain,
but we do not count on that outcome in what follows.
We can complete the decomposition with one of several types of membership
test. The first of these has the following inputs and outputs.
This membership test yields a complete algorithm for the numerical irreducible
decomposition of a witness superset as follows.
On each pass through the main loop, at least one point w is removed from Wk,
so eventually it is emptied out and the algorithm descends to the next dimension.
The Numerical Irreducible Decomposition 269
With this test available, one can remove all junk points as follows. Remember
that the top dimensional component of a witness superset contains no junk points.
With the junk removed, it remains to partition the witness sets at each dimension
into irreducible witness sets. The monodromy method, though not complete on its
own, is useful for this task.
To make the monodromy approach complete, we may employ a trace test. This
test can also be used on its own, without monodromy, to find irreducible decompo-
sitions. Its format is as follows.
and shows that they have the properties that we rely on here.
The fundamental capability upon which all the membership tests depend is the
ability to sample a component given a witness point on it. Recall from § 13.3
that as a theoretical construct, witness points are the isolated points of intersection
between an affine algebraic set and a generic linear space. To sample a component,
we simply move the linear space in continuous fashion, i.e., move it along a real-
one-dimensional path through the Grassmannian of linear spaces. As long as the
prescribed path avoids a proper algebraic subset of nongeneric linear spaces, the
intersection with the component remains isolated and defines a real-one-dimensional
path of points on the component. Sampling is just the process of setting up such
paths and following the points of intersection. Suppose the algebraic set under study
is Z, and let L(s) denote a continuous path of linear spaces that are generic with
respect to Z for s S (0,1]. Then, x(s) := Z n L(s) is a path of isolated points, with
a well-defined endpoint x(0) = lim s ^o3 ; (s). When we choose L(0) to be generic
also, then x(0) is a new sample point lying on the same component as x(l) and on
no other component of Z.
As a numerical construct, witness sets carry along extra information that allows
a numerical approximation to a witness point to be refined to higher precision. This
same information allows us to update the witness point when the slicing plane is
moved slightly, hence we can numerically follow the path x(s). The details vary
according to whether the component is reduced, deflated or nonreduced.
A linear space can be represented extrinsically as a set of linear equations L{x) =
a + A • x = 0 or intrinsically as x(u) = b + B • u. In the extrinsic form, a linear
interpolation between two such spaces of the same dimension, say L\{x) and L0(x),
can be written as
If the coefficients of Li(x) and L0(x) are chosen at random, then L(x, s) = 0 defines
a linear space of the that same dimension for all s G [0,1], with probability one.
Intrinsically defined paths work in an analogous way, so we don't write out the
details. In the rest of this section, we write only extrinsic formulations, but it
should be understood that intrinsic ones can be used instead, usually with some
increase in efficiency for implementations.
(1) Pd(e) / 0, in which case there are no elements of Pd(CN) vanishing on X and
deg X must be greater than d; or
(2) or Pd(e) = 0, which by genericity implies that Pd{x) is identically zero. In this
case, degX < d and if this is the smallest d for which such a polynomial exists,
we know that degX = d, and in fact, X = V(pd)- Consequently, we have a
membership test: y G X if, and only if, pd(y) = 0.
Thus, we may proceed progressively d = 1,2,... until we find a d for which there is
a polynomial Pd{x) vanishing on X. We know that degX is at most the cardinality
of the witness set for dimension N — 1, which limits the complexity of the method.
Now assume that 0 < k < N — 1. Take a generic linear projection TT : CN —>
Cfc+1. We know by Theorem A.10.5 that 7r is generically one-to-one and proper on
X, and in particular, that TT(X) is an algebraic hypersurface with deg?r(X) = deg^f.
We sample X as usual and project each sample point x to y = TT(X) £ Cfc+1. Just
as above, we now find a polynomial qd(y) of minimal degree that vanishes on the
projected samples and we conclude that TT(X) = V(qd)-
Any point of x' e Tr~l(n(X)) satisfies qd(n(x')) = 0, so at first blush, qd does
not seem adequate for testing membership in X. However, it is sufficient for testing
membership for a finite set F C C^, because a general projection such as TT has the
property that for all x* G F, n(x*) G n(X) if, and only if, x* G X. So choosing the
projection at random, we have a probability-one membership test for points x* G F:
x* € X if, and only if, qd(Tr(x*)) = 0. This is all we need for algorithm Memberl.
The main problem with this approach is that (fc+^+d) grows rapidly with the
dimension k and degree d of the component. Also, fitting polynomials of high degree
The Numerical Irreducible Decomposition 275
know which points match which pieces. It doesn't matter as far as the membership
test is concerned; if we track all the witness points, we still get all the endpoints,
without knowing which ones are on which piece.
According to the above, we may now write pseudocode for Member2 as follows.
Obviously, we can save some computation by tracking the paths one at a time
and returning a positive result as soon as one ends on y. The worst case is when
y $ X, because then we always have to track all the paths to find this out.
15.4.1 Monodromy
The same principle underlying the homotopy membership test leads directly to the
concept of monodromy. In our context, the basic idea is that if L(s) C G \ G* is
a one-real-dimensional closed loop, that is, L(0) = L(l), then the set of witness
points at s = 1 are equal to those at s = 0, i.e., W = X n L(l) = X n L(0). This
is true both when X is irreducible and when it is the union of irreducibles. What
makes this useful to us is that, although the set of points is the same, the paths
leaving at s = 1 may arrive back at s = 0 in permuted order. A path beginning at
point u £ W and arriving at point v G W with u ^ v demonstrates that u and v
are in the same irreducible component. This is just the homotopy membership test
applied on a closed loop.
When we begin with a witness set W for a pure-dimensional component X,
such as would be generated by successive application of algorithms Cascade and
JunkRemove, we do not know how many irreducible components X contains.
Any partition of the points is possible, from every witness point lying in its own
linear component to all witness points on the same component of degree #W.
Each connection between distinct witness points found by monodromy restricts
the possible break up. This is how algorithm Monodromy is used in algorithm
IrrDecompPure of § 15.1. Pseudocode for the monodromy algorithm follows.
The Numerical, Irreducible Decomposition 277
and we wish to express w as a function of z, i.e., we wish to make sense out of the
expression y/z with z € C. We would like this to be a global function, but this
is not possible in any continuous way. Let us assume it is possible and see what
goes wrong. At z = 1, we need %/T to be set to either 1 or — 1. Let's assume that
y/z is set to 1: the case of —1 is identical. For z = elS we have either y/z = e%e/2
or y/z — —el6/2. Since \/T = \/e® we conclude by continuity that y/z = eie/2.
The trouble comes when we go full circle and reach Vei2lT. By continuity we have
V ^ F = ei7r = - 1 .
The easiest classical solution of the problem of defining y/z (or ln(.z) for that
matter) is to slit the plane from 0 to —oo, e.g., remove the real numbers from 0 to
—oo from C. On the slit plane there are two "branches" of y/z. One has -\/l set to
1 and the other has vT set to —1. Similarly setting a more complicated polynomial
p(z,w) = 0 with the w degree equal to d, we will have functions w = qi{z) for
i = l,...,d solving p(z,w) = 0. Each is branch of the solution is defined on
an appropriately slit region of the plane. Analytic continuation is the classical
name for the process of extending the function, e.g., extending y/z denned in a
small neighborhood of 1 to a function on a larger region. Hille has a nice detailed
discussion of analytic continuation (Chapter 10 Hille, 1962).
Notice that trying to define y/z and tracking v e * as z goes around the unit
circle leads to a permutation of the set {1,-1} of roots of z2 = 1. Looking at this
a bit more abstractly, we have that w — z2 — 0 defines an algebraic curve X in C 2 .
Projection to the z variable gives a two-sheeted branched cover ?r : X —> C. Over
C* := C \ {0}, we have that 7r : X \ {(0,0)} —» C* is a two-sheeted unramified cover
with the fiber over a point z being (z, w), with w running over the "two square
roots" of z. The fundamental group of C* is the additive group Z, and we have the
monodromy action of Z on the fiber of TT over a fixed basepoint, e.g., 1. The even
elements of Z leave {1,-1} fixed and odd integers send 1 to —1 and —1 to 1.
How does this apply to decomposing an algebraic set into its irreducible com-
ponents? Let's assume we have a purefc-dimensionalaffine algebraic set X C C^.
Let 7T : X —> Ck denote the restriction to X of a generic linear projection from X
to Cfe. Note that by genericity we conclude from the Noether Normalization The-
orem 12.1.5 that 7r is a proper d := degX branched covering of Cfe. The union of
the sets where n is not a covering and X is not a manifold form a proper algebraic
X' C X with dimX' < dimX. Since n is proper, we know that TT(X') is an alge-
braic subset of Ck by the proper mapping theorem A.4.3. Moreover since the fibers
of the map TT are finite, we know that dimTr(X') = diiaX1 < dimX = k. Thus let-
ting X = X\ U • • • LJJ r denote the decomposition of X into irreducible components,
we have that Y := Ck \ n(X') and Xz := Xt \ -K-1(K{X')) are all irreducible and
connected. Moreover, letting X equal the manifold Ul_1Xi, the map n : X —> Y is
a d sheeted unramified covering map.
Fix a basepoint y* £ Y and consider the monodromy action of /n\(Y,y*), the
fundamental group of Y with basepoint y*, on F := tr~l(y*). Note we have a
The Numerical Irreducible Decomposition 279
decomposition
F = Fl U • • • U F r (15.4.5)
F = FiU---UFl,. (15.4.6)
Since the Xi are connected, we see that the decomposition given in Equation 15.4.5
is compatible with F = U ^ - F / in the sense that each Fj is a subset of one of the
Fi. The immediate question that raises itself is:
Question 15.4.1 Do we have r = r' and is each Fj equal to one of the F{1
Question 15.4.2 Is there a cheap way of checking that the breakup of F into the
F[ equals the breakup of F into the i^?
The answer to this is yes. Based on Theorem A. 12.2, the trace test to certify the
breakup is explained in § 15.5.
Remark 15.4.3 (Monodromy over general bases) Everything we said above works
equally well for F := p~1(y) where p : X —> Y is a proper finite-to-one covering
map from a pure-dimensional quasiprojective manifold X onto an connected quasi-
projective manifold Y and y is a point we treat as a basepoint.
p(i) = n t i ( x - A i ) . (15.5.7)
where the ti are elementary symmetric functions of the roots, i.e., to := 1, and for
i>0,
The parameterized version of these ti are the traces we are interested in. Before
we turn to the parameterized situation, let us note an interpretation of the above
ti as traces of matrices. Recall from linear algebra that the trace of a matrix is the
sum of its diagonal elements. Let
A:=diag (Ai,...,A d ).
The trace of A is clearly t\. The matrix A induces linear transformations A*A of
the exterior products AJCd. Using the basis
{eh A---Aen\l<j1<j2<---<jl<d},
where efc is the d-tuple with zero entries in all places but the fc-th place, where there
is a 1, we see that the trace of A1 A is U.
The Numerical Irreducible Decomposition 281
{w-\{xi))---{w-\{xd))-
Expanded we have
i=0
an
where to — 1 d ti for i > 0 denotes the elementary symmetric function
l<ji<-<ji<d
smooth, and Remark A.2.6 when Y is merely normal, we conclude that trnj(\)(y)
extends to V. This gives a holomorphic extension of ti7V:i(X)(y) to all of Y, which
we also denote tr^j(A). The functions tr7rj(A)(y) are algebraic functions. This is a
consequence of the characterization (discussed briefly in §A.l) of algebraic functions
by their growth.
The equation corresponding to the relation between the U and the A^ in § 15.5.2
is the key equation
d
^(-l)Hr T , i (A)( 2 /)A( aJ ) d - i = 0 (15.5.10)
i=0
Though y/zf is not well-defined, the unordered pair -I y/zf, — \fz\ > is well defined,
and thus tr7rjl(Ax(^i,^2))(-2i) is well-defined.
Consider the function Xx(zi,z2) := z2. Substituting into Equation 15.5.11, the
first trace of the function z2 is found to be
Note that yz^ is only well defined if we choose a branch of the square root, but
whichever branch we choose, we have 0. Similarly
=
trva{z2){zi) = yjz~l ( ~ \ A i ) ~zi'
0=(l)z22-(0)z2 + (-z31)z02=z22~zl
Note the linear projection given above is far from generic, e.g., if the linear
projection was generic, we know that the degree of the projection restricted to X
would equal the degree of X, i.e., deg^f — z\) = 3.
Now assume that F? is not equal to any of the sets Fj. It must be properly
contained in one of the Fj, say F/ C Fj. Let qi,...,qb be the points of F/ and
let q0 be a point of Fi \ F/. It follows from Theorem A. 12.2 that there is a path
c : S1 —> C, where S1 are the complex numbers of absolute value 1 with c(l) = 0,
such that monodromy under Lo + c(t)v takes q\, q2, • • •, qb to qo, q2, • • •, qb- Since
S F ' X(s) is linear in s we conclude that
Remark 15.5.2 The usefulness of linear traces has been well recognized, e.g.,
(Sasaki, 2001).
In (Rupprecht, 2004) it was asserted in the codimension one case that the con-
verse fact that Y^qeF' A(?(s)) is linear in s implies that F? is one of the Fi. That
proof, as explained in (Sommese et al., 2002b) has two serious gaps.
Theorem 15.5.1 justifies our use of the linear trace test in the numerical irre-
ducible decomposition algorithm IrrDecompPure of § 15.1. We are ready to state
the procedure for the algorithm Trace, as follows.
parameter point q*, where singular means that the Jacobian matrix df/dz(z*;q*)
has rank less than n. This solution will continue to other isolated singular solutions
on an open set in C m (see § A.14.1), and as described in Theorem 7.1.6, we may wish
to track such a solution along a continuous path in parameter space, say q(s) C O™,
where q(0) = q*. In general, this would be a nearly intractable numerical problem,
but we have a little extra leverage if we have obtained the solution point (z*,q*) as
the endpoint of a nonsingular solution path to a homotopy h(x,t;q*) = 0. Then,
we may define the doubly parameterized homotopy
H(x, t, s) := h(x, t; q(s)) = 0. (15.6.14)
At its root, singular path tracking is based on a singular endgame. For each
value of s, the point on the singular path is the limit as t —> - 0 of a nonsingular
path. In Chapter 10, we discussed how to estimate such endpoints with the power-
series endgame or the related Cauchy integral endgame. Both of these work by
building a local model of the solution path for small t. The gist of singular path
tracking is to update this local model as we advance s and in essence, replay the
endgame at every s.
The power-series endgame and the Cauchy integral endgame both collect sample
data on the incoming paths of the homotopy to determine the winding number c
and to build a local model of the holomorphic function 4>{rj) from Lemma 10.2.1,
where t = rf. The singular path tracker uses prediction/correction techniques to
update the local model as we step along the path. Recall from Chapter 10, that
a cluster of /i paths approaching the same endpoint may break into cycles, each
cycle having a winding number, say c, such that the solution path closes up as t
circles the origin c times. Although we will not argue the issue carefully here, it is
clear that these cycles also continue in the local neighborhood. In a nutshell, the
closing up of the solution path in c loops is an algebraic condition that holds on
at the generic parameter q*, so it continues on an open subset in the neighborhood
of q*. The endgame convergence radius within which the local model holds varies
as q(s) varies with s. It may become zero within a proper algebraic subset of the
parameter space, but by Lemma 7.1.2, a one-real-dimensional path between two
generic parameter points will miss the degenerate set with probability one.
Therefore, at each value of s, we have a nonzero endgame operating zone as
in Figure 10.1, with a convergence radius and an ill-conditioned zone. If we use
sufficiently high precision, the ill-conditioned zone stays inside the convergence ra-
dius for all s 6 [0,1], and our task is to track the local model along this endgame
operating zone. As we have several ways of formulating an endgame based on the
local model, the details of tracking the model must be adjusted accordingly. In
essence though, all the methods are similar. For conciseness, it is helpful to adopt
the notation that for a cluster of points C = {w\,..., wc}, we let H(C,t,s) = 0
mean H(wi, t, s) = 0 for i = 1,..., c. Also, the following definitions are convenient.
Definition 15.6.1 A convergent cluster (C,to,s) = ({wi,..., wc},to,s) with
286 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
H(C,to,s) = 0 is such that to is inside the endgame convergence radius for fixed s
and for i = 1,..., c, the solution path of H(w, t,s) = 0 beginning at (wi, to, s) con-
tinues to (tUj+i, to, s) as t travels once around the circle |t| = to- For this definition,
wc continues to w\.
By requiring that to is inside the convergence radius, we implicitly require that
all the points in the cluster approach the same endpoint w* as t —> 0 and the
same cyclic mapping from u>i to IUJ+I holds under continuation around a circle
for every £ ^ 0 in the disk A to (0). In other words, the projection (w,t,s) —> t
gives a proper c-sheeted finite mapping from the solution set of H(w, t, s) = 0 in a
neighborhood (w*,0, s) to a neighborhood of 0 6 C. We call w* the convergence
point of the cluster.
Definition 15.6.2 For fixed s, the convergence point of a convergent cluster is
the common endpoint as t —» 0 of the solution paths of H(w, t, s) = 0 emanating
from each cluster point (u>i,to, s).
The nonsingular path tracking algorithm of § 2.3 can be adapted to our current
situation to arrive at the following singular path tracking algorithm.
In the context of witness points generated by the cascade algorithm, the paths
of the cluster points are nonsingular away from t = 0. Accordingly, the usual
prediction/correction techniques for nonsingular paths apply.
The adjustment step for reconditioning must select a new value of t, which will
be held constant in the next prediction step. One sensible way to select it is to use
the largest value for which the singular endgame meets the convergence tolerance e.
If the endgame meets the tolerance on the first try at the current value t', it may
be useful to try increasing it. If it fails, we try decreasing t, unless the condition
of the Jacobian matrix indicates that failure may be due to having entered the
ill-conditioned zone around t = 0. With such rules in place, the value of t can
adaptively decrease and increase as s proceeds.
Similar to the nonsingular path tracker, we adaptively adjust the step length h
by halving it when the correction step or the reconditioning step fail. On the other
hand, if these steps both succeed several times in a row, we try doubling h.
A variant of the procedure is to save some computation by applying recondi-
tioning only occasionally to verify that the cluster is convergent. One criterion for
deciding when to recondition is to monitor the condition number of the Jacobian
matrix dH/dw along the paths. Even more computation might be saved by tracking
only one path in the cluster along s holding t constant, and when the condition of
the Jacobian matrix indicates reconditioning is necessary, to regenerate the other
points in the cluster by looping t around the origin. This risks path crossing, be-
cause it is not clear how to set the reconditioning criterion to ensure that t has
remained within the convergence radius as s progresses. There is very little expe-
rience at this point to judge whether such variants can be made both reliable and
efficient. By reconditioning at every step, we have greater assurance that the local
model remains valid for the whole extent of s 6 [0,1].
The techniques we have discussed show that in principle singular path track-
ing is feasible, although in practice a fully satisfactory approach is still a matter
of research. The approach was first presented in (Sommese et al., 2002a), which
also reports on some initial experiments with the technique of using the condition
number to decide when to recondition.
It may seem that we could completely avoid singular path tracking by using
deflation to convert problems into nonsingular path tracking problems. This is
true in the context of witness points generated by the cascade algorithm, because
such points are isolated solutions cut out by the slicing procedure. However, in
Chapter 16, we will see how to find witness points for a set denned as the intersection
of two given algebraic sets, say A and B. If A and B are both components of the
same system of equations f(x) = 0, then although a slice of appropriate dimension
cuts out a unique point on the intersection set, such a point is not an isolated
solution of the system obtained by appending the linear slicing equations to f(x) =
0. Consequently, witness points for A D B are defined only as singular endpoints of
solution paths in a new kind of homotopy, called the diagonal homotopy, and such
288 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
points can be moved along AdB only by singular path tracking. Of course, it could
be that a more elaborate form of deflation could desingularize these points as well;
such a procedure could be subject matter for a new line of inquiry.
To raise the bar even higher, consider intersecting two algebraic sets whose
witness points are only known as singular endpoints of a diagonal homotopy. Then,
we could have a very difficult singular path tracking problem in which each point
in the convergent cluster is itself only known as the convergence point of a prior
homotopy. We have yet to face such a nasty calculation, but it is quite within the
scope of numerical algebraic geometry to consider it.
15.7 Exercises
Exercise 15.1 (Degree of p(x)) Conclude that degp(x) — d for the polynomial
in Equation 15.5.12 by showing that
(1) the highest degree that XN occurs with is d; and
(2) by genericity of XJV we know that there is at least one fiber of n on which XJV
is nowhere zero, and therefore that tT7T^(xN)(xi,...,a;jv-i) is not identically
zero.
Exercise 15.2 (Spherical Parallelogram Mechanism) Pick two unit vectors
ax, a2 € M3 and a random value of a £ I . Let 6i, &2, 63 6 M3- Consider the system
of polynomial equations
ajbi = a, a2b2 = a, b[b2 = aja2, bjbi = 1, 6^62 = 1, 63 = (6i + b2)/2.
These eight equations describe a curve in (61,62,63) € C9. Find a numerical irre-
ducible decomposition of that curve. Report the number of irreducible components
and their degrees.
Exercise 15.3 (Griffls-Duffy Decomposition) Revisit Exercise 14.4 and find
the irreducible decomposition. Do it again for the special case when 6j = a* and
Ci = 1, i = 1,..., 6. Report the number of irreducible components and their degrees.
Exercise 15.4 (Seven-Bar Problem) Use exhaustive trace testing to show that
the one-dimensional component of the seven-bar system presented in Exercise 13.3
is irreducible.
Chapter 16
289
290 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
show promise. It is hoped that the approach might solve some problems that were
previously too large to solve in one blow by the traditional approaches of Part II.
A good idea of the way the diagonal intersection algorithm proceeds can be gleaned
by studying a special case. Assume that we have two polynomials /, g on C2. Let
A be an irreducible component of V(f) and let B be an irreducible component of
V(g). We would like to find A n B. Assume that A has degree d\, and B has
degree d2, and let a.\,..., a^ and /?!,..., (3d2 be witness point sets of A and B,
respectively. That is, for generic linear equations LA(x) = 0 and LB(X) = 0, we
assume that we have already computed the intersections V(LA)^A = {a\,... , a ^ }
and V(L B ) Dfl = {/?i,... ,/JdJ.
Note that AC\B can be interpreted as solutions to a system on C4 by a procedure
from algebraic geometry called reduction to the diagonal (Ex. 13.15 Eisenbud, 1995).
The procedure is to form the system
~f{xi,x2)~
S
F(x1,x2,y1,y2) = ^ = 0.
. x2-y2 .
The solutions of the system consists of points {x\,x^,x\,X2) G C4 with (x*,^)
a point of V(f,g). This identification respects components and all multiplicity
structure. In particular, all the irreducible components of AC\B have corresponding
irreducible components in V(F).
Ignoring for a moment the two diagonal linears, let's consider the set
V{f(x1,x2),g{yi,y2))-
Clearly, A x B is an irreducible component of this set. To see this, remember that
an algebraic set being irreducible means by definition that its set of smooth points
is connected. To see that the smooth points of A x B, (Ax B)leg, is connected, note
that (A x B) reg = ATeg x BTeg, and that the product of connected sets is connected.
Moreover, we know a set of witness points of A x B, i.e., the set of points
{(ati,l3j),i — l,...,d\,j = I,...,d2} are the intersection of A x B with the linear
space V(LA(x1,x2),LB(yi,y2))-
Consider the homotopy
that the endpoints as t —> 0 of the solution paths of H(x, y, t) starting at the points
(<Xi, (3j) at t — 1 includes (using the identification given by reduction to the diagonal)
all the isolated points of An B.
The general case is conceptually not much harder, although the procedural de-
tails get a bit technical. We sketch only the main idea here. We use notation similar
to that above, but now work in higher dimensions. That is, let A C V(f) C CN and
B C V(g) C C ^ be irreducible algebraic sets, with / and g as polynomial systems.
Let dim A — a and dim B = b. The main idea is that, letting x £ Cfc be the vari-
ables for A and y £ Cfc those for B, we wish to find the irreducible decomposition
of the diagonal polynomial system, namely x — y, restricted to Ax B. The cascade
homotopies of Chapter 14 carry over with A x B in place of Euclidean space. In
short, we have an embedding like Equation 14.1.4 that includes all of the systems
for slicing witness sets at every dimension. As in the cascade method on Euclidean
space, we need to square up systems as necessary. Omitting detailed argumenta-
tion, this just amounts to choosing random, complex matrices M / , Mg, Mxy,S, U, v
with dimensions as follows:
Matrix M/ Mg Mxy S U v
rows N- a N -b a+b a+b N N
columns #(/) #(g) N N 2N 1
The result is a system of 2N polynomials:
Mf • f(x)
£(x,y,t)= Mg-g(y) (16.1.1)
Mxy(x-y) + S-T(t).(u-[xy\+vy
where T(t) is & NxN diagonal matrix with entries ti,... ,tN. Just as in the regular
cascade method, we choose t\,... ,tn randomly, and a witness set for dimension i
is found by solving the equations £ (x, y, t^) = 0, where t ^ = (ti,...,ti,O,...,O).
To get started, note that we have at the outset the solutions (a*, f3j) 6 CN x CN,
i = 1 , . . . , deg A, j = 1 , . . . , deg B of the system J-(x, y) = 0, where
'Mrf(xy
M
H*,V)= ['S] • (16-1.2)
LJA\X)
. LB(y) _
Now, the top dimensional component of A n B is at most fci := min(a, b) and the
lowest is at leastfco:= max(0,a + b - N). We solve for dimension fci by tracking
the solution paths of
from each of the start points (ai, 13j) at s = 1 to get at s = 0 three kinds of points:
witness points on the diagonal x — y = 0, points at infinity, and "nonsolutions." The
292 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
nonsolutions at dimension i are the start points for the homotopy to dimension i — 1,
£(x,y,{t1,...,ti-1,s,0,...,0)) = 0, (16.1.4)
whose solution paths we follow from s = 1 to 0. This is a brief, but procedurally
complete, description of the diagonal intersection method.
As outlined above, each homotopy is a system of 2iV equations in 2N unknowns.
In (Sommese, Verschelde, & Wampler, 2004c), it is shown how to consistently reduce
the size of the homotopy by using intrinsic formulations of the linear equations.
Finally, it is important to note that the output of the diagonal homotopy method
is a witness superset. We still need to remove junk points and, if desired, break
the witness sets into irreducible witness sets. The algorithms of Chapter 15 are
directly applicable.
With the diagonal intersection algorithm in hand, we have much more flexibility in
how we solve systems of polynomials. For example, we can subdivide a system into
two sets of polynomials, compute the irreducible decomposition of each, and the use
the diagonal method to intersect each irreducible component of the first subsystem
with each one of the second. With a little bookkeeping, for eliminating duplications
and so on, we get a numerical irreducible decomposition for the whole system.
Taking this approach to the extreme, we may first find witness sets for each
polynomial individually, and then intersect these one-by-one. We call this solving
the system equation-by-equation (Sommese et al., 2004e). The approach is most
easily described in terms of a flowchart, shown in Figure 16.1. The post-processing of
points coming out of the diagonal homotopy discards duplicates and checks whether
singular points are junk. In the junk removal box, we have used the shorthand V{W)
to mean the algebraic set witnessed by W. We also allow an affine algebraic set
Q to be pre-specified for discarding points on known degenerate sets or sets not
of interest. For example, should we wish to work on (C*)N, Q is the union of the
coordinate planes, X{ = 0, any i.
The flowchart also includes two tests that eliminate some witness points of the
subsystems before they get to the diagonal homotopy routine. The one on the
left, "/fc+i = 0?," recognizes that if a witness point satisfies the new equation,
then the set it represents does too, and it passes to the output without change
of dimension. The points eliminated by the similar test on the right, ufi(x) =
0 any i < fc?," discards points on components we have already found. Such tests are
cheap compared to running the diagonal homotopy, so it is useful to employ them.
The pruning of points in the flowchart can be made more stringent if all we wish
to find are the nonsingular isolated points of the system. Supposing that the original
system is square, / : CN —> C^, we can keep in the output for V ( / i , . . . , ft) just the
nonsingular witness points for dimension N -i. There are not enough polynomials
Intersection of Algebraic Sets 293
16.2.1 An Example
Consider once more the system given in Equation 12.0.1, which we treated with
WitnessSuper in Example 13.6.4 and with Cascade in Example 14.2.1. The
equations are
\h(x,y)] [ x(y2-x3)(x-l) 1
2 3 U
[h{x,y)\- [x(y -x )(y-2)(3x + y)\ "
It is easy to confirm by hand that the equation-by-equation algorithm flows as
follows. The numbers next to the flow lines indicate how many points flow that
direction. Counting the computation of witness points for the individual equations,
294 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
there are a total of 5 + 6 + 2 = 13 homotopy paths in the procedure for this problem.
This compares to 36 paths for WitnessSuper and 39 paths for Cascade.
#Wf = 5 #X2 = 6
5 ^ -. 6
Witness y ( / i , / 2 )
16.3 Exercises
Exercise 16.1 (Flowchart) Draw a diagram showing how witness points flow
when the equation-by-equation method is applied to the system of Example 13.6.5.
(Hint: some points coming from the diagonal homotopy go to infinity.)
Exercise 16.2 (Eigenvalues) Chart the flow of witness points for an equation-
by-equation treatment of the eigenvalue problem described in § 16.2. Assume the
size of the matrix i s n x n . The output of the diagonal homotopy at each stage
consists of only nonsingular points and points at infinity. How many paths are
tracked in total?
w
x
V • • ^ W X ^ r
V7
7—
f Diagonal \
I Homotopy I
I I r^>^
\ Discard /
Witness V ( / i , . . . , / f c +i)
Fig. 16.1 Stage A; of equation-by-equation generation of witness sets for V(fi,..., fn) 6 C w \ <5-
The witness sets are subscripted by codimension and superscripted by stage. Q is some pre-
specified algebraic set on which we wish to ignore solutions.
Appendices
Appendix A
Algebraic Geometry
299
300 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The complex neighborhoods introduced in § 12.1.1 are convenient because they may
be chosen small enough to discard global information. Loosely speaking, they let us
put local properties of a space "under the microscope." When using complex neigh-
borhoods it is often useful to choose local coordinates which are not polynomials.
Here is a typical example.
Example A.1.1 Consider the affine algebraic set Z := V(w2 - z). We have a
map 7T : Z —> C given by ir(z,w) = z. There are two points in the fiber TT~1(1)
over 1, i.e., (1,1) and (1,-1). As we will see, Z is a manifold, and a natural
parameterization of Z at (1,1) g Z is given by (z, ^/z) where we choose the branch
of yfz with \/I = 1, and stay in a neighborhood of 1, e.g., {z £ C | z ^ (—oo,0]},
where the branch gives a well-defined function.
For doing algebraic geometry over the complex numbers, it has been standard
for over a century to use holomorphic functions such as the function ^/z in Exam-
ple A.1.1 and holomorphic functions such as ez.
When talking about holomorphic functions, we use the complex topology unless
we explicitly say otherwise, e.g., that a set is a Zariski open set.
A function / defined on an open set V C CN is said to be a holomorphic function
on V if given any x = (xi,..., xN) G V, there exists a neighborhood U C V of x
Algebraic Geometry 301
Just as in one complex variable there are many equivalent ways of denning holo-
morphic functions, e.g., in terms of the Cauchy-Riemann equations. We refer the
reader to (Pritzsche & Grauert, 2002; Gunning, 1990; Gunning & Rossi, 1965) for
more on holomorphic functions. We need only a few facts about them. The first
is the obvious fact that polynomials are holomorphic. Locally, polynomials and
holomorphic functions look and behave the same, but when looked at globally,
holomorphic functions can be much more wild than polynomials, e.g., ez — 1 has in-
finitely many complex zeros. On the other hand, there are many results that assert
that a holomorphic function with growth as moderate as an "algebraic function" is
an "algebraic function." For example, any holomorphic function / on C ^ with the
property that there is a constant C > 0 and an integer K > 0 such that
is a metric space that locally looks like Euclidean space. The definition of differ-
entiable manifold requires some technicalities because manifolds can have different
degrees of smoothness. You need a set {Ua \ a € 1} of open sets which covers the
manifold, i.e., X = UaeiUa, and for each Ua, a map (pa : Ua —> R n that gives a
homeomorphism Ua to an open set of ]Rn. Moreover:
Holomorphic functions satisfy very strong restraints that are often considerably
stronger when the domain of the functions is at least two dimensions. For example,
there are several convenient extension theorems.
Proof. The map A is given by functions Ai,...,AM, and has a unique extension
to U, since the Ai extend uniquely to U by the single function version of Hartogs'
Theorem, e.g., (page 307 Fritzsche & Grauert, 2002). Since the holomorphic func-
tions gt(Ai(z),..., AM{Z)) are identically zero on U \ K, the extensions to U are
identically zero. Thus A(U) CY. •
Algebraic Geometry 303
Remark A.2.2 Theorem A.2.1 is not true with Y merely a complex analytic
subset of an open set U C CM. For example, if G is the open unit ball in C2, K is
the closed ball in C2 of radius 1/2, and Y = U := G\K, the result is false. It is true
whenever U is a holomorphically convex open set of C M , see, (page 75 Fritzsche &
Grauert, 2002). Such sets include CN and open balls.
Here is a typical use of Hartogs' Theorem.
Example A.2.3 Let X := CN \ 0 with N > 2. Then X is not isomorphic to
an affine algebraic set. To see this assume otherwise that it was isomorphic via
F : X —> X' to an affine algebraic set X' C C M for some positive integer M.
Then, since X' is closed, any sequence xn G X converging to 0 £ C^ cannot have
their images F(xn) converge in C M . But, such a sequence does converge, since by
Hartogs' Theorem A.2.1, the mapping F has a holomorphic and hence continuous
extension to CN.
The following simple result puts Example A.2.3 in perspective.
Lemma A.2.4 Let g be an algebraic function on an affine algebraic set X C CN.
Then X \V(g) is isomorphic to an affine algebraic set.
r dfi dfx -I
dzx '" dzN
dfN _ _ _ dfN
• dz\ dzN -
is invertible at x.
dfk . 3/fc
- dzx dzpi -
The analogues of the many consequences of the differentiable implicit function the-
orem hold with no change. For example, we have a corollary that we will use below.
d<f>m _ _ _ d<j,m
- Szi dzm -
U U
"9^7 9i^" '
dtpm + l 90m + l 1-1 rv
3zi '" 9 z m -- • • • U
90w d(f>N r\ i
u L
L aZl '"' dzm ''" J
The reader might observe that in Examples 12.1.3 and 12.1.4, there is a one-to-
one and onto mapping C —> V(w2 — z) given by sending j o e C t o (w, w2). Note that
the differential of the mapping everywhere has rank one, and the map (z, w) —> w
gives an inverse. Given this, it is natural to hope that given a smooth point x of
an affine algebraic set Z, there is a Zariski open dense neighborhood U C Z of x
which can be identified with a Zariski open dense subset of some Euclidean space.
It is a fact of life that this is false.
Example A.2.10 Let I c C 2 denote the affine algebraic set defined by p(z, w) =
w2 ~z{z-l)(z-2) =0. Since
306 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
is the empty set, it follows from Corollary A.2.9 that V{p) is a manifold. It can be
shown that V(p) is as a differentiable manifold homeomorphic to a torus minus one
point, i.e., homeomorphic to S1 x S1 minus a point, where S1 denotes the circle
S*1 := { z e C \z\ = l} . Any Zariski open set U C V(p) is the complement of a
finite set on V(p). Thus, there will be two differentiable embeddings of the circle
5 1 to V(p) that meet transversely in only one point. But, there can be no such
maps of S1 into C. One of the beauties of Zariski open sets is that they are very
big. The problem here though, is that Zariski open sets are too big.
For example, at (1,1) on Example A.1.1, (j> can be taken to be z —> (z, yfz).
Note that by Corollary A.2.9, it follows that given a smooth point x £ Z, there
are holomorphic coordinates z\,..., z^ defined on a complex open set U C CN con-
taining x and such that Zi(x) = 0 for all i, and such that U(~\Z = V(zm+i, • • •, .ZJV).
This integer m is defined to be the complex dimension of Z at a regular point x £ Z.
The complex dimension of Z at m is half the usual dimension of Z considered as
a topological manifold at x. We typically use the word dimension for complex di-
mension and refer to the usual dimension as the real dimension. For example, the
complex dimension of C is one and the real dimension is two. It is traditional to
denote the smooth points of a quasiprojective set Z by Z reg . The points in Z \ Zreg
are called singular points. The singular points of Z are denoted Sing(Z). The di-
mension of Z at a smooth point is well defined. A nice argument for this follows
by adapting the very short argument for differentiable manifolds (page 7 Milnor,
1965). We gave a general definition of dimension in § 12.2 based on the irreducible
decomposition.
Algebraic Geometry 307
One difficulty with deciding for which points an algebraic set are smooth is that
the defining equations for the set might have too much information packed in them.
Here is an example where the defining functions will not suffice.
Example A.2.11 Let Z := V(z2) C C. In this case, Z = V(z) also, and using
the defining equation z, we see that Z is a manifold. The problem with the defining
equation z2 is that it also includes multiplicity information about Z.
Remark A.2.12 There is no easy computational solution to the problem posed
by the last example. The set of smooth points of an affine algebraic set V(f) is
Zariski open and dense, but the prescription for the singular set is nontrivial.
Given an affine algebraic set Z C C^, Z = V(I(Z)) where I(Z) denotes the
ideal of polynomials in C[zi,... ,ZN] that vanish on Z. One version of Hilbert's
Nullstellensatz, e.g., (Cox et al., 1997), says that given an ideal X C C[zi,..., z^\,
then I(V(X)) = y/X, where y/X, the radical of X, consists of all polynomials g such
that gk G X for some positive integer k. For example, on C, y/(z3) = (z). The
passing from an ideal to its radical throws away all multiplicity information.
The radical intervenes in the algebraic characterization of the set of smooth
points of an affine algebraic set. Let g\, • • • -,gu be a basis of the radical of X(f).
It follows that (Chapter 1A Mumford, 1995) that the singular set, Sing(V(/)), of
V(f) is equal to
T/ ( dgi dgM \
\ dzi dzN)
It must be noted that for a fixed N, M can be arbitrarily large.
A special case of the above, mainly useful for making illustrative examples, is the
following.
Lemma A.2.13 Let p G C[zi,..., z^\. The singular set of V(p) is contained in
(to dp^\
These special sets are irreducible and naturally occur as parameter spaces.
There are situations, e.g., in the study of endgames in Chapter 10, when we want to
look carefully at behavior in a neighborhood of a point. In such situations specifying
a fixed neighborhood of the point is inconvenient, and the notion of a germ of a
complex analytic set improves clarity. (Chap. II, Sec. E Gunning & Rossi, 1965) is
an excellent place for becoming comfortable with germs of complex analytic sets.
Since we will not be talking about germs of other types of sets, e.g., germs of affine
algebraic sets, which are denned analogously using the Zariski topology in place of
the complex topology, we will often refer to the germ of a complex analytic set as
a germ of an analytic set or a germ.
Given a point x e CN we define an equivalence relation on complex analytic
sets containing x: if X c U and X' C U' are two complex analytic sets denned on
open neighborhoods of x in C ^ , then we say that X and X' have the same germ
at x if there is an open neighborhood V C U D V with X n V = X' D V. Thus the
complex analytic sets V(z) and V(zw) define the same germ of a complex analytic
set at (0,4) but not at (0,0).
Given a point w = (WI,...,WN) G C ^ , we let ||u;|| = \/|u!i| 2 + • • • + \WN\2
denote the Euclidean norm and we denote the ball of radius r about a point x by
Br(x), i.e.,
Br(x)~{zeCN ||z-x||<r}.
Algebraic Geometry 309
We emphasize this is equivalent to saying there is a positive number e' such that
for all positive e that are less than e', there are representatives of X and the X, in
Be (x) satisfying the conclusions of the theorem.
Note if we have an irreducible affine algebraic set X c C ^ , it may well happen
that the germ of a complex analytic set defined by X at a point x e X is not
irreducible, e.g., X — V{z\ — z\ — zf) at (0,0), which is discussed in detail in
Example A.4.18.
We say an algebraic set or a complex analytic space X is irreducible at a point
x 6 X if the complex analytic germ defined by X at x is irreducible. X is often
said to be locally irreducible at x if X is irreducible at a point x € X. We say
that X is locally irreducible if it is irreducible at all points. In the literature, being
irreducible at a point x is sometimes referred to as being topologically unibranch at
x (page 43 Mumford, 1995).
We define the dimension of a germ X at a point x as the maximum of the
dimensions of the irreducible germs occurring in the irreducible decomposition of X.
The following result is important in Chapter 10.
Theorem A.3.2 Let X be a germ of a one-dimensional complex analytic set at a
point x of CN. If X is irreducible at the point x, then there exists a representative
of X on a neighborhood of x, which by abuse of notation we also denote by X, and
a holomorphic map <j> : Ai(0) —• X from the open unit disk Ai(0) C C to X such
that (f>(0) = x and <p gives a biholomorphism of Ai(0) \ 0 and </>(Ai(0)) \ x.
Proof. This is sometimes referred to as the local uniformization theorem for one-
dimensional analytic sets. It is the one-dimensional complex analytic version of
Theorem A.4.1.
A proof for it can be based on the local parameterization theorem (Gunning,
1970), which, in its simplest form for a pure one-dimensional complex analytic set X,
says that given a point x C X, there exists a finite proper surjection n : U —> Ai(0)
where U is an open neighborhood of x and where 0 = n(x). Since X is one-
310 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Proof. Let w denote any coordinate on Ai(0), which is 0 at 0. The power series
expansion of g((f>(w)) is given by
oo
g(x)+Y^aiw\
k=c
The Hironaka Desingularization Theorem, which holds for both complex algebraic
sets and complex analytic spaces, is highly nontrivial, but extremely useful. Given
a quasiprojective algebraic set X (respectively, a complex analytic space X), a
desingularization f : X —> X of X is a quasiprojective manifold X (respectively, a
complex manifold X) and a proper surjective algebraic (respectively, holomorphic)
map / : X —> X such that //-i(x r e g ) : / -1 (-Xreg) —> XTeg is an isomorphism
(respectively, a biholomorphism) with f~l(Xreg) Zariski open and dense in X. Xreg
is always Zariski open and dense in X.
Algebraic Geometry 311
More refined versions of the result tell us that we may choose the desingulariza-
tion map so that the inverse image of the singular set under the desingularization
map is a union of smooth codimension one algebraic sets which meet transversely.
See (Lipman, 1975) for a nice exposition of this result.
In the case when all components of X are of dimension one, Theorem A.4.1 is
simply the normalization of X, e.g., see (Fischer, 1976).
The Hironaka Desingularization Theorem makes many facts that are easy for
manifolds carry over immediately to general algebraic sets.
Here is one simple example often referred to as the maximum principle, e.g.,
(Theorem III.B.16 Gunning & Rossi, 1965).
Theorem A.4.20 will give another illustration of the clarity brought by using
Hironaka's theorem.
The proper mapping theorem of Grauert assures us that many operations with
algebraic sets yield algebraic sets.
Proof. The analytic statement may be found in (Fischer, 1976). This, or the
simple fact that in the complex topology proper maps take closed sets to closed
sets, automatically implies the algebraic statements. To see this note that if X is
projective, affine, or quasiprojective algebraic, we know that ir(X) is constructible
by Theorem 12.5.6. Since n(X) is closed, we have the conclusion from Lemma 12.5.4.
•
Proof. Let y be a point not in the image of X. By dominance we can find a sequence
of points yj G f(X) with yj converging to y. Choose Xj G X with f(xj) = yj. By
the definition of properness, there is a neighborhood U that contains y such that
f~l(U) is compact. Thus there is a subsequence of the Xj that converges to a point
x G X. By continuity of / , we have the contradiction that f(x) = y. •
{xex\ dimxr'ifix^yk}
is a quasiprojective algebraic set (respectively a complex analytic space).
Remark A.4.6 Let / : X —> Y be an algebraic map between algebraic sets. As
we see from Example 12.5.5, the sets
{yGY | dim f-\y)> k)
do not have to be algebraic sets, though by Theorem A.4.5 and by Theorem 12.5.6,
they are constructible.
Using Theorem A.4.5 and Theorem A.4.3 we have the following result.
Corollary A.4.7 Let f : X —> Y be a proper algebraic mapping of quasiprojective
algebraic sets. For each integerfc> 0, the set {y G Y\ d i m / ^ 1 ( y ) > k} is a closed
quasiprojective subset ofY.
Finally we have the very useful Factorization Theorem of Remmert and Stein
(III Corollary 11.5 Hartshorne, 1977). Note that finite-to-one proper maps are called
finite maps by algebraic geometers, e.g., (Hartshorne, 1977).
The following general lemma, which is a special case of (III Proposition 10.6
Hartshorne, 1977), is often useful.
Lemma A.4.9 Let f : X —* Y be an algebraic map from a quasiprojective al-
gebraic set X to a quasiprojective algebraic set Y. Let Xr denote the closure of
Algebraic Geometry 313
those points from x e Xreg such that f(x) 6 f(X) and rank dfx < r. Then
dimf{Xr) < r.
The algebraic analogue of Sard's Theorem, e.g., ((3.7) Mumford, 1995), is much
crisper than the usual Sard's theorem for differentiable maps. It is responsible,
through the Bertini theorems of § A.9 for many of the strong probability-one state-
ments in this book.
Remark A.4.11 The differentiable form of Theorem A.4.10 is quite weak. For
example, consider the infinitely differentiable map / : E 2 —• R defined by
/(*,„) : = f e x p ( ^ i f ^ < l .
[ 0 if x2 + y2 > 1
The image of this map is [0,e -1 ]. Over the dense set (Oje"1) of the image [0, e" 1 ],
/ is of maximal rank, but f~1(U) is far from dense in IR2.
Another useful fact is that generically dimensions add.
Proof. By Theorem A.4.5, there is a dense Zariski open set V C X such that
dimx f~1(f(x)) is a constant k for all x G V, and for all x G Z := X \ V,
dimx f~1(f(x)) > k. Using Theorem A.4.10, we see that k = dimX — dimY.
We will be done if we show that f(Z) is not Zariski dense. Assume it was.
Then we would have an irreducible component Z' of Z mapped dominantly to
Y. Using Theorem A.4.10 we conclude that a dense set of points x £ Z' satisfy
dimx fz^{fz'{x)) — dimZ' — dimY. But this gives the contradiction that
dimX - dimY = k < dim^ f^ifz'ix)) = dimZ' - dimY < dimX - dimY.
a
314 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The following result (Corollary, page 138 Fischer, 1976) is useful for analyzing
not necessarily proper maps. The algebraic case with the Zariski topology follows
from it by using Theorem 12.5.6.
Using this and Grauert's Proper Mapping Theorem A.4.3, we have an extremely
important existence result.
Theorem A.4.17 Let f : X —> Y be a holomorphic map between complex analytic
spaces. Assume that Y is irreducible and that all irreducible components of X have
dimension at least equal to dimY. Assume that y € Y and x is an isolated point
°f f1(y)- IfY is locally irreducible at y, then, there are arbitrarily small complex
neighborhoods U C X of x and V C Y of y such that f\j : U —> V is a proper
surjective map with finite fibers.
Proof. Choose a neighborhood X' of x. By Lemma A.4.16, there are open complex
neighborhoods U C X' of x and V C Y of y, such that fu : U —> V is proper. By
Theorem A.4.3, f(U) is a complex analytic subspace of V. Since x is isolated, we
would be done in the algebraic case by Corollary A.4.12 and the irreducibility of Y
at y. In the complex analytic case, we instead use Theorem A.4.13, which implies
dim/([/) = dimY. From this and the irreducibility of Y at y, we conclude that
f(U) contains a complex open neighborhood V of y. The rest of the result follows
by replacing V by V, and U by U n f~l{V). U
Example A.4.18 The local irreducibility is needed for the above result. Let
X := C and let Y := V(g) C C2 be defined by g(x,y) = y2 - x2(x + 1). Consider
the algebraic map / from to X -* Y given by f(t) = (-(1 + t2),yf-i(t + t3)).
The reader can check that / is surjective and one-to-one everywhere except at
±%/^T which are mapped to (0,0). A small complex neighborhood U of (0,0)
on Y is biholomorphic to a small complex neighborhood of 0 on V(xy). A small
neighborhood of t = yf—l does not map onto a neighborhood of (0,0) in Y, but
only onto a complex neighborhood of (0,0) on one of the irreducible components of
U at (0,0).
The following result underpins most constructions of homotopies. It asserts that
under minimal conditions, isolated solutions of a system in a family of systems are
limits of isolated solutions of nearby systems.
Corollary A.4.19 Let X and Y be irreducible quasiprojective algebraic sets. Let
f(x;y) be a system of N := dimX algebraic functions on X xY. Letir : XxY —> Y
denote the product projection. Assume that x* is an isolated solution of f(x;y*) = 0
for some point y* such that Y is locally irreducible at y*. Then each irreducible
component Z ofV(f) containing (x*,y*) satisfies the following properties:
(1) dim Z = dim Y; and
316 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Proof. In the following argument, we will repeatedly replace algebraic sets by dense
Zariski open sets. For the most part, we call these shrunk sets by the same names.
Replacing X by Xreg, we may assume that X is smooth. By shrinking Y and
replacing X by the inverse image under n of the shrunk Y, we may assume that
Y is smooth. Similarly using Chevalley's Theorem 12.5.6, we may assume that
?r(X) = Y. By using the algebraic Sard's Theorem A.4.10 and shrinking X and Y
further we may further assume that IT is of maximal rank.
Let X denote an irreducible projective algebraic set in which X is Zariski open.
Let X denote the closure of Graph(7r) C X x Y in X x Y. The induced map
W : X —> Y extending n is proper. By Hironaka's Theorem A.4.1, there exists a
desingularization / : X —> X. Thus following Theorem A.4.8, we can factor ?F O /
as a o p where p : X —> Z is an algebraic map with connected fibers; where Z is an
irreducible quasiprojective algebraic set; and a : Z —» Y is a finite-to-one proper
algebraic map.
By Corollary A.4.15, there is a Zariski open dense set U' C Y such that U' and
cr' 1 (t/ / ) are smooth and ov-^t/') : <J~1(U') —> U' is a covering. Thus by shrinking
we may assume without loss of generality that a : Z —> Y is a finite-to-one covering;
Z is smooth; and that p(X) = Z.
As we shrink Z we may automatically shrink Y so as not to lose the properties
already obtained. To see this let V' be a Zariski open set of Z. Since the image
under a of the proper algebraic subset Z \ V is a proper algebraic subset, we may
replace Y by U := Y \ a{Z \ V) and Z by V := o-1 (Y \ a(Z \ V1)) to still have an
algebraic finite-to-one covering map a :V —> U between manifolds.
Algebraic Geometry 317
By using the algebraic Sard Theorem A.4.10 on p we may, after shrinking, as-
sume without loss of generality that p is of maximal rank.
Since X is smooth and / is an isomorphism on the inverse image of the regular
points, we may regard X as a subset of X. By using Chevalley's Theorem 12.5.6,
we may assume that px '• X —> Z surjects onto Z.
Since all fibers of p are smooth and connected they are irreducible. We conclude
that all fibers of px are smooth and connected. Indeed, if this failed, we would
have a fiber of p which is irreducible, and which after removing a proper algebraic
subset is disconnected.
Taking U as the final shrunk Y, V :— a~1{U) and W as the final shrunk X, we
have finished the proof of the theorem. •
Besides algebraic mappings, there is a more general notion of mapping that is often
very useful.
On C, the assignment / : x H-> 1/X is a well-defined function on C \ {0}. Using
the identification of x € C with [1, x] € P 1 ,, it is natural to think of / as extending
to a function on all of C that takes 0 to the value oo equal to [0,1]. In this case,
the algebraic set
{(a;, [x, 1]) e C x P 1 | x G C}
is the graph of the map x —> 1/x regarded as a map from C —> P 1 . This is the
simplest example of a rational mapping.
A rational mapping f : X —> Y between quasiprojective algebraic sets is defined
to be an algebraic set F C X x Y such that there is a Zariski open dense set U c X
such that F n (U x Y) is the graph of an algebraic map and V D (U x Y) = T.
Every algebraic mapping is a rational mapping, but rational mappings are much
general. A rational mapping often has a set of indeterminacy, where it cannot be
318 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(z1,z2;[z1,z2]) e C 2 xP 1 .
F does not define an algebraic map at (0,0). Indeed, there is no way to assign a
value there since 22/21 has different constant values on different lines through the
origin. The origin is the set of indeterminacy of / .
Rational mappings may fail to be algebraic mappings even though they are
continuous maps. For example, consider the algebraic map g sending t 6 C to
(21,22) := (*2,£3) e C 2 . The image of g is the affine set C denned by z\ - z\ = 0.
The map g is one-to-one and onto. The inverse from C to C may be checked to be
a rational mapping: it is the restriction of the rational function 22/21. However, it
is not an algebraic mapping, since t as a function on C is not the restriction of a
polynomial p £ C[2i, 22].
Proof. By Theorem A.4.10, we know there is a Zariski open set U C f{X) such that
V(f(x) — y) is smooth and such that the Jacobian matrix of / has rank equal to
dim/pQ at all points of V(f(x) -y). Corollary A.4.12 gives that dim V(f(x)-y) =
N-dimf(X). •
Let V and / be as in Theorem A.6.1. For the dense Zariski open set U :=
Algebraic Geometry 319
/~ 1 (y)nX r e g , we have for all points £* 6 U, the rank of the Jacobian of / evaluated
at x* equals rank/. This gives us a quick probability-one algorithm for the rank
for a system f. Given an algebraic system f(x) of n algebraic functions on an
irreducible ./V-dimensional quasiprojective set X, the rank of / equals the rank of
the Jacobian at a random point of X.
The following is useful.
Corollary A.6.2 Let f(x) = 0 denote a system of n algebraic functions on an
irreducible quasiprojective set X. If the rank of the Jacobian of f at some point
x E Xreg is k, then rank/ > k.
Proof. The set on X reg where the Jacobian has rank less than or equal to A; is a
quasiprojective subset of X reg . If it is dense then rank/ = k. If it is not dense,
then the rank of the Jacobian is greater than A; on a Zariski dense set, which would
imply rank/ > k. •
The rank of a system is a useful invariant, but from the viewpoint of the Bertini
Theorems, a closely related invariant, the projective rank of a system, plays a more
central role. The first time reader may safely ignore the rest of this section and any
mention of the projective rank of a system. The importance of projective rank stems
from Theorem A.8.6, which states that projective rank controls the nonemptiness
of the zero sets in Bertini Theorems.
The projective rank of a system f is the dimension of the closure of the image
of the rational mapping given by sending x G X \ V(f) to [/i(a;),..., fn(x)]. We
denote the projective rank of / by rankp/. A system / on CN having projective
rank N is called a big system.
Remark A.6.4 For an algebraic line bundle and a system of algebraic sections
/ i , . . . ,fn, rank does not make good sense but projective rank does. It is closely
related to the notion of Kodaira dimension, e.g., (Iitaka, 1982).
Lemma A.6.5 Let f be a system of n algebraic functions on an irreducible qua-
siprojective algebraic set X. Then rank/ — 1 < rankp/ < rank/.
Proof. The rational mapping used in the definition of projective rank factors as
the composition of the map in the definition of rank followed by the map Cn \
{0} —> P™"1, which sends (ZQ, ..., zn-i) —> [ZQ, ..., zn^i]. Since the fiber of the
map C™ \ {0} —> P " - 1 is dimension one, we are done. •
320 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The above proof makes clear how the two ranks may fail to be equal. Before we
make this precise in Lemma A.6.7, let us give a definition and an example.
Let X c P™ be an irreducible projective set. X is said to be a cone with
vertex x e X if for some hyperplane H of Pra not containing x, the projection
•KX : P™ \ {x} —> H maps X \ {x} to a set whose closure has dimension less than
dimX. An irreducible affine algebraic set X C C™ is said to be a cone with vertex
x G X if, when we regard Cn as a subset of P n , the closure of X in P n is a cone
with vertex x.
Proof. The proof of this is immediate from the definitions and left to the reader. •
rankp/ = rank j , . . . , - ^ ,
I /i h )
where the system of quotients is defined on X \ V(/i).
Proof. The proof of this, and the independence of the choice of the not identically
zero fi, follows immediately from definitions and is left to the reader. •
In this section we will give a detailed discussion of certain special families of poly-
nomials, which are useful in the study of polynomial systems.
Algebraic Geometry 321
withz := [zo,zi] G P 1 ?
(3) Since multiplying an equation by a nonzero complex number does not change
the solution set of the polynomial, should we make the convention that c is
not the point (c 0 , ...,cd) G C d + 1 , but instead [c0, ...,cd] G P d ? Doing this, of
course, implicitly throws away the identically zero polynomial, that corresponds
to c = 0.
For simplicity, we look only at polynomials with no restrictions on the c*, i.e.,
we assume that (z,c) = (z,c0,... ,cd) G Cd+2 corresponding to polynomials of
degree < d. The different choices raised by the issues listed above are treated in a
similar way.
Let's introduce some notation. We let Zd c Cd+2 denote the solution set of
p(z, c) = 0. We let n : Zd —> Cd+1 denote the map induced by the projection
(z, c) —> c, and we let p : Zd —> C denote the map induced by the projection
0,c) ->• z.
Note that for any given c € Cd, n~1(c) consists of the points (z,c) satisfying
p(z, c) = Co + c\z + . . . + cdzd = 0. It is important that the zero set Zd C Cd+2
oip(z,c) = 0 is a connected (d + l)-dimensional complex manifold. Indeed, Zd is
dimension d+ 1 since it is denned by a single algebraic function on an irreducible
d'oi z cl
(d+ 1)-dimensional algebraic set. Moreover, since —^ ' = 1, it is a consequence
OCQ
Let us consider the points c for which p(z, c) has multiple roots, i.e., less than d
distinct roots. This corresponds to points c for which the equations
P(Z>C)] _n ,A 7 9 N
dP(z,c) — u {A. (.A)
. dz J
have a common root. The classical prescription, e.g., (Chapter 1 Walker, 1962)
and (Cox et al., 1997), for how to eliminate z from these equations constructs the
discriminant of p(z,c), a polynomial of degree Id — 1 in c, which is the resultant
of the two polynomials in the system (A.7.2). We only discuss resultants briefly in
§ 6.2.1 of this book. For us it will be enough to note that
(1) for some c», e.g., the c* corresponding to the polynomial zd — 1 = 0, the roots
are distinct;
(2) for c in a complex neighborhood of c* the roots remain distinct; and
(3) the set S in Cd+1 denned by the system A.7.2 is an affine algebraic set.
We know from item 3) and Chevalley's Theorem 12.5.6 that TT(<S) is a constructible
set. We know that the closure T> of n(S) in the complex topology is an affine
algebraic set by Lemma 12.5.3. By item 2) we conclude that V ^ C d+1 and thus
that the Zariski open set C d + 1 \ T> is nonempty, and hence dense.
P(z,c)= Yl °JzJ
\j\<d
n
We define the universal function F(X: x) := V^ A;/,(z) on Cn x X.
i=l
It is traditional in this context to refer to the solution set V(f) of the set of
algebraic functions fi(x),..., fn{x) as the base locus of the set of functions. We
will not use this language, but the reader should be aware of it.
Zf : = V(F) i s a quasiprojective algebraic set with Zf (1 [Cn x (Xieg \ V(f))]
smooth. Moreover there is a Zariski open dense set U C Cn, such that the restriction
of the projection map -K : Cn X (Xreg \ V(/)) —> C" to Zf H TC~1 (U) is either empty
or a maximal rank map. This is important enough to state as a Theorem.
Theorem A.7.1 (Simple Bertini's Theorem) Let f{x) := {fi,- • • ,fn} be a
system of algebraic functions on an irreducible quasiprojective algebraic set X.
There is a Zariski open dense set U C C™, such that for (Ai,...,A n ) G U,
it follows that g := Yli=i Ai/i has a possibly empty quasiprojective zero set Z
such that Z (~i (Xreg \ V(f)) is smooth with the differential dg nowhere zero on
zn(xieg\V(f)).
Proof. First note that we can assume that X is smooth and V(f) is empty, by
simply replacing X with (XTeg \ V(/)) and renaming. Note that if rank/ = 0, then
each g is constant and the theorem is vacuously true. Therefore we can assume that
rank/ > 0.
We have the "universal function" F(\,x) :— X^ILi ^ifi(x) defined for (X,x) G
n
C x X. Zf := V(F) C X is smooth by the same reasoning as used in § A.7.1.
Consider the maps TTI : Zf —>• Cra and 7T2 : Zf —> X induced by the projections
C " x X - » C " a n d C n x I ^ I respectively. The fiber ^(x) for any x G X is an
affine hyperplane of C™. It can be further checked that given any x e X, there is a
neighborhood O of x in the complex topology such that 7r^"1(C7) is biholomorphic
to C"" 1 x C. Thus Zj is a bundle over X and therefore irreducible of dimension
dimX + 71—1.
We are in the situation of Theorem A.4.10, and would be done, if we knew that
7Ti is dominant. Assume it is not. Then, there is a Zariski open dense set U C C n
such that for A G U we would have that V(£)™=1 A,/*) = 0. •
Proof. The set 7r^"1(x) for x 6 X may be identified with the linear space of A 6 Csx™
satisfying A • / ( x ) = 0. Since f(x) ^ 0, this space has codimension s. It can be
further checked that given any x G X there is a neighborhood 0 of a: in the complex
topology such that T T ^ ^ O ) is biholomorphic to ^ ' " " ' ' x O . Prom this it follows that
the set Zf C C s x n x X consisting of (A, x) such that F(A, x) = 0 is an irreducible
set. •
Proof. Since at any point x G V(F) C Csxn x X at least one of the / , is nonzero,
we can see that all the partial derivatives
dFj{A,x)
After we develop a few results on linear projections and subspaces, we will prove
Theorem A.8.7, the analogue for systems of Theorem A.7.1.
"Generic" projections have been used since classical times to reduce questions about
general algebraic sets to questions about hypersurfaces. Here we present the basic
facts that we need. We follow the presentation in (Sommese et al., 2001c) closely.
A linear projection n : CN —> C m is a surjective affine map
?r(x) = a + ^ x , (A.8.3)
where
«i,o a i , i • • • ai,N Xi
o= : ; A= : • . : ; and x = : (A.8.4)
.«m,oj L a m , l • ' • a m,JVJ LXN.
i.e., TTJ~ (TTI(O)) is parallel to TT^" (7^(0)). So in the special case of linear projections
from C^ —> C^^ 1 with N > 2, we can consider the projections to be parameterized
by the lines through the origin, or equivalently the hyperplane at infinity H^ :=
N
V(ZQ), where we regard C^ as embedded in ¥ by
A.8.1 Grassmannians
We denned the iV-dimensional projective space P^ in § 3.2 as the set of lines through
the origin in CN+1. Replacing linesby (m+1) -dimensional linear subspaces through
the origin leads to the notion of a Grassmannian. We define the Grassmannian of
(m + l)-planes in (N + l)-space to be the set of all (m + l)-dimensional linear
subspaces of CN+1 through the origin. Equivalently this is the space of linear P m s
in FN. We denote this space Gr(m, N). The reader should be aware that there is a
second convention in the literature where the focus is on CN+1, and the space we
denote Gr(m, N) is denoted Gr(m + 1, N + 1).
An (TO + l)-dimensional subspace of C w + 1 through the origin is determined by
m + 1 elements of CN+1. In analogy with homogeneous coordinates on projective
space, we may represent an element of Gr(m, N) by an (m + 1) x (N + 1) matrix
A. Conversely, we would like an (m + 1) x (N + 1) matrix A to represent an
element of Gr(m, N). For A to represent an (m + l)-dimensional linear subspace, A
must have rank m + 1 , e.g., for projective space P^ = Gr(0,N), the (N + l)-tuples
[zo,... ,ZN] £ FN are not allowed to have all entries 0. In analogy with homogeneous
coordinates on PN, if G is an (m + 1) x (m + 1) invertible matrix, then the rank
m + 1 matrices A and G • A represent the same linear subspace.
As with FN, we can define embeddings of c( m + 1 ) x ( i V - m ) into Gr(m, N). Indeed,
given an (m+1) x (N — m) matrix B, if we send it to [Im+i B], we have a one-to-one
mapping. We take this as giving a neighborhood of any of the elements of the image
of this map. We can construct other embeddings of c(m+i)x(W-™) whose unions
cover Gr(m, N), but do not do so since we do not need this.
Grassmannians are connected projective manifolds. As we saw above, the di-
mension of Gr(m, N) is (m + 1) x (N — m). There is a natural embedding
GrfoNO-pG+D-1
326 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Proof. Using the identification of affine linear spaces of C ^ with linear spaces in
P ^ , we only need to show this result in the case of FN. Note that d i m T r ^ X ) =
n + m(N — m). Thus 7Ti(7r^"1(X)) cannot be dense because if it was
Proof. This theorem is proved by reasoning similar to that for Theorem A.8.1.
We only prove the case when X is projective algebraic (the quasiprojective case
requires the projective case plus an application of Lemma 12.5.2). Fix a point
x £ X C ¥N. The L € Gr(m, N) that contain x, i.e., G\ := 7T1(7r2"1(ar)) is isomorphic
to Gr(m — 1,N — 1) and thus irreducible and m(N — m) dimensional. The set L
containing x and a point y ^ x is isomorphic to the set G2 '•— Gr(m — 2, N — 2) and
thus (m — 1)(N — m) dimensional. The set W of L that contain x and some other
point y of X is thus of dimension at most (m — l)(iV —TO)+ n. Here we are using
the fact that W is projective. To see this let 52 : G± —> fN denote the algebraic
mapping induced by TT2- W is the set ir^q^iX)).
Since dimGi - d i m W > m(N -m) - ((m - l)(iV -m)+n) = N-m-n>l,
we conclude W is a proper algebraic subset of an irreducible projective algebraic
set C?2- Thus there is a Zariski dense open set G2 \ W of m-dimensional projective
linear spaces L containing x and no other point of X.
The tangent space assertions follow by a dimension count showing that the space
of L € Gi such that TL,X H TX,X ¥" x ^ TPN<X is of dimension less than dim G2- The
details are left to the reader. •
Proof. Using the same sort of reasoning used in Theorem A.8.2 or a repeated use
of Theorem A.7.1 gives this result. •
where
a a a
0,0 l,l ''' 0,N ZQ
and where i is the linear projective space p^- 7 7 1 " 1 c P w defined by the vanishing
of the linear equations Az. L is the center of the projection. Theoretically, we
work with equivalence classes of projections, considering two projections TTI, TT2 from
PN onto P m equivalent if they have a common center L and there is a projective
linear isomorphism T : P m -> P m with T(TTI(X)) = -K2{x) on P ^ - L. Note that
two projections from P ^ to P m are equivalent if and only if they have the same
center L. Thus the linear projections ¥N —> P m are naturally parameterized by the
Grassmannian Gr(N — m — 1,N) of (N — m— l)-dimensional linear spaces L c PN.
Geometrically, the projection -KI has a simple description. Let L be the center
of the projection. Choose any P m C P ^ with the property that L D P m = 0. Given
a point x e WN \L and letting (a;, L) denote the linear subspace FN~m C P ^
generated by x and L, the projection from P ^ to P m with center L sends x to
(i,L)nPra.
The projections nL from P ^ to P m that are extensions of projections from C^
to C m are precisely the projections with centers L C H^. Indeed, let y i , . . . , ym be
coordinates on C m and let the usual embedding of C m to P m be given by
{ y i , . . . , y m ) -> [ w o , . . . , w m ] = [ l , y i , . . . , y m ] ,
Since we must have a linear equation in Xi,... ,XN when we dehomogenize with
respect to w0, we conclude that A is of the form
. a m,0 a
m,l ' ' ' am,N .
T
-=\ \°1
Algebraic Geometry 329
ai,o
where u := • , we see that an equivalent form for A is
.am,0.
dimXnffco < d i m X - l .
Proof. Applying Lemma A.8.4 to f{X)~; there is a dense Zariski open set U of
A £ C s x n such that dim(A-/)(X) = min |dim/(X), s\ . •
Proof. This follows from application of Theorem A.8.2 to the closure of the image
ofXinP"-1. •
Proof. The set U' of A with rank equal to min{s, n} is dense and Zariski open.
Therefore, by replacing any dense Zariski open set U C C s x n that is constructed
below with its intersection with U', we may assume that all A G U have rank equal
to m'm{s,n}.
By replacing X with X\V(f) we can assume that the /; have no common zeros.
Denote V(F(A,x)) C C s x n x X by Zf. By Lemma A.7.2, Zf is irreducible of
dimension dimX + s(n — 1). By Lemma A.7.3, Zf f) (C s x n x Xreg) is smooth.
Let TTI : Zf —> CSXn denote the algebraic map induced by the product projection
C ™ x X —> C s X n and let TT2 : Zf —> X denote the algebraic map induced by the
sx
Prom Theorem A.4.10, we conclude there is a Zariski open dense set U C Csxn such
that for A G U and ZA := 7T2(7rf ^A)),
z& n Xies
is smooth with the differentials dFj spanning the normal bundle of Z\ n Xreg. •
In this section we present a general Bertini Theorem about the solution sets of
systems.
Since the intersection of any finite number of dense Zariski open sets is Zariski
open and dense, we can (and typically do) apply Bertini's Theorem to conclude that
a generic choice of some parameters leads to a long list of generic properties. To state
such a result succinctly, let us define the constellation of algebraic sets associated to
a finite number of quasiprojective subsets X\,..., Xr of a quasiprojective set X to
be the collection of sets obtained by repeatedly doing in any order the operations
of
(1) taking irreducible components;
(2) taking intersections;
(3) taking the singular set of a quasiprojective algebraic set;
(4) taking finite unions; and
(5) given two sets A, B taking the set A \ A D B.
Lemma A.9.1 The constellation of algebraic sets, C, associated to a finite num-
ber of quasiprojective subsets X\,-.., Xr of an algebraic set X is a finite set of
quasiprojective sets.
Proof. All these operations start with quasiprojective algebraic sets and produce
quasiprojective algebraic sets.
To prove that C is finite, it suffices to show that the set of all the irreducible
components of the quasiprojective sets obtained by these operations is finite.
Since an irreducible quasiprojective algebraic set A minus a proper algebraic
subset remains irreducible, the last operation leads only to the finite number of
quasiprojective algebraic sets A \ A f) B for the collection of quasiprojective sets
J4, B generated by the first four operations. Thus it suffices to prove the finiteness of
332 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the collection of sets generated from X±,..., Xr by a repeated use of the operations
1), 2), 3), and 4).
Since any intersection is a finite union of intersections of irreducible quasiprojec-
tive algebraic sets, it suffices to prove the finiteness of the collection of sets obtained
by starting with X\,... ,Xr and repeatedly doing only the operations 1), 2), and
3). Note that the operations of taking intersections of irreducible sets and taking
singular sets decreases dimensions if it leads to anything new. Thus, by the fact
that dimension is finite, the operations 1), 2), and 3) lead to only a finite number
of quasiprojective sets. D
dimV(gil,...,gir)n(Z\V(g)) = k-r
and V{gil,...,gir) n (Zreg \ V(g)) is smooth with the differentials dgit,..., dgir
having rank r in the tangent space Tz,x any x e V(gil,..., gir) n (ZTeg \ V(<?))•
Given an s x n complex matrix
Ai,i • • • Ai i7l
A :
= ; ••. • '
. A s , l • ' • ^s,n .
Ai,ji • • • ^i,jb
A(ji,...,jfc) : = : ••. :
-As,ji ' • • *s,jb _
T h e o r e m A . 9 . 2 ( B e r t i n i T h e o r e m f o r C o n s t e l l a t i o n s ) Let fi,...,fn be a
set of algebraic functions on a quasiprojective set X. Given any finite number
A\,... ,Am of quasiprojective subsets of X, let C denote the constellation of quasi-
projective sets associated to
Then there is a Zariski dense open set U C CsXra, such that for A e U and any list
1 < ji < • • • < jb < n the functions
: :=A(ji,...,jb)- \
.9s \ [fjb.
Proof. Since the intersection of dense Zariski open subsets of Csxn is dense and
Zariski open, it suffices to prove the result for a single irreducible set Z e C of some
dimension k.
Further if we showed that for a given list l < i i < . . . < i r < s , the result is
true ioi gi1,..., gir where
"fill [/ji"
: := A ( j i , . . . , j b ) • :
.9s J [fjb.
with A in a dense Zariski open set U(ii, •.., ir;ji, • • • ,jb) C C s x n , we will be done
by taking the intersection of these open sets indexed by the finite number of lists
of integers 1 < i\ < ... < ir < s and 1 < j \ < ... < % < n.
Therefore by renaming, it suffices to prove that there is a dense Zariski open set
(/ C C s x n for any s <n such that setting
"si] r/r
: :=A- : ,
.9s\ [fn.
it follows that
There are many versions of Bertini's Theorem in the literature, e.g., (Example
12.1.11 Fulton, 1998). For a further discussion of Bertini theorems, see also (§1-7
Beltrametti & Sommese, 1995).
334 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
There are some natural embeddings of algebraic sets that are useful. For simplic-
ity, we give versions for projective algebraic sets, though similar constructions are
equally useful for affine algebraic sets.
obtained by sending the point [ZO,...,ZN] to the point with homogeneous coor-
J
dinates made out of all the monomials {z \ \J\ = d} where we use multidegree
notation. The restrictions of the linear equations ¥^ N >~ to the image of the
Veronese embedding give all the degree d equations of VN.
P^ 1 x . . . x FNr -> p n L i W + i J - i
given by sending the point [zito,..., Z±>N1 ; . . . ; zr,o, • • •, Zr,Nr] to the point with ho-
mogeneous coordinates made out of all the monomials z\^ • • • zr^T.
Remark A.10.1 The degree of the image of the Segre embedding of the multi-
projective space FNl x . . . x PNr in p n r=i( Ar ;+ 1 )- 1 i s the multihomogeneous Bezout
number for the system with X)[=i Ni equations all of type ( 1 , . . . , 1). This may be
checked, e.g., using Equation 8.4.15, to be
/ V N- \ CV" 7V-V
\N1,---,NrJ- N1\---Nr\-
On a theoretical level, the Segre embedding shows that subsets of multiprojec-
tive spaces defined by multihomogeneous equations may be regarded as projective
algebraic sets.
One case is of special interest.
Remark A.10.3 (Measuring Degrees) Measuring degree by using the Segre em-
bedding gives the smallest possible values of all the consistent ways of measuring the
degrees of pure-dimensional algebraic sets. Other consistent ways may be obtained
by using the Veronese embedding on the different projective spaces followed by a
Segre embedding. In the language of line bundles mentioned briefly in § A. 13, such
a choice is equivalent to choosing an ample line bundle L on M := P^ 1 x ... x ¥Nr
and then denning the L-degree degL(X) of a pure fc-dimensional X C M to be
c\{L)k • X, where ci(L) is the first Chern class of L.
A := {{z, w) € X x X \ z = w} .
Proof. Embed CN into PN by the map (zi,..., ZN) —> [XQ, . . . , XN] = [1, Z\,..., ZJV].
Let HQ := V(XQ) denote the hyperplane at infinity. Then Sec(X), the closure of
Sec(X) in ¥N, meets Ho in a proper algebraic set of Sec(X). Thus dim iJonSec(X) <
dim Sec (X). So we conclude that
Thus Ho <t Sec(X). Therefore, we can choose a point p £ HQ not in Sec(X). Let
•K : CN —» C^" 1 be a linear projection with fibers being lines having direction p.
No fiber can go through two distinct points of X since if it did, p would be in
Sec(X) C\HQ. Since this is true for a Zariski open set of p e Ho, it follows from § A.8
that it is true for a Zariski open set of projections. •
336 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Lemma A.10.6 Let X be an affine algebraic subset ofCN, all of whose irreducible
components are of dimension < k. Fix a finite set S C CN. For a general linear
Algebraic Geometry 337
Proof. Since the lemma is vacuous if m — N, we can assume that m < JV — 1 and
thus that k < N — 2. We can reduce by induction to the case when m = N - 1.
Let Hoc := f1^-1 denote the hyperplane at infinity in FN. Let y be a point
of S. If y ^ X, consider the map <fiy : X —• H^ given by sending x £ X
to the point (j>y{x) equal to the intersection of H^ with the line spanned by
x and y. If y £ X (~) S, let (f>y : X \ {y} —> i ^ be the analogous map.
The union T of the closures of the images of these maps as y runs over the
set S is at most dimX. Since dimX = k < N — 2 < dimiJoo, we conclude
that the projection corresponding to a general point of -ffoo \ T has the desired
properties. •
The following result is classical (p. 7 Mumford, 1995). A proof follows, e.g., from
the construction given in § 15.5.4.
Given a reduced affine algebraic set X, the following classical lemma lets us
construct polynomials whose set of common zeros is the underlying set of X.
Lemma A.10.8 Let X be an affine subset of CN, all of whose irreducible com-
ponents are of dimension k < N. Given N + 1 generic projections i\i : CN —> Ck+1
with Qi the defining degX polynomial of TTJ(X) for i = 0 , . . . , iV; the set of common
zeros of the polynomials qo(iTQ(x)),..., qN(nN{x)) is X.
Proof. Choose a generic projection TTJV : C ^ —> C fc+1 and let qpj be the defining
degX polynomial of n^(X). Then gjv(7Tjv(aO) vanishes on an (N — l)-dimensional
set XJV containing X. Let S be a finite set consisting of one point from each
irreducible component of X^ \X. Choose a generic projection TT/V-I : C ^ —> Ck+1.
By Lemma A.10.6, TTJV-I(S) fl TTJV-I(X) = 0, and thus the set of common zeros
Xjv-i of qN(^N{x)),qN^i(nN^i(x)) minus X is of dimension at most N — 2. This
step can be repeated, in an induction, to give the conclusion of the lemma. •
In classical projective geometry there is a simple but basic duality between points
and hyperplanes. To make this precise, let P ^ denote the A^-dimensional projective
338 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Proof. (Kleiman, 1986) is a good reference for this result and related material. •
Note this result says that in the case when X is a curve in P 2 and not a line,
the rational map X —> X' gives an isomorphism from a Zariski open set of X to a
Zariski open set of X'. To see this note that the image of X is either a point or a
curve. If it is a point then X" = X is a line. So we have that if X is not a line
it has image a curve. The rational mapping X' —> X is a well-defined map on the
smooth points of X'. Prom this we conclude that X - t l ' could not be r to one
for an r > 1. We need a special consequence of this result.
Corollary A.11.2 Let C be a pure dimension-one, not necessarily irreducible,
algebraic subset o/P 2 . Assume that C has no irreducible components of degree one.
Then C" = C. Further let x be a general point of any one of the components C
of C with the tangent line £ to C at x. Then the defining equation of C given by
Theorem A. 10.7 restricted to £ has x as a zero of multiplicity two with all other
zeros of multiplicity one.
Proof. Since C" = C for an irreducible curve and the degrees of the components of
C are all of degree greater than one, we have from Theorem A.ll.l that the images
of the components of C are distinct irreducible curves. Choosing a general point x
Algebraic Geometry 339
Proof. Note that q : J- —> X is a fiber bundle with the fibers isomorphic to the
Grassmannian Gr(N — k, N). Thus the set q"1{Xi) is irreducible, and therefore the
Zariski dense open subset p~l(U) D q^1(Xi) C q~l(Xi) is also irreducible. Since
y is general, p~l{y) consists of smooth points of the irreducible and hence path-
connected manifold (p^1(f7) n q~1(Xi))reg. The monodromy action under a path
connecting two distinct points of p~1(y) gives the transitivity. •
0Sym(Fi),
Proof. First we may discard all degree one components since they do not effect the
veracity of the theorem. Next we reduce to the case when k = 1. Since B is general,
we know from part 3) of Theorem 13.2.1 that each X, is irreducible. The map -K
may now be regarded as the linear map from CN to C with Ls of the form LQ + sv
for a fixed vector v G CN with v £ LQ. By renaming if necessary, we may assume
that X is one-dimensional.
Next we take a general projection TT' : C^ —* C. The generic linear map II :=
(7r,7r') : CN —> C2 maps X generically one-to-one to its image by Theorem A.10.5.
Let TTi denote the projection of C2 onto its ith factor. There is a Zariski open dense
set V of C such that ?rf 1{V) nll(X) is smooth and m : Tr^l(V')r\Tl(X) -> V is a
d := degX sheeted covering map. Since n = TT\ O LT, we may regard V as an open
subset of V. Since every immersion g : S1 —> V' gives an immersion g : S1 —> V, it
suffices to prove the result for V'. This reduces us to the case of a curve in C2 with
V a family of lines parameterized by an open Zariski dense set of a line in the dual
P2 to the P 2 containing C2.
This case follows in two steps. First we prove the statement for the family U
of all affine lines in C2. This follows using Corollary A. 11.2 and a modification of
the proof of the classical statement when X is an irreducible curve, e.g., (page 111
Arbarello, Cornalba, Griffiths, & Harris, 1985).
The proof foiVcU follows from a theorem (Theorem, §5.2. Part II Goresky &
MacPherson, 1988) of Lefschetz type asserting that the homomorphism TT\(V, y) —>
7Ti(U,y) induced by the inclusion V C U is a surjection.
We refer the reader to (Sommese et al., 2002b) for a more detailed proof. •
Algebraic Geometry 341
We have mentioned earlier that homogeneous functions are not functions on projec-
tive space, though they are functions on a related Euclidean space. One difficulty
posed by this is that the usual statements for algebraic functions on affine alge-
braic sets are not literally true for homogeneous functions on projective space. If
homogeneous functions on projective space were the only issue, we could state the
results for polynomials with slight rewording for homogeneous functions. But, faced
with a number of very useful generalizations of homogeneous functions, e.g., biho-
mogeneous and more generally multihomogeneous functions, this is not a viable
approach. In this section we first introduce bihomogeneous and multihomogeneous
polynomials, and then define line bundles and their sections.
condition pijPjk — Pik guarantees the identifications are well defined. There is a
further natural, but involved, definition of when different covers and choices of pij
lead to the "same" line bundle. Sections, which are basically like graphs of functions,
are defined as a choice of algebraic functions at : Ui —> C with the property that
aj{x) = ai{x)pij{x) for x e Uld and all 0 < i < I, 0 < j < L
An algebraic line bundle L on a quasiprojective algebraic set X is spanned by a
vector space V of global sections of L if for each point x 6 X, there is at least one
section s & V such that s(x) ^ 0.
For example, letting [zo,zi] denote homogeneous coordinates on P 1 , we may
cover P 1 with Uo :— P 1 \ V(zx) and Ux := P 1 \ V(z0). We have the coordinate
z :— ZQ/ZI on UQ and w := 1/'z = Z\/ZQ on U\. We may form the line bundle Opi (d)
by taking the data consisting of the function poi = l/zd on Uo fl U\. We may
regard a homogeneous polynomial p(zo,z\) of degree d > 0 a s a section of Opi(d)
by assigning ao(z) := p(z, 1) to Uo and <T\(z) := p(l, 1/z) to C/'i: for convenience we
are working with the z coordinate. Note as required,
Note that Opi (0) is the trivial bundle, and for d > 0 the bundles 0pi (rf) are spanned.
There are no other sections of (Dpi (d) besides the ones just constructed using the
homogeneous polynomials.
On P ^ , the line bundles are not much more complicated than the ones just
constructed for P 1 . They are in one-to-one correspondence with the integers d,
with the line bundle corresponding to d being denoted Cpw(d). For d < 0 the only
algebraic section of OFw(rf) is the 0-section, i.e., the choice of a cover [7; of FN and
(Ti — 0 for all i. For d = 0 we have the trivial bundle, whose only sections are
the constant functions, and for d > 0 the algebraic sections are again in one-to-one
correspondence with the homogeneous polynomials of degree d.
It turns out that up to equivalence that the only algebraic line bundle on CN is
the trivial line bundle.
Any algebraic line bundle L on an irreducible projective algebraic set X gives rise
to a well-defined element C\{L) in the second integral cohomology group H2(X, Z) of
X. This element c\(L) is called the first Chern class of L. If L has a not identically
zero section s, then ci(L) is Poincare dual to the zero set Z of s.
Let us assume we have line bundles L\,... ,LN on an irreducible projective
algebraic set X of dimension N. If the line bundles are spanned by global sections,
then given general sections Si of Li for i = 1 , . . . , N, it follows that the system
siO) = 0
: (A.13.6)
sN(z) = 0
has exactly (c\{L\) • • -CI(LJV)) [X] isolated solutions and they are all nonsingular.
Algebraic Geometry 343
where li{x) = a^o + auxi + • • • + dixx^ for generic choices of all the a^. By the
theory of vector bundles it may be checked that this system has exactly N + 1
nonsingular isolated solutions.
Before we start the proof of this theorem we would like to show what it says
about down-to-earth polynomial systems.
Let X := P^ 1 x • • • x ¥Nr be a product of projective spaces. Consider "systems
of polynomials" a consisting of N := YH=i Ni equations where the ith equation has
the nonnegative multidegrees d^i,... ,dktT with respect to the multihomogeneous
structure. Letting TTJ be the product projection of X onto the jth factor P ^ , a is
a section of the bundle
Systems of polynomials that arise in engineering and science often depend on pa-
rameters. In this section, we take a general approach to polynomial systems with
parameters, and discuss what we can say about the dependence of solution sets on
the parameters. There are two questions we are interested in:
(1) what properties hold for general values of the parameters, e.g., a well-defined
number of isolated solutions; and
(2) given some property for a system with a special value of the parameter, e.g.,
having an isolated solution, what can we conclude for general values of the
parameters.
Algebraic Geometry 345
Since the proofs require material beyond the scope of this book, we refer to references
for essential points. Our approach is the same as (Morgan & Sommese, 1989),
though the focus there was mainly on isolated solutions of systems.
Let
/ i ( z i , . . .,xN;qi,.. .,qM)
f(x;q):= : (A.14.8)
_fn{xi, • • • ,XN',Ql, • • • ,QM) .
f(x;q), (A.14.9)
(3) the multiplicity of (x*;q*) as a solution of f(x;q*) = 0 equals the sum of the
multiplicities of the isolated solutions of f(x; q1) = 0 for q' € ir(U) and x €
Un(Xx{q'}).
For q\ — q§, the system has isolated solutions, but for q\ ^ q2., there are no solutions.
Theorem A.14.5 Assume that M + JV > n and that there is an isolated solution
(x*;q*) of f(x;q*) = 0 where f(x;q) is as in Equation A.14.9. There is a germ of
an irreducible complex analytic set Q containing q* with dim Q > M — {n — N) such
that for all points q' in arbitrarily small open sets U <Z Q containing q*, f(x; q') = 0
has isolated solutions near (x*;q*).
Proof. First randomize to get a square system. Using Theorem 13.5.1, it follows
that the point (x*; q*) is an isolated solution of the randomized system. By Theo-
rem A.14.1, (#*; q*) is a point on an irreducible analytic space XQ with -KX'O dominant
and a finite-to-one branched cover in the neighborhood of (x*;q*). We know that
appending any n — JV of the equations of f(x; q) to the randomized system, we
Algebraic Geometry 347
Lemma A.14.7 There is a Zariski open dense set U cY such that either TT"1 ([/)
is empty or n^-i^ : 7r^1(t/) —> U maps every irreducible component of X surjec-
tively onto Y.
Proof. To see this note that there are finitely many irreducible components Z of
X. The set TT(Z) is constructible by Theorem 12.5.6, and so either n(Z) is Y or a
proper algebraic subset of Y. Setting U equal to the complement of the union of the
proper algebraic sets arising in this way, we can assume TT(Z) is dense in U for every
component of Xu, the solution set of f(x;q) over U. We know, by Lemma 12.5.8,
that for such a Z there is a Zariski open dense set of Y contained in ir(Z). By
taking the intersection of these sets, we get a Zariski open dense set U with the
desired property. •
Lemma A.14.8 There is a Zariski open dense set U C Y such that given any
irreducible component Z of n~1(U), TT^-I^) : TT~1([7) —> U maps Z surjectively
onto Y with every fiber of nz having dimension exactly dim Z — M.
Proof The argument follows from Corollary A.4.7 combined with the same reason-
ing as Lemma A.14.7. •
Lemma A. 14.9 There is a Zariski open dense set U C Y such that given any
distinct irreducible components Z\ and Z2 ofn'1^), either Z\V\Zi — 0 or 7rff-i((/) :
TT~1(U) —> U maps every irreducible component W of Z\ D Z 2 surjectively onto Y
with every fiber W of irw having dimension exactly dim W — M
Many results such as this are immediate consequences of the generic flatness
theorem. The generic flatness theorem is a useful algebraic result of Grothendieck,
e.g., (pg. 57 Mumford, 1966), which Frisch (Frisch, 1967) showed holds for holomor-
phic maps between complex analytic spaces. We are not going to define flatness,
but it geometrically says "fibers change without discontinuity." Good places to
read about flatness (and some of the results that justify such a statement) are (pg.
146-161 Fischer, 1976) and (Chapter III. 10 Mumford, 1999).
The generic flatness theorem mentioned above says that there is a Zariski open
set U C Y such that either 7r~1(C/) is empty or T I V - I ^ ) : TT~1(C/) —> U is a flat
surjection. From here on we will assume that U is not empty, since the statements
we show are all trivially true in that case.
We would like there to be a Zariski open dense set U <zY such that:
(1) For any 2/1,1/2 € U, both ZVl and ZV2 have the same number of irreducible
components of a given dimension, degree, and multiplicity in their respective
fibers XVl, XV2; and
(2) their components so matched up vary continuously as y\ moves to 2/2•
The first assertion is true, and the second is true, but different paths from j/i to
1/2 may match up the decompositions different ways, i.e., there may be nontrivial
monodromy.
One way of approaching this is to take the irreducible decomposition of ZJJ :=
n-\U), i.e.,
dim Zy / \
Zv= U (\J Zv,itk . (A.14.11)
i=l \keJi /
Note we are using Lemma A. 14.8, which tells us that given any irreducible com-
ponent ZUthk of ZJJ, dimZ[/,j;fc = M + d\mZUylyk,y = M + i, where ZUtitk,y =
Zu,i,k n (X x {y}).
Algebraic Geometry 349
Proof. Assume that it is not true, for the U selected in Lemmas A. 14.7, A. 14.8,
and A.14.9, that Zu,i,k H Tr~1(y) is a union of the irreducible components 2y,i,j of
Zy. Then one of the components Zy^j of Zy must contain one of the components
of Zu^^k H ir~l{y). Moreover one of the components Zuytk' of Z\j must contain
Zyjj. Thus we get that Z[/,i',fc' H •Zf/.i.fc contains a component W with fiber under
7T of dimension i. But this means W is dense in Zu,i,k, which gives the absurdity
that Zutifk C Zuyk'-
By Theorem A.4.20, we may shrink U to a smaller dense Zariski open set U,
so that each Zu,%,k contains a smooth Zariski open set W such that for all y e U,
W Pi 7r~1(y) is dense in 7r^1(y); and IT : W —» U is of maximal rank with all fibers
having the same number of irreducible components. •
It is a natural question to ask whether the results in this section are true when
the parameters do not vary algebraically but only vary holomorphically. The short
answer is "yes, with certain minor modifications." Because it is useful to allow
complex analytic parameters, we explain what we mean by this and moreover state
the generalization of the above results with the changes needed to prove them. In
this one subsection, Zariski topology refers to the Zariski topology using zero sets
of sets of homomorphic functions.
The simplest case is a system
' fi(xi,...,xN;qi,...,qM)~
f(x;q):= : (A.14.12)
Jn(xi,- •• ,xN;qi,. ..,qM).
fi(x;q)= ] T aI(q)xI,
\i\<di
f(x;q), (A.14.13)
350 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
£ = OPN(di) ® • • •® O¥N(dn),
and f(x;q) = fi(x;q)(B- • -®fn(x;q). As before, the first-time reader should assume
that X := CN, Y := C M , and f(x; q) = 0 is as in Equation A.14.12.
As above we let n : V(f) —> Y denote the holomorphic mapping induced by the
product projection X x Y —» Y. We let n : V(f) —> Y denote the holomorphic
mapping induced by the product projection X x Y —> Y. If Z is an irreducible
component of V(/), then by Theorem A.4.3, W (Z) is a complex analytic subspace
of Y. Since n {Z\Z) is a proper complex analytic subspace of n (Z), we conclude
that U :— 7f (Z) \ W [Z \ Z) c n(Z) is a Zariski open dense subset of TT(Z). This
plays the role of Lemma 12.5.8.
We will continue to replace U by Zariski open dense subsets of U as needed,
and call them by the name U. This implies that each irreducible component of
7T~1(C/) maps surjectively on U. We state only the analogues of Theorem A.14.1
and Corollary A.14.2.
Theorem A.14.11 If n = N and if there is an isolated solution (x*;q*) of
f(x;q*) = 0, then (x*;q*) € XQ. Moreover there are arbitrarily small open sets
U C X xY that contain (x*; q*) and such that
There is much to be said for the motto "learn by doing," and in our case, this
means solving polynomial systems with numerical continuation. Even though this
book offers substantially all the information one would need to write a solver from
scratch, that is rather far beyond the level of commitment most readers will muster.
To provide an easy entry to the area, we provide a suite of m-file routines called
HOMLAB for performing polynomial continuation in the Matlab environment. Af-
ter gaining experience with HOMLAB, one may wish to download one of several
freely available software packages for polynomial continuation. These may offer
speed advantages and advanced options, such as polytope methods, not available
in HOMLAB. Some of these have been adapted to run on multi-processor machines
for large computations.
A partial listing of packages available as of the writing of this book is as follows.
• HOMLAB runs in the Matlab environment and implements general linear prod-
uct homotopy and parameter homotopy. See Appendix C.
• HOMPACK, H0MPACK90, POLSYS_PLP are a sequence of increasingly sophisticated
continuation algorithms, written in Fortran. The "PLP" in POLSYS_PLP stands
for Partitioned Linear Products, a special case of the general linear products
discussed in § 8.4.3. This code finds only isolated solutions for square systems
(same number of equations as variables).
• PHoM is a C++ code that implements polyhedral homotopies (see § 8.5). This
package finds isolated solutions for square systems.
• PHCpack implements a variety of homotopies in a menu-driven interface that
includes all the structures discussed in Chapter 8, except polynomial products.
In addition to isolated roots, the algorithms from Part III of this book for han-
dling positive dimensional solutions, nonsquare systems, etc., are implemented.
This package is written by our collaborator, J. Verschelde, and it has been the
experimental platform for validation of these algorithms. Both executables and
Ada source code are available.
• Algorithms for mixed volume computations can be found on T.Y. Li's webpage.
This is the most difficult phase of a polyhedral homotopy, (§ 8.5).
353
Appendix C
HOMLAB, a suite of scripts and functions for the Matlab environment, is designed as
an easy entry into the use of polynomial continuation and, for the experienced user,
as a platform for experimental development of new methods. Many of the exercises
of this book assume the availability of HOMLAB and special routines using HOM-
LAB functions are provided for some exercises. The use of a routine for a particular
exercise is described in the exercise statement itself, while the general structure and
use of HOMLAB is documented below. The best way to learn HOMLAB is simply to
work the exercises in the order they appear in this book. These progress from the
simple application of the core path-tracking routine to successively more sophisti-
cated homotopies that use it. Help describing the usage of individual routines, say,
endgamer. m, is available by typing "help endgamer" at the Matlab prompt. The
main text of this book is the reference for the methodologies used and the help
facility just mentioned is the reference for individual routines. However, to help the
user in getting started quickly, we provide this user's guide.
We assume the user has at least a minimal acquaintance with Matlab; in par-
ticular, the user must know how to write and execute simple scripts and functions.
A script is a sequence of Matlab commands recorded in a file, say "myscript.m,"
which are executed by typing » myscript at the Matlab prompt, here indicated
as " » . " (Scripts can also be called within other scripts or functions.) A function
is a file, say "myfunc.m," which starts with a declaration line something like
function [out1,out2]=myfunc(inl,in2,in3)
followed by lines of Matlab code that compute the two outputs, outl,out2 from
the three inputs inl,in2,in3. This function might be called as
[a,b]=myfunc(0.1,[1 3] ,x)
where x is an existing variable in the workspace. For more on using Matlab, please
see the Matlab documentation.
355
354 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
As URL's are often subject to change, we suggest that the packages be located by
use of a search engine.
356 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
C.I Preliminaries
C.I.5 Installation
As a suite of m-files, HOMLAB becomes functional by simply adding the folder
containing the routines to Matlab's search path. The folder for the current release,
HOMLABI.O, is HomLablO. Let's say that you have copied this folder onto your
machine with the full path name of c:\mypath\HomLablO, where "mypath" could
be any path in the file structure of your machine. There are three basic options for
adding HOMLAB to the Matlab path:
• In Matlab (v.6.5 and above), use "File -> Set Path" on the Matlab
menu bar to launch a dialog box for setting the path and use it to add
c: \mypath\HomLablO and its subfolders to the top of the search path. The
change becomes effective immediately in the current session, while the "Save"
button in the dialog box records it for future sessions.
• At the Matlab prompt, use the command
» addpath c:\mypath\HomLablO
HOMLAB will then be available for the current session only. Similarly, add the
subfolders of HomLablO to the path.
• Create a file called startup.m in a directory already on Matlab's search path
and put the appropriate addpath commands there. HOMLAB will then be
available for all future sessions.
Any one of these three options is sufficient. See the Matlab help facility to obtain
more detailed instructions on modifying the search path.
To test if the installation is successful, type » simpltst at the Matlab prompt.
If all is well, the session should look something like:
358 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
» simpltst
Number of s t a r t points = 2 elapsecLtime =
0
Path 1 elapsed_time =
1.4100e-001
Path 2 elapsed_time =
3.1300e-001
The solutions are:
1.0000e+000 -2.2204e-016i -1.0000e+000 -1.1102e-016i
1.0000e+000 -1.6653e-016i -1.0000e+000 -1.6653e-016i
1.0000e+000 1.0000e+000 +5.5511e-017i
»
The times will vary according to your machine and the tiny values of the imaginary
parts of the answers will typically change with each run. This test solves the simple
system
x2 - 1 = 0, xy-l = 0,
x2 - w2 = 0, xy — w2 = 0,
user's responsibility to verify that the homotopy is valid in the sense that the
linear combination of two functions from the family is still in the same family.
This is the least used option, but as shown in Exercise 7.6, it is sometimes
handy.
The usual process involves creating two m-files:
• a function defining the system to be solved
• a script that sets up the required data structures before calling endgamer to
get the solutions.
The exception is if one chooses to specify the function in "tableau" form (§ C.3.1),
in which case the function evaluation routine is already provided. Facilities are
available to make the whole process easy in the most common formulations, while
the more advanced user can directly access the basic routines to implement special-
ized homotopies. In the next few sections, we illustrate each of the main options by
examining example scripts and functions.
HOMLAB allows a target system to be denned in one of two ways: as a fully expanded
sum of terms or as a black box function. The fully expanded form is convenient for
simple, sparse polynomials, while user-written functions are more flexible and often
more efficient. Parameterized families of systems must always be written as a user-
defined function, but the underlying functions that HOMLAB uses for evaluating
fully expanded functions can be employed in a user-defined function as well.
x2 - x - 2 = 0, xy - 1 = 0,
The total degree is 2 • 2 = 4, and there are two finite solutions [x, y, w] = [2,0.5,1],
and [—1,-1,1] and a double root at infinity of [0,1,0].
More information on the solution script totdtab is given in § C.4. A related
script, lpdtab, can be used to solve tableau-style systems using multihomogeneous
or general linear-product homotopies. Using this capability, a two-path version of
the above would be as follows.
°/0 Define the target system in tableau form
eop = [ 0 - 1 0 ] ; '/, marker for end of polynomial
tableau = [
12 0
-110
-2 0 0
eop
111
-10 0
eop
];
% define a linear-product decomposition
xw=[l 0 1]; yw=[0 1 1] ;
LPDstruct=[
xw; xw;
xw; yw; ] ;
HomStruct= [] ; V, default t o 1-homogeneous
% decode tableau and solve with linear-product homotopy
lpdtab
'/, display the dehomogenized solutions
disp('The solutions are:'); disp(dehomog(xsoln,le-8))
362 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
See § C.4 for details on specifying linear-product structures and see the help for
lpdtab for details on how this script automatically homogenizes the tableau-style
polynomial using the information in HomStruct.
These scripts parse the tableau matrix into a more basic form and pass the results
to a built-in function, ftabcall, which in turn calls function ftabsys. The latter
can be used directly if one wishes to write a straight-line function (see next section)
while specifying some subset of the polynomials in tableau form. For details, see
use the help facility or look at the source code for ftabsys.
function [f,fx]=function_name(x)
where x is the input variable list, f is the output function value, and f x is the output
Jacobian matrix of partial derivatives df/dx. The function must be homogeneous,
possibly multihomogeneous. The careful reader might raise an objection that a
homogeneous polynomial on P n is not truly a function (see § 12.3), but for our
purposes we consider it as a function on C" +1 , which it certainly is. The script
which defines the linear-product homotopy appends random linear equations to
effect the projective transformation of § 3.7, one such equation for each projective
subspace when working multihomogeneously.
To repeat the example above of the system
x2 - x - 2 = 0, xy-1-0,
function [f,fz]=simplefcn(z)
% Straight-line function for
"/. x"2-x-2=0, xy-l=0
x=z(l); y=z(2); w=z(3);
f = [
x~2-x*w-2*w~2
x*y-w*2
];
fz = [
2*x-w, 0, -x-4*w
y, x, -2*w
];
This is not really useful for such a simple example, but it can be significant for more
complicated systems. Notice the use of the homogeneous coordinate w.
Similarly, parameterized functions must also be homogeneous in the unknowns,
but not necessarily in the parameters. The Matlab format for a parameterized
family of systems is simply
function [f,fx,fp]=function_name(x,p)
where the third output, f p, is the matrix of derivatives df /dp. Here is a complete
specification for the intersection of two circles, where a subfunction for a single
circle is used twice.
function [f,fx,fp]=twocircle(x,p)
% Straight-line function for intersection of two circles
f=zeros(2,l); fx=zeros(2,3); fp=zeros(2,6);
[f(l),fx(l,:),fp(l,l:3)]=onecircle(x,p(l:3));
[f(2),fx(2):),fp(2)4:6)]=onecircle(x,p(4:6));
%
function [f,fx,fp]=onecircle(z,p)
% straight-line function for one circle
°/, parameters are [cx;cy;r~2] where (cx,cy)=center, r=radius
x=z(l); y=z(2); w=z(3);
cx=p(l); cy=p(2); rsq=p(3);
a=x-cx*w;
b=y-cy*w;
f = 0.5*( a~2 + b~2 - rsq*w~2);
fx = [ a , b, -cx*a-cy*b-rsq*w];
fp = [ -w*a, -w*b, -0.5*w~2 ] ;
The most error-prone part of writing a straight-line program is in generating
the derivatives. To aid in debugging, utilities are provided to numerically check the
coding of the function. See § C.3.4.
364 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
C.3.3 Homogenization
It is highly recommended that all systems presented to HOMLAB be denned in ho-
mogeneous form. This is the user's responsibility, with the exception that tableau-
style systems will be homogenized automatically. Homogenization is recommended
because path endpoints approaching infinity are very common, and the projective
transformation available after homogenization keeps both the magnitudes of the
coordinates and the arclengths of the homotopy paths finite. If one wishes to com-
pute homotopy paths without homogenization, the path-tracker routines endgame
and tracker will still work, but they do not include any special stopping conditions
for diverging solutions, which may therefore take up inordinate computation time.
(Such paths can never make it to t — 0, so eventually they must fail on a too small
step size condition or on the limit on the number of steps.)
The choices of a linear product structure and a multihomogenization are inde-
pendent. For example, if a system is bilinear, one can reflect this in the linear-
product structure while one-homogenizing the system. The one-homogeneous start
system will have solutions at infinity, but HOMLAB will ignore these. If the system
is two-homogenized instead, respecting the bilinear structure, the linear-product
start system has no solutions at infinity. This is a bit cleaner mathematically, but
in practical terms, both formulations have the same number of solution paths to
follow.
To be clear, consider again the example
x2 ~ x - 2 = 0, xy - 1 = 0.
In a one-homogeneous treatment using coordinates [x, y, w] £ P 2 , we have the equa-
tions
x2 - xw - 2w2 = 0 , xy -w2 = 0,
and we must specify a compatible homogeneous structure:
HomStruct=[l 1 1];
which directs HOMLAB to append a inhomogeneous linear equation ax + by+cw = 1
for some random, complex {o, b, c}, thereby choosing a random patch on P 2 . (For
a discussion of projective spaces, see Chapter 3.) To get a two-path homotopy, we
specify the linear-product structure
LPDstruct=[l 0 1 ; 1 0 1 ; 1 0 1; O i l ] ;
that is, /i e (x,w) x (x,w) and / 2 G (x,w) (y,w). This start system has a
double root at infinity of [x,y,w] = [0,1,0], but HOMLAB will ignore it. The
two-homogeneous treatment of the same system using coordinates {[x, u], [y,v]} e
P 1 x P 1 is
x2 — xu — u2 = 0, xy — uv = 0.
HomLab User's Guide 365
The compatible HomStruct is, assuming the coordinates are ordered as (x,y,u,v),
HomStruct=[l 0 1 0 ; 0 1 0 1];
ax + Qy + bu + Ov = 1, Ox + cy + Ou + dv = 1,
for random, complex values of {a, b, c, d}. This picks a random patch on each of the
two P 1 subspaces. Now, the two-path linear-product decomposition is
Since Matlab is a numerical package, there has been no attempt to automate dif-
ferentiation and homogenization except for the simple case of fully expanded poly-
nomials via the ftableau function. Otherwise, this onerous task falls to the user.
Symbolic packages can be employed to preprocess functions in this way and then
copy the results into an m-file function.
The most error-prone step in defining a straight-line program for a function
is in giving formulae for the partial derivatives. A helpful way of checking these
is to compare the computed derivatives with a computation based on numerical
differentiation. The function must also be homogenized, which can also be checked
numerically. The following checking utilities are provided for these purposes.
366 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
function [fxerr]=chekffun(fname,nx,epsO)
—> checks derivatives of target functions
[f,fx]=myfunc(x)
function [fxerr,fperr]=chekpfun(fname,nx,np,epsO)
—> checks derivatives for parameterized functions
[f,fx,fp]=myfunc(x,p)
function [homerr]=chekhmog(fname,HomStruct,mdeg)
—> Checks multihomogenization of function [f]=myfunc(x)
User provides homogeneous structure and multidegree matrix
function [homerr]=chekhmog(fname,HomStruct,deg,lpd)
—> Checks multihomogenization of function [f]=myfunc(x)
User provides homogeneous structure, t o t a l degrees, and
linear product structure. Code computes multidegree matrix
from these.
In each case, the checking is done at a random point x £ Cnx. The functions provide
a numerical comparison and also use the Matlab spy function to graphically show
which elements are suspicious, having an error greater than espO. If epsO is omitted
from the call, it defaults to 10~6. Note that high-level scripts define a global FFUN,
which can be used for fname.
One of the two main options in HOMLAB is the linear-product homotopy, imple-
mented in the script lpdsolve. With appropriate settings, this script performs the
equivalent of a total-degree homotopy, a multihomogeneous homotopy, or a general
linear-product homotopy. For total-degree and multihomogeneous homotopies and
a tableau-style function definition, the higher-level scripts totdtab and mhomtab
automatically perform some preliminary processing steps for you before initiating
lpdsolve.
Let's first see all the set-up information required by lpdsolve by studying a
script to solve a simple system specified in straight-line form. Such a function
is treated as a "black box," so the user must supply all the structural informa-
tion necessary to specify the linear-product formulation. To this end, consider the
straight-line function called simplf en in § C.3 above, that implements the system
x2 - x - 2 = 0, xy - 1 = 0.
does not exist at all. It makes no difference in this case, but in cases where some
endpoints arrive at infinity only at the target (as t —• 0), multihomogenization can
change their representation, and sometimes this can make them numerically more
tame. For instance, a singular double root at infinity might break into two distinct
nonsingular roots at infinity.
To show how HomStruct is used to set up a homotopy in a cross product of pro-
jective spaces, let's rework the running example. First, we need a two-homogenized
version of the equations.
function [f,fz]=simplefcn2(z)
% Straight-line function for
'/. x~2-x-2=0, xy-l=0
% Two-homogeneous version on [x,u] \times [y,v]
x=z(l); y=z(2);
u=z(3); v=z(4);
f = [
x~2-x*u-2*u~2
x*y-u*v
];
fz = [
2*x-u, 0, -x-4*u, 0
y. x, -v, -u
];
];
lpdsolve
dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln,le-8))
In general, the groupings in the linear factors specified in LPDstruct do not have
to be copies of those in HomStruct, as indeed, they are different in the two-path,
HomLab User's Guide 369
path function must be passed from the top level script to the homotopy evaluation
function via global variables. Moreover, we write myfunc in homogeneous form, so
projective transformation equations must be appended. The script parsolve takes
care of all the formatting once the minimal set of information has been established.
Let's assume that pi and pO are in memory along with the start points, as matrix
startpoint, listed columnwise and satisfying f(x,pi) = 0. Then, a script for
solving the system f(x,p0) = 0 is as follows, assuming myfunc implements f(x,p).
Here, lin_path is a pre-defined function for a linear path. Clearly, HomStruct must
be set to agree with the homogenization that has been applied to the user-defined
myfunc.
the parameter space is non-Euclidean, a more general type of path is needed. The
user must provide the definition in the form
function [p,dpdt]=mypath(pl,pO,t)
where mypath. m is a user-written m-file function. Then, set PATHFUN=' mypath'; be-
fore calling parsolve to execute the homotopy. See the source code for lin_path.m
for an example to follow.
The workhorse routine is endgamer. m which tracks solution paths for a homotopy
h(x,t) = 0 from a list of startpoint solutions of h(x, 1) — 0 to their endpoints
satisfying h(x, 0) = 0. Specifically, endgamer has the usage
[xsoln,stats,xendgame]=endgamer(startpoint,hfun)
(1) tracks the path to the beginning of the endgame, t=t_endgame, a global control
variable;
(2) records the solution at t=t_endgame as a column in xendgame;
(3) executes the power-series endgame (§ 10.3.3), monitoring the convergence cri-
terion and stopping when either convergence is reached or when one of several
protective stopping conditions is satisfied;
(4) records the best solution estimate, as judged by the convergence criterion, as a
column in xsoln and certain statistics concerning the solution are recorded as
a column in stats.
The details of all the required control settings are given next, followed by a
detailed description of the outputs.
• stepmin = le-6; The minimum step size in t, below which a path is de-
clared as having failed.
• maxit = 3; The maximum number of Newton iterations allowed in the
corrector. If the designated convergence criterion is not met within this
number of iterations, the step is a failure and the step size will be halved.
• maxnf e = 1500; The maximum number of function evaluations allowed per
path. This limits the amount of computing time that a diverging path may
consume. For well-scaled, homogenized homotopies, this criterion should
rarely come into play.
• epstiny = le-12; In rare instances, a path may fail due to vanishing of
the tangent vector (dh/dx)~1dh/dt. This is detected using the tolerance
epstiny. Typically, when this occurs, it is a signal that the homotopy is not
properly formed, possibly an error in a user-written function for evaluation
of the derivatives.
End Game (used by endgamer) This routine calls tracker to get to the start of
the endgame, then runs the power-series endgame.
• stepstart = 0.1; The initial step size for t in the tracker.
• epsbig = le-4; The tracking accuracy to be maintained in the initial
phase of tracking. This is the convergence tolerance for the corrector. If a
path does not successfully reach the endgame, it is tried once more from
the beginning with a tighter tolerance of epsbig/100.
• epssmall = le-6; The tracking accuracy to be maintained in the
endgame.
• t_endgame = 0.1; The value of t where the endgame starts.
• tstop = le-10; The value of t where the endgame gives up.
• t r a t i o = 0.3; During the endgame, samples are taken for t in a geometric
series where £& = tratio*£/c_i. The value 0.3 is a compromise between the
need to spread the samples out for a well-conditioned fit (tratio smaller)
and the need to stay away from t = 0, where the path may be singular.
• eps_end = le-10; The criterion for deciding when the endpoint estimate
has converged. When two successive estimates agree to this tolerance, suc-
cess is declared.
• CycleMax = 4; This is the maximum winding number tested by the power-
series endgame. In double precision, the endgame is rarely successful above
winding number c = 4.
• maxerrup = 10; The endgame keeps a record of the smallest change in the
endpoint estimate in successive iterations. (This is compared to eps_end
for declaring success.) Usually, this measure improves with each succes-
sive iteration, unless the path gets too close to t = 0 before converging.
However, in the early stages of the endgame, the convergence measure can
sometimes increase briefly before entering the endgame operating zone. If
there are more than maxerrup successive iterations without improving on
HomLab ,User's Guide 375
x=dehomog(xsoln,epsO) ;
where epsO is the magnitude of the homogenizing coordinate below which a solu-
HomLab User's Guide 377
tion is declared to be at infinity. This form assumes that the solutions are one-
homogenized and that the homogenizing coordinate is the last entry. Any solution
determined to be at infinity is rescaled by its largest element, while a finite one is
rescaled by the homogenizing coordinate. This is usually what one wants, but the
result can be a bit surprising if epsO is made too small so that a poorly computed
solution at infinity gets erroneously rescaled as if it were finite.
A more elaborate form must be used for multihomogenized solutions:
x=dehomog(xsoln,espO,HomStruct,homvar);
where HomStruct identifies the membership in the various homogeneous groupings,
and homvar is a list of the row number for each homogenizing variable. If homvar is
missing, the last variable of each group is assumed by default to be the homogenizing
variable for that group.
In either the short form or the long form, dehomog sets one variable of each
homogeneous group to one.
Bibliography
379
380 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
bonn, 1978), Vol. 730 of Lecture Notes in Math. (pp. 77-88). Berlin: Springer.
Chu, M. T., Li, T.-Y., & Sauer, T. (1988). Homotopy method for general A-matrix
problems. SI AM J. Matrix Anal. AppL, 9(4), 528-536.
Cox, D., Little, J., & O'Shea, D. (1997). Ideals, varieties, and algorithms. Under-
graduate Texts in Mathematics. New York: Springer-Verlag, second edition. An
introduction to computational algebraic geometry and commutative algebra.
Cox, D., Little, J., & O'Shea, D. (1998). Using algebraic geometry, Vol. 185 of
Graduate Texts in Mathematics. New York: Springer-Verlag.
D'Andrea, C , & Emiris, I. Z. (2003). Sparse resultant perturbations. In Algebra,
geometry, and software systems (pp. 93-107). Berlin: Springer.
Datta, R. S. (2003). Using computer algebra to find Nash equilibria. Proceedings of
the 2003 International Symposium on Symbolic and Algebraic Computation (pp.
74-79). New York: ACM.
Davidenko, D. F. (1953a). On a new method of numerical solution of systems of
nonlinear equations. Doklady Akad. Nauk SSSR (N.S.), 88, 601-602.
Davidenko, D. F. (1953b). On approximate solution of systems of nonlinear equa-
tions. Ukrain. Mat. Zurnal, 5, 196-206.
Davis, P. J. (1975). Interpolation and approximation. New York: Dover Publications
Inc. Republication, with minor corrections, of the 1963 original, with a new
preface and bibliography.
Decker, W., Greuel, G.-M., & Pfister, G. (1999). Primary decomposition: algo-
rithms and comparisons. In Algorithmic algebra and number theory (Heidelberg,
1997) (pp. 187-220). Berlin: Springer.
Decker, W., & Schreyer, F.-O. (2001). Computational algebraic geometry today.
In Applications of algebraic geometry to coding theory, physics and computation
(Eilat, 2001), Vol. 36 of NATO Sci. Ser. II Math. Phys. Chem. (pp. 65-119).
Dordrecht: Kluwer Acad. Publ.
Decker, W., & Schreyer, F.-O. (2005). Solving polynomial equations: Foundations,
algorithms, and applications, to appear.
Denavit, J., & Hartenberg, R. S. (1955). A kinematic notation for lower pair mech-
anisms based on matrices. J. Appl. Mechanics, 22, 215-221. Trans. ASME, vol.
77.
Dhingra, A., Kohli, D., & Xu, Y. X. (1992). Direct kinematic of general Stewart
platforms. DE-Vol. 45, Robotics, Spatial Mechanisms, and Mechanical Systems
(pp. 107-112). ASME.
Dian, J., & Kearfott, R. B. (2003). Existence verification for singular and nonsmooth
zeros of real nonlinear systems. Math. Comp., 72(242), 757-766.
Dickenstein, A., & Emiris, I. Z. (Eds.), (preprint). Solving polynomial equa-
tions: Foundations, algorithms, and applications. Berlin Heidelberg New York:
Springer-Verlag.
Dietmaier, P. (1998). The Stewart-Gough platform of general geometry can have 40
real postures. In J. Lenarcic, & M. L. Husty (Eds.), Advances in robot kinematics:
382 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Press.
Gunning, R. C. (1990). Introduction to holomorphic functions of several variables.
Vol. II. The Wadsworth & Brooks/Cole Mathematics Series. Monterey, CA:
Wadsworth k Brooks/Cole Advanced Books & Software. Local theory.
Gunning, R. C, k Rossi, H. (1965). Analytic functions of several complex variables.
Englewood Cliffs, N.J.: Prentice-Hall Inc.
Hamming, R. W. (1986). Numerical methods for scientists and engineers. New
York: Dover Publications Inc., second edition.
Harris, J. (1995). Algebraic geometry, Vol. 133 of Graduate Texts in Mathematics.
New York: Springer-Verlag. A first course, Corrected reprint of the 1992 original.
Hartenberg, R. S., & Denavit, J. (1964). Kinematic synthesis of linkages. McGraw-
Hill, N.Y.
Hartshorne, R. (1977). Algebraic geometry. New York: Springer-Verlag. Graduate
Texts in Mathematics, No. 52.
Hille, E. (1959). Analytic function theory. Vol. 1. Introduction to Higher Mathe-
matics. Ginn and Company, Boston.
Hille, E. (1962). Analytic function theory. Vol. II. Introductions to Higher Mathe-
matics. Ginn and Co., Boston, Mass.-New York-Toronto, Ont.
Hodge, W. V. D., k Pedoe, D. (1994a). Methods of algebraic geometry. Vol. I. Cam-
bridge Mathematical Library. Cambridge: Cambridge University Press. Book I:
Algebraic preliminaries, Book II: Projective space, Reprint of the 1947 original.
Hodge, W. V. D., k Pedoe, D. (1994b). Methods of algebraic geometry. Vol. II. Cam-
bridge Mathematical Library. Cambridge: Cambridge University Press. Book III:
General theory of algebraic varieties in projective space, Book IV: Quadrics and
Grassmann varieties, Reprint of the 1952 original.
Hodge, W. V. D., k Pedoe, D. (1994c). Methods of algebraic geometry. Vol.
III. Cambridge Mathematical Library. Cambridge: Cambridge University Press.
Book V: Birational geometry, Reprint of the 1954 original.
Ho§ten, S., k Shapiro, J. (2000). Primary decomposition of lattice basis ideals. J.
Symbolic Comput., 29(4-5), 625-639. Symbolic computation in algebra, analysis,
and geometry (Berkeley, CA, 1998).
Huang, Y., Wu, W., Stetter, H. J., k Zhi, L. (2000). Pseudofactors of multivariate
polynomials. Proceedings of the 2000 International Symposium on Symbolic and
Algebraic Computation (St. Andrews) (pp. 161-168). New York: ACM.
Huber, B., Sottile, F., k Sturmfels, B. (1998). Numerical Schubert calculus. J.
Symbolic Comput., 26(6), 767-788. Symbolic numeric algebra for polynomials.
Huber, B., k Sturmfels, B. (1995). A polyhedral method for solving sparse polyno-
mial systems. Math. Comp., 64(212), 1541-1555.
Huber, B., k Sturmfels, B. (1997). Bernstein's theorem in affine space. Discrete
Comput. Geom., 17(2), 137-141.
Huber, B., k Verschelde, J. (1998). Polyhedral end games for polynomial continu-
ation. Numer. Algorithms, 18(1), 91-108.
Bibliography 385
Huber, B., & Verschelde, J. (2000). Pieri homotopies for problems in enumerative
geometry applied to pole placement in linear systems control. SIAM J. Control
Optim., 38(4), 1265-1287.
Husty, M. L. (1996). An algorithm for solving the direct kinematics of general
Stewart-Gough platforms. Mechanism Machine Theory, 31(4), 365-380.
Husty, M. L., & Karger, A. (2000). Self-motions of Griffis-Duffy type parallel
manipulators. Proceedings of the 2000 IEEE Int. Conf. Robotics and Automation,
CDROM, San Francisco, CA, April 24-28, 2000. IEEE.
Iitaka, S. (1982). Algebraic geometry, Vol. 76 of Graduate Texts in Mathematics.
New York: Springer-Verlag. An introduction to birational geometry of algebraic
varieties, North-Holland Mathematical Library, 24.
Innocenti, C. (1995). Polynomial solution to the position analysis of the 7-link Assur
kinematic chain with one quaternary link. Mechanism Machine Theory, 30(8),
1295-1303.
Isaacson, E., & Keller, H. B. (1994). Analysis of numerical methods. New York:
Dover Publications Inc. Corrected reprint of the 1966 original [Wiley, New York].
Kearfott, R. B. (1996). Rigorous global search: continuous problems, Vol. 13 of Non-
convex Optimization and its Applications. Dordrecht: Kluwer Academic Publish-
ers.
Kearfott, R. B. (1997). Empirical evaluation of innovations in interval branch and
bound algorithms for nonlinear systems. SIAM J. Sci. Comp., 18(2), 574-594.
Kearfott, R. B., & Novoa, M. (1990). Algorithm 681: INTBIS, a portable interval
Newton/bisection package. ACM Trans. Math. Softw., 16(2), 152-157.
Kearfott, R. B., & Xing, Z. (1994). An interval step control for continuation meth-
ods. SIAM J. Numer. Anal., 31(3), 892-914.
Keller, H. B. (1981). Geometrically isolated nonisolated solutions and their approx-
imation. SIAM J. Numer. Anal., 18(5), 822-838.
Kendig, K. (1977). Elementary algebraic geometry. New York: Springer-Verlag.
Graduate Texts in Mathematics, No. 44.
Khovanski, A. G. (1978). Newton polyhedra, and the genus of complete intersec-
tions. Funktsional. Anal, i Prilozhen., 12(1), 51-61.
Kleiman, S. L. (1986). Tangency and duality. Proceedings of the 1984 Vancou-
ver conference in algebraic geometry, Vol. 6 of CMS Conf. Proc. (pp. 163-225).
Providence, RI: Amer. Math. Soc.
Knuth, D. E. (1981). The art of computer programming. Vol. 2. Addison-Wesley
Publishing Co., Reading, Mass., second edition. Seminumerical algorithms,
Addison-Wesley Series in Computer Science and Information Processing.
Krick, T. (2004). Straight-line programs in polynomial equation solving. In F.
Cucker, R. DeVore, P. Olver, & E. Siili (Eds.), Foundations of computational
mathematics, Minneapolis 2002. Cambridge University Press.
Kuo, Y.-C, Li, T.-Y., & Wu, D. (2004). Determining whether a numerical solution
of a polynomial system is isolated, preprint.
386 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
26(5), 1241-1251.
Li, T. Y., Wang, T., & Wang, X. (1996). Random product homotopy with minimal
BKK bound. In The mathematics of numerical analysis (Park City, UT, 1995),
Vol. 32 of Lectures in Appl. Math. (pp. 503-512). Providence, RI: Amer. Math.
Soc.
Li, T.-Y., & Wang, X. (1991). Solving deficient polynomial systems with homotopies
which keep the subschemes at infinity invariant. Math. Comp., 56(194), 693-710.
Li, T.-Y., & Wang, X. (1992). Nonlinear homotopies for solving deficient polynomial
systems with parameters. SIAM J. Numer. Anal, 29(4), 1104-1118.
Li, T.-Y., & Wang, X. (1996). The BKK root count in C". Math. Comp., 65(216),
1477-1484.
Li, T.-Y., & Zheng, Z. (2004). A rank-revealing method and its applications.
preprint.
Lipman, J. (1975). Introduction to resolution of singularities. In Algebraic geometry
(Proc. Sympos. Pure Math., Vol. 29, Humboldt State Univ., Arcata, Calif., 1974)
(pp. 187-230). Providence, R.I.: Amer. Math. Soc.
Lo Cascio, M. L., Pasquini, L., & Trigiante, D. (1989). Simultaneous determina-
tion of polynomial roots and multiplicities: an algorithm and related problems.
Ricerche Mat, 38(2), 283-305.
Losch, S. (1995). Parallel redundant manipulators based on open and closed normal
Assur chains. In J.-P. Merlet, & B. Ravani (Eds.), Computational kinematics '95,
Proceedings of the Second Workshop held in Sophia Antipolis, September 4-6,
1995, Vol. 40 of Solid Mechanics and its Applications (pp. x+310). Dordrecht:
Kluwer Academic Publishers Group.
Lu, Y., Sommese, A. J., & Wampler, C. W. (2005). Finding all real solutions of
polynomial systems: I the curve case, in preparation.
Macaulay, F. (1902). On some formulas in elimination. Proc. London Math. Soc,
3, 3-27.
Manocha, D. (1993). Efficient algorithms for multipolynomial resultant. The Com-
puter Journal, 36, 485-496.
Manocha, D. (1994). Solving systems of polynomial equations. IEEE Comput.
Graph. Appl, 36, 46-55.
Manocha, D., & Canny, J. F. (1994). Efficient inverse kinematics for general 6R
manipulators. IEEE Trans. Rob. Auto., 10(5), 648-657.
Manseur, R., & Doty, K. (1989). A robot manipulator with 16 real inverse kinematic
solution set. Int. J. Robotics Res., 8(5), 75-79.
Marden, M. (1966). Geometry of polynomials. Second edition. Mathematical Sur-
veys, No. 3. Providence, R.I.: American Mathematical Society.
Mavroidis, C, & Roth, B. (1995a). Analysis of overconstrained mechanisms. ASME
J. Mech. Design, 117, 69-74.
Mavroidis, C, & Roth, B. (1995b). New and revised overconstrained mechanisms.
ASME J. Mech. Design, 117, 75-82.
388 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Mayer St-Onge, B., &: Gosselin, C. M. (2000). Singularity analysis and represen-
tation of the general Gough-Stewart platform. Int. J. Robotics Research, 19,
Li 1— ZOO.
Meintjes, K., & Morgan, A. P. (1987). A methodology for solving chemical equilib-
rium systems. Appl. Math. Com/put., 22, 333-361.
Merlet, J.-P. (1989). Singular configurations of parallel manipulators and Grass-
mann geometry. Int. J. Robotics Research, 8, 45—56.
Merlet, J.-P. (2000). Parallel robots. Kluwer Academic Publishers, Dordrecht, The
Netherlands.
Merlet, J.-P. (2001). A parser for the interval evaluation of analytical functions and
its applications to engineering problems. J. Symbolic Computation, 31, 475-486.
Mignotte, M., & Stefanescu, D. (1999). Polynomials. Springer Series in Discrete
Mathematics and Theoretical Computer Science. Springer-Verlag, Singapore. An
algorithmic approach.
Milnor, J. W. (1965). Topology from the differentiate viewpoint. Based on notes
by David W. Weaver. The University Press of Virginia, Charlottesville, Va.
Moller, H. M. (1998). Grobner bases and numerical analysis. In Grobner bases and
applications (Linz, 1998), Vol. 251 of London Math. Soc. Lecture Note Ser. (pp.
159-178). Cambridge: Cambridge Univ. Press.
Moller, H. M., & Stetter, H. J. (1995). Multivariate polynomial equations with
multiple zeros solved by matrix eigenproblems. Num. Math., 70, 311-329.
Moore, R. E. (1979). Methods and applications of interval analysis, Vol. 2 of SIAM
Studies in Applied Mathematics. Philadelphia, Pa.: Society for Industrial and
Applied Mathematics (SIAM).
Morgan, A. P. (1983). A method for computing all solutions to systems of polyno-
mial equations. ACM Trans. Math. Software, 9(1), 1-17.
Morgan, A. P. (1986a). A homotopy for solving polynomial systems. Appl. Math.
Comput., 18(1), 87-92.
Morgan, A. P. (1986b). A transformation to avoid solutions at infinity for polyno-
mial systems. Appl. Math. Comput., 18(1), 77-86.
Morgan, A. P. (1987). Solving polynomial systems using continuation for engineering
and scientific problems. Prentice-Hall, Englewood Cliffs, N.J.
Morgan, A. P., & Sommese, A. J. (1987a). A homotopy for solving general poly-
nomial systems that respects m-homogeneous structures. Appl. Math. Comput.,
101-113.
Morgan, A. P., & Sommese, A. J. (1987b). Computing all solutions to polynomial
systems using homotopy continuation. Appl. Math. Comput., 115-138. Errata:
Appl. Math. Comput. 51 (1992), p. 209.
Morgan, A. P., & Sommese, A. J. (1989). Coefficient-parameter polynomial con-
tinuation. Appl. Math. Comput, 29(2), 123-160. Errata: Appl. Math. Comput.
51:207(1992).
Morgan, A. P., & Sommese, A. J. (1990). Generically nonsingular polynomial
Bibliography 389
Mumford, D. (1999). The red book of varieties and schemes, Vol. 1358 of Lecture
Notes in Mathematics. Berlin: Springer-Verlag, expanded edition. Includes the
Michigan lectures (1974) on curves and their Jacobians, With contributions by
E. Arbarello.
Nanua, P., Waldron, K. J., & Murthy, V. (1991). Direct kinematic solution of a
Stewart platform. IEEE Trans, on Robotics and Automation, 6(4), 438-444.
Neumaier, A. (1990). Interval methods for systems of equations, Vol. 37 of Ency-
clopedia of Mathematics and its Applications. Cambridge: Cambridge University
Press.
Nielsen, J., & Roth, B. (1999). Solving the input/output problem for planar mech-
anisms. ASME J. Mech. Design, 121(2), 206-211.
Ojika, T. (1987). Modified deflation algorithm for the solution of singular problems.
I. A system of nonlinear algebraic equations. J. Math. Anal. Appl., 123, 199-221.
Ojika, T., Watanabe, S., & Mitsui, T. (1983). Deflation algorithm for the multiple
roots of a system of nonlinear equations. J. Math. Anal. Appl., 96, 463-479.
Pan, V. Y. (1997). Solving a polynomial equation: some history and recent progress.
SI AM Rev., 39(2), 187-220.
Pasquini, L., & Trigiante, D. (1985). A globally convergent method for simultane-
ously finding polynomial roots. Math. Comp., ^^(169), 135-149.
Pernkopf, F., & Husty, M. L. (2002). Singularity analysis of spatial stewart-gough
platforms with planar base and platform. Proc. ASME Design Eng. Tech. Conf,
Montreal, Canada, Sept. 30~Oct. 2, 2002.
Pieper, D. L. (1968). The kinematics of manipulators under computer control. PhD
thesis, Computer Science Dept., Stanford University.
Primrose, E. J. F. (1986). On the input-output equation of the general 7R-
mechanism. Mechanism Machine Theory, 21(6), 509-510.
Raghavan, M. (1991). The Stewart platform of general geometry has 40 config-
urations. Proc. ASME Design and Automation Conf, vol. 32-2 (pp. 397-402).
ASME.
Raghavan, M. (1993). The Stewart platform of general geometry has 40 configura-
tions. ASME J. Mech. Design, 115, 277-282.
Raghavan, M., & Roth, B. (1993). Inverse kinematics of the general 6R manipulator
and related linkages. ASME J. Mech. Design, 115, 502-508.
Raghavan, M., & Roth, B. (1995). Solving polynomial systems for the kinematic
analysis and synthesis of mechanisms and robot manipulators. ASME J. Mech.
Design, 117, 71-79.
Roberts, S. (1875). On three-bar motion in plane space. Proc. London Math. Soc,
VII, 14-23.
Rojas, J. M. (1994). A convex geometric approach to counting the roots of a
polynomial system. Theoret. Comput. Sci., 133(1), 105-140. Selected papers of
the Workshop on Continuous Algorithms and Complexity (Barcelona, 1993).
Rojas, J. M. (1999). Toric intersection theory for affine root counting. J. Pure Appl.
Bibliography 391
Sommese, A. J., Verschelde, J., & Wampler, C. W. (2002a). A method for tracking
singular paths with application to the numerical irreducible decomposition. In
Algebraic geometry (pp. 329-345). Berlin: de Gruyter.
Sommese, A. J., Verschelde, J., & Wampler, C W. (2002b). Symmetric functions
applied to decomposing solution sets of polynomial systems. SIAM J. Numer.
Anal, 40(6), 2026-2046.
Sommese, A. J., Verschelde, J., & Wampler, C. W. (2003). Numerical irreducible
decomposition using PHCpack. In Algebra, geometry, and software systems (pp.
109-129). Berlin: Springer.
Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004a). Advances in polynomial
continuation for solving problems in kinematics. ASME J. Mech. Design, 126(2),
262-268.
Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004b). Homotopies for in-
tersecting solution components of polynomial systems. SIAM J. Numer. Anal.,
42(4), 1552-1571.
Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004c). An intrinsic homotopy
for intersecting algebraic varieties. J. Complexity, to appear.
Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004d). Numerical factorization
of multivariate complex polynomials. Theoretical Computer Science, 315, 651—
669.
Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004e). Solving polynomial
systems equation by equation, in preparation.
Sommese, A. J., & Wampler, C. W. (1996). Numerical algebraic geometry. In The
mathematics of numerical analysis (Park City, UT, 1995), Vol. 32 of Lectures in
Appi Math. (pp. 749-763). Providence, RI: Amer. Math. Soc.
Sosonkina, M., Watson, L. T., & Stewart, D. E. (1996). Note on the end game in
homotopy zero curve tracking. ACM Trans. Math. Software, 22(3), 281-287.
Sreenivasan, S. V., & Nanua, P. (1992). Solution of the direct position kinematics
problem of the general Stewart platform using advanced polynomial continuation.
DE-Vol. 45, Robotics, Spatial Mechanisms, and Mechanical Systems (pp. 99-106).
ASME.
Sreenivasan, S. V., Waldron, K. J., & Nanua, P. (1994). Closed-form direct dis-
placement analysis of a 6-6 Stewart platform. Mechanism Machine Theory, 29(6),
855-864.
Stetter, H. J. (2004). Numerical polynomial algebra. Philadelphia, PA: Society for
Industrial and Applied Mathematics (SIAM).
Stoer, J., & Bulirsch, R. (2002). Introduction to numerical analysis, Vol. 12 of Texts
in Applied Mathematics. New York: Springer-Verlag, third edition. Translated
from the German by R. Bartels, W. Gautschi and C. Witzgall.
Sturmfels, B. (2002). Solving systems of polynomial equations, Vol. 97 of CBMS
Regional Conference Series in Mathematics. Published for the Conference Board
of the Mathematical Sciences, Washington, DC.
Bibliography 393
397
398 Numerical Solution of Systems of Polynomials Arising in Engineering and Science
underdetermined, 241
universal field, 52
universal function, 323
universal system, 323
upper semicontinuity of dimension, 312
variety, 8
vector bundle, 341, 343
Veronese embedding, 334
Wilkinson polynomials, 11
winding number, 180, 182, 183
witness point superset, 244, 245, 255
witness set, see witness point set, 8, 229,
235
witness superset, 253, 256