Sunteți pe pagina 1din 113

Identification, Assessment and

Correction of Ill-Conditioning and


Numerical Instability in Linear and
Integer Programs
Ed Klotz (klotz@us.ibm.com), Math Programming
Specialist, IBM

2014 IBM Corporation

IBM Software Group

Objective
Enable more precise assessment of ill conditioning in
linear and integer programs
Multiple metrics available to assess ill conditioning

Discuss some techniques to treat the symptoms and


causes of ill conditioning in LP and MIP models

2014 IBM Corporation

IBM Software Group

Outline
Finite precision computing fundamentals
Description of ill conditioning
Assessment of ill conditioning
Alternate metrics for ill conditioning
Numerical stability of algorithms
Identification and treatment of symptoms of ill conditioning
Identification and treatment of sources of ill conditioning
Anomalies, misconceptions, inconsistencies and contradictions
Examples that illustrate modeling pitfalls that can contribute to ill
conditioning

Formulation alternatives

Conclusions

2014 IBM Corporation

IBM Software Group

Finite Precision Computing Fundamentals


64 bit double precision is most commonly used in scientific
applications

32 bit single precision requires less memory, but is less accurate

Memory savings not significant for LP and MIP solvers anymore

128 bit double precision is more accurate, but requires more memory
and computing time

Many floating point numbers cannot be represented exactly


Base of floating point representation determines those that can
Base 2 typically used

Numbers that are integer linear combinations of (positive and negative)


powers of 2 can be represented exactly, within the limits of the minimum
and maximum possible exponents

2014 IBM Corporation

IBM Software Group

Example: 64 Bit IEEE Double Representation


16 digits (53 bits,
one bit implicit)

4 digits (11bits)

1/3

sign

exponent

Absolute round-off
error =( 10-16/ 3 )

3 3

3
Absolute round-off

mantissa

error =( 10-12/ 3 )

10000/3

3 3

3
Absolute round-off
error =( 10-20/ 3 )

1/30000

3 3

2014 IBM Corporation

IBM Software Group

IEEE 64 Bit Double Addition


16 digits (53 bits,
one bit implicit)

4 digits (11bits)

1/3
+
1/30000

sign

exponent

Abs round-off
error = 10-16/ 3

Abs round-off

mantissa

-17
error = 10-16
/3

Shifted exponent

Abs round-off
-21
error = 10-20
/3

1/30000

2014 IBM Corporation

IBM Software Group

IEEE 64 Bit Double Addition


16 digits (53 bits,
one bit implicit)

4 digits (11bits)

1/3
+

Abs round-off
error = 10-12/ 3

...

Abs round-off

sign

exponent

mantissa

error = 10-12/ 3

10000/3

1/3

...

2014 IBM Corporation

IBM Software Group

IEEE 64 Bit Division


Subtract exponents, divide mantissas
Errors in representation in mantissa determine magnitude of roundoff error

Dont divide big numbers by small numbers in data calculations


For a > > b, compare a/b and b/a (a = 3, b = 1/30000, = 10-8 )
b/a ~ (b + )/a = b/a + /a

(error ~ 10-8 )

a/b ~ a/(b + ) = a/b - a / ( (b + )b)

(error ~ 100 )

(a = 3, b = 1/30000, = 10-16 )

b/a ~ (b + )/a = b/a + /a

(error ~ 10-16 )

a/b ~ a/(b + ) = a/b - a / ( (b + )b)

(error ~ 10-8 )
2014 IBM Corporation

IBM Software Group

Implications of Finite Precision Representation


Simply representing the model data can introduce roundoff errors
Larger numbers have larger absolute round-off errors in their
representations

Arithmetic calculations can introduce additional round-off


errors

Arithmetic calculations on numbers of the same order of


magnitude are more accurate than calculations on
numbers of different orders of magnitude

2014 IBM Corporation

IBM Software Group

Description
Ill Conditioning
Does the flap of a butterflys wings in Brazil set off a tornado in
Texas?
Data to 3
decimal places

Meteorological
Model

(.506)

Data to 6
decimal places

Meteorological
Model

(.506127)

10

2014 IBM Corporation

IBM Software Group

Problem definition
Ill Conditioning
Small change in input leads to big change in output

Given x R n , y R m , y = f ( x )
For y + y = f ( x + x ), compute bound

: y x
Can we quantitatively measure ill conditioning?
For many mathematical systems or models, quantitative measures have
yet to be discovered. But, sometimes we can measure it.
Specifically, we can measure ill conditioning when solving square linear
systems of equations

11

2014 IBM Corporation

IBM Software Group

Condition Number of a Square Matrix (Turing, 1948;


Rice, 1966)
CPLEX solves square linear systems of form:
exact solution is:
How will a change to the input vector b affect
the computed solution x?

Cauchy-Schwarz inequality:
Cauchy-Schwarz for original system:
Combine and rearrange:

12

Bx = b 1
x= B b
1

x + x = B (b + b)
1
x = B b
x B 1 b
b B x

x
b
1
B B
x
b
2014 IBM Corporation

IBM Software Group

Condition Number of a Square Matrix (ctd.)


CPLEX solves square linear systems of form:
exact solution is:

Bx = b
x = B 1b

How will a change to the input matrix B affect


the computed solution x?

B x = B ( x + x )
Cauchy-Schwarz inequality:
Rearrange:
Multiply by

13

B
B

( B + B )( x + x ) = b
Bx + Bx + B x + B x = b
x = B 1 B ( x + x )

x B 1 b x + x
x
B 1 B
x+ x
x
B
1
B B
x+ x
B
2014 IBM Corporation

IBM Software Group

Condition Number
Condition number of B is defined as

x ( B) b

x
( B) B
x+ x

( B) = B B 1

As condition number increases, potential change in solution relative


to (normwise) change in data also increases

Even if the modeler doesnt change the data, finite precision


computers can introduce small changes

Machine precision for 64 bit double = 1e-16


Just moving from a Windows machine to an AIX machine can change
precision enough to significantly influence results on an ill conditioned
linear system
14

2014 IBM Corporation

IBM Software Group

Assessment of Ill Conditioning


What constitutes a large or small value?
Depends on machine, data and algorithm precision ( ), algorithm
tolerances (t)
Ill conditioning can occur when round off error associated with
machine precision is large enough to influence algorithm decisions

x ( B) b
t?

Classify based on threshold defined by t /


Four distinct categories

Example: CPLEX has default algorithmic tolerances of 1e-6, runs


double precision arithmetic on machines with precision of ~1e-16
t / = 1e-6/1e-16 = 1e+10 is a key threshold
15

2014 IBM Corporation

IBM Software Group

Assessment of Ill Conditioning


Condition number is a bound for the increase of the error:

x ( B) b

Basic epsilons:
Machine precision (double):

1e-16

Default feasibility and optimality tolerance:

1e-6

Classification of condition numbers for LP bases:

16

(B) < 1e+7

Stable:

Suspicious:

1e+7 (B) < 1e+10

Unstable:

1e+10 (B) < 1e+14

Ill-posed:

1e+14 (B)

2014 IBM Corporation

IBM Software Group

Assessment of Ill Conditioning


What about MIP?
Previously, CPLEX provided Kappa for the associated
fixed LP:

Some indication of condition of primal solution


But optimal basis for fixed LP may not match that of node LP that
yielded the associated integer solution

Optimality proof of MIP is based on pruning during tree


search and thus not available with final solution

How reliable is it?


Need to monitor condition number of all optimal bases
used during Branch-and-Cut search
Performance impact
Can be mitigated by sampling
17

2014 IBM Corporation

IBM Software Group

Assessment of Ill Conditioning


MIP Kappa feature, available starting with CPLEX 12.2
Sample from the series of condition numbers
New parameter CPX_PARAM_MIPKAPPA with settings:

-1: off
0: auto (defaults to off)
1: sample
2: use every optimal basis

Classification thresholds provide percentages of each category


Provide an assessment for users unfamiliar with ill conditioning

If enabled, categorize condition numbers of optimal bases

18

Stable
Suspicious
Unstable
Ill-posed
2014 IBM Corporation

IBM Software Group

Assessment of Ill Conditioning


MIP Kappa sample output :
Branch-and-cut subproblem optimization:
Max condition number:
3.5490e+16
Percentage of stable bases:
0.0%
Percentage of suspicious bases: 86.9%
Percentage of unstable bases:
13.0%
Percentage of ill-posed bases:
0.1%
Attention level:
0.048893
CPLEX encountered numerical difficulties while solving this model.

Attention level
=0 if only stable bases encountered
>0 if at least one basis encountered that is not stable
Max value is 1 (all bases ill-posed)
Not linear
19

2014 IBM Corporation

IBM Software Group

Implications of Ill Conditioning

Now that we can better assess the meaning of the


basis condition numbers, what can we do about it?
Ill conditioning can occur under perfect arithmetic.
For some models, even perfect data, algorithm and machine
precision may not address the problem.
Consider adjustments to existing formulation, or alternate
formulations that provide the solution to the ill conditioned model.

But, in most cases, finite precision can perturb the exact system
of equations we wish to solve, resulting in significant changes to
the computed solution.
Calculate data, formulate model and configure algorithm to keep such
perturbations as small as possible

Condition number provides a worst case bound on the effect


CPLEX provides good quality solutions on majority of models
containing some basis condition numbers in [1e+10, 1e+14]

20

2014 IBM Corporation

IBM Software Group

Implications of Ill Conditioning


Sources of perturbations
Finite precision representation of exact data
Calculation of problem data in finite precision
Truncation of calculated data
Good idea if based on knowledge of the model and associated physical
system (cleaning up the model data)
Bad idea if done arbitrarily without considering the implications for the model
and associated physical system (garbage in, garbage out).

Errors in algorithmic calculations of data


Statistical methods to predict demand for production planning or asset returns
for consideration in a financial portfolio

Errors in physical measurements of data values


Any other differences between the conceptual perfect precision
calculation and the practical finite precision calculation
Example: addition and multiplication no longer associative and distributive
under finite precision

21

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Condition number of Simplex Solutions
Simplex solution is intersection point of n hyperplanes

22

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Condition number of Simplex Solutions
Simplex solution is intersection point of n hyperplanes
b = change in hyperplane (input); x = change in solution
(output)
x

= x / b 1: well conditioned!

b
23

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Condition number of Simplex Solutions
Simplex solution is intersection point of n hyperplanes

= x / b >>1: ill conditioned!


x
b

24

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Condition number of Simplex Solutions
Simplex solution is intersection point of n hyperplanes

stable Solution
ill-conditioned Solution

25

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Distance to singularity of a matrix is the reciprocal of its condition
number (Gastinel, Kahan).

distp( B ) := min{
: B + B singular}
B
|| B || p
distp( B ) = 1 / p( B )
p

Implies that linear combinations of rows or columns of B that are


close to 0 imply ill conditioning:
T

if B = v, || v ||< ,

|| ||> > || ||,

B is close to singular, and hence ill conditioned


provides a certificate of ill conditioning; its support identifies rows to
examine
26

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Diagonals of U in LU factorization provide a proxy for basis
condition number

( B ) = ( L )( D )(U )
1
1
1
1
( B ) = (U )( D )( L )
27

2014 IBM Corporation

IBM Software Group

Implications for the Practitioner (Data Input)


Cant avoid perturbations due to machine precision

x ( B) b
t?

But, increasing tolerances when condition number is high can


prevent algorithmic decisions based on round off error associated
with machine precision.
More precise input values are better
Always calculate and input model data in double precision
Machine precision for 32 bit floats ~1e-8

28

Condition numbers > 1e+2 could result in algorithmic decisions based on


machine precision based round off error
If you really need to use single precision in the model data, increase the
algorithms default tolerances above 1e-6
2014 IBM Corporation

IBM Software Group

Implications for the Practitioner (Data Calculation)


Minimize perturbations involving other factors

x ( B) b
t?

Model data values


Dont divide big numbers by small numbers in data calculations

Increases round off error

Make sure all procedures that calculate the data are implemented in a
numerically stable manner
Less round off error if all data values of similar order of magnitude

29

Mix of large and small numbers results in more shifting of the exponents, loss
of precision in the mantissa.
Use CPLEXs aggressive scaling if unavoidable

2014 IBM Corporation

IBM Software Group

Implications for the Practitioner (Formulation)


Avoid nearly linear dependent rows or columns

if B = v, || v ||< , || ||> > || ||,


B is close to singular, and hence ill conditioned
T

Such linear combinations of rows and columns often arise from round
off error in the data

30

2014 IBM Corporation

IBM Software Group

Implications for the Practitioner (Formulation)


Imprecise model data values and near singular matrices (example)
Avoid rounding if you can, or round as precisely as possible
Matrices can be ill conditioned despite small spread of coefficients
Exact formulation:
if B T = v, || v ||< , || ||> > || ||,
Maximize x1 +
x2
c1:
1/3 x1 + 2/3 x2 = 1
B is close to singular, and hence ill conditioned
c2:
x1 + 2 x2 = 3
Imprecisely rounded, single [double] precision
Maximize
x1
+
x2
c1: .33333333 x1 + .66666667 x2 = 1
(results in near singular matrix)
[ c1: .3333333333333333 x1 + .666666666666667 x2 = 1] (better)
c2:
x1
+
2x2 = 3
Scale to integral value whenever possible:
Maximize x1 + x2
c1:
x1 + 2 x2 = 3
c2:
x1 + 2 x2 = 3
31

(best)
2014 IBM Corporation

IBM Software Group

Numerical Stability of Algorithms


Numerical instability and ill conditioning are not the same
Ill conditioning can occur under perfect precision; numerical instability is
specific to finite precision
Informally, an algorithm is numerically unstable if it performs
calculations that introduce unnecessarily large amounts of round-off
error
Formally, numerical stability (or lack thereof) involves error analysis

Given x R n , y R m , y = f ( x )
Forward error analysis : y = fl ( f ( x )) f ( x )
Backward error analysis : x : f ( x + x ) = fl ( f ( x ))
Forward: change in computed solution due to round-off errors
Backward: change in model (under perfect precision) required to achieve finite
precision result
An algorithm is numerically stable when the bound on the backward error is
small relative to the error in the input
32

2014 IBM Corporation

IBM Software Group

Numerical Stability of Algorithms


Sources of numerical instability in finite precision algorithms and
calculations

Performing arithmetic operations on numbers of dramatically different


orders of magnitude
Look for mathematically equivalent calculation on numbers of more similar
magnitude

Algorithms that rely on ill conditioned subproblems


Example: Gomory cuts become almost parallel in cutting plane algorithm as it
nears convergence

Ill-conditioned transformations of the problem*


Example: LU factorization calculated with numerically unstable pivot
selections (more on this soon)

Calculations involving large intermediate values compared to final


solution values*
Small relative error for large intermediate values are much larger relative to
final value

* source: Higham, Accuracy and Stability of Numerical Algorithms

33

2014 IBM Corporation

IBM Software Group

Numerical Stability of Algorithms


Example: stability test in the LU factorization calculation*

~
A=

- 1
= 1
1

1 1 1
A =
=
-1

~ -1 + 1
A =
-1

+1

+ 1

+ 1

0 -1

1 0 1 +

0 1
1

1 0 - (1 +

1. Divide small numbers


into much larger ones
2. Large intermediate
values
3. Ill conditioned
transformation

(Both matrices are well conditioned)

*source: Higham, Accuracy and Stability of Numerical Algorithms


34

2014 IBM Corporation

IBM Software Group

Numerical Stability of Algorithms


Resulting round-off error

0 -1

1 =

1 0 1 +

~
~
~ - 1 - 1
A - fl ( L ) fl (U ) =

1 1 1 0
1
~
~
fl ( L ) fl (U ) = 1

1
fl ( L) fl (U ) =

-1

1 0
0
=
0

0
1

Should
be 1

0 1
1
1 1

=

1 0 - (1 + ) 0 - 1
1 1 1 1 0 0
A - fl ( L) fl (U ) =

=

- 1 0 - 1 0
35

Should
be

2014 IBM Corporation

IBM Software Group

Numerical Stability of Algorithms


Implications for the practitioner
Optimizer developers are responsible for stability of their algorithms
CPLEX applies such tests in factorizations, ratio tests, presolve operations,
other places
Simpler software that doesnt use numerically stable procedures in the
implementation will not be consistently reliable

Practitioner is responsible for stability of algorithms used to calculate


model data for optimizer
Statistical packages that do predictive analytics as input for prescriptive
analytics
Watch out for ill-conditioned transformations of the problem
Watch out for data calculations involving large intermediate values compared
to final data values

36

2014 IBM Corporation

IBM Software Group

Identification of symptoms of ill conditioning


Tools for assessing presence of ill conditioning or
excessive round-off error

Examine problem statistics of model before starting the


optimization
Mixtures of large and small coefficients
Indications of nearly linearly dependent rows

Values with repeating decimal places

Examine node or iteration log during the optimization


Loss of feasibility for LP/QP solves
Large iteration counts for node relaxations

Examine solution quality after the optimization


Significant primal or dual residuals often indicate large basis
condition numbers

(If available) Examine the diagonals of U in B=LU (MINOS)


(If available) Run the MIP Kappa feature for MIPs (CPLEX)
37

2014 IBM Corporation

IBM Software Group

Identification of symptoms of ill conditioning: ns1687037


(http://plato.asu.edu/ftp/lptestset/)
Problem statistics:
Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros

: 43749 [Nneg: 36001, Box: 874, Free: 6874]


: 24000
: 50622 [Greater: 38622, Equal: 12000]
: 1406739
: 24000

Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros

: Min LB: 0.000000


Max UB: 3.000000
: Min
: 1.000000
Max : 100.0000
:
: Min
: 1.987766e-08 Max : 1364210.
: Min
: 0.0005000000 Max
: 5.030775e+07
Wide range of coefficients; smallest
below default feasibility, optimality
tolerances

38

2014 IBM Corporation

IBM Software Group

Identification of symptoms of ill conditioning: ns1687037


(http://plato.asu.edu/ftp/lptestset/)
Iteration log #1: Loss of feasibility after basis refactorization
Iteration: 674222 Dual objective =
913.318204
Should only
Iteration: 674228 Dual objective =
913.318204
refactor every

<3 more refactorizations>


100+ iters
Iteration: 674256 Dual objective =
913.318205
Iteration: 674258 Dual objective =
913.318205
Removing perturbation.
Massive loss
Iteration: 674259 Scaled dual infeas =
12123.146176
of feasibility
Iteration: 674772 Scaled dual infeas =
595.276887
Elapsed time = 16959.32 sec. (6667412.82 ticks, 674876 iterations)
...
Elapsed time = 17138.76 sec. (6737439.02 ticks, 681930 iterations)
Iteration: 681949 Scaled dual infeas =
0.000002
...
Iteration: 682542 Scaled dual infeas =
0.000000
Objective
Iteration: 682624 Dual objective =
-19559.930294
much worse
Iteration: 682896 Dual objective =
-18109.597465
Elapsed time = 17160.88 sec. (6747443.75 ticks, 682941 iterations)
39

2014 IBM Corporation

IBM Software Group

Identification, of symptoms of ill conditioning: ns1687037


(http://plato.asu.edu/ftp/lptestset/)
Iteration log #2: Increase in Markowitz tolerance after frequent
refactorizations of the basis

Iteration: 783543 Dual objective


Iteration: 783548 Dual objective
Iteration: 783550 Dual objective
Iteration: 783553 Dual objective
Iteration: 783556 Dual objective
Removing shift (209).
Markowitz threshold set to 0.99999
Iteration: 783558 Dual objective

=
=
=
=
=
=

3.635840
3.635840
3.635840
3.635840
3.635840
3.635695

Should only
refactor every
100+ iters

CPLEX reacts to
signs of trouble

Dual feasibility
preserved

40

2014 IBM Corporation

IBM Software Group

Identification, of symptoms of ill conditioning: ns1687037


(http://plato.asu.edu/ftp/lptestset/)
Solution quality (available in all CPLEX APIs)
Max. unscaled (scaled) bound infeas.

= 8.39528e-07 (8.39528e-07)

(Reduce feasibility tolerance, continue optimizing to decrease bound infeasibilities)


Max. unscaled (scaled) reduced-cost infeas. = 2.31959e-08 (2.31959e-08)
(Reduce optimality tolerance, continue optimizing to decrease reduced cost infeasibilities)
Max. unscaled (scaled) Ax-b resid.

= 3.51461e-07 (1.16886e-12)

(If exceeds feasibility tolerance, CPLEX feasibility decisions based on round off error)
Max. unscaled (scaled) c-B'pi resid.

= 1.18561e-13 (1.18561e-13)

(If exceeds optimality tolerance, CPLEX optimality decisions based on round off error)
Max. unscaled (scaled) |x|

= 24139.1 (24139.1)

Max. unscaled (scaled) |slack|

= 48278.2 (48278.2)

Max. unscaled (scaled) |pi|

= 76.2637 (76.2637)

Max. unscaled (scaled) |red-cost|

= 100 (100)

Condition number of scaled basis

= 2.2e+12

(Use to assess sensitivity of solution to perturbations in the model data)

41

2014 IBM Corporation

IBM Software Group

Treatment of symptoms of ill conditioning


Distinguish the symptoms from the cause
Treatment of symptoms often not as robust, but it may
provide a quick resolution to a pressing problem

CPLEX parameters to treat symptoms


Set the scale parameter to 1
Geometric mean based scaling works well on models with wide
range of coefficients

Increase the Markowitz tolerance from its default of 0.01 to .


90 or larger (max of .9999)
Tightens the pivot threshold in the row stability test of the LU
factorization
Equivalently, tightens the bound on the sub diagonal elements of
L from 1/.01 to 1/.9

Turn on the numerical emphasis parameter


Causes CPLEX to invoke internal logic to perform more accurate
calculations (including quad precision for the LU factorization)

42

2014 IBM Corporation

IBM Software Group

Treatment of symptoms of ill conditioning


ns1687037
Problem stats indicated wide range of coefficients in matrix
Well suited for setting scale parameter to 1
Removed all problems (loss of feasibility, overly frequent basis
refactorizations) seen in iteration logs
Results: Huge reductions in run times for dual simplex and barrier, modest
reduction for primal simplex:

Algorithm
Settings

Primal

Dual

Barrier

Default

14214.6

21094.2

1258.23

Scaling=1

11164.64

907.5

83.52

Solution quality was better as well


43

2014 IBM Corporation

IBM Software Group

Treatment of causes of ill conditioning


ns1687037
Problem stats indicated wide range of coefficients in matrix
Linear constraints
Nonzeros
RHS nonzeros

:
: Min
: Min

: 1.987766e-08 Max
: 0.0005000000 Max

: 1364210.
: 5.030775e+07

Are the small coefficients of 1e-8 meaningful or due to round-off error


in the data calculations?
Changing them to 0 results in an LP that solves to optimality within 1
second, just with presolve

Suggests these coefficients have meaning, but may cause trouble for CPLEXs
default feasibility or optimality tolerances of 1e-6

Modeller or owner of data needs to assess whether these coefficients are


meaningful

44

2014 IBM Corporation

IBM Software Group

Treatment of causes of ill conditioning


ns1687037
Consider the constraints that contain the tiny coefficients
Fortunately, they appear repeatedly in small subsets of constraints
Reformulation of one subset will apply to other subsets

R0002624: 50150 C0024008 + 50150 C0024010 + 50150 C0024012 +


50150 C0024014 + 50150 C0024016 + 113600 C0024020 +
50150 C0024024 + 113600 C0024026 + 113600 C0024038 +
... +
69070 C0025728 + 69070 C0025734 + 47585 C0025738 +
50150 C0025742 + 50150 C0025744 + 69070 C0025748 +
C0025749 = 50307748
R0002625: - C0025749 + C0025750 >= 0
These variables
R0002626: C0025749 + C0025750 >= 0
only appear in
R0002627: 1.9877659e-8 C0025750 - C0025751 = 0
constraints on this
R0002628: C0000001 - C0025751 >= 0
slide
R0002629: C0000002 - C0025751 >= -0.0005
R0002630: C0000003 - C0025751 >= -0.0008
R0002631: C0000004 - C0025751 >= -0.0009
45

2014 IBM Corporation

IBM Software Group

Treatment of causes of ill conditioning


ns1687037

These vars appear in other constraints

R0002624: 50150 C0024008 + 50150 C0024010 + 50150 C0024012 +


50150 C0024014 + 50150 C0024016 + 113600 C0024020 +
50150 C0024024 + 113600 C0024026 + 113600 C0024038 +
... +
69070 C0025728 + 69070 C0025734 + 47585 C0025738 +
50150 C0025742 + 50150 C0025744 + 69070 C0025748 +
C0025749 = 50307748
// C0025749 free; all others 0
R0002625: - C0025749 + C0025750 >= 0
C25750 = | C0025749|
R0002626: C0025749 + C0025750 >= 0
R0002627: 1.9877659e-8 C0025750 - C0025751 = 0
R0002628: C0000001 - C0025751 >= 0
R0002629: C0000002 - C0025751 >= -0.0005
Scale abs. value of
R0002630: C0000003 - C0025751 >= -0.0008
violation for
R0002631: C0000004 - C0025751 >= -0.0009
R0002624
Penalty variables; appear
here and in objective
46

2014 IBM Corporation

IBM Software Group

Treatment of causes of ill conditioning


ns1687037
Objective contains only variables like C0000001,,C0000004
Piecewise linear, higher cost for larger absolute violation of R0002624
Move the scaling of absolute violation in the constraints to the objective

Dramatically improves the coefficient spread in the constraint matrix, LU factors

Better: use unscaled violation. Objective value will be larger, but, if needed,
recapture actual value after the optimization

R0002627: 1.9877659 C0025750 1e+8 C0025751 = 0


R0002628: 1e+8 C0000001 1e+8 C0025751 >= 0
R0002629: 1e+8 C0000002 1e+8 C0025751 >= -50000
R0002630: 1e+8 C0000003 1e+8 C0025751 >= -80000
R0002631: 1e+8 C0000004 1e+8 C0025751 >= -90000
(x = (1e+8) x)
R0002627: 1.9877659 C0025750 C0025751 = 0
R0002628: C0000001 C0025751 >= 0
R0002629: C0000002 C0025751 >= -50000
R0002630: C0000003 C0025751 >= -80000
R0002631: C0000004 C0025751 >= -90000
47

2014 IBM Corporation

IBM Software Group

Treatment of causes of ill conditioning


ns1687037
Reformulation:
R0002624: 50150 C0024008 + 50150 C0024010 + 50150 C0024012 +
50150 C0024014 + 50150 C0024016 + 113600 C0024020 +
50150 C0024024 + 113600 C0024026 + 113600 C0024038
... +
69070 C0025728 + 69070 C0025734 + 47585 C0025738 +
50150 C0025742 + 50150 C0025744 + 69070 C0025748 +
C0025749 = 50307748
R0002625: - C0025749 + C0025750 >= 0
R0002626: C0025749 + C0025750 >= 0
R0002627: 1.9877659 C0025750 - C0025751 = 0
R0002628: C0000001 - C0025751 >= 0
R0002629: C0000002 - C0025751 >= -50000
R0002630: C0000003 - C0025751 >= -80000
R0002631: C0000004 - C0025751 >= -90000

48

2014 IBM Corporation

IBM Software Group

Treatment of causes of ill conditioning


ns1687037
Run times for original formulation:
Algorithm
Settings

Primal

Dual

Barrier

Default

14214.6

21094.2

1258.23

Scaling=1

11164.64

907.5

83.52

Run times for modified formulation:


Algorithm
Settings

49

Primal

Dual

Barrier

Default

2310.9

2926.5

41.4

Scaling=1

6890.8

1054.7

68.2
2014 IBM Corporation

IBM Software Group

Common sources of ill conditioning


Ill conditioning can be caused by large or small subsets of
constraints and variables in the model

Such subsets can be difficult to isolate


Build up a list of common sources, use that before more
model specific analysis

50

2014 IBM Corporation

IBM Software Group

Common sources of ill conditioning


Mixture of large and small coefficients in the model
Does not guarantee large basis condition numbers
Additional round-off in floating point representation, arithmetic
calculations enables modest condition numbers to magnify error in
computed solutions above optimizer tolerances

Imprecise data resulting in near singular matrices


Near singular matrices have large condition numbers

Long sequences of transfer constraints


Mixture of large and small coefficients is implicit rather than explicit

This is not a comprehensive list


Add items based on your own modelling experiences

51

2014 IBM Corporation

IBM Software Group

Common sources
Imprecise model data values

distp( B ) := min{
: B + B singular}
B
|| B || p
distp( B ) = 1 / p( B )
p

Avoid rounding if you can, or round as precisely as possible


Matrices can be ill conditioned despite small spread of coefficients
Exact formulation:
Maximize x1 +
x2
c1:
1/3 x1 + 2/3 x2 = 1
c2:
x1 + 2 x2 = 3
Imprecisely rounded, single [double] precision
Maximize
x1
+
x2
c1: .33333333 x1 + .66666667 x2 = 1
(results in near singular matrix)
[ c1: .3333333333333333 x1 + .666666666666667 x2 = 1] (better)
c2:
x1
+
2x2 = 3
Scale to integral value whenever possible:
Maximize x1 + x2
c1:
x1 + 2 x2 = 3
c2:
x1 + 2 x2 = 3
52

(best)
2014 IBM Corporation

IBM Software Group

Common sources of ill conditioning


Long sequences of transfer constraints

x1 = 2 x2
x2 = 2 x3
x3 = 2 x4
M
x n 1 = 2 xn
xn = 1
x j 0 for j = 1,K, n
All coefficients have same order of magnitude
All coefficients can be represented exactly as IEEE doubles
How bad can it be?

53

2014 IBM Corporation

IBM Software Group

Common sources of ill conditioning


Long sequences of transfer constraints (ctd)
If any one variable > 0,
all the others are basic as well
=3*2n
Bound from condition number is
fairly tight

1 -2

1
2

1 -2

~
B =
1

1
2

~ -1
B =

54

4 8

2 4
1

2
1
O
1

2 n-1

n-2
2

2 n -3

2014 IBM Corporation

IBM Software Group

Common sources of ill conditioning


Long sequences of transfer constraints (ctd)
Substitute out variables:

x1 = 2 x2
x2 = 2 x3

( x1 = 4 x3 )

x3 = 2 x4

( x1 = 8 x4 )

M
xn 1 = 2 xn

( x1 = 2n 1 xn )

xn = 1
x j 0 for j = 1,K, n
Small change in xn propagates into large change in x1

55

2014 IBM Corporation

IBM Software Group

Diagnostics for ill conditioning and numerical instability


Consider the list of common sources first
Look at solution values
Extremely large primal or dual values can identify small subsets
of constraints and variables involved in the ill-conditioning

Then look at the basis and its inverse for large values
C API programs available among IBM Technotes*

For MIPs, consider MIP Kappa feature


C API program available to export node LPs with conditions
number above a user supplied threshold available as well*
Look at solution values, basis values or inverse values after locating a
node LP with ill conditioned optimal basis

*http://www-01.ibm.com/support/docview.wss?uid=swg21662382
56

2014 IBM Corporation

IBM Software Group

Diagnostics for ill conditioning and numerical instability


Consider other metrics for ill conditioning
Distance to singularity
Find linear combination of rows or columns that is closest to 0
Since B is nonsingular, no linear combination is exactly 0

Min eT (u + w)
s.t.
c1 : BT v = 0
c2 : v = u w
c 3 : eT = 1

, v free; u, v 0

v is a nonzero linear
combination of the rows of B

Used to minimize |v|


in the objective

Nonzeros of optimal * identify a single subset of constraints involved in


ill-conditioning if optimal objective value is small relative to *
57

2014 IBM Corporation

IBM Software Group

Diagnostics for ill conditioning and numerical instability


Challenges with distance to singularity LP
B is known to be ill conditioned; thats why this LP is of interest
May be better to start with all slack basis
Turn on numerical emphasis

* may identify a large number of constraints, making diagnosis difficult


Solve a MIP instead

Min eT ( y + z )
s.t.
c1 : = u w
c2 : u y 0
c3 : w z 0

Count the nonzero elements


of ; makes use of objective

c4 : BT = s t
c5 : e s + e t
T

c 6 : eT = 1
, free; s, t , u, w 0; y , z {0,1}
58

Nonzero linear
combo of rows of B
that is close to 0

2014 IBM Corporation

IBM Software Group

Diagnostics for ill conditioning and numerical instability


Challenges with distance to singularity MIP
B is known to be ill conditioned, as with LP
Need to choose a small value for
Conceptually want = 1/, but too small for large associated with ill
conditioned basis
Need to use larger value, which could compromise the usefulness of the
resulting solution

Solving a MIP can be much harder than solving an LP

59

2014 IBM Corporation

IBM Software Group

Anomalies, Misconceptions, Inconsistencies and


Contradictions
Unscaled infeasibilities
3y1 + 5y2 + 799988z 800000,

y1, y2 [0,100000], z binary

After default scaling: 3/799988 y1 + 5/799988 y2 + z 800000/799988


y1 =0, y2= 2.5, z=1: lhs = 1 + 12.5/799988, rhs = 1 + 12/799988
Scaled infeasibilities = .5/799988 ~= 6.25e-7
Unscaled: lhs = 800000.5, rhs = 800000; unscaled infeasibilities = .5

Remedies

Improve ratio of largest to smallest coefficients to make


default scaling more effective

If bounds of 100000 can be reduced to 1000, constraint


becomes 3y1 +5y2 +7988z <= 8000, scaled infeasibility increases to
~6.25*10-5, excluding y1 = 0, y2 = 2.5, z=1

If binary involved, try to replace with indicator constraints

60

z = 1 3y1 +5y2 <= 12


2014 IBM Corporation

IBM Software Group

Anomalies, Misconceptions, Inconsistencies and


Contradictions
Models on the edge of feasibility and infeasibility
This can happen with well-conditioned models
Example:

c1 : - x1 + 24x2 21;
- x1 3;
x2 1.00000008
For CPLEXs default feasibility tolerance of 1e-6, different bases
can legitimately result in a declaration of feasibility or infeasibility

61

2014 IBM Corporation

IBM Software Group

Anomalies, Misconceptions, Inconsistencies and


Contradictions
c1 : - x1 + 24x2 21;
Models on the edge of feasibility and infeasibility

Different bases legitimately declare the model feasible or


infeasible

- x1 3;
x2 1.00000008

Presolve (all slack basis)

c1 : - x1 + 24x2 21;
row min : - 1 * 3 + 24 * 1.00000008 = 21 + 1.92e 6 21
(decreasing x1 or increasing x2 from bound will not reduce infeasibility
Presolve off, defaults otherwise (uses all slack basis again)
Primal simplex - Infeasible: Infeasibility = 1.9199999990e-06
CPLEX > display solution reduced Variable Name Reduced Cost
x1 -1.000000
x2 24.000000
Constraint Name Slack Value
slack c1 -0.000002
62

2014 IBM Corporation

IBM Software Group

Anomalies, Misconceptions, Inconsistencies and


Contradictions
c1 : - x1 + 24x2 21;
Models on the edge of feasibility and infeasibility

Different bases legitimately declare the model feasible or


infeasible

- x1 3;
x2 1.00000008

Presolve off, defaults (starting basis of x2)


x1 = 3, slack on c1 = 0 x2 = 1.0
Bound violation on x2 of .8e-8 is within CPLEXs default feasibility
tolerance of 1e-6
Feasible and optimal basis, since objective is all zero

What should we do?


Consistent (infeasible) results obtained with feasibility tolerance of
1e-8
Feasibility and optimality tolerances should be smaller than
smallest legitimate data value
Need to decide if .00000008 is legitimate or round-off error

63

Legitimate: Reduce tolerances


Round-off: Clean data; change lower bound of x2 to 1.0

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns1687037 (previously discussed)
Wide range of coefficients
Setting scaling to 1 helped
Recognize that smallest coefficients involved
penalties on constraint violations that could be
moved into the objective function, improving
numerics of basis factorization
Reformulating the model to improve the numerics
yielded additional improvements, addressed the
underlying cause of the problem

64

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


cdma (unsolved MIP from unstable test set of MIPLIB
2010)
Nodes
Node Left
* 0+
0
0
0
...

Cuts/
Objective IInf Best Integer

0
-1.97987e+14
0 -6.33335e+16 704 -1.97987e+14
0 -6.31118e+16 702 -1.97987e+14
0 -6.26076e+16 631 -1.97987e+14

Best Bound

ItCnt

Gap

-6.51749e+17
-6.33335e+16
Cuts: 1870
Cuts: 1870

71155
71155
185865
292616

---------

0 0 -5.87363e+16 1206 -3.21168e+15 Cuts: 1604 4545331 --* 0+ 0


-7.75173e+15 -5.87363e+16 4654552 657.72%
...
*

0+ 0
-1.16804e+16 -5.81032e+16 5626309 397.44%
0 0 -5.80853e+16 1585 -1.16804e+16 Cuts: 566 5632163 397.29%
Heuristic still looking.
0 2 -5.80853e+16 1583 -1.16804e+16 -5.80853e+16 5633601 397.29%
Elapsed time = 52951.48 sec. (11682283.45 ticks, tree = 0.01 MB, solutions = 17)
1 3 -5.80295e+16 1444 -1.16804e+16 -5.80853e+16 5643488 397.29%
...
12862 10763 -4.10127e+16 1032 -1.46845e+16 -4.29552e+16 29639775 192.52%
Elapsed time = 71901.33 sec. (18469780.15 ticks, tree = 33.05 MB, solutions = 24)
12866 10767 -3.86482e+16 979 -1.46845e+16 -4.29552e+16 29661467 192.52%

65

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


cdma (unsolved MIP from unstable test set of MIPLIB
2010)
Problem statistics:
Variables
: 7891 [Fix: 1, Box: 3655, Binary: 4235]
Objective nonzeros : 2383
Linear constraints
: 9095 [Less: 8390, Greater: 645, Equal: 60]
Nonzeros
: 168227
RHS nonzeros
: 4145
Variables
: Min LB: 0.000000
Objective nonzeros : Min
: 1.000000
Linear constraints
:
Nonzeros
: Min
: 1.000000
RHS nonzeros
: Min
: 1.000000

66

Max UB: 1.000000e+07


Max
: 1.724400e+11
Max
Max

: 5.000000e+07
: 3000000.

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


cdma (unsolved MIP from unstable test set of MIPLIB 2010)
Typical node LP iteration log
Wasted iterations slow
node throughput

Iteration log ...


Iteration: 1 Scaled dual infeas = 0.000130
Iteration: 8 Scaled dual infeas = 0.000069
Iteration: 12 Dual objective = -6614900586660791.000000
Iteration: 23 Scaled dual infeas = 0.000107
Iteration: 32 Dual objective = -6614900586660791.000000
Iteration: 46 Dual infeasibility = 0.000038
Iteration: 52 Dual objective = -6614900586660791.000000
Iteration: 58 Dual infeasibility = 0.000038
Iteration: 64 Dual objective = -6614900586660791.000000
Maximum unscaled reduced-cost infeasibility = 7.62939e-06.
Maximum scaled reduced-cost infeasibility = 7.62939e-06.
Dual simplex - Optimal: Objective = -6.6149005867e+15
...

67

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


cdma (unsolved MIP from unstable test set of MIPLIB 2010)
Solution quality for same node LP
Max. unscaled (scaled) bound infeas. = 1.81311e-07 (1.81311e-07)
Max. unscaled (scaled) reduced-cost infeas. = 7.62939e-06 (7.62939e-06)
Max. unscaled (scaled) Ax - b resid. = 9.99989e-10 (6.10345e-14)
Max. unscaled (scaled) c - B'pi resid. = 7.3125 (7.3125)
Max. unscaled (scaled) |x| = 5596 (55296)
Max. unscaled (scaled) |slack| = 6.12827e+06 (10.6908)
Max. unscaled (scaled) |pi| = 8.67329e+16 (4.02442e+17)
Max. unscaled (scaled) |red-cost| = 4.31401e+17 (4.31401e+17)
Condition number of scaled basis = 5.2e+08

Reasonable optimal
basis condition number
68

Incoming variable choice


based on round-off error
2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


cdma (unsolved MIP from unstable test set of MIPLIB 2010)
Large objective coefficients, not large basis condition numbers,
cause slow node throughput
Model is numerically unstable, not ill conditioned
16 base 10 digits of accuracy for IEEE doubles, objective coefficients on
the order of 1e+11

Round-off error of 1e-5 just to represent


Modest basis condition numbers of 1e+8 can magnify to 1e+8*1e-5 = 1e+3
Default optimality tolerance: 1e-6

Simplex method pivots heavily influenced by round-off error

What can we do?


Take a closer look at the objective function

69

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


cdma (unsolved MIP from unstable test set of MIPLIB 2010)
Histogram of objective coefficients*

OBJECTIVE
Range
Count
[10^0,10^1]:
31
[10^9,10^10]: 1236
[10^11,10^12]: 1116

All binaries; relative


contribution to objective is
below CPLEXs default
relative MIP gap
Current objective poorly
scaled; would be well
scaled if we deleted the
31 small objective coeffs.

Large objective coefficients problematic for dual feasibility of


node LP, dual residuals in solution quality
*Using program from
https://www-304.ibm.com/support/docview.wss?uid=swg21400100.
70

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


cdma (unsolved MIP from unstable test set of MIPLIB 2010)
31 binaries with relatively small objective coefficients have
negligible impact on objective coefficients
Especially when current solutions have relative MIP gaps of over 150%

Remove them from the objective


Remaining objective coefficients are all on the order of [1e+9,1e+12]
Rescale by 1e+9

Much better results with adjusted model


Much faster node throughput
Much better intermediate results regarding MIP gap
Moderately better final MIP gap after ~20 hours

More to be done
MIP gap remains challenging
But at least now node throughput sufficiently fast to consider MIP
parameter tuning, other changes to formulation

71

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


cdma
Implications
Mixture of large and small coefficients can be problematic
Consider solving sequence of problems with a hierarchical objective
rather than solving a problem with a single, blended objective

Examined node log to discover slow node throughput was a major


performance bottleneck
Examined node LP iteration log, solution quality, problem statistics
to identify large dual residuals as the primary source of slow node
LP solve times
Adjusted formulation to obtain a well scaled objective

72

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


de063155 (LP from
http://www.sztaki.hu/meszaros/public_ftp/lptestset/problematic/
CPLEX solves it in less than 0.1 seconds
Iterations logs indicate no sign of trouble
Problem statistics and solution quality raise questions regarding the
solution and the associated physical system
Is the solution acceptable?
Depends
Examine the problem statistics and solution quality to find out.

73

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


de063155 (LP from
http://www.sztaki.hu/meszaros/public_ftp/lptestset/problematic/
Problem stats
Variables
:
228, Other: 84]
Objective nonzeros :
Linear constraints
:
Nonzeros
:
RHS nonzeros
:
Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros

74

1488 [Nneg: 756, Fix: 205, Box: 215, Free:


852
852 [Less: 360, Equal: 492]
4553
777

: Min LB: -10000.00


Max UB: 30.90000
: Min : 1.279580e-05 Max
: 1000.000
:
: Min : 2.106480e-07 Max
: 8.354500e+11
: Min : 0.0002187500 Max
: 4.227560e+17

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


de063155 (LP from
http://www.sztaki.hu/meszaros/public_ftp/lptestset/problematic/
Solution quality:
There are no bound infeasibilities.
There are no reduced-cost infeasibilities.
Max. unscaled (scaled) Ax-b resid. = 747.949 (5.12641e-08)
Max. unscaled (scaled) c-B'pi resid. = 7.74852e-10 (8.51025e-06)
Max. unscaled (scaled) |x|
= 3.10148e+13 (3.76112e+07)
Max. unscaled (scaled) |slack|
= 3.75814e+07 (3.75814e+07)
Max. unscaled (scaled) |pi|
= 62061.1 (5.106e+09)
Max. unscaled (scaled) |red-cost| = 6.78639e+09 (8.5923e+09)
Condition number of scaled basis = 1.7e+08
Using aggressive scaling or turning on numerical emphasis does not
improve the solution quality [verify scaling]

75

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


de063155
Is the solution quality a problem?

Max. unscaled (scaled) Ax-b resid. = 747.949 (5.12641e-08)


Max. unscaled (scaled) c-B'pi resid. = 7.74852e-10 (8.51025e-06)
Max. unscaled (scaled) |x|
= 3.10148e+13 (3.76112e+07)
Max. unscaled (scaled) |slack|
= 3.75814e+07 (3.75814e+07)
Max. unscaled (scaled) |pi|
= 62061.1 (5.106e+09)
Max. unscaled (scaled) |red-cost| = 6.78639e+09 (8.5923e+09)
Condition number of scaled basis = 1.7e+08
Unscaled primal residuals are large in an absolute sense, but not relative to
primal solution values
Solution quality does not hinder performance
Practitioner must assess acceptability in context of the physical system being
modelled
Look at the constraints with the large absolute residuals
Need more computing precision if not acceptable
Or need to reformulate model in order to eliminate large matrix and right hand side
coefficients
76

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob (LP from
http://www.sztaki.hu/meszaros/public_ftp/lptestset/problematic/
Problem stats are benign
CPLEX and most other solvers indicate the model is infeasible
CPLEXs conflict refiner generates a conflict that is feasible
CPLEXs final basis has a condition number of ~1e+7, but small
changes to the model make it feasible
There are some claims on the web that the model is feasible
Investigate the source of these inconsistencies

77

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob
Problem stats are benign (but still useful)

Square linear
system with all free
variables

Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros

: 3001 [Free: 3001]


: 3000
: 3001 [Equal: 3001]
: 9000
: 3000

Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros

: Min LB: all infinite Max UB: all infinite


: Min
: 1.000000
Max : 1.000000
:
: Min
: 0.01030000 Max : 9940.000
: Min
: 1.000000
Max : 1.000000

Has a feasible solution if the square basis matrix of all free


variables is nonsingular
78

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob LP file
Minimize
obj.func: c2 + c3 + c4 + + c2999 + c3000 + c3001
Subject to
r1: 0.373 c2 + 1.23 c3 + 39.4 c4 + ... + 0.798 c3000 + 0.0889 c3001 = 1
r2: - 0.373 c1 + c2 = 1
r3: - 1.23 c1 + c3 = 1
r4: 39.4 c1 + c4 = 1
...
r3000: 0.798 c1 + c3000 = 1
r3001: - 0.0889 c1 + c3001 = 0
All variables but c1 appear in the first constraint
Variable c1 appears in the remaining 3000 constraints, with one other
variable
In each of these constraints, c1 has a coefficient whose absolute value is the
coefficient of the other variable in the first constraint

79

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob, model formulation
3000

min x j

subject to

j= 2

3000

r1 :

j= 2

j xj = 1

rk : k x1 + xk = bk

k = 2,K,3001

x free

80

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob, structure of free variable basis matrix
3000

min x j

x1

2 3 L 3001

2 1

1
3

1
3000

L
1
3001

r1 0

subject to

j= 2

3000

r1 :

j xj = 1

j= 2

rk : k x1 + xk = bk

k = 2,K,3001

x free

After permuting x1, r1 to last row and column:

I
B=

81

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob, closed form representation of LU factors and basis inverse

I
L=

I
U=
0 - T

Singular when
82

= 0

I
B=

1
B =
T

1
-

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob, singular when

I
B= T

vT B = 0
vT b 0

83


;
0

= T = 0

let v = ( ,1), then


(proof of singularity)
(proof of infeasibility)

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob
Singular when

= 0

Becomes increasingly ill conditioned as

Gastinel and Kahans metric


Definition of condition number of B

= 0

for the data instance at the web site (under perfect precision)

CPLEX and the other solvers correctly declare infeasibility


Any slight perturbation to the data causes

Problem then is feasible (although potentially ill conditioned)

Model is on the boundary of feasibility and infeasibility

Excellent model for exploring ill conditioning


Model structure easily reproducible with just a few constraints and variables

84

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


iprob
Implication for practitioner
Needs to assess meaning of small and 0 values of
the physical system being modelled

in the context of

Can check the model data in advance to calculate the value of


Declare infeasible without solving in the singular case
Consider reformulating if need to solve the boundary cases

Additional precision beyond 64 bits may help for small values

85

Exact rational arithmetic would be better.

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603 (http://miplib.zib.de/miplib2010-unstable.php).
Red model from MIPLIB 2010, never solved to optimality
Wide range of coefficients like cdma
Highly variable results
Different objective values claimed optimal with irrelevant changes to
optimization
Dramatic variability in run time as well

86

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603, problem stats
Variables
Objective nonzeros
Linear constraints
2488]
Nonzeros
RHS nonzeros
Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros
Imprecise
rounding?

87

: 19300 [Nneg: 11712, Binary: 7588]


: 5880
: 24754 [Less: 14706, Greater: 7560, Equal:
: 77044
: 10088
: Min LB : 0.000000 Max UB : 1.000000
: Min : 1.000000 Max : 450.0000
:
: Min : 0.04166670 Max : 1.000000e+08
: Min : 0.02083335 Max : 1.000000e+08
Wide
coefficient
range

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603, MIP Kappa output:
Max condition number: 6.0818e+23
Percentage (number) of stable bases: 1.89% (108)
Percentage (number) of suspicious bases: 60.29% (3446)
Percentage (number) of unstable bases: 35.22% (2013)
Percentage (number) of ill-posed bases: 2.61% (149)
Attention level: 0.137747
= 1e+23 -> perturbations of 1e-16 can be magnified to 1e-16*1e+23 =
1e+7
Need to look at node LPs with ill-posed optimal bases
http://www-01.ibm.com/support/docview.wss?uid=swg21662382 contains a
program that writes out node LPs with optimal basis condition numbers that
exceed a threshold provided by the user

88

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603, ill conditioned node LP solution quality
Max. unscaled (scaled) bound infeas.
= 1.72662e-12 (1.72662e-12)
Max. unscaled (scaled) reduced-cost infeas.= 3.97904e-13 (3.97904e-13)
Max. unscaled (scaled) Ax - b resid.
= 7.73721e-07 (9.67152e-08)
Max. unscaled (scaled) c - B'pi resid.
= 4.33376e-12 (4.33376e-12)
Max. unscaled (scaled) |x|
= 4.8e+09 (6e+08)
Max. unscaled (scaled) |slack|
= 4.81015e+09 (6e+08)
Max. unscaled (scaled) |pi|
= 66773.2 (6.71089e+07)
Max. unscaled (scaled) |red-cost|
= 1.00002e+08 (1.34218e+08)
Condition number of scaled basis
= 8.6e+17
Large primal values
(large red. costs too,
but consider primal
value first since primal
residuals are larger
89

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603, ill conditioned node LP primal variable values
Variable Name

Solution Value

C0213

0.007973

C0217

0.025673

...
C10142

36.450029

C10458

4799996160.003072

...
C11441

2399998080.001536

C11442

2399998080.001536

...
C11711

2583222987.267681

C11712

7.851078

...
C19155

0.000476

All other variables in the range 1-16642 are 0.


90

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603, constraints associated with large primal variable value
Intersecting Constraints, Variable C11441, Model ns2122603
Variable C11441 coefficients
Linear Constraint: Coefficient:
Linear Objective
0
R13353
-0.0416667
R14613
1
R15873
1
1/24 ????
R17133
1
R18393
1
R13353: C10181 - 0.0416667 C11441 - 100000000 C12701 +
100000000 C14025 -100000000
Variable Name
Solution Value
C11441 2399998080.001536
The variable ``C12701'' is 0.
The variable ``C14025'' is 0.
The variable ``C10181'' is 0.
91

2400000000?
(Change in input
by 1e-7 changes
output by ~2e+3)
2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603
Change .0416667 to 1/24
Do likewise with some other coefficients in the model

Resulting constraint, after rescaling by 24:


R13353: 24 C10181 + 2400000000 C14025 - 2400000000 C12701
- 1.0 C11441 -2400000000

Inconsistencies, bad MIP Kappa stats persist


Mixture of large and small coefficients remain

92

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603

Binaries

Large coefficients appear to be big M values


R13353: 24 C10181 + 2400000000 C14025 - 2400000000 C12701
- 1.0 C11441 -2400000000
Try to remove the large coefficients using indicator variables
First need to interpret constraint
C14025 = 0 and C12701 = 1 C11441 24C10181 0
Otherwise, nonbinding
Need a binary that assumes value of 1 when C14025 = 0 and C12701 = 1, 0
otherwise

Z C12701 - C14025
Z C12701
Z + C12701+C14025 2
Z 1 C11441 24C10181 0
93

2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603
Problem stats after replacing big M constraints with indicators
Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros
Indicator constraints
Nonzeros
RHS nonzeros
Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros
Indicator constraints:
Nonzeros
RHS nonzeros
94

: 23080 [Nneg: 11712, Binary: 11368]


: 5880
: 28534 [Less: 18486, Greater: 7560, Equal: 2488]
: 84604
: 10088
Previously
: 2520 [Greater: 2520]
[4e-2, 1e+8]
: 3780
: 1260
: Min LB : 0.000000 Max UB : 1.000000
: Min : 1.000000 Max : 450.0000
:
: Min : 1.000000 Max : 110555.0
: Min : 1.000000 Max : 1.295001e+07
: Min : 1.000000 Max : 24.00000
: Min : 1.000000 Max : 1.000000
2014 IBM Corporation

IBM Software Group

Examples from publicly available test sets


ns2122603, indicator formulation
Consistent results regardless of parameter settings, random seeds
MIP model remains challenging
Now have a chance to address the MIP performance issues

95

2014 IBM Corporation

IBM Software Group

Summary of examples from publicly available test sets


ns1687037 (LP; previously discussed)
Wide range of coefficients
Setting scaling to 1 helped
Reformulating the model to improve the numerics yielded additional
improvements, addressed the underlying cause of the problem
Moving the scaling issue from the matrix to the objective removed the
numerical problems from the basis matrix

cdma (MIP)

96

Basis condition numbers OK


Wide range of objective coefficients were the real problem
Separate large objective coefficients, rescale
Faster node throughput yields significantly better solutions faster,
but solving MIP to optimality remains challenging

2014 IBM Corporation

IBM Software Group

Summary of publicly available examples (ctd).


de063155 (LP)
No performance problem; solves within a second
Problem statistics, solution quality are cause for concern
Large data values, significant absolute residuals that are relatively small
Need to assess whether residuals are acceptable in the context of the associated system
being modelled

iprob (LP)
Depending on data values can be Ill posed/singular, highly ill conditioned or well
conditioned
Ill posed for specific data instance

Straightforward to mathematically represent the model


Closed form representation of relevant basis matrices

Use closed form representation to filter ill posed data instances


Decide based on associated system being modelled what to do

ns2122603 (MIP)
Ill conditioned basis matrices yield inconsistent results
Replacing constraints with large big M values with CPLEXs indicator constraints
improves the formulation, yielding consistent results
Solving the MIP to optimality remains challenging
97

2014 IBM Corporation

IBM Software Group

Summary and Conclusion


Key observations
Ill conditioning can occur under perfect arithmetic
But finite precision introduces perturbations that ill conditioning may magnify
Cant avoid machine precision

Algorithms cannot be ill conditioned, but they can be numerically unstable


Avoid unnecessary round-off errors in computations
Ill conditioned transformations in algorithms promote ill numerical instability
Stability tests prevent dividing smaller numbers into much larger ones

In CPLEXs algorithms
In data calculations, including algorithms (e.g., predictive analytics)

Most LP and MIP solvers use absolute rather than relative tolerances
Models with limited accuracy or significant round-off error may require larger
tolerances

98

2014 IBM Corporation

IBM Software Group

Summary and Conclusion


Key recommendations
Make sure optimizer doesnt make algorithmic decisions based on round-off error
Distinguish meaningful data from round-off error
Large basis condition numbers can magnify round-off error
Solution quality can help assess this
Compare primal and dual residuals to feasibility and optimality tolerances, respectively

Avoid single precision data calculations whenever possible


Trade-off between memory savings and accuracy is usually unfavorable
If necessary, use larger feasibility and optimality tolerances

Build a list of known sources of ill conditioning


Try to reproduce numerical issues on smaller model instances
Make use of the numerous different metrics available to measure and assess ill
conditioning
99

2014 IBM Corporation

IBM Software Group

Summary and Conclusions


Tools for assessing presence of ill conditioning or
excessive round-off error

Examine problem statistics of model before starting the


optimization
Mixtures of large and small coefficients
Indications of nearly linearly dependent rows

Examine node or iteration log during the optimization


Loss of feasibility for LP/QP solves
Large iteration counts for node relaxations

Examine solution quality after the optimization


Significant primal or dual residuals often indicate large basis
condition numbers

(If available) Examine the diagonals of U in B=LU (MINOS)


(If available) Run the MIP Kappa feature for MIPs (CPLEX)

100

2014 IBM Corporation

IBM Software Group

Summary and Conclusion


Assessing high condition numbers
Increasing algorithm tolerances can mitigate effects of large
condition numbers
But improvements to formulation, accuracy of input are more robust

1e+10, 1e+14 are important thresholds on common machines when


using 64 bit double precision
CPLEXs MIP Kappa feature enables assessment of ill conditioning
in MIPs

CPLEX solves most models with some or all of these problems


just fine
Familiarity with these remedies can save the practitioner precious
time when they do encounter a badly ill conditioned model.

101

2014 IBM Corporation

IBM Software Group

Discussion
What other features in CPLEX Optimization Studio would
help you bridge the gap between the mathematical model
and the practical application?
How useful would a minimal subset of an ill conditioned
model that remains ill conditioned be?
Even if it involved a significant number of constraints and
variables?

Let us know (klotz@us.ibm.com).

102

2014 IBM Corporation

IBM Software Group

References/Further Reading
More detailed discussion in INFORMS TutORials in Operations Research 2014
Higham, Accuracy and Stability of Numeric Algorithms
Duff, Erisman and Reid, Direct Methods for Sparse Matrices
Gill, Murray and Wright, Practical Optimization
Golub and Van Loan, Matrix Computations
Floating point arithmetic:

http://pages.cs.wisc.edu/~smoler/x86text/lect.notes/arith.flpt.html

MIP Performance tuning and formulation strengthening:

Klotz, Newman. Practical Guidelines for Solving Difficult Mixed Integer Programs

http://www.sciencedirect.com/science/article/pii/S1876735413000020

LP performance issues
Klotz, Newman. Practical Guidelines for Solving Difficult Linear Programs
http://www.sciencedirect.com/science/article/pii/S1876735412000189

Converting repeating decimals into rational fractions:


http://en.wikipedia.org/wiki/Repeating_decimal#Converting_repeating_decimals_to
_fractions
103

2014 IBM Corporation

IBM Software Group

Backup Material
Backup Material

104

2014 IBM Corporation

IBM Software Group

Problem Definition
Ill Conditioning
Motivated by work of meteorologist & mathematician Edward Lorenz
Lorenz focused on small changes in initial conditions, resulting
trajectories in nonlinear meteorological models
Lorenz subsequently became a pioneer in the field of Chaos Theory

Ill conditioning extends beyond the nonlinear meteorological models


on which Lorenz worked
More generally, a mathematical model or system is ill conditioned
when a small change in the input can result in a large change to the
computed solution

105

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Skeels condition number:

|| | B 1 | | B | ||

Invariant under row scaling


Not invariant under column scaling

If significantly smaller than regular condition number, some rows of


the matrix have larger norms than others

106

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Skeels condition number:
Example (http://www.hsl.rl.ac.uk/specs/mc75.pdf):
c1: 3300 x1 + 1e-11 x2 = 1
c2: x1 + 3300 x2 + 1e-11 x3 = 1
c3: x2 + 3300 x3 + 1e-11 x4 = 1
c4: 10000 x2 + 10000 x3 + 330000000 x4 = 1

// much larger row norm

xj free, j=1,...,4

Skeel condition number: 1.0061


CPLEX exact condition number (no scaling): 100036
CPLEX exact condition number (default scaling): 10.0091

107

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


Skeels condition number: || | B

| | B | ||

Why does this metric measure sensitivity to perturbations?

( )

We saw how B = B B
measured potential magnification
of error in the solution relative to perturbation in the input
1
What is the underlying theoretical justification for || | B | | B | || ?

108

2014 IBM Corporation

IBM Software Group

Alternate interpretations of Ill Conditioned Basis


What is the underlying theoretical justification for || | B

| | B | ||?

Use absolute values on individual components instead of norms during


derivation
B B (
Use componentwise perturbation instead of norms:
(original system)
(perturbed system)

Bx = b
(Combine and rearrange)

> 0)

( B + B )( x + x ) = b
1
x = B B( x + x )
B

(Componentwise abs value)

x = B 1 B ( x + x )

x = B 1 B ( x + x ) B 1 B ( x + x )
x ( x + x) B 1 B
109

2014 IBM Corporation

IBM Software Group

Examples
Consider alternate formulations to improve numerics
Fixed costs on continuous variables using big Ms:

Minimize cT x + f T z
subject to Ax = b
xi Mzi 0

( c , f 0)
(only constraint with zi )

xi 0, 0 zi 1

(Mixture of large and


small numbers)

zi integer
LP relaxation solution

xi Mzi xi / M zi zi = xi / M
CPLEX default integrality tolerance: 1e-5
xi = 100, M = 1e + 10 zi = xi / M = 1e 8
zi not eligible for branching unless M 1e + 7
110

integer feasible solution


within integrality tolerance
that violates intent of the
model (trickle flow)
2014 IBM Corporation

IBM Software Group

Examples
To get correct answers with big-M formulation
Use smallest possible value of big-M that doesnt violate intent of model
Bound strengthening in CPLEX presolve often does this automatically

Set integrality tolerance to 0


Set simplex tolerances to minimum values, 1e-9
Ask for more accuracy on a potentially ill-conditioned system
Turn on numerical emphasis parameter

Many users are unfamiliar with issues


Frequent source of CPLEX customer calls
One of most popular CPLEX FAQs
But should they have to be?

111

2014 IBM Corporation

IBM Software Group

Examples
Indicator constraint formulation for fixed costs on continuous
variables

Minimize c T x + f T z
subject to Ax = b

(c, f 0)

zi = 0 xi 0

(CPLEX branches on
these directly)

xi 0, 0 zi 1
zi integer
LP relaxation solution

xi = 100, zi = 0
indicator constraint i requires branching

112

(integer feasible solutions


aligned with intent of the
model)

2014 IBM Corporation

IBM Software Group

Examples
Which approach to use?
Indicator formulation more precise representation of model
Indicator and big-M formulation equivalent when M=

If we can use modest values for big-M, indicator formulation


tends to be weaker
Use indicator constraints, let CPLEX decide whether to
replace with big-Ms if preprocessing can deduce big-M
values of modest size
Presolve tightens the indicator formulation (improved further in
CPLEX 12.2.0)

113

Presolve on indicators (improved)


Node presolve on indicators
Probing on by default
Probing on indicator constraints
Re-presolve by default

2014 IBM Corporation