Documente Academic
Documente Profesional
Documente Cultură
Objective
Enable more precise assessment of ill conditioning in
linear and integer programs
Multiple metrics available to assess ill conditioning
Outline
Finite precision computing fundamentals
Description of ill conditioning
Assessment of ill conditioning
Alternate metrics for ill conditioning
Numerical stability of algorithms
Identification and treatment of symptoms of ill conditioning
Identification and treatment of sources of ill conditioning
Anomalies, misconceptions, inconsistencies and contradictions
Examples that illustrate modeling pitfalls that can contribute to ill
conditioning
Formulation alternatives
Conclusions
128 bit double precision is more accurate, but requires more memory
and computing time
4 digits (11bits)
1/3
sign
exponent
Absolute round-off
error =( 10-16/ 3 )
3 3
3
Absolute round-off
mantissa
error =( 10-12/ 3 )
10000/3
3 3
3
Absolute round-off
error =( 10-20/ 3 )
1/30000
3 3
4 digits (11bits)
1/3
+
1/30000
sign
exponent
Abs round-off
error = 10-16/ 3
Abs round-off
mantissa
-17
error = 10-16
/3
Shifted exponent
Abs round-off
-21
error = 10-20
/3
1/30000
4 digits (11bits)
1/3
+
Abs round-off
error = 10-12/ 3
...
Abs round-off
sign
exponent
mantissa
error = 10-12/ 3
10000/3
1/3
...
(error ~ 10-8 )
(error ~ 100 )
(a = 3, b = 1/30000, = 10-16 )
(error ~ 10-16 )
(error ~ 10-8 )
2014 IBM Corporation
Description
Ill Conditioning
Does the flap of a butterflys wings in Brazil set off a tornado in
Texas?
Data to 3
decimal places
Meteorological
Model
(.506)
Data to 6
decimal places
Meteorological
Model
(.506127)
10
Problem definition
Ill Conditioning
Small change in input leads to big change in output
Given x R n , y R m , y = f ( x )
For y + y = f ( x + x ), compute bound
: y x
Can we quantitatively measure ill conditioning?
For many mathematical systems or models, quantitative measures have
yet to be discovered. But, sometimes we can measure it.
Specifically, we can measure ill conditioning when solving square linear
systems of equations
11
Cauchy-Schwarz inequality:
Cauchy-Schwarz for original system:
Combine and rearrange:
12
Bx = b 1
x= B b
1
x + x = B (b + b)
1
x = B b
x B 1 b
b B x
x
b
1
B B
x
b
2014 IBM Corporation
Bx = b
x = B 1b
B x = B ( x + x )
Cauchy-Schwarz inequality:
Rearrange:
Multiply by
13
B
B
( B + B )( x + x ) = b
Bx + Bx + B x + B x = b
x = B 1 B ( x + x )
x B 1 b x + x
x
B 1 B
x+ x
x
B
1
B B
x+ x
B
2014 IBM Corporation
Condition Number
Condition number of B is defined as
x ( B) b
x
( B) B
x+ x
( B) = B B 1
x ( B) b
t?
x ( B) b
Basic epsilons:
Machine precision (double):
1e-16
1e-6
16
Stable:
Suspicious:
Unstable:
Ill-posed:
1e+14 (B)
-1: off
0: auto (defaults to off)
1: sample
2: use every optimal basis
18
Stable
Suspicious
Unstable
Ill-posed
2014 IBM Corporation
Attention level
=0 if only stable bases encountered
>0 if at least one basis encountered that is not stable
Max value is 1 (all bases ill-posed)
Not linear
19
But, in most cases, finite precision can perturb the exact system
of equations we wish to solve, resulting in significant changes to
the computed solution.
Calculate data, formulate model and configure algorithm to keep such
perturbations as small as possible
20
21
22
= x / b 1: well conditioned!
b
23
24
stable Solution
ill-conditioned Solution
25
distp( B ) := min{
: B + B singular}
B
|| B || p
distp( B ) = 1 / p( B )
p
if B = v, || v ||< ,
( B ) = ( L )( D )(U )
1
1
1
1
( B ) = (U )( D )( L )
27
x ( B) b
t?
28
x ( B) b
t?
Make sure all procedures that calculate the data are implemented in a
numerically stable manner
Less round off error if all data values of similar order of magnitude
29
Mix of large and small numbers results in more shifting of the exponents, loss
of precision in the mantissa.
Use CPLEXs aggressive scaling if unavoidable
Such linear combinations of rows and columns often arise from round
off error in the data
30
(best)
2014 IBM Corporation
Given x R n , y R m , y = f ( x )
Forward error analysis : y = fl ( f ( x )) f ( x )
Backward error analysis : x : f ( x + x ) = fl ( f ( x ))
Forward: change in computed solution due to round-off errors
Backward: change in model (under perfect precision) required to achieve finite
precision result
An algorithm is numerically stable when the bound on the backward error is
small relative to the error in the input
32
33
~
A=
- 1
= 1
1
1 1 1
A =
=
-1
~ -1 + 1
A =
-1
+1
+ 1
+ 1
0 -1
1 0 1 +
0 1
1
1 0 - (1 +
0 -1
1 =
1 0 1 +
~
~
~ - 1 - 1
A - fl ( L ) fl (U ) =
1 1 1 0
1
~
~
fl ( L ) fl (U ) = 1
1
fl ( L) fl (U ) =
-1
1 0
0
=
0
0
1
Should
be 1
0 1
1
1 1
=
1 0 - (1 + ) 0 - 1
1 1 1 1 0 0
A - fl ( L) fl (U ) =
=
- 1 0 - 1 0
35
Should
be
36
Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros
38
=
=
=
=
=
=
3.635840
3.635840
3.635840
3.635840
3.635840
3.635695
Should only
refactor every
100+ iters
CPLEX reacts to
signs of trouble
Dual feasibility
preserved
40
= 8.39528e-07 (8.39528e-07)
= 3.51461e-07 (1.16886e-12)
(If exceeds feasibility tolerance, CPLEX feasibility decisions based on round off error)
Max. unscaled (scaled) c-B'pi resid.
= 1.18561e-13 (1.18561e-13)
(If exceeds optimality tolerance, CPLEX optimality decisions based on round off error)
Max. unscaled (scaled) |x|
= 24139.1 (24139.1)
= 48278.2 (48278.2)
= 76.2637 (76.2637)
= 100 (100)
= 2.2e+12
41
42
Algorithm
Settings
Primal
Dual
Barrier
Default
14214.6
21094.2
1258.23
Scaling=1
11164.64
907.5
83.52
:
: Min
: Min
: 1.987766e-08 Max
: 0.0005000000 Max
: 1364210.
: 5.030775e+07
Suggests these coefficients have meaning, but may cause trouble for CPLEXs
default feasibility or optimality tolerances of 1e-6
44
Better: use unscaled violation. Objective value will be larger, but, if needed,
recapture actual value after the optimization
48
Primal
Dual
Barrier
Default
14214.6
21094.2
1258.23
Scaling=1
11164.64
907.5
83.52
49
Primal
Dual
Barrier
Default
2310.9
2926.5
41.4
Scaling=1
6890.8
1054.7
68.2
2014 IBM Corporation
50
51
Common sources
Imprecise model data values
distp( B ) := min{
: B + B singular}
B
|| B || p
distp( B ) = 1 / p( B )
p
(best)
2014 IBM Corporation
x1 = 2 x2
x2 = 2 x3
x3 = 2 x4
M
x n 1 = 2 xn
xn = 1
x j 0 for j = 1,K, n
All coefficients have same order of magnitude
All coefficients can be represented exactly as IEEE doubles
How bad can it be?
53
1 -2
1
2
1 -2
~
B =
1
1
2
~ -1
B =
54
4 8
2 4
1
2
1
O
1
2 n-1
n-2
2
2 n -3
x1 = 2 x2
x2 = 2 x3
( x1 = 4 x3 )
x3 = 2 x4
( x1 = 8 x4 )
M
xn 1 = 2 xn
( x1 = 2n 1 xn )
xn = 1
x j 0 for j = 1,K, n
Small change in xn propagates into large change in x1
55
Then look at the basis and its inverse for large values
C API programs available among IBM Technotes*
*http://www-01.ibm.com/support/docview.wss?uid=swg21662382
56
Min eT (u + w)
s.t.
c1 : BT v = 0
c2 : v = u w
c 3 : eT = 1
, v free; u, v 0
v is a nonzero linear
combination of the rows of B
Min eT ( y + z )
s.t.
c1 : = u w
c2 : u y 0
c3 : w z 0
c4 : BT = s t
c5 : e s + e t
T
c 6 : eT = 1
, free; s, t , u, w 0; y , z {0,1}
58
Nonzero linear
combo of rows of B
that is close to 0
59
Remedies
60
c1 : - x1 + 24x2 21;
- x1 3;
x2 1.00000008
For CPLEXs default feasibility tolerance of 1e-6, different bases
can legitimately result in a declaration of feasibility or infeasibility
61
- x1 3;
x2 1.00000008
c1 : - x1 + 24x2 21;
row min : - 1 * 3 + 24 * 1.00000008 = 21 + 1.92e 6 21
(decreasing x1 or increasing x2 from bound will not reduce infeasibility
Presolve off, defaults otherwise (uses all slack basis again)
Primal simplex - Infeasible: Infeasibility = 1.9199999990e-06
CPLEX > display solution reduced Variable Name Reduced Cost
x1 -1.000000
x2 24.000000
Constraint Name Slack Value
slack c1 -0.000002
62
- x1 3;
x2 1.00000008
63
64
Cuts/
Objective IInf Best Integer
0
-1.97987e+14
0 -6.33335e+16 704 -1.97987e+14
0 -6.31118e+16 702 -1.97987e+14
0 -6.26076e+16 631 -1.97987e+14
Best Bound
ItCnt
Gap
-6.51749e+17
-6.33335e+16
Cuts: 1870
Cuts: 1870
71155
71155
185865
292616
---------
0+ 0
-1.16804e+16 -5.81032e+16 5626309 397.44%
0 0 -5.80853e+16 1585 -1.16804e+16 Cuts: 566 5632163 397.29%
Heuristic still looking.
0 2 -5.80853e+16 1583 -1.16804e+16 -5.80853e+16 5633601 397.29%
Elapsed time = 52951.48 sec. (11682283.45 ticks, tree = 0.01 MB, solutions = 17)
1 3 -5.80295e+16 1444 -1.16804e+16 -5.80853e+16 5643488 397.29%
...
12862 10763 -4.10127e+16 1032 -1.46845e+16 -4.29552e+16 29639775 192.52%
Elapsed time = 71901.33 sec. (18469780.15 ticks, tree = 33.05 MB, solutions = 24)
12866 10767 -3.86482e+16 979 -1.46845e+16 -4.29552e+16 29661467 192.52%
65
66
: 5.000000e+07
: 3000000.
67
Reasonable optimal
basis condition number
68
69
OBJECTIVE
Range
Count
[10^0,10^1]:
31
[10^9,10^10]: 1236
[10^11,10^12]: 1116
More to be done
MIP gap remains challenging
But at least now node throughput sufficiently fast to consider MIP
parameter tuning, other changes to formulation
71
72
73
74
75
77
Square linear
system with all free
variables
Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros
Variables
Objective nonzeros
Linear constraints
Nonzeros
RHS nonzeros
79
min x j
subject to
j= 2
3000
r1 :
j= 2
j xj = 1
rk : k x1 + xk = bk
k = 2,K,3001
x free
80
min x j
x1
2 3 L 3001
2 1
1
3
1
3000
L
1
3001
r1 0
subject to
j= 2
3000
r1 :
j xj = 1
j= 2
rk : k x1 + xk = bk
k = 2,K,3001
x free
I
B=
81
I
L=
I
U=
0 - T
Singular when
82
= 0
I
B=
1
B =
T
1
-
2014 IBM Corporation
I
B= T
vT B = 0
vT b 0
83
;
0
= T = 0
= 0
= 0
for the data instance at the web site (under perfect precision)
84
in the context of
85
86
87
88
Solution Value
C0213
0.007973
C0217
0.025673
...
C10142
36.450029
C10458
4799996160.003072
...
C11441
2399998080.001536
C11442
2399998080.001536
...
C11711
2583222987.267681
C11712
7.851078
...
C19155
0.000476
2400000000?
(Change in input
by 1e-7 changes
output by ~2e+3)
2014 IBM Corporation
92
Binaries
Z C12701 - C14025
Z C12701
Z + C12701+C14025 2
Z 1 C11441 24C10181 0
93
95
cdma (MIP)
96
iprob (LP)
Depending on data values can be Ill posed/singular, highly ill conditioned or well
conditioned
Ill posed for specific data instance
ns2122603 (MIP)
Ill conditioned basis matrices yield inconsistent results
Replacing constraints with large big M values with CPLEXs indicator constraints
improves the formulation, yielding consistent results
Solving the MIP to optimality remains challenging
97
In CPLEXs algorithms
In data calculations, including algorithms (e.g., predictive analytics)
Most LP and MIP solvers use absolute rather than relative tolerances
Models with limited accuracy or significant round-off error may require larger
tolerances
98
100
101
Discussion
What other features in CPLEX Optimization Studio would
help you bridge the gap between the mathematical model
and the practical application?
How useful would a minimal subset of an ill conditioned
model that remains ill conditioned be?
Even if it involved a significant number of constraints and
variables?
102
References/Further Reading
More detailed discussion in INFORMS TutORials in Operations Research 2014
Higham, Accuracy and Stability of Numeric Algorithms
Duff, Erisman and Reid, Direct Methods for Sparse Matrices
Gill, Murray and Wright, Practical Optimization
Golub and Van Loan, Matrix Computations
Floating point arithmetic:
http://pages.cs.wisc.edu/~smoler/x86text/lect.notes/arith.flpt.html
Klotz, Newman. Practical Guidelines for Solving Difficult Mixed Integer Programs
http://www.sciencedirect.com/science/article/pii/S1876735413000020
LP performance issues
Klotz, Newman. Practical Guidelines for Solving Difficult Linear Programs
http://www.sciencedirect.com/science/article/pii/S1876735412000189
Backup Material
Backup Material
104
Problem Definition
Ill Conditioning
Motivated by work of meteorologist & mathematician Edward Lorenz
Lorenz focused on small changes in initial conditions, resulting
trajectories in nonlinear meteorological models
Lorenz subsequently became a pioneer in the field of Chaos Theory
105
|| | B 1 | | B | ||
106
xj free, j=1,...,4
107
| | B | ||
( )
We saw how B = B B
measured potential magnification
of error in the solution relative to perturbation in the input
1
What is the underlying theoretical justification for || | B | | B | || ?
108
| | B | ||?
Bx = b
(Combine and rearrange)
> 0)
( B + B )( x + x ) = b
1
x = B B( x + x )
B
x = B 1 B ( x + x )
x = B 1 B ( x + x ) B 1 B ( x + x )
x ( x + x) B 1 B
109
Examples
Consider alternate formulations to improve numerics
Fixed costs on continuous variables using big Ms:
Minimize cT x + f T z
subject to Ax = b
xi Mzi 0
( c , f 0)
(only constraint with zi )
xi 0, 0 zi 1
zi integer
LP relaxation solution
xi Mzi xi / M zi zi = xi / M
CPLEX default integrality tolerance: 1e-5
xi = 100, M = 1e + 10 zi = xi / M = 1e 8
zi not eligible for branching unless M 1e + 7
110
Examples
To get correct answers with big-M formulation
Use smallest possible value of big-M that doesnt violate intent of model
Bound strengthening in CPLEX presolve often does this automatically
111
Examples
Indicator constraint formulation for fixed costs on continuous
variables
Minimize c T x + f T z
subject to Ax = b
(c, f 0)
zi = 0 xi 0
(CPLEX branches on
these directly)
xi 0, 0 zi 1
zi integer
LP relaxation solution
xi = 100, zi = 0
indicator constraint i requires branching
112
Examples
Which approach to use?
Indicator formulation more precise representation of model
Indicator and big-M formulation equivalent when M=
113