Sunteți pe pagina 1din 172

Optimizarea 

reprezintă activitatea de selectare, din mulțimea soluțiilor posibile unei


probleme, a acelei soluții care este cea mai bună în raport cu un criteriu predefinit.
Această definiție implică existența următoarelor componente:
1. O problemă tehnică constând în calculul matematic al unei soluții;

2. Existența mai multor soluții pentru aceeași problemă;

3. Un criteriu de selectare a soluției optime.

Funcția obiectiv reprezintă expresia matematică a criteriului de optimizare. Aceasta


trebuie să reflecte eficiența economică a procesului și în același timp să răspundă
obiectivelor funcționării oricărui proces chimic: siguranța în exploatare și respectarea
condițiilor de calitate.
Problema de optimizare reprezintă o aplicație matematică de selectare a unei soluții,
dintr-o mulțime posibilă, pe baza evaluării funcției obiectiv. Foarte multe probleme din
domeniile matematicii, statisticii, ingineriei, economiei și științelor aplicate se pot formula
ca probleme de optimizare.
În matematică, termenul de optimizare se referă la studiul problemelor care sunt de
forma
Se dă: o funcție   pentru o mulțime   de numere reale
Se cere: un element   pentru care   ("minimizare"),
sau  ("maximizare").
O asemenea formulare este câteodată numită program matematic (un termen care nu are
legătură directă cu programarea calculatoarelor, dar încă se mai folosește în programarea
liniară). Multe probleme din lumea reală cât și probleme teoretice, pot fi modele pentru
această ramură a matematicii.
Tipic,   este o submulțime a spațiului Euclidian  , des specificat ca un set de limitări de
posibilități, egalități sau inegalități pe care membrii lui   trebuie să le satisfacă.
Elementele lui   se numesc soluții admisibile. Funcția   se numește funcție obiectiv,
sau funcție cost. O soluție admisibilă care minimizează (sau maximizează, dacă acesta
este scopul) funcția obiectiv se numește soluție optimă.
În general, există mai multe puncte de minim sau maxim local, unde minimul local   
este definit ca fiind un punct pentru care câțiva   și toți x astfel încât
formula

se verifică; aceasta înseamnă că, în anumite bile ale lui   toate valorile funcțiilor sunt
mai mari sau egale decât valoarea în acel punct. Maximul local se definește similar. În
general, minimul local este simplu de găsit — informații adiționale despre problemă
(spre exemplu, funcția este convexă) sunt necesare pentru a fi siguri că soluția
problemei este minimul global.

Un exemplu este cel al problemelor de amestec. Frecvent se pune chestiunea maximizării sau
minimizării unui atribut al amestecului, de obicei costul prin raportare la compozitia amestecului.
De asemenea se pot formula chestiuni referitor la extremul compoziției dintr-un anumit element
chimic dintr-un amestec.

Programarea liniara

Maximizare: S1x1 + S2x2 (maximizare venitului—venitul e "funcția objectiv")


Condiții
:
0 ≤ x1 + x2 ≤ L (limitare arie)
0 ≤ F1x1 + F2x2 ≤ F (limitare ingrășământ)
0 ≤ P1x1 + P2x2 ≤ P (limitare insecticid)
x1 ≥ 0, x2 ≥ 0 (nu se poate planta o arie negativâ).
Linear programming
Contents

 1 History
 2 Uses
 3 Standard form
o 3.1 Example
 4 Augmented form (slack form)
o 4.1 Example
 5 Duality
o 5.1 Example
o 5.2 Another example
 6 Covering/packing dualities
o 6.1 Examples
 7 Complementary slackness
 8 Theory
o 8.1 Existence of optimal solutions
o 8.2 Optimal vertices (and rays) of polyhedra
 9 Algorithms
o 9.1 Basis exchange algorithms
 9.1.1 Simplex algorithm of Dantzig
 9.1.2 Criss-cross algorithm
o 9.2 Interior point
 9.2.1 Ellipsoid algorithm, following Khachiyan
 9.2.2 Projective algorithm of Karmarkar
 9.2.3 Path-following algorithms
o 9.3 Comparison of interior-point methods versus simplex algorithms
o 9.4 Approximate Algorithms for Covering/Packing LPs
 10 Open problems and recent work
 11 Integer unknowns
 12 Integral linear programs
 13 Solvers and scripting (programming) languages
 14 See also
 15 Notes
 16 References

Linear programming (LP; also called linear optimization) is a method to achieve the best


outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are
represented by linear relationships. Linear programming is a special case of mathematical
programming (mathematical optimization).

More formally, linear programming is a technique for the optimization of a linear objective function,


subject to linear equality and linear inequality constraints. Its feasible region is a convex polytope,
which is a set defined as the intersection of finitely many half spaces, each of which is defined by a
linear inequality. Its objective function is a real-valued affine function defined on this polyhedron. A
linear programming algorithm finds a point in the polyhedron where this function has the smallest (or
largest) value if such a point exists.

Linear programs are problems that can be expressed in canonical form:


where x represents the vector of variables (to be determined), c and b are vectors of (known)
coefficients, A is a (known) matrix of coefficients, and   is the matrix transpose. The
expression to be maximized or minimized is called the objective function (cTx in this case). The
inequalities Ax ≤ b and x ≥ 0 are the constraints which specify a convex polytope over which the
objective function is to be optimized. In this context, two vectors are comparable when they have
the same dimensions. If every entry in the first is less-than or equal-to the corresponding entry in
the second then we can say the first vector is less-than or equal-to the second vector.

Linear programming can be applied to various fields of study. It is used in business


and economics, but can also be utilized for some engineering problems. Industries that use
linear programming models include transportation, energy, telecommunications, and
manufacturing. It has proved useful in modeling diverse types of problems in
planning, routing, scheduling, assignment, and design.

History

Leonid Kantorovich

The problem of solving a system of linear inequalities dates back at least as far as Fourier, who in
1827 published a method for solving them,[1] and after whom the method of Fourier–Motzkin
elimination is named.

The first linear programming formulation of a problem that is equivalent to the general linear
programming problem was given by Leonid Kantorovich in 1939, who also proposed a method for
solving it.[2] He developed it during World War II as a way to plan expenditures and returns so as to
reduce costs to the army and increase losses incurred by the enemy. About the same time as
Kantorovich, the Dutch-American economist T. C. Koopmans formulated classical economic
problems as linear programs. Kantorovich and Koopmans later shared the 1975 Nobel prize in
economics.[1] In 1941, Frank Lauren Hitchcock also formulated transportation problems as linear
programs and gave a solution very similar to the later Simplex method;[2] Hitchcock had died in 1957
and the Nobel prize is not awarded posthumously.

During 1946-1947, George B. Dantzig independently developed general linear programming


formulation to use for planning problems in US Air Force. In 1947, Dantzig also invented the simplex
method that for the first time efficiently tackled the linear programming problem in most cases. When
Dantzig arranged meeting with John von Neumann to discuss his Simplex method, Neumann
immediately conjectured the theory of duality by realizing that the problem he had been working
in game theory was equivalent. Dantzig provided formal proof in an unpublished report "A Theorem
on Linear Inequalities" on January 5, 1948.[3] Postwar, many industries found its use in their daily
planning.
Dantzig's original example was to find the best assignment of 70 people to 70 jobs. The computing
power required to test all the permutations to select the best assignment is vast; the number of
possible configurations exceeds the number of particles in the observable universe. However, it
takes only a moment to find the optimum solution by posing the problem as a linear program and
applying the simplex algorithm. The theory behind linear programming drastically reduces the
number of possible solutions that must be checked.

The linear-programming problem was first shown to be solvable in polynomial time by Leonid
Khachiyan in 1979, but a larger theoretical and practical breakthrough in the field came in 1984
when Narendra Karmarkar introduced a new interior-point method for solving linear-programming
problems.

Uses
Linear programming is a considerable field of optimization for several reasons. Many practical
problems in operations research can be expressed as linear programming problems. Certain special
cases of linear programming, such as network flow problems and multicommodity flow problems are
considered important enough to have generated much research on specialized algorithms for their
solution. A number of algorithms for other types of optimization problems work by solving LP
problems as sub-problems. Historically, ideas from linear programming have inspired many of the
central concepts of optimization theory, such as duality, decomposition, and the importance
of convexityand its generalizations. Likewise, linear programming is heavily used
in microeconomics and company management, such as planning, production, transportation,
technology and other issues. Although the modern management issues are ever-changing, most
companies would like to maximize profits or minimize costs with limited resources. Therefore, many
issues can be characterized as linear programming problems.

Standard form
Standard form is the usual and most intuitive form of describing a linear programming problem. It
consists of the following three parts:

 A linear function to be maximized


e.g. 

 Problem constraints of the following form


e.g.
 Non-negative variables
e.g.

The problem is usually expressed in matrix form, and then becomes:

Other forms, such as minimization problems, problems with constraints on alternative forms, as well
as problems involving negative variables can always be rewritten into an equivalent problem in
standard form.

Example
Suppose that a farmer has a piece of farm land, say L km2, to be planted with either wheat or barley
or some combination of the two. The farmer has a limited amount of fertilizer,F kilograms, and
insecticide, P kilograms. Every square kilometer of wheat requires F1 kilograms of fertilizer
and P1 kilograms of insecticide, while every square kilometer of barley requires F2 kilograms of
fertilizer and P2 kilograms of insecticide. Let S1 be the selling price of wheat per square kilometer,
and S2 be the selling price of barley. If we denote the area of land planted with wheat and barley
by x1 and x2 respectively, then profit can be maximized by choosing optimal values for x1 and x2. This
problem can be expressed with the following linear programming problem in the standard form:

Maximize:  (maximize the revenue—revenue is the "objective function")

Subject to: (limit on total area)

(limit on fertilizer)

(limit on insecticide)

(cannot plant a negative area).

Which in matrix form becomes:

maximize 

subject to 

Augmented form (slack form)


Linear programming problems can be converted into an augmented form in order to apply the
common form of the simplex algorithm. This form introduces non-negative slack variables to replace
inequalities with equalities in the constraints. The problems can then be written in the following block
matrix form:

Maximize Z:

x, xs ≥ 0

where xs are the newly introduced slack variables, and Z is the variable to be maximized.

Example The example above is converted into the following


augmented form:
(objective
Maximize: 
function)

Subject (augmented
to: constraint)

(augmented
constraint)

(augmented
constraint)

where   are (non-negative) slack variables, representing in this example the unused area,
the amount of unused fertilizer, and the amount of unused insecticide.

In matrix form this becomes:

Maximize Z:
Duality
Every linear programming problem, referred to as a primal problem, can be converted into a dual
problem, which provides an upper bound to the optimal value of the primal problem. In matrix form,
we can express the primal problem as:

Maximize cTx subject to Ax ≤ b, x ≥ 0;

with the corresponding symmetric dual problem,


Minimize bTy subject to ATy ≥ c, y ≥ 0.

An alternative primal formulation is:

Maximize cTx subject to Ax ≤ b;

with the corresponding asymmetric dual problem,


Minimize bTy subject to ATy = c, y ≥ 0.

There are two ideas fundamental to duality theory. One is the fact that (for the symmetric dual) the
dual of a dual linear program is the original primal linear program. Additionally, every feasible
solution for a linear program gives a bound on the optimal value of the objective function of its dual.
The weak duality theorem states that the objective function value of the dual at any feasible solution
is always greater than or equal to the objective function value of the primal at any feasible solution.
The strong duality theorem states that if the primal has an optimal solution, x*, then the dual also has
an optimal solution, y*, and cTx*=bTy*.

A linear program can also be unbounded or infeasible. Duality theory tells us that if the primal is
unbounded then the dual is infeasible by the weak duality theorem. Likewise, if the dual is
unbounded, then the primal must be infeasible. However, it is possible for both the dual and the
primal to be infeasible. As an example, consider the linear program:

Maximize: 

Subject to:

(minimize the total cost of the means of production


Minimize:
as the "objective function")

Subject (the farmer must receive no less than S1 for his


to: wheat)
(the farmer must receive no less than S2 for his
barley)

(prices cannot be negative).

Example
Revisit the above example of the farmer who may grow wheat and barley with the set provision of
some L land, F fertilizer and P pesticide. Assume now that y unit prices for each of these means of
production (inputs) are set by a planning board. The planning board's job is to minimize the total cost
of procuring the set amounts of inputs while providing the farmer with a floor on the unit price of each
of his crops (outputs), S1 for wheat and S2 for barley. This corresponds to the following linear
programming problem:

Which in matrix form becomes:

Minimize: 

Subject to: 

The primal problem deals with physical quantities. With all inputs available in limited quantities, and
assuming the unit prices of all outputs is known, what quantities of outputs to produce so as to
maximize total revenue? The dual problem deals with economic values. With floor guarantees on all
output unit prices, and assuming the available quantity of all inputs is known, what input unit pricing
scheme to set so as to minimize total expenditure?

To each variable in the primal space corresponds an inequality to satisfy in the dual space, both
indexed by output type. To each inequality to satisfy in the primal space corresponds a variable in
the dual space, both indexed by input type.

The coefficients that bound the inequalities in the primal space are used to compute the objective in
the dual space, input quantities in this example. The coefficients used to compute the objective in
the primal space bound the inequalities in the dual space, output unit prices in this example.

Both the primal and the dual problems make use of the same matrix. In the primal space, this matrix
expresses the consumption of physical quantities of inputs necessary to produce set quantities of
outputs. In the dual space, it expresses the creation of the economic values associated with the
outputs from set input unit prices.
Since each inequality can be replaced by an equality and a slack variable, this means each primal
variable
minimize

subject to ,

,
corresponds to a dual slack variable, and each dual variable corresponds to a primal slack variable.
This relation allows us to speak about complementary slackness.

Another example
Sometimes, one may find it more intuitive to obtain the dual program without looking at the program
matrix. Consider the following linear program:

minimize

subject to ,

We have m + n conditions and all variables are non-negative. We shall define m + n dual


variables: yj and si. We get:

Since this is a minimization problem, we would like to obtain a dual program that is a lower bound of
the primal. In other words, we would like the sum of all right hand side of the constraints to be the
maximal under the condition that for each primal variable the sum of its coefficients do not exceed its
coefficient in the linear function. For example, x1appears in n + 1 constraints. If we sum its
constraints' coefficients we get a1,1y1 + a1,2y2 + ... + a1,nyn + f1s1. This sum must be at most c1. As a
result we get:
Note that we assume in
our calculations steps
maximize
that the program is in
standard form.
subject to , However, any linear
program may be
transformed to standard
, form and it is therefore
not a limiting factor.
, Minimize: bTy,
Subject to: ATy ≥ c, y ≥ 0,

such that the matrix A and the vectors b and c are non-negative.

The dual of a covering LP is a packing LP, a linear program of the form:

Maximize: cTx,
Subject to: Ax ≤ b, x ≥ 0,such that the matrix A and the vectors b and c are non-negative.
Examples
Covering and packing LPs commonly arise as a linear programming relaxation of a combinatorial
problem and are important in the study of approximation algorithms.[4] For example, the LP
relaxations of the set packing problem, the independent set problem, and the matching problem are
packing LPs. The LP relaxations of the set cover problem, the vertex cover problem, and
the dominating set problem are also covering LPs.

Finding a fractional coloring of a graph is another example of a covering LP. In this case, there is
one constraint for each vertex of the graph and one variable for eachindependent set of the graph.

Complementary slackness
It is possible to obtain an optimal solution to the dual when only an optimal solution to the primal is
known using the complementary slackness theorem. The theorem states:

Suppose that x = (x1, x2, ... , xn) is primal feasible and that y = (y1, y2, ... , ym) is dual feasible. Let


(w1, w2, ..., wm) denote the corresponding primal slack variables, and let (z1, z2, ... , zn) denote the
corresponding dual slack variables. Then x and y are optimal for their respective problems if and
only if

xj zj = 0, for j = 1, 2, ... , n, and

wi yi = 0, for i = 1, 2, ... , m.
So if the i-th slack variable of the primal is not zero, then the i-th variable of the dual is equal to zero.
Likewise, if the j-th slack variable of the dual is not zero, then the j-th variable of the primal is equal
to zero.

This necessary condition for optimality conveys a fairly simple economic principle. In standard form
(when maximizing), if there is slack in a constrained primal resource (i.e., there are "leftovers"), then
additional quantities of that resource must have no value. Likewise, if there is slack in the dual
(shadow) price non-negativity constraint requirement, i.e., the price is not zero, then there must be
scarce supplies (no "leftovers").

Theory
Existence of optimal solutions
Geometrically, the linear constraints define the feasible region, which is a convex polyhedron.
A linear function is a convex function, which implies that every local minimum is aglobal minimum;
similarly, a linear function is a concave function, which implies that every local maximum is a global
maximum.An optimal solution need not exist, for two reasons. First, if two constraints are
inconsistent, then no feasible solution exists: For instance, the constraints x ≥ 2 and x ≤ 1 cannot be
satisfied jointly; in this case, we say that the LP is infeasible. Second, when the polytope is
unbounded in the direction of the gradient of the objective function (where the gradient of the
objective function is the vector of the coefficients of the objective function), then no optimal value is
attained.

Optimal vertices (and rays) of polyhedra


Otherwise, if a feasible solution exists and if the (linear) objective function is bounded, then the
optimum value is always attained on the boundary of optimal level-set, by themaximum
principle for convex functions (alternatively, by the minimum principle for concave functions): Recall
that linear functions are both convex and concave. However, some problems have distinct optimal
solutions: For example, the problem of finding a feasible solution to a system of linear inequalities is
a linear programming problem in which the objective function is the zero function (that is, the
constant function taking the value zero everywhere): For this feasibility problem with the zero-
function for its objective-function, if there are two distinct solutions, then every convex combination of
the solutions is a solution.

The vertices of the polytope are also called basic feasible solutions. The reason for this choice of
name is as follows. Let d denote the number of variables. Then the fundamental theorem of linear
inequalities implies (for feasible problems) that for every vertex x* of the LP feasible region, there
exists a set of d (or fewer) inequality constraints from the LP such that, when we treat
those d constraints as equalities, the unique solution is x*. Thereby we can study these vertices by
means of looking at certain subsets of the set of all constraints (a discrete set), rather than the
continuum of LP solutions. This principle underlies the simplex algorithm for solving linear programs.

Algorithms

In a linear programming problem, a series of linear constraints produces a convex feasible region of possible
values for those variables. In the two-variable case this region is in the shape of a convex simple polygon.

Basis exchange algorithms


Simplex algorithm of Dantzig[edit]

The simplex algorithm, developed by George Dantzig in 1947, solves LP problems by constructing a


feasible solution at a vertex of the polytope and then walking along a path on the edges of the
polytope to vertices with non-decreasing values of the objective function until an optimum is reached
for sure. In many practical problems, "stalling" occurs: Many pivots are made with no increase in the
objective function.[5][6] In rare practical problems, the usual versions of the simplex algorithm may
actually "cycle".[6] To avoid cycles, researchers developed new pivoting rules.[7][8][5][6][9][10]

In practice, the simplex algorithm is quite efficient and can be guaranteed to find the global optimum
if certain precautions againstcycling are taken. The simplex algorithm has been proved to solve
"random" problems efficiently, i.e. in a cubic number of steps,[11]which is similar to its behavior on
practical problems.[5][12]However, the simplex algorithm has poor worst-case behavior: Klee and Minty
constructed a family of linear programming problems for which the simplex method takes a number
of steps exponential in the problem size.[5][8][9] In fact, for some time it was not known whether the
linear programming problem was solvable in polynomial time, i.e. of complexity class P.

Criss-cross algorithm

Like the simplex algorithm of Dantzig, the criss-cross algorithm is a basis-exchange algorithm that
pivots between bases. However, the criss-cross algorithm need not maintain feasibility, but can pivot
rather from a feasible basis to an infeasible basis. The criss-cross algorithm does not
have polynomial time-complexity for linear programming. Both algorithms visit all 2D corners of a
(perturbed) cube in dimension D, the Klee–Minty cube, in the worst case.[10][13]

Interior point
Ellipsoid algorithm, following Khachiyan[edit]

This is the first worst-case polynomial-time algorithm for linear programming. To solve a problem


which has n variables and can be encoded in L input bits, this algorithm usesO(n4L) pseudo-
arithmetic operations on numbers with O(L) digits. Khachiyan's algorithm and his long standing issue
was resolved by Leonid Khachiyan in 1979 with the introduction of the ellipsoid method. The
convergence analysis have (real-number) predecessors, notably the iterative methods developed
by Naum Z. Shor and theapproximation algorithms by Arkadi Nemirovski and D. Yudin.

Projective algorithm of Karmarkar [edit]

Khachiyan's algorithm was of landmark importance for establishing the polynomial-time solvability of
linear programs. The algorithm was not a computational break-through, as the simplex method is
more efficient for all but specially constructed families of linear programs.However, Khachiyan's
algorithm inspired new lines of research in linear programming. In 1984, N. Karmarkar proposed
a projective method for linear programming. Karmarkar's algorithm improved on Khachiyan's worst-
case polynomial bound (giving  ). Karmarkar claimed that his algorithm was much faster
in practical LP than the simplex method, a claim that created great interest in interior-point methods.
[14]

Path-following algorithmIn contrast to the simplex algorithm, which finds an optimal solution
by traversing the edges between vertices on a polyhedral set, interior-point methods move
through the interior of the feasible region. Since then, many interior-point methods have been
proposed and analyzed. Early successful implementations were based on affine
scaling variants of the method. For both theoretical and practical purposes, barrier
function or path-following methods have been the most popular since the 1990s.[15]
Comparison of interior-point methods versus simplex algorithms
The current opinion is that the efficiency of good implementations of simplex-based methods and
interior point methods are similar for routine applications of linear programming.[15] However, for
specific types of LP problems, it may be that one type of solver is better than another (sometimes
much better), and that the structure of the solutions generated by interior point methods versus
simplex-based methods are significantly different with the support set of active variables being
typically smaller for the later one.[16]

LP solvers are in widespread use for optimization of various problems in industry, such as
optimization of flow in transportation networks.[17]
Approximate Algorithms for Covering/Packing LPs
List of unsolved problems in computer science Covering and packing LPs can be solved
approximately in nearly-linear time. That is,
Does linear programming admit a strongly if matrix A is of dimension n×m and
polynomial-time algorithm? has N non-zero entries, then there exist
algorithms that run in time O(N·(log
N) /ε ) and produce O(1±ε) approximate solutions to given covering and packing LPs. The best
O(1) O(1)

known sequential algorithm of this kind runs in time O(N + (log N)·(n+m)/ε2),[18] and the best known
parallel algorithm of this kind runs in O((log N)2/ε3) iterations, each requiring only a matrix-vector
multiplication which is highly parallelizable.[19]

Open problems and recent work]


There are several open problems in the theory of linear programming, the solution of which would
represent fundamental breakthroughs in mathematics and potentially major advances in our ability to
solve large-scale linear programs.

Does LP admit a strongly polynomial-time algorithm?

Does LP admit a strongly polynomial algorithm to find a strictly complementary solution?

Does LP admit a polynomial algorithm in the real number (unit cost) model of computation?This
closely related set of problems has been cited by Stephen Smale as among the 18 greatest
unsolved problems of the 21st century. In Smale's words, the third version of the problem "is the
main unsolved problem of linear programming theory." While algorithms exist to solve linear
programming in weakly polynomial time, such as the ellipsoid methods and interior-point techniques,
no algorithms have yet been found that allow strongly polynomial-time performance in the number of
constraints and the number of variables. The development of such algorithms would be of great
theoretical interest, and perhaps allow practical gains in solving large LPs as well.

Although the Hirsch conjecture was recently disproved for higher dimensions, it still leaves the
following questions open.

Are there pivot rules which lead to polynomial-time Simplex variants?

Do all polytopal graphs have polynomially bounded diameter?These questions relate to the
performance analysis and development of Simplex-like methods. The immense efficiency of the
Simplex algorithm in practice despite its exponential-time theoretical performance hints that there
may be variations of Simplex that run in polynomial or even strongly polynomial timThe Simplex
algorithm and its variants fall in the family of edge-following algorithms, so named because they
solve linear programming problems by moving from vertex to vertex along edges of a polytope. This
means that their theoretical performance is limited by the maximum number of edges between any
two vertices on the LP polytope. As a result, we are interested in knowing the maximum graph-
theoretical diameter of polytopal graphs. It has been proved that all polytopes have subexponential
diameter. The recent disproof of the Hirsch conjecture is the first step to prove whether any polytope
has superpolynomial diameter. If any such polytopes exist, then no edge-following variant can run in
polynomial time. Questions about polytope diameter are of independent mathematical interest.

Simplex pivot methods preserve primal (or dual) feasibility. On the other hand, criss-cross pivot
methods do not preserve (primal or dual) feasibility—they may visit primal feasible, dual feasible or
primal-and-dual infeasible bases in any order. Pivot methods of this type have been studied since
the 1970s. Essentially, these methods attempt to find the shortest pivot path on the arrangement
polytope under the linear programming problem. In contrast to polytopal graphs, graphs of
arrangement polytopes are known to have small diameter, allowing the possibility of strongly
polynomial-time criss-cross pivot algorithm without resolving questions about the diameter of general
polytopes.[10]

Integer unknowns
If all of the unknown variables are required to be integers, then the problem is called an integer
programming (IP) or integer linear programming (ILP) problem. In contrast to linear programming,
which can be solved efficiently in the worst case, integer programming problems are in many
practical situations (those with bounded variables) NP-hard. 0-1 integer programming or binary
integer programming (BIP) is the special case of integer programming where variables are
required to be 0 or 1 (rather than arbitrary integers). This problem is also classified as NP-hard, and
in fact the decision version was one of Karp's 21 NP-complete problems.

If only some of the unknown variables are required to be integers, then the problem is called
a mixed integer programming (MIP) problem. These are generally also NP-hard because they are
even more general than ILP programs.

There are however some important subclasses of IP and MIP problems that are efficiently solvable,
most notably problems where the constraint matrix is totally unimodular and the right-hand sides of
the constraints are integers or - more general - where the system has the total dual integrality (TDI)
property.

Advanced algorithms for solving integer linear programs include:

cutting-plane method

branch and bound


branch and cut

branch and price

if the problem has some extra structure, it may be possible to apply delayed column generation.Such
integer-programming algorithms are discussed by Padberg and in Beasley

Integral linear programs

A linear program in real variables is said to be integral if it has at least one optimal solution which is
integral. Likewise, a polyhedroger.

Integral linear programs are of central importance in the polyhedral aspect of combinatorial
optimization since they provide an alternate characterization of a problem. Specifically, for any
problem, the convex hull of the solutions is an integral polyhedron; if this polyhedron has a
nice/compact description, then we can efficiently find the optimal feasible solution under any linear
objective. Conversely, if we can prove that a linear programming relaxation is integral, then it is the
desired description of the convex hull of feasible (integral) solutions.

Note that terminology is not consistent throughout the literature, so one should be careful to
distinguish the following two concepts,

in an integer linear program, described in the previous section, variables are forcibly constrained to
be integers, and this problem is NP-hard in general,

MINTO (Mixed Integer Optimizer, an integer programming solver which uses branch and bound
algorithm) has publicly available source code[20] but is not open source.
Nonlinear programming
From Wikipedia, the free encyclopedia

In mathematics, nonlinear programming (NLP) is the process of solving an optimization


problem defined by a system of equalities and inequalities, collectively termedconstraints, over a set
of unknown real variables, along with an objective function to be maximized or minimized, where
some of the constraints or the objective function arenonlinear.[1] It is the sub-field of mathematical
optimization that deals with problems that are not linear.

Contents

 1 Applicability
 2 The general non-linear optimization problem (NLP)
 3 Possible types of constraint set
 4 Methods for solving the problem
 5 Examples
o 5.1 2-dimensional example
o 5.2 3-dimensional example
 6 Applications
 7 See also
 8 References
 9 Further reading
 10 External links

Applicability
A typical nonconvex problem is that of optimising transportation costs by selection from a set of
transportation methods, one or more of which exhibit economies of scale, with various connectivities
and capacity constraints. An example would be petroleum product transport given a selection or
combination of pipeline, rail tanker, road tanker, river barge, or coastal tankship. Owing to economic
batch size the cost functions may have discontinuities in addition to smooth changes.

Modern engineering practice involves much numerical optimization. Except in certain narrow but
important cases such as passive electronic circuits, engineering problems are non-linear, and they
are usually very complicated.

In experimental science, some simple data analysis (such as fitting a spectrum with a sum of peaks
of known location and shape but unknown magnitude) can be done with linear methods, but in
general these problems, also, are non-linear. Typically, one has a theoretical model of the system
under study with variable parameters in it and a model the experiment or experiments, which may
also have unknown parameters. One tries to find a best fit numerically. In this case one often wants
a measure of the precision of the result, as well as the best fit itself.

The general non-linear optimization problem (NLP)


The problem can be stated simply as:

 to maximize some variable such as product throughput

or

 to minimize a cost function

where

s.t. (subject to)

Possible types of constraint set


There are several possibilities for the nature of the constraint set, also known as the feasible set
or feasible region.

An infeasible problem is one for which no set of values for the choice variables satisfies all the
constraints. That is, the constraints are mutually contradictory, and no solution exists.

A feasible problem is one for which there exists at least one set of values for the choice variables
satisfying all the constraints.

An unbounded problem is a feasible problem for which the objective function can be made to exceed
any given finite value. Thus there is no optimal solution, because there is always a feasible solution
that gives a better objective function value than does any given proposed solution.

Methods for solving the problem


If the objective function f is linear and the constrained space is a polytope, the problem is a linear
programming problem, which may be solved using well known linear programming solutions.

If the objective function is concave (maximization problem), or convex (minimization problem) and


the constraint set is convex, then the program is called convex and general methods from convex
optimization can be used in most cases.
If the objective function is a ratio of a concave and a convex function (in the maximization case) and
the constraints are convex, then the problem can be transformed to a convex optimization problem
using fractional programming techniques.

Several methods are available for solving nonconvex problems. One approach is to use special
formulations of linear programming problems. Another method involves the use ofbranch and
bound techniques, where the program is divided into subclasses to be solved with convex
(minimization problem) or linear approximations that form a lower bound on the overall cost within
the subdivision. With subsequent divisions, at some point an actual solution will be obtained whose
cost is equal to the best lower bound obtained for any of the approximate solutions. This solution is
optimal, although possibly not unique. The algorithm may also be stopped early, with the assurance
that the best possible solution is within a tolerance from the best point found; such points are called
ε-optimal. Terminating to ε-optimal points is typically necessary to ensure finite termination. This is
especially useful for large, difficult problems and problems with uncertain costs or values where the
uncertainty can be estimated with an appropriate reliability estimation.

Under differentiability and constraint qualifications, the Karush–Kuhn–Tucker (KKT)


conditions provide necessary conditions for a solution to be optimal. Under convexity, these
conditions are also sufficient. If some of the functions are non-differentiable, subdifferential versions
of Karush–Kuhn–Tucker (KKT) conditions are available.[2]

Examples[edit]
2-dimensional example

The intersection of the line with the constrained space represents the solution. The line is the best
achievable contour line (locus with a given value of the objective function).

A simple problem can be defined by the constraints

x1 ≥ 0
x2 ≥ 0
x12 + x22 ≥ 1
x12 + x22 ≤ 2

with an objective function to be maximized

f(x) = x1 + x2

where x = (x1, x2). Solve 2-D Problem.

3-dimensional example[edit]

The intersection of the top surface with the constrained space in the center represents
the solution

Another simple problem can be defined by the constraints

x12 − x22 + x32 ≤ 2
x12 + x22 + x32 ≤ 10

with an objective function to be maximized

f(x) = x1x2 + x2x3

where x = (x1, x2, x3). Solve 3-D Problem.

Applications
Nonlinear optimization methods are used in engineering, for example to construct computational
models of oil reservoirs.[3]
Dynamic programming

In mathematics, computer science, economics, and bioinformatics, dynamic programming is a


method for solving a complex problem by breaking it down into a collection of simpler subproblems.
It is applicable to problems exhibiting the properties of overlapping subproblems[1] and optimal
substructure (described below). When applicable, the method takes far less time than other methods
that don't take advantage of the subproblem overlap (like depth-first search).
In order to solve a given problem, using a dynamic programming approach, we need to solve
different parts of the problem (subproblems), then combine the solutions of the subproblems to
reach an overall solution. Often when using a more naive method, many of the subproblems are
generated and solved many times. The dynamic programming approach seeks to solve each
subproblem only once, thus reducing the number of computations: once the solution to a given
subproblem has been computed, it is stored or "memo-ized": the next time the same solution is
needed, it is simply looked up. This approach is especially useful when the number of repeating
subproblems grows exponentiallyas a function of the size of the input.
Dynamic programming algorithms are used for optimization (for example, finding the shortest path
between two points, or the fastest way to multiply many matrices). A dynamic programming
algorithm will examine the previously solved subproblems and will combine their solutions to give the
best solution for the given problem. The alternatives are many, such as using a greedy algorithm,
which picks the locally optimal choice at each branch in the road. The locally optimal choice may be
a poor choice for the overall solution. While a greedy algorithm does not guarantee an optimal
solution, it is often faster to calculate. Fortunately, some greedy algorithms (such as minimum
spanning trees) are proven to lead to the optimal solution.
For example, let's say that you have to get from point A to point B as fast as possible, in a given city,
during rush hour. A dynamic programming algorithm will look at finding the shortest paths to points
close to A, and use those solutions to eventually find the shortest path to B. On the other hand, a
greedy algorithm will start you driving immediately and will pick the road that looks the fastest at
every intersection. As you can imagine, this strategy might not lead to the fastest arrival time, since
you might take some "easy" streets and then find yourself hopelessly stuck in a traffic jam.

Contents

 1 Overview
o 1.1 Dynamic programming in mathematical optimization
o 1.2 Dynamic programming in bioinformatics
o 1.3 Dynamic programming in computer programming
 2 Example: Economic optimization
o 2.1 Optimal consumption and saving
 3 Examples: Computer algorithms
o 3.1 Dijkstra's algorithm for the shortest path problem
o 3.2 Fibonacci sequence
o 3.3 A type of balanced 0–1 matrix
o 3.4 Checkerboard
o 3.5 Sequence alignment
o 3.6 Tower of Hanoi puzzle
o 3.7 Egg dropping puzzle
 3.7.1 Faster DP solution using a different parametrization
o 3.8 Matrix chain multiplication
 4 History
 5 Algorithms that use dynamic programming
 6 See also
 7 References

Overview
Figure 1. Finding the shortest path in a graph using optimal substructure; a straight line indicates a single
edge; a wavy line indicates a shortest path between the two vertices it connects (other nodes on these paths
are not shown); the bold line is the overall shortest path from start to goal.

Dynamic programming is both a mathematical optimization method and a computer programming


method. In both contexts it refers to simplifying a complicated problem by breaking it down into
simpler subproblems in a recursive manner. While some decision problems cannot be taken apart
this way, decisions that span several points in time do often break apart recursively; Bellman called
this the "Principle of Optimality". Likewise, in computer science, a problem that can be solved
optimally by breaking it into subproblems and then recursively finding the optimal solutions to the
subproblems is said to have optimal substructure.

If subproblems can be nested recursively inside larger problems, so that dynamic programming
methods are applicable, then there is a relation between the value of the larger problem and the
values of the subproblems.[2] In the optimization literature this relationship is called the Bellman
equation.

Dynamic programming in mathematical optimization


In terms of mathematical optimization, dynamic programming usually refers to simplifying a decision
by breaking it down into a sequence of decision steps over time. This is done by defining a
sequence of value functions V1, V2, ..., Vn, with an argument y representing the state of the system
at times i from 1 to n. The definition of Vn(y) is the value obtained in state y at the last time n. The
values Vi at earlier timesi = n −1, n − 2, ..., 2, 1 can be found by working backwards, using
a recursive relationship called the Bellman equation. For i = 2, ..., n, Vi−1 at any state y is calculated
from Vi by maximizing a simple function (usually the sum) of the gain from a decision at time i − 1
and the function Vi at the new state of the system if this decision is made. Since Vi has already been
calculated for the needed states, the above operation yields Vi−1 for those states. Finally, V1 at the
initial state of the system is the value of the optimal solution. The optimal values of the decision
variables can be recovered, one by one, by tracking back the calculations already performed.

Dynamic programming in bioinformatics


Dynamic programming is widely used in bioinformatics for the tasks such as sequence alignment,
protein folding, RNA structure prediction and protein-DNA binding. First dynamic programming
algorithms for protein-DNA binding were developed in the 1970s independently by Charles DeLisi in
USA[3] and Georgii Gurskii and Alexander Zasedatelev in USSR.[4] Recently these algorithms have
become very popular in bioinformatics and computational biology, particularly in the studies
of nucleosome positioning and transcription factor binding.
Dynamic programming in computer programming
This section needs additional citations for verification. Please
help improve this article by adding citations to reliable sources.
Unsourced material may be challenged and removed. (April 2014)

There are two key attributes that a problem must have in order for dynamic programming to be
applicable: optimal substructure and overlapping subproblems. If a problem can be solved by
combining optimal solutions to non-overlapping subproblems, the strategy is called "divide and
conquer" instead. This is why mergesort and quicksort are not classified as dynamic programming
problems.

Optimal substructure means that the solution to a given optimization problem can be obtained by the
combination of optimal solutions to its subproblems. Consequently, the first step towards devising a
dynamic programming solution is to check whether the problem exhibits such optimal substructure.
Such optimal substructures are usually described by means of recursion. For example, given a
graph G=(V,E), the shortest path p from a vertex u to a vertex v exhibits optimal substructure: take
any intermediate vertex w on this shortest path p. If p is truly the shortest path, then it can be split
into subpaths p1 from u to w and p2 from w to v such that these, in turn, are indeed the shortest paths
between the corresponding vertices (by the simple cut-and-paste argument described in Introduction
to Algorithms). Hence, one can easily formulate the solution for finding shortest paths in a recursive
manner, which is what the Bellman–Ford algorithm or the Floyd–Warshall algorithm does.

Overlapping subproblems means that the space of subproblems must be small, that is, any recursive
algorithm solving the problem should solve the same subproblems over and over, rather than
generating new subproblems. For example, consider the recursive formulation for generating the
Fibonacci series: Fi = Fi−1 + Fi−2, with base case F1 = F2 = 1. Then F43 = F42 + F41, and F42 = F41 + F40.
Now F41 is being solved in the recursive subtrees of both F43 as well as F42. Even though the total
number of subproblems is actually small (only 43 of them), we end up solving the same problems
over and over if we adopt a naive recursive solution such as this. Dynamic programming takes
account of this fact and solves each subproblem only once.

Figure 2. The subproblem graph for the Fibonacci sequence. The fact that it is not a treeindicates overlapping
subproblems.
This can be achieved in either of two ways.

 Top-down approach: This is the direct fall-out of the recursive formulation of any problem. If the
solution to any problem can be formulated recursively using the solution to its subproblems, and
if its subproblems are overlapping, then one can easily memoize or store the solutions to the
subproblems in a table. Whenever we attempt to solve a new subproblem, we first check the
table to see if it is already solved. If a solution has been recorded, we can use it directly,
otherwise we solve the subproblem and add its solution to the table.

 Bottom-up approach: Once we formulate the solution to a problem recursively as in terms of its
subproblems, we can try reformulating the problem in a bottom-up fashion: try solving the
subproblems first and use their solutions to build-on and arrive at solutions to bigger
subproblems. This is also usually done in a tabular form by iteratively generating solutions to
bigger and bigger subproblems by using the solutions to small subproblems. For example, if we
already know the values of F41 and F40, we can directly calculate the value of F42.

Some programming languages can automatically memoize the result of a function call with a


particular set of arguments, in order to speed up call-by-nameevaluation (this mechanism is referred
to as call-by-need). Some languages make it possible portably (e.g. Scheme, Common Lisp or Perl).
Some languages have automatic memoization built in, such as tabled Prolog and J, which supports
memoization with the M. adverb.[5] In any case, this is only possible for a referentially
transparent function.

Example: Economic optimization[edit]


Optimal consumption and saving[edit]
A mathematical optimization problem that is often used in teaching dynamic programming to
economists (because it can be solved by hand[6]) concerns a consumer who lives over the
periods   and must decide how much to consume and how much to save in
each period.

Let   be consumption in period  , and assume consumption yields utility   as long
as the consumer lives. Assume the consumer is impatient, so that hediscounts future utility by a
factor   each period, where  . Let   be capital in period  . Assume initial capital is a
given amount  , and suppose that this period's capital and consumption determine next
period's capital as  , where   is a positive constant and  . Assume
capital cannot be negative. Then the consumer's decision problem can be written as follows:
 subject to   for all 

Written this way, the problem looks complicated, because it involves solving for all the choice
variables  . (Note that   is not a choice variable—the consumer's initial
capital is taken as given.)

The dynamic programming approach to solving this problem involves breaking it apart into a
sequence of smaller decisions. To do so, we define a sequence of value functions  ,
for   which represent the value of having any amount of capital   
at each time  . Note that  , that is, there is (by assumption) no utility from having
capital after death.

The value of any quantity of capital at any previous time can be calculated by backward
induction using the Bellman equation. In this problem, for each  , the
Bellman equation is

This problem is much simpler than the one we wrote down before, because it involves only
two decision variables,   and  . Intuitively, instead of choosing his whole lifetime plan at
birth, the consumer can take things one step at a time. At time  , his current capital   is
given, and he only needs to choose current consumption   and saving  .

To actually solve this problem, we work backwards. For simplicity, the current level of capital
is denoted as  .   is already known, so using the Bellman equation once we can
calculate  , and so on until we get to  , which is the value of the initial decision
problem for the whole lifetime. In other words, once we know  , we can
calculate  , which is the maximum of  ,
where   is the choice variable and  .

Working backwards, it can be shown that the value function at time   is

where each   is a constant, and the optimal amount to consume at


time   is

which can be simplified to


...

We see that it is optimal to consume a larger fraction of current wealth as one gets older, finally
consuming all remaining wealth in period  , the last period of life.

Examples: Computer algorithms


Dijkstra's algorithm for the shortest path problem
From a dynamic programming point of view, Dijkstra's algorithm for the shortest path problem is a
successive approximation scheme that solves the dynamic programming functional equation for the
shortest path problem by the Reaching method.[7][8][9]

In fact, Dijkstra's explanation of the logic behind the algorithm,[10] namely

Problem 2. Find the path of minimum total length between two given nodes   and  .

We use the fact that, if   is a node on the minimal path from   to  , knowledge of the latter
implies the knowledge of the minimal path from   to  .

is a paraphrasing of Bellman's famous Principle of Optimality in the context of the shortest path


problem.

Fibonacci sequence
Here is a naïve implementation of a function finding the nth member of the Fibonacci sequence,
based directly on the mathematical definition:
function fib(n)
if n <=1 return n
return fib(n − 1) + fib(n − 2)

Notice that if we call, say,  fib(5) , we produce a call tree that calls the function on the same value
many different times:

fib(5)

fib(4) + fib(3)

(fib(3) + fib(2)) + (fib(2) + fib(1))

((fib(2) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) + fib(1))

(((fib(1) + fib(0)) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) +


fib(1))

In particular,  fib(2)  was calculated three times from scratch. In larger examples, many more
values of  fib , or subproblems, are recalculated, leading to an exponential time algorithm.

Now, suppose we have a simple map object, m, which maps each value of  fib  that has already
been calculated to its result, and we modify our function to use it and update it. The resulting
function requires only O(n) time instead of exponential time (but requires O(n) space):

var m := map(0 → 0, 1 → 1)
function fib(n)
if key n is not in map m
m[n] := fib(n − 1) + fib(n − 2)
return m[n]

This technique of saving values that have already been calculated is called memoization; this is the
top-down approach, since we first break the problem into subproblems and then calculate and store
values.
In the bottom-up approach, we calculate the smaller values of  fib  first, then build larger values
from them. This method also uses O(n) time since it contains a loop that repeats n − 1 times, but it
only takes constant (O(1)) space, in contrast to the top-down approach which requires O(n) space to
store the map.
function fib(n)
if n = 0
return 0
else var previousFib := 0, currentFib := 1
repeat n − 1 times // loop is skipped if n = 1
var newFib := previousFib + currentFib
previousFib := currentFib
currentFib := newFib
return currentFib

In both examples, we only calculate  fib(2)  one time, and then use it to calculate
both  fib(4)  and  fib(3) , instead of computing it every time either of them is evaluated.

Note that the above method actually takes   time for large n because addition of two integers
with   bits each takes   time. (The n  fibonacci number has 
th
bits.) Also, there is a
closed form for the Fibonacci sequence, known as Binet's formula, from which the  -th term can
be computed in approximately   time, which is more efficient than the above dynamic
programming technique. However, the simple recurrence directly gives the matrix form that leads to
an approximately  algorithm by fast matrix exponentiation.

A type of balanced 0–1 matrix


Consider the problem of assigning values, either zero or one, to the positions of an n × n matrix,
with n even, so that each row and each column contains exactly n / 2 zeros andn / 2 ones. We ask
how many different assignments there are for a given  . For example, when n = 4, four possible
solutions are

There are at least three possible approaches: brute force, backtracking, and dynamic programming.

Brute force consists of checking all assignments of zeros and ones and counting those that have

balanced rows and columns (n / 2 zeros and n / 2 ones). As there are  possible assignments,
this strategy is not practical except maybe up to  .

Backtracking for this problem consists of choosing some order of the matrix elements and
recursively placing ones or zeros, while checking that in every row and column the number of
elements that have not been assigned plus the number of ones or zeros are both at least n / 2.
While more sophisticated than brute force, this approach will visit every solution once, making it
impractical for n larger than six, since the number of solutions is already 116,963,796,250 for n = 10,
as we shall see.

Dynamic programming makes it possible to count the number of solutions without visiting them all.
Imagine backtracking values for the first row – what information would we require about the
remaining rows, in order to be able to accurately count the solutions obtained for each first row
value? We consider k × n boards, where 1 ≤ k ≤ n, whose  rows contain   zeros and   
ones. The function f to which memoization is applied maps vectors of n pairs of integers to the
number of admissible boards (solutions). There is one pair for each column, and its two components
indicate respectively the number of zeros and ones that have yet to be placed in that column. We
seek the value of   (  arguments or one vector

of   elements). The process of subproblem creation involves iterating over every one of 
possible assignments for the top row of the board, and going through every column, subtracting one
from the appropriate element of the pair for that column, depending on whether the assignment for
the top row contained a zero or a one at that position. If any one of the results is negative, then the
assignment is invalid and does not contribute to the set of solutions (recursion stops). Otherwise, we
have an assignment for the top row of the k × n board and recursively compute the number of
solutions to the remaining(k − 1) × n board, adding the numbers of solutions for every admissible
assignment of the top row and returning the sum, which is being memoized. The base case is the
trivial subproblem, which occurs for a 1 × n board. The number of solutions for this board is either
zero or one, depending on whether the vector is a permutation of n / 2   andn / 2   pairs
or not.For example, in the first two boards shown above the sequences of vectors would be

((2, 2) (2, 2) (2, 2) (2, 2)) ((2, 2) (2, 2) (2, 2) (2, 2))
k = 4
0 1 0 1 0 0 1 1
((1, 2) (2, 1) (1, 2) (2, 1)) ((1, 2) (1, 2) (2, 1) (2, 1))
k = 3
1 0 1 0 0 0 1 1

((1, 1) (1, 1) (1, 1) (1, 1)) ((0, 2) (0, 2) (2, 0) (2, 0))
k = 2
0 1 0 1 1 1 0 0
((0, 1) (1, 0) (0, 1) (1, 0)) ((0, 1) (0, 1) (1, 0) (1, 0))
k = 1
1 0 1 0 1 1 0 0
((0, 0) (0, 0) (0, 0) (0, 0)) ((0, 0) (0, 0), (0, 0) (0, 0))

The number of solutions is

Checkerboard
Consider a checkerboard with n × n squares and a cost-function c(i, j) which returns a cost
associated with square i, j (i being the row, j being the column). For instance (on a 5 × 5
checkerboard),

5 6 7 4 7 8

4 7 6 1 1 4

3 3 5 7 8 2

2 – 6 7 0 –

1 – – *5* – –

1 2 3 4 5

Thus c(1, 3) = 5

Let us say you had a checker that could start at any square on the first rank (i.e., row) and you
wanted to know the shortest path (sum of the costs of the visited squares are at a minimum) to get to
the last rank, assuming the checker could move only diagonally left forward, diagonally right forward,
or straight forward. That is, a checker on (1,3) can move to (2,2), (2,3) or (2,4).

5
4

2 x x x

1 o

1 2 3 4 5

This problem exhibits optimal substructure. That is, the solution to the entire problem relies on
solutions to subproblems. Let us define a function q(i, j) as

q(i, j) = the minimum cost to reach square (i, j).

If we can find the values of this function for all the squares at rank n, we pick the minimum and
follow that path backwards to get the shortest path.

Note that q(i, j) is equal to the minimum cost to get to any of the three squares below it (since
those are the only squares that can reach it) plus c(i, j). For instance:

4 A

3 B C D

1
1 2 3 4 5

Now, let

us define q(i, j) in somewhat more general terms:

The first line of this equation is there to make the recursive property simpler (when
dealing with the edges, so we need only one recursion). The second line says what
happens in the last rank, to provide a base case. The third line, the recursion, is the
important part. It is similar to the A,B,C,D example. From this definition we can make a
straightforward recursive code for q(i, j). In the following pseudocode, n is the size of the
board,  c(i, j)  is the cost-function, and  min()  returns the minimum of a number of
values:

function minCost(i, j)
if j < 1 or j > n
return infinity
else if i = 1
return c(i, j)
else
return min( minCost(i-1, j-1), minCost(i-1, j),
minCost(i-1, j+1) ) + c(i, j)

It should be noted that this function only computes the path-cost, not the actual path.
We will get to the path soon. This, like the Fibonacci-numbers example, is horribly slow
since it wastes time recomputing the same shortest paths over and over. However, we
can compute it much faster in a bottom-up fashion if we store path-costs in a two-
dimensional array  q[i, j]  rather than using a function. This avoids recomputation;
before computing the cost of a path, we check the array  q[i, j]  to see if the path cost
is already there.

We also need to know what the actual shortest path is. To do this, we use another
array  p[i, j] , a predecessor array. This array implicitly stores the path to any
square s by storing the previous node on the shortest path to s, i.e. the predecessor. To
reconstruct the path, we lookup the predecessor of s, then the predecessor of that
square, then the predecessor of that square, and so on, until we reach the starting
square. Consider the following code:

function computeShortestPathArrays()
for x from 1 to n
q[1, x] := c(1, x)
for y from 1 to n
q[y, 0] := infinity
q[y, n + 1] := infinity
for y from 2 to n
for x from 1 to n
m := min(q[y-1, x-1], q[y-1, x], q[y-1, x+1])
q[y, x] := m + c(y, x)
if m = q[y-1, x-1]
p[y, x] := -1
else if m = q[y-1, x]
p[y, x] := 0
else
p[y, x] := 1

Now the rest is a simple matter of finding the minimum and printing it.

function computeShortestPath()
computeShortestPathArrays()
minIndex := 1
min := q[n, 1]
for i from 2 to n
if q[n, i] < min
minIndex := i
min := q[n, i]
printPath(n, minIndex)
function printPath(y, x)
print(x)
print("<-")
if y = 2
print(x + p[y, x])
else
printPath(y-1, x + p[y, x])

Sequence alignment

In genetics, sequence alignment is an important application where dynamic programming is


essential.[11] Typically, the problem consists of transforming one sequence into another using edit
operations that replace, insert, or remove an element. Each operation has an associated cost, and
the goal is to find the sequence of edits with the lowest total cost.

The problem can be stated naturally as a recursion, a sequence A is optimally edited into a
sequence B by either:

1. inserting the first character of B, and performing an optimal alignment of A and the tail of B
2. deleting the first character of A, and performing the optimal alignment of the tail of A and B
3. replacing the first character of A with the first character of B, and performing optimal
alignments of the tails of A and B.

The partial alignments can be tabulated in a matrix, where cell (i,j) contains the cost of the optimal
alignment of A[1..i] to B[1..j]. The cost in cell (i,j) can be calculated by adding the cost of the relevant
operations to the cost of its neighboring cells, and selecting the optimum.

Different variants exist, see Smith–Waterman algorithm and Needleman–Wunsch algorithm.

Tower of Hanoi puzzle

A model set of the Towers of Hanoi (with 8 disks)

An animated solution of the Tower of Hanoipuzzle for T(4,3).


The Tower of Hanoi or Towers of Hanoi is a mathematical game or puzzle. It consists of three
rods, and a number of disks of different sizes which can slide onto any rod. The puzzle starts with
the disks in a neat stack in ascending order of size on one rod, the smallest at the top, thus making a
conical shape.

The objective of the puzzle is to move the entire stack to another rod, obeying the following rules:

 Only one disk may be moved at a time.


 Each move consists of taking the upper disk from one of the rods and sliding it onto another rod,
on top of the other disks that may already be present on that rod.
 No disk may be placed on top of a smaller disk.

The dynamic programming solution consists of solving the functional equation

S(n,h,t) = S(n-1,h, not(h,t)) ; S(1,h,t) ; S(n-1,not(h,t),t)

where n denotes the number of disks to be moved, h denotes the home rod, t denotes the target
rod, not(h,t) denotes the third rod (neither h nor t), ";" denotes concatenation, and

S(n, h, t) := solution to a problem consisting of n disks that are to be moved from rod h to rod
t.

Note that for n=1 the problem is trivial, namely S(1,h,t) = "move a disk from rod h to rod t"
(there is only one disk left).

The number of moves required by this solution is 2n − 1. If the objective is to maximize the
number of moves (without cycling) then the dynamic programming functional equation is
slightly more complicated and 3n − 1 moves are required.[12]

Egg dropping puzzle

The following is a description of the instance of this famous puzzle involving n=2 eggs and a building
with H=36 floors:[13]

Suppose that we wish to know which stories in a 36-story building are safe to drop eggs
from, and which will cause the eggs to break on landing (using U.S. Englishterminology, in
which the first floor is at ground level). We make a few assumptions:

 An egg that survives a fall can be used again.


 A broken egg must be discarded.
 The effect of a fall is the same for all eggs.
 If an egg breaks when dropped, then it would break if dropped from a higher window.
 If an egg survives a fall, then it would survive a shorter fall.
 It is not ruled out that the first-floor windows break eggs, nor is it ruled out that eggs can
survive the 36th-floor windows.
If only one egg is available and we wish to be sure of obtaining the right result, the
experiment can be carried out in only one way. Drop the egg from the first-floor window; if it
survives, drop it from the second-floor window. Continue upward until it breaks. In the worst
case, this method may require 36 droppings. Suppose 2 eggs are available. What is the
least number of egg-droppings that is guaranteed to work in all cases?

To derive a dynamic programming functional equation for this puzzle, let the state of the


dynamic programming model be a pair s = (n,k), where

n = number of test eggs available, n = 0, 1, 2, 3, ..., N − 1.


k = number of (consecutive) floors yet to be tested, k = 0, 1, 2, ..., H − 1.

For instance, s = (2,6) indicates that two test eggs are available and 6 (consecutive)
floors are yet to be tested. The initial state of the process is s = (N,H)
where N denotes the number of test eggs available at the commencement of the
experiment. The process terminates either when there are no more test eggs (n = 0)
or when k = 0, whichever occurs first. If termination occurs at state s = (0,k)
and k > 0, then the test failed.

Now, let

W(n,k) = minimum number of trials required to identify the value of the critical floor under the
worst-case scenario given that the process is in state s = (n,k).

Then it can be shown that[14]

W(n,k) = 1 + min{max(W(n − 1, x − 1), W(n,k − x)): x = 1, 2, ..., k }

with W(n,1) = 1 for all n > 0 and W(1,k) = k for all k. It is easy to solve this equation iteratively by
systematically increasing the values of n and k.

An interactive online facility is available for experimentation with this model as well as with other
versions of this puzzle (e.g. when the objective is to minimize the expected value of the number of
trials.)[14]

Faster DP solution using a different parametrization

Notice that the above solution takes   time with a DP solution. This can be improved
to   time by binary searching on the optimal   in the above recurrence,
since   is increasing in   while   is decreasing in  , thus a local
minimum of   is a global minimum. Also, by storing
the optimal   for each cell in the DP table and referring to its value for the previous cell, the
optimal   for each cell can be found in constant time, improving it to   time. However, there
is an even faster solution that involves a different parametrization of the problem:

Let   be the total number of floors such that the eggs break when dropped from the  th floor (The
example above is equivalent to taking  ).

Let   be the minimum floor from which the egg must be dropped to be broken.

Let   be the maximum number of values of   that are distinguishable using   tries and   
eggs.

Then   for all  .

Let   be the floor from which the first egg is dropped in the optimal strategy.

If the first egg broke,   is from   to   and distinguishable using at most   tries and   
eggs.

If the first egg did not break,   is from   to   and distinguishable using   tries and   
eggs.

Therefore  .

Then the problem is equivalent to finding the minimum   such that  .

To do so, we could compute   in order of increasing  , which would


take   time.

Thus, if we separately handle the case of  , the algorithm would take   time.

But the recurrence relation can in fact be solved, giving  , which can be

computed in   time using the identity   for all  .

Since   for all  , we can binary search on   to find  , giving
an   algorithm.

Matrix chain multiplication


Matrix chain multiplication is a well known example that demonstrates utility of dynamic
programming. For example, engineering applications often have to multiply a chain of matrices. It is
not surprising to find matrices of large dimensions, for example 100×100. Therefore, our task is to
multiply matrices A1, A2, .... An. As we know from basic linear algebra, matrix multiplication is not
commutative, but is associative; and we can multiply only two matrices at a time. So, we can multiply
this chain of matrices in many different ways, for example:

((A1 × A2) × A3) × ... An


A1×(((A2×A3)× ... ) × An)
(A1 × A2) × (A3 × ... An)

and so on. There are numerous ways to multiply this chain of matrices. They will all
produce the same final result, however they will take more or less time to compute,
based on which particular matrices are multiplied. If matrix A has dimensions m×n and
matrix B has dimensions n×q, then matrix C=A×B will have dimensions m×q, and will
require m*n*q scalar multiplications (using a simplistic matrix multiplication algorithm for
purposes of illustration).

For example, let us multiply matrices A, B and C. Let us assume that their dimensions
are m×n, n×p, and p×s, respectively. Matrix A×B×C will be of size m×s and can be
calculated in two ways shown below:

1. Ax(B×C) This order of matrix multiplication will require nps + mns scalar
multiplications.
2. (A×B)×C This order of matrix multiplication will require mnp + mps scalar
calculations.

Let us assume that m = 10, n = 100, p = 10 and s = 1000. So, the first way to multiply
the chain will require 1,000,000 + 1,000,000 calculations. The second way will require
only 10,000+100,000 calculations. Obviously, the second way is faster, and we should
multiply the matrices using that arrangement of parenthesis.

Therefore, our conclusion is that the order of parenthesis matters, and that our task is to
find the optimal order of parenthesis.

At this point, we have several choices, one of which is to design a dynamic


programming algorithm that will split the problem into overlapping problems and
calculate the optimal arrangement of parenthesis. The dynamic programming solution is
presented below.

Let's call m[i,j] the minimum number of scalar multiplications needed to multiply a chain
of matrices from matrix i to matrix j (i.e. Ai × .... × Aj, i.e. i<=j). We split the chain at some
matrix k, such that i <= k < j, and try to find out which combination produces minimum
m[i,j].

The formula is:


if i = j, m[i,j]= 0
if i < j, m[i,j]= min over all possible values of k (
)

where k is changed from i to j − 1.

  is the row dimension of matrix i,


  is the column dimension of matrix k,
  is the column dimension of matrix j.

This formula can be coded as shown below, where input parameter "chain" is the chain
of matrices, i.e.  :

function OptimalMatrixChainParenthesis(chain)
n = length(chain)
for i = 1, n
m[i,i] = 0 //since it takes no calculations to
multiply one matrix
for len = 2, n
for i = 1, n - len + 1
for j = i, len -1
m[i,j] = infinity //so that the first
calculation updates
for k = i, j-1

if q < m[i, j] // the new order of


parenthesis is better than what we had
m[i, j] = q //update
s[i, j] = k //record which k to
split on, i.e. where to place the parenthesis

So far, we have calculated values for all possible m[i, j], the minimum number of
calculations to multiply a chain from matrix i to matrix j, and we have recorded the
corresponding "split point"s[i, j]. For example, if we are multiplying chain A1×A2×A3×A4,
and it turns out that m[1, 3] = 100 and s[1, 3] = 2, that means that the optimal
placement of parenthesis for matrices 1 to 3 is (A1×A2)×A3 and to multiply those
matrices will require 100 scalar calculation.
This algorithm will produce "tables" m[, ] and s[, ] that will have entries for all possible
values of i and j. The final solution for the entire chain is m[1, n], with corresponding split
at s[1, n]. Unraveling the solution will be recursive, starting from the top and continuing
until we reach the base case, i.e. multiplication of single matrices.

Therefore, the next step is to actually split the chain, i.e. to place the parenthesis where
they (optimally) belong. For this purpose we could use the following algorithm:

function PrintOptimalParenthesis(s, i, j)
if i = j
print "A"i
else
print "(" PrintOptimalParenthesis(s, i, s[i, j])
PrintOptimalParenthesis(s, s[i, j] + 1, j) ")"

Of course, this algorithm is not useful for actual multiplication. This algorithm is just a
user-friendly way to see what the result looks like.

To actually multiply the matrices using the proper splits, we need the following
algorithm:
   function MatrixChainMultiply(chain from 1 to n)         //
returns the final matrix, i.e. A1×A2×... ×An
      OptimalMatrixChainParenthesis(chain from 1 to n)  // this
will produce s[ . ] and m[ . ] "tables"
      OptimalMatrixMultiplication(s, chain from 1 to n)    //
actually multiply

   function OptimalMatrixMultiplication(s, i, j)   // returns
the result of multiplying a chain of matrices from Ai to Aj in
optimal way
      if i < j
         // keep on splitting the chain and multiplying the
matrices in left and right sides
         LeftSide = OptimalMatrixMultiplication(s, i, s[i, j])  
       
RightSide = OptimalMatrixMultiplication(s, s[i, j] + 1, j)
         return MatrixMultiply(LeftSide, RightSide)
      else if i = j
         return Ai   // matrix at position i
      else
         print "error, i <= j must hold"

    function MatrixMultiply(A, B)    // function that multiplies


two matrices
      if columns(A) = rows(B)
         for i = 1, rows(A)
            for j = 1, columns(B)
               C[i, j] = 0
               for k = 1, columns(A)
                   C[i, j] = C[i, j] + A[i, k]*B[k, j]
               return C
      else
          print "error, incompatible dimensions."

References
1. Jump up^ S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani, 'Algorithms', p 173, available
athttp://www.cs.berkeley.edu/~vazirani/algorithms.html
2. Jump up^ Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C. (2001), Introduction to Algorithms
(2nd ed.), MIT Press & McGraw–Hill, ISBN 0-262-03293-7 . pp. 327–8.
3. Jump up^ DeLisi, Biopolymers, 1974, Volume 13, Issue 7, pages 1511–1512, July 1974
4. Jump up^ Gurskiĭ GV, Zasedatelev AS, Biofizika, 1978 Sep-Oct;23(5):932-46
5. Jump up^ "M. Memo". J Vocabulary. J Software. Retrieved 28 October 2011.
6. Jump up^ Stokey et al., 1989, Chap. 1
7. Jump up^ Sniedovich, M. (2006), "Dijkstra’s algorithm revisited: the dynamic programming
connexion" (PDF), Journal of Control and Cybernetics 35 (3): 599–620. Online version of the paper
with interactive computational modules.
8. Jump up^ Denardo, E.V. (2003), Dynamic Programming: Models and Applications, Mineola,
NY:Dover Publications, ISBN 978-0-486-42810-9
9. Jump up^ Sniedovich, M. (2010), Dynamic Programming: Foundations and Principles, Taylor &
Francis, ISBN 978-0-8247-4099-3
10. Jump up^ Dijkstra 1959, p. 270
11. ^ Jump up to:a b Eddy, S. R., What is dynamic programming?, Nature Biotechnology, 22, 909–910
(2004).
12. Jump up^ Moshe Sniedovich (2002), "OR/MS Games: 2. The Towers of Hanoi Problem,",INFORMS
Transactions on Education 3 (1): 34–51.
13. Jump up^ Konhauser J.D.E., Velleman, D., and Wagon, S. (1996). Which way did the Bicycle
Go? Dolciani Mathematical Expositions – No 18. The Mathematical Association of America.
14. ^ Jump up to:a b Sniedovich, M. (2003). The joy of egg-dropping in Braunschweig and Hong Kong.
INFORMS Transactions on Education, 4(1) 48–64.
15. Jump up^ Dean Connable Wills, Connections between combinatorics of permutations and algorithms
and geometry
16. Jump up^ http://www.wu-wien.ac.at/usr/h99c/h9951826/bellman_dynprog.pdf
17. Jump up^ Nocedal, J.; Wright, S. J.: Numerical Optimization, page 9, Springer, 2006..

Teoria stocurilor

Urmărește pagina
În cercetarea operațională, teoria stocurilor studiază, cu ajutorul unor modele matematice,
procesele economice de stocare în vederea adoptării unei decizii de maximă eficiență economică.
Fără a fi o teorie matematică propriu-zisă (deoarece nu există o axiomatizare specifică și nici
teoreme proprii), teoria stocurilor studiază cele mai reprezentative modele din această categorie.
De asemenea, își propune formularea unor metode de cercetare aplicabile tuturor modelelor
particulare de stocare.
Pentru funcțiile dublu diferențiabile, problemele fără limitări de posibilități se pot rezolva
găsind punctele în care panta funcției obiectiv este 0 (acestea sunt punctele staționare)
și folosind matricea Hessiană pentru a clasifica tipul fiecărui punct. Dacă este pozitiv
definită, punctul este un minim local, dacă este negativă, un maxim local, iar dacă este
nedefinită, un punct de șa.

Se poate găsi acel punct staționar pornind de la o bănuială despre el, apoi ajungând la
el printr-una din metodele:

 descreșterea pantei
 metoda lui Newton
 conjugatul pantei
 căutarea liniară

Dacă funcția este convexă în regiunea de interes, atunci orice minim local este și global.
Există metode rapide pentru optmizarea funcțiilor dublu diferențiabile convexe.
Problemele cu limitări de situații pot fi transformate în probleme fără limitări cu ajutorul
multiplicatorilor lui Lagrange.

Iată câteva metode populare:

 algoritmi genetici
 strategie de evoluție
 evoluție diferențială

Sequential quadratic programming


Sequential quadratic programming (SQP) is an iterative method for nonlinear optimization. SQP
methods are used on problems for which the objective function and the constraints are
twice continuously differentiable.

SQP methods solve a sequence of optimization subproblems, each of which optimizes a quadratic
model of the objective subject to a linearization of the constraints. If the problem is unconstrained,
then the method reduces to Newton's method for finding a point where the gradient of the objective
vanishes. If the problem has only equality constraints, then the method is equivalent to
applying Newton's method to the first-order optimality conditions, or Karush–Kuhn–Tucker
conditions, of the problem. SQP methods have been implemented in many packages,
including NPSOL, SNOPT, NLPQL, OPSYC, OPTIMA, MATLAB, GNU Octave and SQP.
Algorithm basics[edit]
Consider a nonlinear programming problem of the form:

The Lagrangian for this problem is;

where   and   are Lagrange multipliers. At an iterate  , a basic sequential quadratic


programming algorithm defines an appropriate search direction   as a solution to
thequadratic programming subproblem

Note that the term   in the expression above may be left out for the minimization
problem, since it is constant.
Gradient descent
For the analytical method called "steepest descent", see Method of steepest descent.

Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using


gradient descent, one takes steps proportional to the negative of the gradient(or of the approximate
gradient) of the function at the current point. If instead one takes steps proportional to the positive of
the gradient, one approaches a local maximum of that function; the procedure is then known
as gradient ascent.
Gradient descent is also known as steepest descent, or the method of steepest descent. When
known as the latter, gradient descent should not be confused with the method of steepest
descent for approximating integrals.

contents

 1 Description
o 1.1 Examples
o 1.2 Limitations
 2 Solution of a linear system
 3 Solution of a non-linear system
 4 Comments
 5 A computational example
 6 Extensions
o 6.1 Fast proximal gradient method
o 6.2 The momentum method
 7 See also
 8 References

Description
Illustration of gradient descent.

Gradient descent is based on the observation that if the multivariable function   


is defined and differentiable in a neighborhood of a point  , then   decreases fastest if one
goes from   in the direction of the negative gradient of   at  ,  . It follows that, if

for   small enough, then  . With this observation in mind, one starts with a
guess   for a local minimum of  , and considers the sequence   such that

We have

so hopefully the sequence   converges to the desired local minimum. Note that the value of
the step size   is allowed to change at every iteration. With certain assumptions on the function   
(for example,   convex and  Lipschitz) and particular choices of   (e.g., chosen via a line
search that satisfies the Wolfe conditions), convergence to a local minimum can be guaranteed.
When the function   is convex, all local minima are also global minima, so in this case gradient
descent can converge to the global solution.

This process is illustrated in the picture to the right. Here   is assumed to be defined on the plane,
and that its graph has a bowl shape. The blue curves are the contour lines, that is, the regions on
which the value of   is constant. A red arrow originating at a point shows the direction of the
negative gradient at that point. Note that the (negative) gradient at a point is orthogonal to the
contour line going through that point. We see that gradient descent leads us to the bottom of the
bowl, that is, to the point where the value of the function   is minimal.

Examples
Gradient descent has problems with pathological functions such as the Rosenbrock function shown
here.

The Rosenbrock function has a narrow curved valley which contains the minimum. The bottom
of the valley is very flat. Because of the curved flat valley the optimization is zig-zagging slowly
with small stepsizes towards the minimum.

The "Zig-Zagging" nature of the method is also evident below, where the gradient ascent

method is applied to  .


Limitations
For some of the above examples, gradient descent is relatively slow close to the minimum:
technically, its asymptotic rate of convergence is inferior to many other methods. For poorly
conditioned convex problems, gradient descent increasingly 'zigzags' as the gradients point nearly
orthogonally to the shortest direction to a minimum point. For more details, see the comments below.
For non-differentiable functions, gradient methods are ill-defined. For locally Lipschitz problems and
especially for convex minimization problems, bundle methods of descent are well-defined. Non-
descent methods, like subgradient projection methods, may also be used.[1] These methods are
typically slower than gradient descent. Another alternative for non-differentiable functions is to
"smooth" the function, or bound the function by a smooth function. In this approach, the smooth
problem is solved in the hope that the answer is close to the answer for the non-smooth problem
(occasionally, this can be made rigorous).

Solution of a linear system


Gradient descent can be used to solve a system of linear equations, reformulated as a quadratic
minimization problem, e.g., using linear least squares. The solution of

in the sense of linear least squares is defined as minimizing the function

In traditional linear least squares for real   and   the Euclidean norm is used, in which case

In this case, the line search minimization, finding the locally optimal step size   on every iteration,
can be performed analytically, and explicit formulas for the locally optimal   are known.[2]

For solving linear equations, gradient descent is rarely used, with the conjugate gradient
method being one of the most popular alternatives. The speed of convergence of gradient descent
depends on the maximal and minimal eigenvalues of  , while the speed of convergence
of conjugate gradients has a more complex dependence on the eigenvalues, and can benefit
from preconditioning. Gradient descent also benefits from preconditioning, but this is not done as
commonly.

Solution of a non-linear system


Gradient descent can also be used to solve a system of nonlinear equations. Below is an example
that shows how to use the gradient descent to solve for three unknown variables, x1, x2, and x3. This
example shows one iteration of the gradient descent.

Consider a nonlinear system of equations:


suppose we have the function

where

and the objective function

With initial guess

We know that

where

The Jacobian matrix 

Then evaluating these terms at 

and
So that

and

An animation showing the first 83 iterations of gradient descent applied to this example. Surfaces

are isosurfaces of   at current guess  , and arrows show the direction of descent. Due to a small
and constant step size, the convergence is slow.

Now a suitable   must be found such that  . This can be done with any of a
variety of line search algorithms. One might also simply guess   which gives

evaluating at this value,


The decrease from   to the next step's value of   is a
sizable decrease in the objective function. Further steps would reduce its value until a solution to the
system was found.

Comments[edit]
Gradient descent works in spaces of any number of dimensions, even in infinite-dimensional ones.
In the latter case the search space is typically a function space, and one calculates the Gâteaux
derivative of the functional to be minimized to determine the descent direction.[3]

The gradient descent can take many iterations to compute a local minimum with a
required accuracy, if the curvaturein different directions is very different for the given function. For
such functions, preconditioning, which changes the geometry of the space to shape the function
level sets likeconcentric circles, cures the slow convergence. Constructing and applying
preconditioning can be computationally expensive, however.

The gradient descent can be combined with a line search, finding the locally optimal step size   on
every iteration. Performing the line search can be time-consuming. Conversely, using a fixed small   
can yield poor convergence.

Methods based on Newton's method and inversion of the Hessian using conjugate


gradient techniques can be better alternatives.[4][5] Generally, such methods converge in fewer
iterations, but the cost of each iteration is higher. An example is the BFGS method which consists in
calculating on every step a matrix by which the gradient vector is multiplied to go into a "better"
direction, combined with a more sophisticated line search algorithm, to find the "best" value of   For
extremely large problems, where the computer memory issues dominate, a limited-memory method
such as L-BFGS should be used instead of BFGS or the steepest descent.

Gradient descent can be viewed as Euler's method for solving ordinary differential


equations   of a gradient flow.

A computational example
The gradient descent algorithm is applied to find a local minimum of the function f(x)=x4−3x3+2, with
derivative f'(x)=4x3−9x2. Here is an implementation in the Python programming language.
# From calculation, we expect that the local minimum occurs at x=9/4

x_old = 0
x_new = 6 # The algorithm starts at x=6
gamma = 0.01 # step size
precision = 0.00001
def f_derivative(x):
return 4 * x**3 - 9 * x**2

while abs(x_new - x_old) > precision:


x_old = x_new
x_new = x_old - gamma * f_derivative(x_old)

print("Local minimum occurs at", x_new)

The above piece of code has to be modified with regard to step size according to the system at hand
and convergence can be made faster by using an adaptive step size. In the above case the step size
is not adaptive. It stays at 0.01 in all the directions which can sometimes cause the method to fail by
diverging from the minimum.

Extensions
Gradient descent can be extended to handle constraints by including a projection onto the set of
constraints. This method is only feasible when the projection is efficiently computable on a computer.
Under suitable assumptions, this method converges. This method is a specific case of the forward-
backward algorithm for monotone inclusions (which includes convex programming and variational
inequalities).[6]

Fast proximal gradient method


Another extension of gradient descent is due to Yurii Nesterov from 1983,[7] and has been
subsequently generalized. He provides a simple modification of the algorithm that enables faster
convergence for convex problems. Specifically, if the function   is convex and   is Lipschitz,
and it is not assumed that   is strongly convex, then the error in the objective value generated at
each step   by the gradient descent method will be bounded by  . Using the Nesterov
acceleration technique, the error decreases at  .[8]

The momentum method


Yet another extension, that reduces the risk of getting stuck in a local minimum, as well as speeds
up the convergence considerably in cases where the process would otherwise zig-zag heavily, is
the momentum method, which uses a momentum term in analogy to "the mass of Newtonian
particles that move through a viscous medium in a conservative force field".[9] This method is often
used as an extension to the backpropagation algorithms used to train artificial neural networks.

References[edit]
 Mordecai Avriel (2003). Nonlinear Programming: Analysis and Methods. Dover Publishing. ISBN
0-486-43227-0.
 Jan A. Snyman (2005). Practical Mathematical Optimization: An Introduction to Basic
Optimization Theory and Classical and New Gradient-Based Algorithms. Springer
Publishing. ISBN 0-387-24348-8
 Cauchy, Augustin (1847). Méthode générale pour la résolution des systèmes d'équations
simultanées. pp. 536–538.

1. Jump up^ Kiwiel, Krzysztof C. (2001). "Convergence and efficiency of subgradient methods for
quasiconvex minimization". Mathematical Programming (Series A) 90 (1) (Berlin, Heidelberg:
Springer). pp. 1–25. doi:10.1007/PL00011414. ISSN 0025-5610. MR 1819784.
2. Jump up^ Yuan, Ya-xiang (1999). "Step-sizes for the gradient method" (PDF). AMS/IP Studies in
Advanced Mathematics (Providence, RI: American Mathematical Society) 42 (2): 785.
3. Jump up^ G. P. Akilov, L. V. Kantorovich, Functional Analysis, Pergamon Pr; 2 Sub edition,ISBN 0-
08-023036-9, 1982
4. Jump up^ W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, Numerical Recipes in C:
The Art of Scientific Computing, 2nd Ed., Cambridge University Press, New York, 1992
5. Jump up^ T. Strutz: Data Fitting and Uncertainty (A practical introduction to weighted least squares
and beyond). Vieweg+Teubner, Wiesbaden 2011, ISBN 978-3-8348-1022-9.
6. Jump up^ P. L. Combettes and J.-C. Pesquet, "Proximal splitting methods in signal processing",
in: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, (H. H. Bauschke, R. S.
Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, Editors), pp. 185-212. Springer,
New York, 2011.
7. Jump up^ Yu. Nesterov, "Introductory Lectures on Convex Optimization. A Basic Course" (Springer,
2004, ISBN 1-4020-7553-7)
8. Jump up^ Fast Gradient Methods, lecture notes by Prof. Lieven Vandenberghe for EE236C at UCLA
9. Jump up^ Qian, Ning (January 1999). "On the momentum term in gradient descent learning
algorithms" (PDF). Neural Networks 12 (1): 145–151. Retrieved 17 October 2014.
10. Jump up^ "Momentum and Learning Rate Adaptation". Willamette University. Retrieved 17
October 2014.
11. Jump up^ Geoffrey Hinton; Nitish Srivastava; Kevin Swersky. "6 - 3 - The momentum
method". YouTube. Retrieved 18 October 2014. Part of a lecture series for the Coursera online
course Neural Networks for Machine Learning.

Proximal gradient method


Proximal gradient methods are a generalized form of projection used to solve non-
differentiable convex optimization problems. Many interesting problems can be formulated as convex
optimization problems of form
where   are convex functions defined from   where some of the
functions are non-differentiable, this rules out our conventional smooth optimization techniques
like Steepest descent method, conjugate gradient method etc. There is a specific class
of algorithms which can solve above optimization problem. These methods proceed by splitting,
in that the functions   are used individually so as to yield an
easily implementable algorithm. They are called proximal because each non smooth
function among   is involved via its proximity operator. Iterative Shrinkage
thresholding algorithm, projected Landweber, projected gradient, alternating
projections,alternating-direction method of multipliers, alternating split Bregman are special
instances of proximal algorithms. Details of proximal methods are discussed in Combettes and
Pesquet.[1] For the theory of proximal gradient methods from the perspective of and with
applications to statistical learning theory, see proximal gradient methods for learning.

contents

 1 Notations and terminology


 2 Projection onto convex sets (POCS)
 3 Definition
 4 Examples
 5 See also
 6 References

Notations and terminology[edit]


Let  , the  -dimensional euclidean space, be the domain of the
function  . Suppose   is the non-empty convex subset of  . Then, the
indicator function of   is defined as

-norm is defined as (   )

The distance from   to   is defined as


If   is closed and convex, the projection of   onto   is the unique
point   such that  .

The subdifferential of   is given by

Projection onto convex sets (POCS)


One of the widely used convex optimization algorithms is POCS (Projection Onto Convex Sets). This
algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex
constraints. Let   be the indicator function of non-empty closed convex set   modeling a
constraint. This reduces to convex feasibility problem, which require us to find a solution such that it
lies in the intersection of all convex sets  . In POCS method each set   is incorporated by
its projection operator  . So in eachiteration   is updated as

However beyond such problems projection operators are not appropriate and more general
operators are required to tackle them. Among the various generalizations of the notion of a convex
projection operator that exist, proximity operators are best suited for other purposes.

Definition
Proximity operators of function   at   is defined as

For every  , the minimization problem

admits a unique solution which is denoted by  .

The proximity operator of   is characterized by inclusion

If   is differentiable then above equation reduces to

Examples
Special instances of Proximal Gradient Methods are
 Projected Landweber
 Alternating projection
 Alternating-direction method of multipliers

Landweber iteration
The Landweber iteration or Landweber algorithm is an algorithm to solve ill-posed linear inverse
problems, and it has been extended to solve non-linear problems that involve constraints. The
method was first proposed in the 1950s,[1] and it can be now viewed as a special case of many other
more general methods.[2]

Contents

 1 Basic algorithm
 2 Nonlinear extension
 3 Extension to constrained problems
 4 Applications
 5 References

Basic algorithm
The original Landweber algorithm [1] attempts to recover a signal x from measurements y. The linear
version assumes that   for a linear operator A. When the problem is in
finite dimensions, A is just a matrix.

When A is nonsingular, then an explicit solution is  . However, if A is ill-conditioned, the


explicit solution is a poor choice since it is sensitive to any errors made on y. IfA is singular, this
explicit solution doesn't even exist. The Landweber algorithm is an attempt to regularize the problem,
and is one of the alternatives to Tikhonov regularization. We may view the Landweber algorithm as
solving:

using an iterative method. For ill-posed problems, the iterative method may be purposefully


stopped before convergence.
The algorithm is given by the update

where the relaxation factor   satisfies  . Here   is the largest singular value of 
. If we write  , then the update can be written in terms of the gradient
and hence the algorithm is a special case of gradient descent.
Discussion of the Landweber iteration as a regularization algorithm can be found in.[3][4]

Nonlinear extension
In general, the updates generated by   will generate a sequence   
that converges to a minimizer of f whenever f is convex and the stepsize   is chosen such
that   where   is the spectral norm.
Since this is special type of gradient descent, there currently is not much benefit to analyzing it on its
own as the nonlinear Landweber, but such analysis was performed historically by many communities
not aware of unifying frameworks.
The nonlinear Landweber problem has been studied in many papers in many communities; see, for
example,.[5]

Extension to constrained problems


If f is a convex function and C is a convex set, then the problem

can be solved by the constrained, nonlinear Landweber iteration, given by:

where   is the projection onto the set C. Convergence is guaranteed when  .


[6]
 This is again a special case of projected gradient descent (which is a special case of the forward–
backward algorithm) as discussed in.[2]

Applications
Since the method has been around since the 1950s, it has been adopted and rediscovered by many
scientific communities, especially those studying ill-posed problems. In X-ray computed
tomography it is called SIRT - simultaneous iterative reconstruction technique. It has also been used
in the computer vision community[7] and the signal restoration community.[8] It is also used in image
processing, since many image problems, such as deconvolution, are ill-posed. Variants of this
method have been used also in sparse approximation problems and compressed sensing settings.

Projections onto convex sets


In mathematics, projections onto convex sets (POCS), sometimes known as the alternating
projection method, is a method to find a point in the intersection of two closedconvex sets. It is a
very simple algorithm and has been rediscovered many times.[1] The simplest case, when the sets
are affine spaces, was analyzed by John von Neumann.[2][3] The case when the sets are affine
spaces is special, since the iterates not only converge to a point in the intersection (assuming the
intersection is non-empty) but in fact to the orthogonal projection onto the intersection of the initial
iterate. For general closed convex sets, the limit point need not be the projection. Classical work on
the case of two closed convex sets shows that the rate of convergence of the iterates is
linear. [4] [5] There are now extensions that consider cases when there are more than one set, or when
the sets are not convex,[6] or that give faster convergence rates. Analysis of POCS and related
methods attempt to show that the algorithm converges (and if so, find the rate of convergence), and
whether it converges to the projection of the original point. These questions are largely known for
simple cases, but a topic of active research for the extensions. There are also variants of the
algorithm, such as Dykstra's projection algorithm. See the references in the further reading section
for an overview of the variants, extensions and applications of the POCS method; a good historical
background can be found in section III of.[7]

Contents

 1 Algorithm
 2 Related algorithms
 3 Further reading
 4 References

Algorithm[edit]

Example on two circles

The POCS algorithm solves the following problem:

where C and D are closed convex sets.
To use the POCS algorithm, one must know how to project onto the sets C and D separately.
The algorithm starts with an arbitrary value for   and then generates the sequence

The simplicity of the algorithm explains some of its popularity. If


the intersection of C and D is non-empty, then thesequence generated by the algorithm
will converge to some point in this intersection.
Unlike Dykstra's projection algorithm, the solution need not be a projection onto the
intersection C and D.

Related algorithms
Example of averaged projections variant

The method of averaged projections is quite similar. For the case of two closed convex
sets C and D, it proceeds by

It has long been known to converge globally.[8] Furthermore, the method is easy to


generalize to more than two sets; some convergence results for this case are in.[9]
The averaged projections method can be reformulated as alternating projections method
using a standard trick. Consider the set

which is defined in the product space  . Then define another set, also in


the product space:

Thus finding   is equivalent to finding  .


To find a point in  , use the alternating projection method. The projection
of a vector   onto the set F is given by  . Hence

Since   and assuming  , then   for


all  , and hence we can simplify the iteration

to  .
Augmented Lagrangian method
Augmented Lagrangian methods are a certain class of algorithms for
solving constrained optimization problems. They have similarities to penalty methods in that they
replace a constrained optimization problem by a series of unconstrained problems and add a penalty
term to the objective; the difference is that the augmented Lagrangian method adds yet another
term, designed to mimic a Lagrange multiplier. The augmented Lagrangian is not the same as
the method of Lagrange multipliers.

Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an
additional penalty term (the augmentation).

The method was originally known as the method of multipliers, and was studied much in the 1970
and 1980s as a good alternative to penalty methods. It was first discussed byMagnus Hestenes in
1969[1] and by Powell in 1969.[2] The method was studied by R. Tyrrell Rockafellar in relation
to Fenchel duality, particularly in relation to proximal-point methods, Moreau–Yosida regularization,
and maximal monotone operators: These methods were used in structural optimization. The method
was also studied by Dimitri Bertsekas, notably in his 1982 book,[3] together with extensions involving
nonquadratic regularization functions, such as entropic regularization, which gives rise to the
"exponential method of multipliers," a method that handles inequality constraints with a twice
differentiable augmented Lagrangian function.

Since the 1970s, sequential quadratic programming (SQP) and interior point methods (IPM) have


had increasing attention, in part because they more easily use sparse
matrixsubroutines from numerical software libraries, and in part because IPMs have proven
complexity results via the theory of self-concordant functions. The augmented Lagrangian method
was rejuvenated by the optimization systems LANCELOT and AMPL, which allowed sparse matrix
techniques to be used on seemingly dense but "partially separable" problems. The method is still
useful for some problems.[4] Around 2007, there was a resurgence of augmented Lagrangian
methods in fields such as total-variation denoising andcompressed sensing. In particular, a variant of
the standard augmented Lagrangian method that uses partial updates (similar to the Gauss-Seidel
method for solving linear equations) known as the alternating direction method of
multipliers or ADMM gained some attention.

Contents

 1 General method
 2 Comparison with penalty methods
 3 Alternating direction method of multipliers
 4 Stochastic Optimization
 5 Software
 6 See also
 7 References
 8 Bibliography

General method[edit]
Let us say we are solving the following constrained problem:

subject to

This problem can be solved as a series of unconstrained minimization problems. For reference, we
first list the penalty method approach:

The penalty method solves this problem, then at the next iteration it re-solves the problem using a
larger value of   (and using the old solution as the initial guess or "warm-start").

The augmented Lagrangian method uses the following unconstrained objective:

and after each iteration, in addition to updating  , the variable   is also updated according to the
rule

where   is the solution to the unconstrained problem at the kth step, i.e. 

The variable   is an estimate of the Lagrange multiplier, and the accuracy of this estimate improves
at every step. The major advantage of the method is that unlike the penalty method, it is not
necessary to take   in order to solve the original constrained problem. Instead, because of
the presence of the Lagrange multiplier term,   can stay much smaller.

The method can be extended to handle inequality constraints. For a discussion of practical
improvements, see.[4]

Comparison with penalty methods[edit]


In [4] it is suggested that the augmented Lagrangian method is generally preferred to the quadratic
penalty method since there is little extra computational cost and the parameter  need not go to
infinity, thus avoiding ill-conditioning this is used.
Alternating direction method of multipliers[edit]
The alternating direction method of multipliers (ADMM) is a variant of the augmented Lagrangian
scheme that uses partial updates for the dual variables. This method is often applied to solve
problems such as

This is equivalent to the constrained problem

Though this change may seem trivial, the problem can now be attacked using methods of
constrained optimization (in particular, the augmented Lagrangian method), and the objective
function is separable in x and y. The dual update requires solving a proximity function in x and y at
the same time; the ADMM technique allows this problem to be solved approximately by first solving
for x with y fixed, and then solving for y with x fixed. Rather than iterate until convergence (like
the Jacobi method), the algorithm proceeds directly to updating the dual variable and then repeating
the process. This is not equivalent to the exact minimization, but surprisingly, it can still be shown
that this method converges to the right answer (under some assumptions). Because of this
approximation, the algorithm is distinct from the pure augmented Lagrangian method.

The ADMM can be viewed as an application of the Douglas-Rachford splitting algorithm, and the
Douglas-Rachford algorithm is in turn an instance of the Proximal point algorithm; details can be
found here.[5] There are several modern software packages that solve Basis pursuit and variants and
use the ADMM; such packages include YALL1(2009), SpaRSA (2009) and SALSA (2009).

Stochastic Optimization
Stochastic optimization considers the problem of minimizing a loss function with access to noisy
samples of (gradient of) the function. The goal is to have an estimate of the optimal parameter
(minimizer) per new sample. ADMM is originally a batch method. However, with some modifications
it can also be used for stochastic optimization. Since in stochastic setting we only have access to
noisy samples of gradient, we use an inexact approximation of the Lagrangian as

where   is a time-varying step size.[6]

The alternating direction method of multipliers (ADMM) is a popular method for online and distributed
optimization on a large scale,[7] and is employed in many applications, e.g.[8][9][10] ADMM is often
applied to solve regularized problems, where the function optimization and regularization can be
carried out locally, and then coordinated globally via constraints. Regularized optimization problems
are especially relevant in the high dimensional regime since regularization is a natural mechanism to
overcome ill-posedness and to encourage parsimony in the optimal solution, e.g., sparsity and low
rank. Due to the efficiency of ADMM in solving regularized problems, it has a good potential for
stochastic optimization in high dimensions. However, conventional stochastic ADMM methods suffer
from curse of dimensionality. Their convergence rate is proportional to square of the dimension and
in practice they scale poorly. See figure REASON vs Stochastic ADMM

Recently, a general framework has been proposed for stochastic optimization in high-dimensions
that solves this bottleneck by adding simple and cheap modifications to ADMM.,[11][12] The method is
called REASON (Regularized Epoch-based Admm for Stochastic Optimization in high-dimensioN).
The modifications are in terms of added projection which goes a long way and results in logarithmic
dimension dependency. REASON can be performed on any regularized optimization with any
number of regularizers. The specific cases of sparse optimization framework and noisy
decomposition framework are discussed further. In both cases, REASON obtains minimax optimal
convergence rate. REASON provides the first online guarantees for noisy matrix decomposition.
Experiment results show that in aforementioned cases, REASON outperforms state-of-the-art.

Convex optimization
Convex minimization, a subfield of optimization, studies the problem of minimizing convex
functions over convex sets. The convexity property can make optimization in some sense "easier"
than the general case - for example, any local minimum must be a global minimum.

Given a real vector space   together with a convex, real-valued function

defined on a convex subset   of  , the problem is to find any point   in   for which the
number   is smallest, i.e., a point   such that

 for all  .

The convexity of   makes the powerful tools of convex analysis applicable. In finite-
dimensional normed spaces, the Hahn–Banach theorem and the existence
of subgradientslead to a particularly satisfying theory of necessary and sufficient
conditions for optimality, a duality theory generalizing that for linear programming, and
effective computational methods.

Convex minimization has applications in a wide range of disciplines, such as


automatic control systems, estimation and signal processing, communications and
networks, electronic circuit design, data analysis and modeling, statistics (optimal design),
and finance. With recent improvements in computing and in optimization theory, convex
minimization is nearly as straightforward as linear programming. Many optimization
problems can be reformulated as convex minimization problems. For example, the problem
ofmaximizing a concave function f can be re-formulated equivalently as a problem
of minimizing the function -f, which is convex.

Contents
  [hide] 

 1 Convex optimization problem


 2 Theory
 3 Standard form
 4 Examples
 5 Lagrange multipliers
 6 Methods
 7 Convex minimization with good complexity: Self-concordant barriers
 8 Quasiconvex minimization
 9 Convex maximization
 10 Extensions
 11 See also
 12 Notes
 13 References
 14 External links

Convex optimization problem[edit]


An optimization problem (also referred to as a mathematical programming problem or minimization
problem) of finding some   such that

where   is the feasible set and   is the objective, is called convex if   is


a closed convex set and   is convex on  . [1] [2]

Alternatively, an optimization problem of the form

is called convex if the functions   are convex.[3]

Theory
The following statements are true about the convex minimization problem:

 if a local minimum exists, then it is a global minimum.


 the set of all (global) minima is convex.
 for each strictly convex function, if the function has a minimum, then the
minimum is unique.

These results are used by the theory of convex minimization along with geometric notions
from functional analysis (in Hilbert spaces) such as the Hilbert projection theorem, theseparating
hyperplane theorem, and Farkas' lemma.

Standard form
Standard form is the usual and most intuitive form of describing a convex minimization problem. It
consists of the following three parts:

 A convex function   to be minimized over the variable 


 Inequality constraints of the form  , where the functions   are
convex
 Equality constraints of the form  , where the functions   
are affine. In practice, the terms "linear" and "affine" are often used
interchangeably. Such constraints can be expressed in the
form  , where   is a column-vector and   a real number.

A convex minimization problem is thus written as

Note that every equality constraint   can be equivalently replaced by a pair of inequality
constraints   and  . Therefore, for theoretical purposes, equality constraints
are redundant; however, it can be beneficial to treat them specially in practice.

Following from this fact, it is easy to understand why   has to be affine as opposed to
merely being convex. If   is convex,   is convex, but   is concave.
Therefore, the only way for   to be convex is for   to be affine.

Examples
The following problems are all convex minimization problems, or can be transformed into convex
minimizations problems via a change of variables:

 Least squares
 Linear programming
 Convex quadratic minimization with linear constraints
 quadratic minimization with convex quadratic constraints
 Conic optimization
 Geometric programming
 Second order cone programming
 Semidefinite programming
 Entropy maximization with appropriate constraints

Lagrange multipliers
Consider a convex minimization problem given in standard form by a cost function   and
inequality constraints  , where  . Then the domain   is:

The Lagrangian function for the problem is

L(x,λ0,...,λm) = λ0f(x) + λ1g1(x) + ... + λmgm(x).

For each point x in X that minimizes f over X, there exist real numbers λ0, ..., λm, called Lagrange
multipliers, that satisfy these conditions simultaneously:

1. x minimizes L(y, λ0, λ1, ..., λm) over all y in X,


2. λ0 ≥ 0, λ1 ≥ 0, ..., λm ≥ 0, with at least one λk>0,
3. λ1g1(x) = 0, ..., λmgm(x) = 0 (complementary slackness).

If there exists a "strictly feasible point", i.e., a point z satisfying

g1(z) < 0,...,gm(z) < 0,

then the statement above can be upgraded to assert that λ0=1.

Conversely, if some x in X satisfies 1-3 for scalars λ0, ..., λm with λ0 = 1, then x is certain to


minimize f over X.

Methods
Convex minimization problems can be solved by the following contemporary methods:[4]

 "Bundle methods" (Wolfe, Lemaréchal, Kiwiel), and


 Subgradient projection methods (Polyak),
 Interior-point methods (Nemirovskii and Nesterov).

Other methods of interest:


 Cutting-plane methods
 Ellipsoid method
 Subgradient method
 Dual subgradients and the drift-plus-penalty method

Subgradient methods can be implemented simply and so are widely used.[5] Dual subgradient
methods are subgradient methods applied to a dual problem. The drift-plus-penaltymethod is similar
to the dual subgradient method, but takes a time average of the primal variables.

Convex minimization with good complexity: Self-


concordant barriers
The efficiency of iterative methods is poor for the class of convex problems, because this class
includes "bad guys" whose minimum cannot be approximated without a large number of function and
subgradient evaluations;[6] thus, to have practically appealing efficiency results, it is necessary to
make additional restrictions on the class of problems. Two such classes are problems special barrier
functions, first self-concordant barrier functions, according to the theory of Nesterov and Nemirovskii,
and second self-regularbarrier functions according to the theory of Terlaky and coauthors.

Quasiconvex minimization
Problems with convex level sets can be efficiently minimized, in theory. Yurii Nesterov proved that
quasi-convex minimization problems could be solved efficiently, and his results were extended by
Kiwiel.[7] However, such theoretically "efficient" methods use "divergent-series" stepsize rules, which
were first developed for classical subgradient methods. Classical subgradient methods using
divergent-series rules are much slower than modern methods of convex minimization, such as
subgradient projection methods, bundle methods of descent, and nonsmooth filter methods.

Solving even close-to-convex but non-convex problems can be computationally intractable.


Minimizing a unimodal function is intractable, regardless of the smoothness of the function,
according to results of Ivanov.[8]

Convex maximization
Conventionally, the definition of the convex optimization problem (we recall) requires that the
objective function f to be minimized and the feasible set be convex. In the special case of linear
programming (LP), the objective function is both concave and convex, and so LP can also consider
the problem of maximizing an objective function without confusion. However, for most convex
minimization problems, the objective function is not concave, and therefore a problem and then such
problems are formulated in the standard form of convex optimization problems, that is, minimizing
the convex objective function.
For nonlinear convex minimization, the associated maximization problem obtained by substituting
the supremum operator for the infimum operator is not a problem of convex optimization, as
conventionally defined. However, it is studied in the larger field of convex optimization as a problem
of convex maximization.[9]

The convex maximization problem is especially important for studying the existence of maxima.
Consider the restriction of a convex function to a compact convex set: Then, on that set, the function
attains its constrained maximum only on the boundary.[10] Such results, called "maximum principles",
are useful in the theory of harmonic functions, potential theory, and partial differential equations.

The problem of minimizing a quadratic multivariate polynomial on a cube is NP-hard.[11] In fact, in


the quadratic minimization problem, if the matrix has only one negativeeigenvalue, is NP-hard.[12]

Extensions
Advanced treatments consider convex functions that can attain positive infinity, also; the indicator
function of convex analysis is zero for every   and positive infinity otherwise.

Extensions of convex functions include biconvex, pseudo-convex, and quasi-convex functions.


Partial extensions of the theory of convex analysis and iterative methods for approximately solving
non-convex minimization problems occur in the field of generalized convexity ("abstract convex
analysis").

Optimization problem
In mathematics and computer science, an optimization problem is the problem of finding
the best solution from all feasible solutions. Optimization problems can be divided into two
categories depending on whether the variables are continuous or discrete. An optimization problem
with discrete variables is known as a combinatorial optimization problem. In a combinatorial
optimization problem, we are looking for an object such as an integer, permutation or graph from a
finite (or possibly countable infinite) set.

Contents

 1 Continuous optimization problem


 2 Combinatorial optimization problem
o 2.1 NP optimization problem
 3 References

Continuous optimization problem


The standard form of a (continuous) optimization problem is[1]

where

  is the objective function to be minimized over the variable  ,


  are called inequality constraints, and
  are called equality constraints.

By convention, the standard form defines a minimization problem. A maximization


problem can be treated by negating the objective function.

Combinatorial optimization problem


Formally, a combinatorial optimization problem   is a quadruple  , where

  is a set of instances;


 given an instance  ,   is the set of feasible solutions;
 given an instance   and a feasible solution   of  ,   denotes the measure of  , which
is usually a positive real.
  is the goal function, and is either   or  .

The goal is then to find for some instance   an optimal solution, that is, a feasible solution   with

For each combinatorial optimization problem, there is a corresponding decision problem that


asks whether there is a feasible solution for some particular measure  . For example, if there
is a graph   which contains vertices   and  , an optimization problem might be "find a path
from   to   that uses the fewest edges". This problem might have an answer of, say, 4. A
corresponding decision problem would be "is there a path from   to   that uses 10 or fewer
edges?" This problem can be answered with a simple 'yes' or 'no'.

In the field of approximation algorithms, algorithms are designed to find near-optimal solutions to
hard problems. The usual decision version is then an inadequate definition of the problem since
it only specifies acceptable solutions. Even though we could introduce suitable decision
problems, the problem is more naturally characterized as an optimization problem.[2]

NP optimization problem
An NP-optimization problem (NPO) is a combinatorial optimization problem with the following
additional conditions.[3] Note that the below referred polynomials are functions of the size of the
respective functions' inputs, not the size of some implicit set of input instances.

 the size of every feasible solution   is polynomially bounded in the size of the given
instance  ,
 the languages   and   can be recognized in polynomial time, and
 m is polynomial-time computable.

This implies that the corresponding decision problem is in NP. In computer science, interesting
optimization problems usually have the above properties and are therefore NPO problems. A
problem is additionally called a P-optimization (PO) problem, if there exists an algorithm which finds
optimal solutions in polynomial time. Often, when dealing with the class NPO, one is interested in
optimization problems for which the decision versions are NP-complete. Note that hardness relations
are always with respect to some reduction. Due to the connection between approximation algorithms
and computational optimization problems, reductions which preserve approximation in some respect
are for this subject preferred than the usual Turing and Karp reductions. An example of such a
reduction would be the L-reduction. For this reason, optimization problems with NP-complete
decision versions are not necessarily called NPO-complete.[4]

NPO is divided into the following subclasses according to their approximability:[3]

 NPO(I): Equals FPTAS. Contains the Knapsack problem.


 NPO(II): Equals PTAS. Contains the Makespan scheduling problem.
 NPO(III): :The class of NPO problems that have polynomial-time algorithms which computes
solutions with a cost at most c times the optimal cost (for minimization problems) or a cost at
least   of the optimal cost (for maximization problems). In Hromkovič's book, excluded from
this class are all NPO(II)-problems save if P=NP. Without the exclusion, equals APX.
Contains MAX-SAT and metric TSP.
 NPO(IV): :The class of NPO problems with polynomial-time algorithms approximating the
optimal solution by a ratio that is polynomial in a logarithm of the size of the input. In
Hromkovic's book, all NPO(III)-problems are excluded from this class unless P=NP. Contains
the set cover problem.
 NPO(V): :The class of NPO problems with polynomial-time algorithms approximating the optimal
solution by a ratio bounded by some function on n. In Hromkovic's book, all NPO(IV)-problems
are excluded from this class unless P=NP. Contains the TSP and Max Clique problems.

Another class of interest is NPOPB, NPO with polynomially bounded cost functions. Problems with
this condition have many desirable properties.
Proximal gradient methods for learning
Proximal gradient (forward backward splitting) methods for learning is an area of research
in optimization and statistical learning theory which studies algorithms for a general class
of convex regularization problems where the regularization penalty may not be differentiable. One
such example is   regularization (also known as Lasso) of the form

Proximal gradient methods offer a general framework for solving regularization problems from
statistical learning theory with penalties that are tailored to a specific problem application.[1]
[2]
 Such customized penalties can help to induce certain structure in problem solutions, such
as sparsity (in the case of lasso) or group structure (in the case ofgroup lasso).

Contents

 1 Relevant background
o 1.1 Moreau decomposition
 2 Lasso regularization
o 2.1 Solving for proximity operator
o 2.2 Fixed point iterative schemes
 3 Practical considerations
o 3.1 Adaptive step size
o 3.2 Elastic net (mixed norm regularization)
 4 Exploiting group structure
o 4.1 Group lasso
o 4.2 Other group structures

Relevant background
Proximal gradient methods are applicable in a wide variety of scenarios for solving convex
optimization problems of the form

where   is convex and differentiable with Lipschitz continuous gradient,   is a convex, lower


semicontinuous function which is possibly nondifferentiable, and   is some set, typically a Hilbert
space. The usual criterion of   minimizes   if and only if   in
the convex, differentiable setting is now replaced by
where   denotes the subdifferential of a real-valued, convex function  .

Given a convex function   an important operator to consider is its proximity


operator   defined by

which is well-defined because of the strict convexity of the   norm. The proximity operator can be
seen as a generalization of a projection.[1][3][4] We see that the proximity operator is important

because   is a minimizer to the problem   if and only if

 where   is any positive real number.[1]


Moreau decomposition
One important technique related to proximal gradient methods is the Moreau decomposition, which
decomposes the identity operator as the sum of two proximity operators.[1]Namely, let   
be a lower semicontinuous, convex function on a vector space  . We define its Fenchel
conjugate   to be the function

The general form of Moreau's decomposition states that for any   


and any   that

which for   implies that  .[1][3] The Moreau decomposition can


be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous
with the fact that proximity operators are generalizations of projections.[1]

In certain situations it may be easier to compute the proximity operator for the conjugate   instead
of the function  , and therefore the Moreau decomposition can be applied. This is the case
for group lasso.

Lasso regularization
Consider the regularized empirical risk minimization problem with square loss and with the   
norm as the regularization penalty:
where   The   regularization problem is sometimes referred to
as lasso (least absolute shrinkage and selection operator).[5] Such   regularization problems are
interesting because they induce sparse solutions, that is, solutions   to the minimization problem
have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-
convex problem

where   denotes the   "norm", which is the number of nonzero entries of the vector  . Sparse
solutions are of particular interest in learning theory for interpretability of results: a sparse solution
can identify a small number of important factors.[5]

Solving for   proximity operator[edit]


For simplicity we restrict our attention to the problem where  . To solve the problem

we consider our objective function in two parts: a convex, differentiable

term   and a convex function  . Note that   is


not strictly convex.

Let us compute the proximity operator for  . First we find an alternative characterization of the
proximity operator   as follows:

For   it is easy to compute  : the  th entry of   is precisely

Using the recharacterization of the proximity operator given above, for the choice
of   and   we have that   is defined entrywise by
which is known as the soft thresholding operator  .[1][6]

Fixed point iterative schemes


To finally solve the lasso problem we consider the fixed point equation shown earlier:

Given that we have computed the form of the proximity operator explicitly, then we can define a
standard fixed point iteration procedure. Namely, fix some initial  , and for   
define

Note here the effective trade-off between the empirical error term   and the regularization
penalty  . This fixed point method has decoupled the effect of the two different convex

functions which comprise the objective function into a gradient descent step ( )
and a soft thresholding step (via  ).

Convergence of this fixed point scheme is well-studied in the literature[1][6] and is guaranteed under
appropriate choice of step size   and loss function (such as the square loss taken
here). Accelerated methods were introduced by Nesterov in 1983 which improve the rate of
convergence under certain regularity assumptions on  .[7] Such methods have been studied
extensively in previous years.[8] For more general learning problems where the proximity operator
cannot be computed explicitly for some regularization term  , such fixed point schemes can still be
carried out using approximations to both the gradient and the proximity operator.[4][9]

Practical considerations
There have been numerous developments within the past decade in convex optimization techniques
which have influenced the application of proximal gradient methods in statistical learning theory.
Here we survey a few important topics which can greatly improve practical algorithmic performance
of these methods.[2][10]

Adaptive step size


In the fixed point iteration scheme
one can allow variable step size   instead of a constant  . Numerous adaptive step size schemes
have been proposed throughout the literature.[1][4][11][12] Applications of these schemes[2][13] suggest that
these can offer substantial improvement in number of iterations required for fixed point convergence.

Elastic net (mixed norm regularization)


Elastic net regularization offers an alternative to pure   regularization. The problem of lasso ( )
regularization involves the penalty term  , which is not strictly convex. Hence,

solutions to   where   is some empirical loss function, need not be unique.
This is often avoided by the inclusion of an additional strictly convex term, such as an   norm
regularization penalty. For example, one can consider the problem

where   For   the penalty

term   is now strictly convex, and hence the minimization problem
now admits a unique solution. It has been observed that for sufficiently small  , the additional
penalty term   acts as a preconditioner and can substantially improve convergence while not
adversely affecting the sparsity of solutions.[2][14]

Exploiting group structure[edit]


Proximal gradient methods provide a general framework which is applicable to a wide variety of
problems in statistical learning theory. Certain problems in learning can often involve data which has
additional structure that is known a priori. In the past several years there have been new
developments which incorporate information about group structure to provide methods which are
tailored to different applications. Here we survey a few such methods.

Group lasso
Group lasso is a generalization of the lasso method when features are grouped into disjoint blocks.
[15]
 Suppose the features are grouped into blocks  . Here we take as a regularization
penalty

which is the sum of the   norm on corresponding feature vectors for the different groups. A similar
proximity operator analysis as above can be used to compute the proximity operator for this penalty.
Where the lasso penalty has a proximity operator which is soft thresholding on each individual
component, the proximity operator for the group lasso is soft thresholding on each group. For the

group   we have that proximity operator of   is given by

where   is the  th group.

In contrast to lasso, the derivation of the proximity operator for group lasso relies on the Moreau
decomposition. Here the proximity operator of the conjugate of the group lasso penalty becomes a
projection onto the ball of a dual norm

Nonlinear conjugate gradient method


In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate
gradient method to nonlinear optimization. For a quadratic function  :

The minimum of   is obtained when the gradient is 0:

Whereas linear conjugate gradient seeks a solution to the linear equation 


, the nonlinear conjugate gradient method is generally used to find the local minimumof a
nonlinear function using its gradient   alone. It works when the function is
approximately quadratic near the minimum, which is the case when the function is twice
differentiable at the minimum.

Given a function   of   variables to minimize, its gradient   indicates the direction
of maximum increase. One simply starts in the opposite (steepest descent) direction:

with an adjustable step length   and performs a line search in this direction until it
reaches the minimum of  :

After this first iteration in the steepest direction  , the following steps constitute
one iteration of moving along a subsequent conjugate direction  ,
where  :
1. Calculate the steepest direction:  ,
2. Compute   according to one of the formulas below,
3. Update the conjugate direction: 

4. Perform a line search: optimize  ,


5. Update the position:  ,

With a pure quadratic function the minimum is reached within N iterations (excepting
roundoff error), but a non-quadratic function will make slower progress. Subsequent
search directions lose conjugacy requiring the search direction to be reset to the
steepest descent direction at least every N iterations, or sooner if progress stops.
However, resetting every iteration turns the method into steepest descent. The
algorithm stops when it finds the minimum, determined when no progress is made
after a direction reset (i.e. in the steepest descent direction), or when some
tolerance criterion is reached.

Within a linear approximation, the parameters   and   are the same as in the
linear conjugate gradient method but have been obtained with line searches. The
conjugate gradient method can follow narrow (ill-conditioned) valleys where
the steepest descent method slows down and follows a criss-cross pattern.

Four of the best known formulas for   are named after their developers and are
given by the following formulas:

 Fletcher–Reeves:

 Polak–Ribière:

 Hestenes-Stiefel:

 Dai–Yuan:

.
These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred
formula is a matter of heuristics or taste. A popular choice is   which provides
a direction reset automatically.

Newton based methods - Newton-Raphson Algorithm, Quasi-Newton methods (e.g., BFGS method)


- tend to converge in fewer iterations, although each iteration typically requires more computation
than a conjugate gradient iteration as Newton-like methods require computing the Hessian (matrix of
second derivatives) in addition to the gradient. Quasi-Newton methods also require more memory to
operate (see also the limited memory L-BFGS method).

Vehicle routing problem


The vehicle routing problem (VRP) is a combinatorial optimization and integer
programming problem which asks "What is the optimal set of routes for a fleet of vehicles to traverse
in order to deliver to a given set of customers?". It generalises the well-known Travelling Salesman
Problem (TSP). It first appeared in a paper by George Dantzig and John Ramser in 1959,[1] in which
first algorithmic approach was written and was applied to petrol deliveries. Often, the context is that
of delivering goods located at a central depot to customers who have placed orders for such goods.
The objective of the VRP is to minimize the total route cost. In 1964, Clarke and Wright improved on
Dantzig and Ramser's approach using an effective greedy approach called the savings algorithm.

Determining the optimal solution is an NP-hard[2] problem in combinatorial optimization, so the size of


problems that can be solved optimally is limited . The commercial solvers therefore tend to use
heuristics due to the size of real world VRPs and the frequency that they may have to be solved.

The VRP has many obvious applications in industry. In fact the use of computer optimisation
programs can give savings of 5% to a company [3] as transportation is usually a significant
component of the cost of a product (10%) [4] - indeed the transportation sector makes up 10% of
the EU's GDP. Consequently, any savings created by the VRP, even less than 5%, are significant.[3]

Contents

 1 Setting up the problem


 2 VRP flavours
 3 Exact solution methods
o 3.1 Vehicle flow formulations
 4 Free software for solving VRP
 5 See also
 6 References
 7 Further reading
 8 External links
Setting up the problem
The VRP concerns the service of a delivery company. How things are delivered from one or
more depots which has a given set of home vehicles and operated by a set of driverswho can move
on a given road network to a set of customers. It asks for a determination of a set of routes, S, (one
route for each vehicle that must start and finish at its own depot) such that all customers'
requirements and operational constraints are satisfied and the global transportation cost is
minimised. This cost may be monetary, distance or otherwise.[2]

The road network can be described using a graph where the arcs are roads and vertices are
junctions between them. The arcs may be directed or undirected due to the possible presence of
one way streets or different costs in each direction. Each arc has an associated cost which is
generally its length or travel time which may be dependent on vehicle type.[2]

To know the global cost of each route, the travel cost and the travel time between each customer
and the depot must be known. To do this our original graph is transformed into one where the
vertices are the customers and depot and the arcs are the roads between them. The cost on each
arc is the lowest cost between the two points on the original road network. This is easy to do
as shortest path problems are relatively easy to solve. This transforms the sparse original graph into
a complete graph. For each pair of vertices iand j, there exists an arc (i,j) of the complete graph
whose cost is written as   and is defined to be the cost of shortest path from i to j. The travel
time   is the sum of the travel times of the arcs on the shortest path from i to j on the original road
graph.

Sometimes it is impossible to satisfy all of a customer's demands and in such cases solvers may
reduce some customers' demands or leave some customers unserved. To deal with these situations
a priority variable for each customer can be introduced or associated penalties for the partial or lack
of service for each customer given [2]

The objective function of a VRP can be very different depending on the particular application of the
result but a few of the more common objectives are:[2]

 Minimise the global transportation cost based on the global distance travelled as well as the
fixed costs associated with the used vehicles and drivers
 Minimise the number of vehicles needed to serve all customers
 Least variation in travel time and vehicle load
 Minimise penalties for low quality service

VRP flavours
A map showing the relationship between common VRP subproblems.

Several variations and specializations of the vehicle routing problem exist:

 Vehicle Routing Problem with Pickup and Delivery (VRPPD): A number of goods need to be
moved from certain pickup locations to other delivery locations. The goal is to find optimal routes
for a fleet of vehicles to visit the pickup and drop-off locations.
 Vehicle Routing Problem with LIFO: Similar to the VRPPD, except an additional restriction is
placed on the loading of the vehicles: at any delivery location, the item being delivered must be
the item most recently picked up. This scheme reduces the loading and unloading times at
delivery locations because there is no need to temporarily unload items other than the ones that
should be dropped off.
 Vehicle Routing Problem with Time Windows (VRPTW): The delivery locations have time
windows within which the deliveries (or visits) must be made.
 Capacitated Vehicle Routing Problem: CVRP or CVRPTW. The vehicles have limited carrying
capacity of the goods that must be delivered.
 Vehicle Routing Problem with Multiple Trips (VRPMT): The vehicles can do more than one
route.
 Open Vehicle Routing Problem (OVRP): Vehicles are not required to return to the depot.

Several software vendors have built software products to solve the various VRP problems.
Numerous articles are available for more detail on their research and results.

Although VRP is related to the Job Shop Scheduling Problem, the two problems are typically solved
using different techniques.[5]

Exact solution methods


There are thee main different approaches to modelling the VRP

1. Vehicle flow formulations - this uses integer variables associated with each arc that count
the number of times that the edge is traversed by a vehicle. It is generally used for basic
VRPs. This is good for cases where the solution cost can be expressed as the sum of any
costs associated with the arcs. However it can't be used to handle many practical
applications [2].

1. Commodity flow formulations - additional integer variables are associated with the arcs or
edges which represent the flow of commodities along the paths travelled by the vehicles.
This has only recently been used to find an exact solution [2].

1. Set partitioning problem - These have an exponential number of binary variables which are
each associated with a different feasible circuit. The VRP is then instead formulated as a set
partitioning problem which asks what is the collection of circuits with minimum cost that
satisfy the VRP constraints. This allows for very general route costs [2].
Vehicle flow formulations
The formulation of the TSP by Dantzig, Fulkerson and Johnson was extended to create the two
index vehicle flow formulations for the VRP

 subject to

Constraint 1 and 2 say that exactly one arc enters and exactly one leaves each vertex associated
with a customer respectively. Constraint 3 and 4 says that the number of vehicles leaving the depot
is the same as the number entering. We say that 3, 4 and 5 are the capacity cut constraints, these
impose that the routes must be connected and that the demand on each route must not exceed the
vehicle capacity [2].

One arbitrary constraint among the 2|V| constraints is actually implied by the remaining 2|V|-1 ones


so it can be removed. Each cut defined by a customer set S$ is crossed by a number of arcs not
smaller than r(s) (minimum number of vehicles needed to serve set S)[2].

An alternative formulation may be obtained by transforming the capacity cut constraints into

generalised subtour elimination constraints (GSECs).  which imposes


that at least r(s) arcs leave each customer set S[2].

GCECs and CCCs have an exponential number of constraints so it is practically impossible to solve
the linear relaxation. A possible way to solve this is to consider a limited subset of these constraints
and add the rest if needed.

A different method again is to use a family of constraints which have a polynomial cardinality which
are known as the MTZ constraints, they were first proposed for the TSP [6] and subsequently
extended by Christofides, Mingozzi and
Toth [7]. 

where   is an additional continuous variable which represents the load of the
vehicle after visiting customer i and d_i is the demand of customer i. These impose both the
connectivity and the capacity requirements. When   constraint then i 'is not binding'
since   and   whereas   they impose that  .

These have been used extensively to model the basic VRP (CVRP) and the VRPB. However their
power is limited to these simple problems. They can only be used when the cost of the solution can
be expressed as the sum of the costs of the arc costs. We cannot also know which vehicle traverses
each arc. Hence we cannot use this for more complex models where the cost and or feasibility is
dependent on the order of the customers or the vehicles used [2].

Travelling salesman problem


The travelling salesman problem (TSP) asks the following question: Given a list of cities and the
distances between each pair of cities, what is the shortest possible route that visits each city exactly
once and returns to the origin city? It is an NP-hard problem in combinatorial optimization, important
in operations research and theoretical computer science.
Solution of a travelling salesman problem

TSP is a special case of the travelling purchaser problem and the Vehicle routing problem.

In the theory of computational complexity, the decision version of the TSP (where, given a length L,
the task is to decide whether the graph has any tour shorter than L) belongs to the class of NP-
complete problems. Thus, it is possible that the worst-case running time for any algorithm for the
TSP increases superpolynomially (perhaps, specifically, exponentially) with the number of cities.

The problem was first formulated in 1930 and is one of the most intensively studied problems in
optimization. It is used as a benchmark for many optimization methods. Even though the problem is
computationally difficult, a large number of heuristics and exact methods are known, so that some
instances with tens of thousands of cities can be solved completely and even problems with millions
of cities can be approximated within a small fraction of 1%.[1]

The TSP has several applications even in its purest formulation, such as planning, logistics, and the
manufacture of microchips. Slightly modified, it appears as a sub-problem in many areas, such
as DNA sequencing. In these applications, the concept city represents, for example, customers,
soldering points, or DNA fragments, and the concept distance represents travelling times or cost, or
a similarity measure between DNA fragments. The TSP also appears in astronomy, as astronomers
observing many sources will want to minimise the time spent slewing the telescope between the
sources. In many applications, additional constraints such as limited resources or time windows may
be imposed.

Contents

 1 History
 2 Description
o 2.1 As a graph problem
o 2.2 Asymmetric and symmetric
o 2.3 Related problems
 3 Integer linear programming formulation
 4 Computing a solution
o 4.1 Exact algorithms
o 4.2 Heuristic and approximation algorithms
 4.2.1 Constructive heuristics
 4.2.2 Christofides' algorithm for the TSP
 4.2.3 Iterative improvement
 4.2.4 Randomised improvement
 4.2.4.1 Ant colony optimization
 5 Special cases of the TSP
o 5.1 Metric TSP
o 5.2 Euclidean TSP
o 5.3 Asymmetric TSP
 5.3.1 Solving by conversion to symmetric TSP
o 5.4 Analyst's travelling salesman problem
o 5.5 TSP path length for random sets of points in a square
 5.5.1 Upper bound
 5.5.2 Lower bound
 6 Computational complexity
o 6.1 Complexity of approximation
 7 Human performance on TSP
 8 Benchmarks
 9 Popular culture
 10 See also
 11 Notes
 12 References
 13 Further reading
 Description
As a graph problem

Symmetric TSP with four cities

TSP can be modelled as an undirected weighted graph, such that cities are the graph's vertices,
paths are the graph's edges, and a path's distance is the edge's length. It is a minimization problem
starting and finishing at a specified vertex after having visited each othervertex exactly once. Often,
the model is a complete graph (i.e. each pair of vertices is connected by an edge). If no path exists
between two cities, adding an arbitrarily long edge will complete the graph without affecting the
optimal tour.

Asymmetric and symmetric


In the symmetric TSP, the distance between two cities is the same in each opposite direction,
forming an undirected graph. This symmetry halves the number of possible solutions. In
the asymmetric TSP, paths may not exist in both directions or the distances might be different,
forming a directed graph. Traffic collisions, one-way streets, and airfares for cities with different
departure and arrival fees are examples of how this symmetry could break down.

Related problems

 An equivalent formulation in terms of graph theory is: Given a complete weighted graph (where


the vertices would represent the cities, the edges would represent the roads, and the weights
would be the cost or distance of that road), find a Hamiltonian cycle with the least weight.

 The requirement of returning to the starting city does not change the computational
complexity of the problem, see Hamiltonian path problem.

 Another related problem is the bottleneck travelling salesman problem (bottleneck TSP): Find a


Hamiltonian cycle in a weighted graph with the minimal weight of the weightiest edge. The
problem is of considerable practical importance, apart from evident transportation and logistics
areas. A classic example is in printed circuitmanufacturing: scheduling of a route of
the drill machine to drill holes in a PCB. In robotic machining or drilling applications, the "cities"
are parts to machine or holes (of different sizes) to drill, and the "cost of travel" includes time for
retooling the robot (single machine job sequencing problem).[9]

 The generalized travelling salesman problem, also known as the "travelling politician problem",
deals with "states" that have (one or more) "cities" and the salesman has to visit exactly one
"city" from each "state". One application is encountered in ordering a solution to the cutting stock
problem in order to minimise knife changes. Another is concerned with drilling
in semiconductor manufacturing, see e.g., U.S. Patent 7,054,798. Surprisingly, Behzad and
Modarres demonstrated that the generalised travelling salesman problem can be transformed
into a standard travelling salesman problem with the same number of cities, but a
modified distance matrix.

 The sequential ordering problem deals with the problem of visiting a set of cities where
precedence relations between the cities exist.

 The travelling purchaser problem deals with a purchaser who is charged with purchasing a set of
products. He can purchase these products in several cities, but at different prices and not all
cities offer the same products. The objective is to find a route between a subset of the cities,
which minimizes total cost (travel cost + purchasing cost) and which enables the purchase of all
required products.

Integer linear programming formulation


TSP can be formulated as an integer linear program.[10][11][12] Label the cities with the numbers
0, ..., n and define:

For i = 0, ..., n, let   be an artificial variable, and finally take   to be the distance from city i to
city j. Then TSP can be written as the following integer linear programming problem:
The first set of equalities requires that each city be arrived at from exactly one other city,
and the second set of equalities requires that from each city there is a departure to exactly
one other city. The last constraints enforce that there is only a single tour covering all cities,
and not two or more disjointed tours that only collectively cover all cities. To prove this, it is
shown below (1) that every feasible solution contains only one closed sequence of cities,
and (2) that for every single tour covering all cities, there are values for the dummy
variables   that satisfy the constraints.

To prove that every feasible solution contains only one closed sequence of cities, it suffices
to show that every subtour in a feasible solution passes through city 0 (noting that the
equalities ensure there can only be one such tour). For if we sum all the inequalities
corresponding to   for any subtour of k steps not passing through city 0, we obtain:

which is a contradiction.

It now must be shown that for every single tour covering all cities, there are values for
the dummy variables   that satisfy the constraints.

Without loss of generality, define the tour as originating (and ending) at city 0.
Choose   if city i is visited in step t (i, t = 1, 2, ..., n). Then

since   can be no greater than n and   can be no less than 1; hence the


constraints are satisfied whenever   For  , we have:

satisfying the constraint.


Computing a solution
The traditional lines of attack for the NP-hard problems are the following:

 Devising algorithms for finding exact solutions (they will work reasonably
fast only for small problem sizes).
 Devising "suboptimal" or heuristic algorithms, i.e., algorithms that deliver
either seemingly or probably good solutions, but which could not be proved
to be optimal.
 Finding special cases for the problem ("subproblems") for which either
better or exact heuristics are possible.
Exact algorithms
The most direct solution would be to try all permutations (ordered combinations)
and see which one is cheapest (using brute force search). The running time for
this approach lies within a polynomial factor of  , the factorial of the
number of cities, so this solution becomes impractical even for only 20 cities.

One of the earliest applications of dynamic programming is


the Held–Karp algorithm that solves the problem in time
.[13]

Solution to a symmetric TSP with 7 cities using brute force search. Note: Number of
permutations: (7-1)!/2 = 360
Improving these time bounds seems to be difficult. For example, it has not been
determined whether an exact algorithm for TSP that runs in time   
exists. [14]

Other approaches include:

 Various branch-and-bound algorithms, which can be used to process TSPs


containing 40–60 cities.

Solution of a TSP with 7 cities using a simple Branch and bound algorithm. Note: The
number of permutations is much less than Brute force search

 Progressive improvement algorithms which use techniques reminiscent


of linear programming. Works well for up to 200 cities.
 Implementations of branch-and-bound and problem-specific cut generation
(branch-and-cut[15]); this is the method of choice for solving large instances.
This approach holds the current record, solving an instance with 85,900
cities, see Applegate et al. (2006).

An exact solution for 15,112 German towns from TSPLIB was found in 2001
using the cutting-plane method proposed by George Dantzig,Ray Fulkerson,
and Selmer M. Johnson in 1954, based on linear programming. The
computations were performed on a network of 110 processors located at Rice
University and Princeton University (see the Princeton external link). The total
computation time was equivalent to 22.6 years on a single 500 MHz Alpha
processor. In May 2004, the travelling salesman problem of visiting all 24,978
towns in Sweden was solved: a tour of length approximately 72,500 kilometers
was found and it was proven that no shorter tour exists.[16] In March 2005, the
travelling salesman problem of visiting all 33,810 points in a circuit board was
solved using Concorde TSP Solver: a tour of length 66,048,945 units was found
and it was proven that no shorter tour exists. The computation took
approximately 15.7 CPU-years (Cook et al. 2006). In April 2006 an instance
with 85,900 points was solved using Concorde TSP Solver, taking over 136
CPU-years, seeApplegate et al. (2006).

Heuristic and approximation algorithms


Various heuristics and approximation algorithms, which quickly yield good
solutions have been devised. Modern methods can find solutions for extremely
large problems (millions of cities) within a reasonable time which are with a high
probability just 2–3% away from the optimal solution.[8]

Several categories of heuristics are recognized.

Constructive heuristics[edit]

Nearest Neighbor algorithm for a TSP with 7 cities. The solution changes as the starting
point is changed

The nearest neighbor (NN) algorithm (or so-called greedy algorithm) lets the


salesman choose the nearest unvisited city as his next move. This algorithm
quickly yields an effectively short route. For N cities randomly distributed on a
plane, the algorithm on average yields a path 25% longer than the shortest
possible path.[17]However, there exist many specially arranged city distributions
which make the NN algorithm give the worst route (Gutin, Yeo, and Zverovich,
2002). This is true for both asymmetric and symmetric TSPs (Gutin and Yeo,
2007). Rosenkrantz et al. [1977] showed that the NN algorithm has the
approximation factor  for instances satisfying the triangle inequality.
A variation of NN algorithm, called Nearest Fragment (NF) operator, which
connects a group (fragment) of nearest unvisited cities, can find shorter route
with successive iterations.[18] The NF operator can also be applied on an initial
solution obtained by NN algorithm for further improvement in an elitist model,
where only better solutions are accepted.

Constructions based on a minimum spanning tree have an approximation ratio


of 2. The Christofides algorithmachieves a ratio of 1.5.

The bitonic tour of a set of points is the minimum-perimeter monotone


polygon that has the points as its vertices; it can be computed efficiently
by dynamic programming.

Another constructive heuristic, Match Twice and Stitch (MTS) (Kahng, Reda
2004 [19]), performs two sequentialmatchings, where the second matching is
executed after deleting all the edges of the first matching, to yield a set of
cycles. The cycles are then stitched to produce the final tour.

Christofides' algorithm for the TSP

The Christofides algorithm follows a similar outline but combines the minimum


spanning tree with a solution of another problem, minimum-weight perfect
matching. This gives a TSP tour which is at most 1.5 times the optimal. The
Christofides algorithm was one of the first approximation algorithms, and was in
part responsible for drawing attention to approximation algorithms as a practical
approach to intractable problems. As a matter of fact, the term "algorithm" was
not commonly extended to approximation algorithms until later; the Christofides
algorithm was initially referred to as the Christofides heuristic.

This algorithm looks at things differently by using a result from graph theory
which helps improve on the LB of the TSP which originated from doubling the
cost of the minimum spanning tree. Given an Eulerian graph we can find
an Eulerian tour in O(n) time.[5] So if we had an Eulerian graph with cities from a
TSP as vertices then we can easily see that we could use such a method for
finding an Eulerian tour to find a TSP solution. By triangular inequality we know
that the TSP tour can be no longer than the Eulerian tour and as such we have
a LB for the TSP. Such a method is described below.
Using a shortcut heuristic on the graph created by the matching below

1. Find a minimum spanning tree for the problem


2. Create duplicates for every edge to create an Eulerian graph
3. Find an Eulerian tour for this graph
4. Convert to TSP: if a city if visited twice, create a shortcut from the city
before this in the tour to the one after this.

To improve our lower bound, we therefore need a better way of creating an


Eulerian graph. But by triangular inequality, the best Eulerian graph must have
the same cost as the best travelling salesman tour, hence finding optimal
Eulerian graphs is at least as hard as TSP. One way of doing this that has been
proposed is by the concept of minimum weight matching for the creation of
which there exist algorithms of O($n^3$).[5]
Creating a matching

To make a graph into an Eulerian graph, one starts with the minimum spanning
tree. Then all the vertices of odd order must be made even. So a matching for
the odd degree vertices must be added which increases the order of every odd
degree vertex by one.[5] This leaves us with a graph where every vertex is of
even order which is thus Eulerian. Now we can adapt the above method to give
Christofides' algorithm,

1. Find a minimum spanning tree for the problem


2. Create a matching for the problem with the set of cities of odd order.
3. Find an Eulerian tour for this graph
4. Convert to TSP using shortcuts.
Iterative improvement
An example of a 2-opt iteration

Pairwise exchange
The pairwise exchange or 2-opt technique involves iteratively removing two edges and replacing
these with two different edges that reconnect the fragments created by edge removal into a new and
shorter tour. This is a special case of the k-opt method. Note that the label Lin–Kernighan is an often
heard misnomer for 2-opt. Lin–Kernighan is actually the more general k-opt method.

For Euclidean instances, 2-opt heuristics give on average solutions that are about 5% better than
Christofides' algorithm. If we start with an initial solution made with a greedy algorithm, the average
number of moves greatly decreases again and is O(n). For random starts however, the average
number of moves is O(n log(n)). However whilst in order this is a small increase in size, the initial
number of moves for small problems is 10 times as big for a random start compared to one made
from a greedy heuristic. This is because such 2-opt heuristics exploit `bad' parts of a solution such
as crossings. These types of heuristics are often used within Vehicle routing problemheuristics to
reoptimise route solutions [20].

k-opt heuristic, or Lin–Kernighan heuristics


Take a given tour and delete k mutually disjoint edges. Reassemble the remaining fragments into a
tour, leaving no disjoint subtours (that is, don't connect a fragment's endpoints together). This in
effect simplifies the TSP under consideration into a much simpler problem. Each fragment endpoint
can be connected to 2k − 2 other possibilities: of 2k total fragment endpoints available, the two
endpoints of the fragment under consideration are disallowed. Such a constrained 2k-city TSP can
then be solved with brute force methods to find the least-cost recombination of the original
fragments. The k-opt technique is a special case of the V-opt or variable-opt technique. The most
popular of the k-opt methods are 3-opt, and these were introduced by Shen Lin of Bell Labs in 1965.
There is a special case of 3-opt where the edges are not disjoint (two of the edges are adjacent to
one another). In practice, it is often possible to achieve substantial improvement over 2-opt without
the combinatorial cost of the general 3-opt by restricting the 3-changes to this special subset where
two of the removed edges are adjacent. This so-called two-and-a-half-opt typically falls roughly
midway between 2-opt and 3-opt, both in terms of the quality of tours achieved and the time required
to achieve those tours.
V-opt heuristic
The variable-opt method is related to, and a generalization of the k-opt method. Whereas
the k-opt methods remove a fixed number (k) of edges from the original tour, the variable-opt
methods do not fix the size of the edge set to remove. Instead they grow the set as the
search process continues. The best known method in this family is the Lin–Kernighan
method (mentioned above as a misnomer for 2-opt). Shen Lin and Brian Kernighan first
published their method in 1972, and it was the most reliable heuristic for solving travelling
salesman problems for nearly two decades. More advanced variable-opt methods were
developed at Bell Labs in the late 1980s by David Johnson and his research team. These
methods (sometimes called Lin–Kernighan–Johnson) build on the Lin–Kernighan method,
adding ideas from tabu search andevolutionary computing. The basic Lin–Kernighan
technique gives results that are guaranteed to be at least 3-opt. The Lin–Kernighan–Johnson
methods compute a Lin–Kernighan tour, and then perturb the tour by what has been
described as a mutation that removes at least four edges and reconnecting the tour in a
different way, then V-opting the new tour. The mutation is often enough to move the tour
from the local minimum identified by Lin–Kernighan. V-opt methods are widely considered
the most powerful heuristics for the problem, and are able to address special cases, such as
the Hamilton Cycle Problem and other non-metric TSPs that other heuristics fail on. For
many years Lin–Kernighan–Johnson had identified optimal solutions for all TSPs where an
optimal solution was known and had identified the best known solutions for all other TSPs on
which the method had been tried.
Randomised improvement

Optimized Markov chain algorithms which use local searching heuristic sub-algorithms can find a
route extremely close to the optimal route for 700 to 800 cities.

TSP is a touchstone for many general heuristics devised for combinatorial optimization such
as genetic algorithms, simulated annealing, Tabu search, ant colony optimization,river formation
dynamics (see swarm intelligence) and the cross entropy method.

Ant colony optimization


Main article: Ant colony optimization algorithms

Artificial intelligence researcher Marco Dorigo described in 1997 a method of heuristically generating


"good solutions" to the TSP using a simulation of an ant colony called ACS(Ant Colony System).[21] It
models behavior observed in real ants to find short paths between food sources and their nest,
an emergent behaviour resulting from each ant's preference to follow trail pheromones deposited by
other ants.

ACS sends out a large number of virtual ant agents to explore many possible routes on the map.
Each ant probabilistically chooses the next city to visit based on a heuristic combining the distance
to the city and the amount of virtual pheromone deposited on the edge to the city. The ants explore,
depositing pheromone on each edge that they cross, until they have all completed a tour. At this
point the ant which completed the shortest tour deposits virtual pheromone along its complete tour
route (global trail updating). The amount of pheromone deposited is inversely proportional to the tour
length: the shorter the tour, the more it deposits.

Ant Colony Optimization Algorithm for a TSP with 7 cities: Red and thicklines in the pheromone map indicate
presence of more pheromone
Special cases of the TSP
Metric TSP
In the metric TSP, also known as delta-TSP or Δ-TSP, the intercity distances satisfy the triangle
inequality.

A very natural restriction of the TSP is to require that the distances between cities form a metric to
satisfy the triangle inequality; that is the direct connection from A to B is never farther than the route
via intermediate C:

The edge spans then build a metric on the set of vertices. When the cities are viewed as points in
the plane, many natural distance functions are metrics, and so many natural instances of TSP satisfy
this constraint.

The following are some examples of metric TSPs for various metrics.

 In the Euclidean TSP (see below) the distance between


two cities is the Euclidean distance between the
corresponding points.
 In the rectilinear TSP the distance between two cities is the
sum of the differences of their x- and y-coordinates. This
metric is often called the Manhattan distance or city-block
metric.
 In the maximum metric, the distance between two points is
the maximum of the absolute values of differences of
their x- and y-coordinates.

The last two metrics appear for example in routing a machine that drills a given set of holes in
a printed circuit board. The Manhattan metric corresponds to a machine that adjusts first one co-
ordinate, and then the other, so the time to move to a new point is the sum of both movements. The
maximum metric corresponds to a machine that adjusts both co-ordinates simultaneously, so the
time to move to a new point is the slower of the two movements.

In its definition, the TSP does not allow cities to be visited twice, but many applications do not need
this constraint. In such cases, a symmetric, non-metric instance can be reduced to a metric one.
This replaces the original graph with a complete graph in which the inter-city distance   is
replaced by the shortest path between A and B in the original graph.
Euclidean TSP
The Euclidean TSP, or planar TSP, is the TSP with the distance being the ordinary Euclidean
distance.

The Euclidean TSP is a particular case of the metric TSP, since distances in a plane obey the
triangle inequality.

Like the general TSP, the Euclidean TSP is NP-hard. With discretized metric (distances rounded up
to an integer), the problem is NP-complete.[22] However, in some respects it seems to be easier than
the general metric TSP. For example, the minimum spanning tree of the graph associated with an
instance of the Euclidean TSP is a Euclidean minimum spanning tree, and so can be computed in
expected O (n log n) time for n points (considerably less than the number of edges). This enables
the simple 2-approximation algorithm for TSP with triangle inequality above to operate more quickly.

In general, for any c > 0, where d is the number of dimensions in the Euclidean space, there is a
polynomial-time algorithm that finds a tour of length at most (1 + 1/c) times the optimal for geometric
instances of TSP in

time; this is called a polynomial-time approximation scheme (PTAS).[23] Sanjeev Arora and Joseph S.


B. Mitchell were awarded the Gödel Prize in 2010 for their concurrent discovery of a PTAS for the
Euclidean TSP.

In practice, simpler heuristics with weaker guarantees continue to be used.

Asymmetric TSP
In most cases, the distance between two nodes in the TSP network is the same in both directions.
The case where the distance from A to B is not equal to the distance from B toA is called asymmetric
TSP. A practical application of an asymmetric TSP is route optimisation using street-level routing
(which is made asymmetric by one-way streets, slip-roads, motorways, etc.).

Solving by conversion to symmetric TSP

Solving an asymmetric TSP graph can be somewhat complex. The following is a 3×3 matrix
containing all possible path weights between the nodes A, B and C. One option is to turn an
asymmetric matrix of size N into a symmetric matrix of size 2N.[24]

Asymmetric path weights

A B C
A 1 2

B 6 3

C 5 4

To double the size, each of the nodes in the graph is duplicated, creating a second ghost node.
Using duplicate points with very low weights, such as −∞, provides a cheap route "linking" back to
the real node and allowing symmetric evaluation to continue. The original 3×3 matrix shown above is
visible in the bottom left and the inverse of the original in the top-right. Both copies of the matrix have
had their diagonals replaced by the low-cost hop paths, represented by −∞.

Symmetric path weights

A B C A′ B′ C′


A 6 5

B 1 −∞ 4

C 2 3 −∞

A
−∞ 1 2

B
6 −∞ 3

C
5 4 −∞

The original 3×3 matrix would produce two Hamiltonian cycles (a path that visits every node once),
namely A-B-C-A [score 9] and A-C-B-A [score 12]. Evaluating the 6×6 symmetric version of the
same problem now produces many paths, including A-A′-B-B′-C-C′-A, A-B′-C-A′-A, A-A′-B-C′-A [all
score 9 – ∞].

The important thing about each new sequence is that there will be an alternation between dashed (A
′,B′,C′) and un-dashed nodes (A, B, C) and that the link to "jump" between any related pair (A-A′) is
effectively free. A version of the algorithm could use any weight for the A-A′ path, as long as that
weight is lower than all other path weights present in the graph. As the path weight to "jump" must
effectively be "free", the value zero (0) could be used to represent this cost—if zero is not being used
for another purpose already (such as designating invalid paths). In the two examples above, non-
existent paths between nodes are shown as a blank square.

Analyst's travelling salesman problem


There is an analogous problem in geometric measure theory which asks the following: under what
conditions may a subset E of Euclidean space be contained in a rectifiable curve (that is, when is
there a curve with finite length that visits every point in E)? This problem is known as the analyst's
travelling salesman problem

TSP path length for random sets of points in a square[edit]


Suppose   are   independent random variables with uniform distribution in the
square  , and let   be the shortest path length (i.e. TSP solution) for this set of points,
according to the usual Euclidean distance. It is known[25] that, almost surely,

where   is a positive constant that is not known explicitly. Since   (see below), it

follows from bounded convergence theorem that  , hence lower and upper


bounds on   follow from bounds on  .

Upper bound

One has  , and therefore  , by using a naive path which visits monotonically
the points inside each of   slices of width   in the square.

Few [26] proved  , hence  , later improved by Karloff


(1987):  .

The currently [27] best upper bound is  .


Lower bound

By observing that   is greater than   times the distance between   and the closest
point  , one gets (after a short computation)

A better lower bound is obtained[25] by observing that   is greater than   times the sum of the
distances between   and the closest and second closest points  , which gives

The currently [27] best lower bound is

Held and Karp[28] gave a polynomial-time algorithm that provides numerical lower bounds for  , and
thus for   which seem to be good up to more or less 1%.[29] In particular, David S.
Johnson[30] obtained a lower bound by computer experiment:

where 0.522 comes from the points near square boundary which have fewer neighbors, and
Christine L. Valenzuela and Antonia J. Jones [31] obtained the following other numerical lower bound:

.
Computational complexity

The problem has been shown to be NP-hard (more precisely, it is complete for the complexity
class FPNP; see function problem), and the decision problem version ("given the costs and a
number x, decide whether there is a round-trip route cheaper than x") is NP-complete.
The bottleneck travelling salesman problem is also NP-hard. The problem remains NP-hard even for
the case when the cities are in the plane with Euclidean distances, as well as in a number of other
restrictive cases. Removing the condition of visiting each city "only once" does not remove the NP-
hardness, since it is easily seen that in the planar case there is an optimal tour that visits each city
only once (otherwise, by thetriangle inequality, a shortcut that skips a repeated visit would not
increase the tour length).

Complexity of approximation
In the general case, finding a shortest travelling salesman tour is NPO-complete.[32] If the distance
measure is a metric and symmetric, the problem becomes APX-complete[33]and Christofides’s
algorithm approximates it within 1.5.[34]
If the distances are restricted to 1 and 2 (but still are a metric) the approximation ratio becomes 8/7.
[35]
 In the asymmetric, metric case, only logarithmic performance guarantees are known, the best
current algorithm achieves performance ratio 0.814 log(n);[36] it is an open question if a constant
factor approximation exists.

The corresponding maximization problem of finding the longest travelling salesman tour is


approximable within 63/38.[37] If the distance function is symmetric, the longest tour can be
approximated within 4/3 by a deterministic algorithm[38] and within   by a randomised
algorithm.

Human performance on TSP


The TSP, in particular the Euclidean variant of the problem, has attracted the attention of
researchers in cognitive psychology. It has been observed that humans are able to produce good
quality solutions quickly.[40] These results suggest that computer performance on the TSP may be
improved by understanding and emulating the methods used by humans for these problems, and
have also led to new insights into the mechanisms of human thought.[41] The first issue of the Journal
of Problem Solving was devoted to the topic of human performance on TSP,[42] and a 2011 review
listed dozens of papers on the subject.

Fundamentals of Transportation/Timetabling
and Scheduling
In the previous unit on service planning, the strategic decisions of network and route design, stop
layout, and frequency determination were described. In this unit, the tactical decisions associated
with creating a service schedule (timetabling), creating a schedule for vehicles to operate the
service (vehicle scheduling), and creating work shifts for operators (crew scheduling) are
presented. A practical guidebook and learning tool has been published recently.[1]

The motivation for good solutions to these tactical decisions is to minimize the net operating costs to
the agency. Once the timetable is determined, the number of vehicles required to be in revenue
service can also be identified. When the vehicle schedule is determined, the total mileage and hours
for the vehicle fleet are defined. Finally, when the crew schedule is determined, the total cost of labor
(operators) is defined. Since these factors are the primary determinants of operating costs, finding
efficient solutions has a direct effect on the bottom line.

In many cases, these tactical activities are assisted by software tools that can generate high quality
solutions in a short period of time, often with direct interaction with the planner. As a result, the
interested student may wish to consult other sources to identify and to investigate the specific
software tools that might be available, such as those described in a recent publication
Contents

 1 Timetabling
 2 Vehicle scheduling
 3 Crew scheduling
 4 Glossary
 5 Related books
 6 References

Timetabling
The general idea behind timetabling is to create a schedule for service. As inputs, one would
consider the frequency of service for the given route (see previous unit) and the expected travel
times between stops on the route. The latter could be determined either by historical experience or
through estimates based on traffic conditions, vehicle acceleration and deceleration characteristics,
expected dwell times, etc.

Let h be the selected headway for a route, perhaps for a specific time period of the day.

Let tij be the time between stop i and stop j along the route, where i and j are adjacent stops. The
travel times between stops, tij, can vary by time of day, particularly as they may be affected by traffic
conditions. They may also reflect any slack time built into the schedule between stops, to allow for
possible variability in travel times.

Finally, let t0 be the dispatch time (departure time) of the first vehicle from a terminal.

Then, the timetable can be created simply using the following structure, with n stops on the route
and k+1 vehicles to dispatch:
The primary decision variable here is the initial dispatch time, t0. Different operating conditions might
lead to a number of possible choices for t0:

 “Clockface” values. Passengers may remember the schedule more clearly if the dispatch
times fall at easily recognized times on the clock. For example, with 15-minute headways, there
may be value to passengers in dispatching a vehicle on the :00, :15, :30, and :45 of each hour.

 Coordination for improved vehicle scheduling. When a vehicle finishes its trip at a terminal, it
will often be turned around to continue onto the next trip in the opposite direction on the route. In
this case, there is a need for sufficient layover time at the terminal. If the vehicle finishes a trip at
time t, then completes the layover after an additional time tL, then the vehicle may start its return
trip after t + tL. Choosing the dispatch time to occur at or slightly after t + tL allows for higher
vehicle utilization.
One way of visualizing such a system uses a so-called “string diagram,” shown in this figure.
The blue lines indicate the trajectory of a vehicle from the terminal at stop 1 to the terminal at
stop n, with short dwell times at each stop. Vehicles arriving at stop n then can return along
the route in the opposite direction (the red lines), after a layover (indicated by the black
arrows). These diagrams can be useful in visualizing vehicle movements and crosses along
the route.

 Coordination of passenger transfers. In some cases, it may be desirable to choose the


dispatch time so that passengers may connect to other routes in the network without unduly
long waiting times for the transfer route. To do this consistently across the timetable, the two
(or more) routes that connect for the transfer must have the same headway h. If this is the
case, then the initial dispatch time t0 can be chosen so that the vehicle’s arrival time at the
transfer point matches that of the vehicle on the connecting route.

 Reduction of vehicle requirements. The timetable will dictate how many vehicles are in
operation at any time of the day. In some cases, minor adjustments in the dispatch times,
coupled with changes in layovers and/or dead-heads of vehicles between terminals can lead
to a reduction in the number of vehicles needed for service.

Vehicle scheduling
Vehicle scheduling, also called “blocking”, involves assigning vehicles to cover the trips
associated with the timetable. A vehicle “block” is the schedule of travel of a vehicle for a given
day, including: (1) a pull-out from the depot, (2) a sequence of trips from the timetable, (3) any
dead-head trips, and (4) a pull-in back to the depot (recall the vehicle cycle from the unit on
vehicle operations).

Generally, once the timetable is created, the time and mileage that vehicles spend in revenue
service (i.e., completing the trips in the timetable) is fixed. So, the usual goal in vehicle
scheduling is to minimize the time and/or distance that vehicles spend outside of revenue
service: e.g., pull-ins, pull-outs, dead-heads, and layovers. These all represent time or mileage
that are “unproductive”, and hence should be minimized.

Constraints on this process include the following:

 Each trip in the timetable must be made by a vehicle.


 A vehicle cannot be assigned more than one trip at any point in time.
 If a vehicle must be re-positioned for a trip, the associated travel time and distance from its
current position to the new position must be observed.

To solve for the vehicle schedule, one might consider a simple “first-in-first-out” rule. In this
case, a vehicle stays on the same route throughout the whole period, and is always assigned to
the next trip after a layover. The string diagram above gives just such an arrangement.

As a simple example, suppose we have a route that runs from terminal A to terminal B and then
back to terminal A. Travel time from A to B and from B to A, including running and dwell time, is
30 minutes, and a minimum 5 minute layover is needed at each terminal. Headways are 15
minutes.
Below is a timetable for this situation, for trips between 6:00 am and after 9:00 am. The left-hand
side of the timetable shows vehicle trips from A to B, while the right-hand side shows vehicle
trips from B to A.

The colors correspond to different vehicles used on the route. The gray color corresponds to the
first vehicle of the day, leaving A at 6:00 am and continuing with the trip from B at 6:40, the trip
from A at 7:15, etc. A total of five vehicles are required to cover all the trips in this timetable.

In addition to the trips from the timetable, the vehicle block also includes a pull-out and pull-in,
so that the final block for the first vehicle (gray) could look like the following.
For networks with longer policy headways (e.g., 30- or 60-minute headways), longer layovers at
terminals may be necessary if vehicles serve the same route throughout the block. As a result,
other options can be considered, particularly in terms of shifting vehicles from one route to
another. The timetable may allow vehicles to shift from one route to another, in order to reduce
layover time and/or to avoid pull-outs or pull-ins. Specific activities in the block can include:

 Interlining: the process of switching a vehicle from one route to another at a terminal, when
the routes share that common terminal.

 Deadheading: the process of switching a vehicle from one route to another, also requiring a
re-location of the vehicle (traveling empty) to another terminal.

These methods can be quite effective under different timetabling conditions.

Crew scheduling
Crew scheduling (also called “run-cutting” in the transit industry) is the task of determining work
shifts (so-called “duties” or “runs”) for operators. Generally, the primary interest in crew
scheduling is to minimize the total cost of labor that meets the service requirements.

A significant fraction, typically 60-70%, of the total operating costs at a transit agency involves
the cost of operators, including wages, benefits, and other premiums. With this in mind, small
reductions in the number of operators, or in the total work hours, can result in more substantive
reductions in the total operating cost. For this reason, the task of scheduling crew to vehicles is
one area where many large transit agencies can achieve some efficiencies and potential cost
savings.

Crew scheduling is complicated because operators often cannot simply be assigned to a vehicle
for the entire vehicle block. First, the shift would often be much longer than a typical 8-hour work
period; and, second, the operator may not get sufficient break time during vehicle layovers (e.g.,
for lunch). Instead, the duties have to consider more practical concerns of the operators.

In this regard, transit agencies have rules that dictate the kind of work shifts the operators may
perform. In most cases in the US, the types of work shifts are governed by collective bargaining
agreements (union work rules) that specify work conditions for transit operators. Possible
examples of work rules could include restrictions like the following:

 A duty should start and end at the same terminal


 Crew needs at least 2 breaks during the day: a normal (15-min) break and a (30-min) lunch
break
 A break is required after no more than 3 hours of work
 Each crew must have at least 8 hours off before resuming duties on the next day
 Only 20% of duties can be longer than 9 hours
 Only 25% of duties can be split into intervals with an unpaid break (e.g. a duty that only
covers the AM and PM peak periods)
 Only 30% of duties can be covered by part-time operators

The general approach to creating a crew schedule begins by cutting each vehicle block into
“pieces of work.” Each piece of work is a subset of trips in the block, forming the elemental unit
of work (driving) for the operator. Then, according to the constraints from the work rules, these
pieces of work are assembled into feasible duties. The hope is to assemble a full set of duties
such that all pieces of work are covered and that the total cost is minimal. The cost of a duty can
depend on both the traditional hourly rate of pay for the operator for hours worked. If the
operator has a straight shift (no unpaid break), they are paid a certain amount, usually at a given
hourly rate. Other costs can include:

 A minimum guarantee of hours of pay, if the guarantee exceeds the number of hours worked
(e.g., 8 hours of pay, even if the operator works only 7 hours);
 Premiums for overtime (e.g., time in the duty over 8 hours);
 Premiums for spread time. Spread is the total time between the start and end of a duty. If this
exceeds a certain maximum (e.g., 9 hours), the operator is entitled to extra pay;
 Premiums for swing. Swing occurs when the duty starts and ends at different locations
(terminals, depots);
 Premiums for split duties, where the duty has an unpaid break. This can occur when an
operator works only the AM and PM peak periods, without working in the mid-day;

These rules on pay suggest that the crew schedule should contain as many straight duties as
possible. Small pieces of work that remain after generating these straight duties can be
allocated to part-time operators (if they are available), to avoid other premiums, or covered using
split duties with associated split and/or spread penalties.

A second problem in crew scheduling is rostering, in which duties are assembled into a group of
duties (the “roster”) for each operator, by week. For example, one roster could include the same
8-hour duty for 5 weekdays. However, many possible combinations of duties could be
considered, especially if weekend or evening service is provided. Once the rosters are created,
operators choose from among these duty rosters.

Glossary
 Block: the sequence of trips made by a vehicle in the course of one day of operations,
including both revenue and non-revenue trips.
 Duty (or, Run): a work shift for an operator for one day.

 Guarantee: the minimum pay hours for an operator, regardless of the number of hours
worked.

 Piece of work: an operator work assignment extracted from a vehicle block.

 Relief: the change of operators during a vehicle block.

 Roster (also, rostering): the set of duties for a single operator in a week.

 Split: a duty covering at least two intervals of time with an unpaid break.

 Spread: the time between when an operator reports for duty and when they end their duty.

 Straight: a duty covering a single interval of time.

 Swing: a duty in which the operator begins and ends at different locations.

 Tripper: a short work assignment (e.g., 2-4 hours); generally much shorter than a typical
straight.

References
1. Jump up↑ Transportation Research Board (2009a). Controlling System Costs: Basic and Advanced
Scheduling Manuals and Contemporary Issues in Transit Scheduling, Transit Cooperative Research
Program, Report 135. [2]
2. Jump up↑ Transportation Research Board (2009b). Controlling System Costs: Basic and Advanced
Scheduling Manuals and Contemporary Issues in Transit Scheduling, Appendix, Transit Cooperative
Research Program, Report 135 Appendix. [3]

Flow network
In graph theory, a flow network (also known as a transportation network) is a directed
graph where each edge has a capacity and each edge receives a flow. The amount of flow on an
edge cannot exceed the capacity of the edge. Often in operations research, a directed graph is
called a network. The vertices are called nodes and the edges are called arcs. A flow must satisfy
the restriction that the amount of flow into a node equals the amount of flow out of it, unless it is
a source, which has only outgoing flow, or sink, which has only incoming flow. A network can be
used to model traffic in a road system, circulation with demands, fluids in pipes, currents in an
electrical circuit, or anything similar in which something travels through a network of nodes.

Contents

 1 Definition
 2 Example
 3 Applications
 4 See also
 5 References
 6 Further reading
 7 External links

Definition
Let   be a finite directed graph in which every edge   has a non-negative,
real-valued capacity  . If  , we assume that  . We distinguish two
vertices: a source   and a sink  . A flow in a flow network is a real function   with
the following three properties for all nodes   and  :
Capacity
constraints: . The flow along an edge cannot exceed its capacity.

Skew symmetry: . The net flow from   to   must be the opposite of the
net flow from   to   (see example).

Flow , unless   or  . The net flow to a node is zero,


conservation: except for the source, which "produces" flow, and the sink, which "consumes"
flow.

i.e. Flow conservation implies:  , for each


vertex 

Notice that   is the net flow from   to  . If the graph represents a physical network, and
if there is a real flow of, for example, 4 units from   to  , and a real flow of 3 units from   to  ,
we have   and  .

Basically we can say that flow for a physical network is flow leaving at s = 

The residual capacity of an edge is  . This defines


a residual network denoted  , giving the amount of available capacity. See that
there can be a path from   to   in the residual network, even though there is no path from   
to   in the original network. Since flows in opposite directions cancel out,decreasing the flow
from   to   is the same as increasing the flow from   to  . An augmenting path is a
path   in the residual network, where  ,  ,
and  . A network is at maximum flow if and only if there is no augmenting
path in the residual network   .

So   is constructed using graph G as follows:

1. Vertices of   = 

2. Edges of   =   defined as-

For each edge 

(i). If make Forward edge with


capacity .
(ii). If make Backward edge with capacity
.

This concept is used in Ford–Fulkerson algorithm which computes the maximum flow in a flow


network.

Sometimes one needs to model a network with more than one source, a supersource is
introduced to the graph.  This consists of a vertex connected to each of the sources with edges
of infinite capacity, so as to act as a global source. A similar construct for sinks is called
a supersink.

Example

A flow network showing flow and capacity

To the right you see a flow network with source labeled  , sink  , and four additional nodes. The
flow and capacity is denoted  . Notice how the network upholds skew symmetry, capacity
constraints and flow conservation. The total amount of flow from   to   is 5, which can be easily
seen from the fact that the total outgoing flow from   is 5, which is also the incoming flow to  .
We know that no flow appears or disappears in any of the other nodes.

Residual network for the above flow network, showing residual capacities.

Below you see the residual network for the given flow. Notice how there is positive residual
capacity on some edges where the original capacity is zero, for example for the edge  .
This flow is not a maximum flow. There is available capacity along the paths 
,   and  , which are then the augmenting paths. The residual
capacity of the first path
is   
. Notice that as long as there exists
some path with a positive residual capacity, the flow will not be maximum. The residual capacity
for some path is the minimum residual capacity of all edges in that path.

Applications

Picture a series of water pipes, fitting into a network. Each pipe is of a certain diameter, so it can
only maintain a flow of a certain amount of water. Anywhere that pipes meet, the total amount of
water coming into that junction must be equal to the amount going out, otherwise we would
quickly run out of water, or we would have a buildup of water. We have a water inlet, which is
the source, and an outlet, the sink. A flow would then be one possible way for water to get from
source to sink so that the total amount of water coming out of the outlet is consistent. Intuitively,
the total flow of a network is the rate at which water comes out of the outlet.

Flows can pertain to people or material over transportation networks, or to electricity


over electrical distribution systems. For any such physical network, the flow coming into any
intermediate node needs to equal the flow going out of that node. This conservation constraint
was formalized as Kirchhoff's current law.

Flow networks also find applications in ecology: flow networks arise naturally when considering
the flow of nutrients and energy between different organizations in a food web. The
mathematical problems associated with such networks are quite different from those that arise in
networks of fluid or traffic flow. The field of ecosystem network analysis, developed by Robert
Ulanowicz and others, involves using concepts from information theory and thermodynamics to
study the evolution of these networks over time.

The simplest and most common problem using flow networks is to find what is called
the maximum flow, which provides the largest possible total flow from the source to the sink in a
given graph. There are many other problems which can be solved using max flow algorithms, if
they are appropriately modeled as flow networks, such as bipartite matching, the assignment
problem and the transportation problem. Maximum flow problems can be solved efficiently with
the relabel-to-front algorithm. The max-flow min-cut theoremstates that finding a maximal
network flow is equivalent to finding a cut of minimum capacity that separates the source and
the sink. Where a cut is the division of vertices such that the source is in one division and the
sink is in another.

In a multi-commodity flow problem, you have multiple sources and sinks, and various
"commodities" which are to flow from a given source to a given sink. This could be for example
various goods that are produced at various factories, and are to be delivered to various given
customers through the same transportation network.

In a minimum cost flow problem, each edge   has a given cost  , and the cost of
sending the flow   across the edge is  . The objective is to send a
given amount of flow from the source to the sink, at the lowest possible price.

In a circulation problem, you have a lower bound   on the edges, in addition to the upper
bound  . Each edge also has a cost. Often, flow conservation holds for allnodes in a
circulation problem, and there is a connection from the sink back to the source. In this way, you
can dictate the total flow with   and  . The flow circulatesthrough the network,
hence the name of the problem.

In a network with gains or generalized network each edge has a gain, a real number (not
zero) such that, if the edge has gain g, and an amount x flows into the edge at its tail, then an
amount gx flows out at the head.

In a source localization problem, an algorithm tries to identify the most likely source node of
information diffusion through a partially observed network. This can be done in linear time for
trees and cubic time for arbitrary networks and has applications ranging from tracking mobile
phone users to identifying the originating village of disease outbreaks.[3]

See also
 Braess' paradox
 Centrality
 Constructal theory
 Ford–Fulkerson algorithm
 Dinic's algorithm
 Flow (computer networking)
 Flow graph
 Max-flow min-cut theorem
 Oriented matroid
 Shortest path problem

References
1. Jump up^ Black, Paul E. "Supersource". Dictionary of Algorithms and Data Structures. NIST.
2. Jump up^ Black, Paul E. "Supersink". Dictionary of Algorithms and Data Structures. NIST.
3. Jump
up^ http://www.pedropinto.org.s3.amazonaws.com/publications/locating_source_diffusion_netwo
rks.pdf
Max-flow min-cut theorem
In optimization theory, the max-flow min-cut theorem states that in a flow network, the maximum
amount of flow passing from the source to the sink is equal to the minimum capacity that, when
removed in a specific way from the network, causes the situation that no flow can pass from the
source to the sink.

The max-flow min-cut theorem is a special case of the duality theorem for linear programs and can


be used to derive Menger's theorem and the König–Egerváry theorem.

Definitions and statement


Let N = (V, E) be a network (directed graph) with s and t being the source and the sink
of N respectively.

Maximum flow
Definition. The capacity of an edge is a mapping c : E → R+, denoted by cuv or c(u, v). It
represents the maximum amount of flow that can pass through an edge.

Definition. A flow is a mapping  f : E → R+, denoted by  fuv or  f (u, v), subject to the following two


constraints:

1. Capacity Constraint:

2. Conservation of Flows:

Definition. The value of flow is defined by

where s is the source of N. It represents the amount of flow passing from the source to
the sink.

Maximum Flow Problem. Maximize | f |, that is, to route as much flow as possible


from s to t.
Minimum cut
Definition. An s-t cut C = (S, T) is a partition of V such that s ∈ S and t ∈ T.
The cut-set of C is the set
Note that if the edges in the cut-set of C are removed, | f | = 0.

Definition. The capacity of an s-t cut is defined by

where   if   and  , 0 otherwise.

Minimum s-t Cut Problem. Minimize c(S, T), that is, to determine S and T such that the


capacity of the S-T cut is minimal.
Statement
Max-flow min-cut theorem. The maximum value of an s-t flow is equal to the minimum
capacity over all s-t cuts.

Linear program formulation


Max-flow (Primal) Min-cut (Dual)

maximize 
minimize 

subject to subject to
The max-flow problem and min-cut problem can be formulated as two primal-dual linear programs.

Note that for the given s-t cut   if   then   and 0 otherwise. Therefore   
should be 1 and   shout be zero. The equality in the max-flow min-cut theorem follows from
the strong duality theorem in linear programming, which states that if the primal program has an
optimal solution, x*, then the dual program also has an optimal solution, y*, such that the optimal
values formed by the two solutions are equal.

Example

A network with the value of flow equal to the capacity of an s-t cut
The figure on the right is a network having a value of flow of 7. The vertex in white and the vertices
in grey form the subsets Sand T of an s-t cut, whose cut-set contains the dashed edges. Since the
capacity of the s-t cut is 7, which equals to the value of flow, the max-flow min-cut theorem tells us
that the value of flow and the capacity of the s-t cut are both optimal in this network.

Application
Generalized max-flow min-cut theorem
In addition to edge capacity, consider there is capacity at each vertex, that is, a
mapping c : V → R+, denoted by c(v), such that the flow  f  has to satisfy not only the capacity
constraint and the conservation of flows, but also the vertex capacity constraint

In other words, the amount of flow passing through a vertex cannot exceed its capacity. Define an s-
t cut to be the set of vertices and edges such that for any path from s to t, the path contains a
member of the cut. In this case, the capacity of the cut is the sum the capacity of each edge and
vertex in it.

In this new definition, the generalized max-flow min-cut theorem states that the maximum value of
an s-t flow is equal to the minimum capacity of an s-t cut in the new sense.

Menger's theorem
In the undirected edge-disjoint paths problem, we are given an undirected graph G = (V, E) and two
vertices s and t, and we have to find the maximum number of edge-disjoint s-t paths in G.

The Menger's theorem states that the maximum number of edge-disjoint s-t paths in an undirected
graph is equal to the minimum number of edges in an s-t cut-set.
Project selection problem

A network formulation of the project selection problem with the optimal solution

In the project selection problem, there are n projects and m equipments. Each project pi yields


revenue r(pi) and each equipment qj costsc(qj) to purchase. Each project requires a number of
equipments and each equipment can be shared by several projects. The problem is to determine
which projects and equipments should be selected and purchased respectively, so that the profit is
maximized.

Let P be the set of projects not selected and Q be the set of equipments purchased, then the
problem can be formulated as,

Since the first term does not depend on the choice of P and Q, this maximization problem can be
formulated as a minimization problem instead, that is,

The above minimization problem can then be formulated as a minimum-cut problem by constructing
a network, where the source is connected to the projects with capacity r(pi), and the sink is
connected by the equipments with capacity c(qj). An edge (pi, qj) with infinitecapacity is added if
project pi requires equipment qj. The s-t cut-set represents the projects and equipments
in P and Q respectively. By the max-flow min-cut theorem, one can solve the problem as
a maximum flow problem.

The figure on the right gives a network formulation of the following project selection problem:

Project r(pi) Equipmentc(qj)

1 100 200 Project 1 requires equipments 1 and 2.

2 200 100 Project 2 requires equipment 2.

3 150 50 Project 3 requires equipment 3.

The minimum capacity of a s-t cut is 250 and the sum of the revenue of each project is 450;
therefore the maximum profit g is 450 − 250 = 200, by selecting projects p2 and p3.

The idea here is to 'flow' the project profits through the 'pipes' of the equipment. If we cannot fill the
pipe, the equipment's return is less than its cost, and the min cut algorithm will find it cheaper to cut
the project's profit edge instead of the equipment's cost edge.

Image segmentation problem

Each black node denotes a pixel.

In the image segmentation problem, there are n pixels. Each pixel i can be assigned a foreground
value  fi or a background value bi. There is a penalty of pij if pixels i, j are adjacent and have
different assignments. The problem is to assign pixels to foreground or background such that the
sum of their values minus the penalties is maximum.
Let P be the set of pixels assigned to foreground and Q be the set of points assigned to background,
then the problem can be formulated as,

This maximization problem can be formulated as a minimization problem instead, that is,

The above minimization problem can be formulated as a minimum-cut problem by constructing a


network where the source (orange node) is connected to all the pixels with capacity  fi, and the sink
(purple node) is connected by all the pixels with capacity bi. Two edges (i, j) and (j, i)
with pij capacity are added between two adjacent pixels. The s-t cut-set then represents the pixels
assigned to the foreground inP and pixels assigned to background in Q.

History
The max-flow min-cut theorem was proven by P. Elias, A. Feinstein, and C.E. Shannon in 1956[1],
and independently also by L.R. Ford, Jr. and D.R. Fulkerson in the same year[2].

Proof
Let G = (V, E) be a network (directed graph) with s and t being the source and the sink
of G respectively.

Consider the flow  f  computed for G by Ford–Fulkerson algorithm. In the residual


graph (Gf ) obtained for G (after the final flow assignment by Ford–Fulkerson algorithm), define two
subsets of vertices as follows:

1. A: the set of vertices reachable


from s in Gf
2. Ac: the set of remaining vertices i.e. V
−A
Claim. value( f ) = c(A, Ac), where the capacity of an s-t cut is defined by

Now, we know,   for any subset of vertices, A. Therefore


for value( f ) = c(A, A ) we need:
c

All outgoing edges from the cut must be fully saturated.


All incoming edges to the cut must have zero flow.

To prove the above claim we consider two


cases:

In G, there exists an outgoing edge   such that it is not saturated, i.e.,  f 
(x, y) < cxy. This implies, that there exists a forward edge from x to y in Gf, therefore there exists a
path from s to y in Gf, which is a contradiction. Hence, any outgoing edge (x, y) is fully saturated.

In G, there exists an incoming edge   such that it carries some non-zero
flow, i.e.,  f (x, y) > 0. This implies, that there exists a backward edge fromx to y in Gf, therefore
there exists a path from s to y in Gf, which is again a contradiction. Hence, any incoming
edge (x, y) must have zero flow.

Both of the above statements prove that the capacity of cut obtained in the above described manner
is equal to the flow obtained in the network. Also, the flow was obtained byFord-Fulkerson algorithm,
so it is the max-flow of the network as well.
Also, since any flow in the network is always less than or equal to capacity of every cut
possible in a network, the above described cut is also the min-cut which obtains themax-flow.

References
Eugene Lawler (2001). "4.5. Combinatorial Implications of Max-Flow Min-Cut Theorem, 4.6. Linear
Programming Interpretation of Max-Flow Min-Cut Theorem".Combinatorial Optimization: Networks
and Matroids. Dover. pp. 117–120. ISBN 0-486-41453-1.

Christos H. Papadimitriou, Kenneth Steiglitz (1998). "6.1 The Max-Flow, Min-Cut


Theorem". Combinatorial Optimization: Algorithms and Complexity. Dover. pp. 120–128.ISBN 0-486-
40258-4.

Vijay V. Vazirani (2004). "12. Introduction to LP-Duality". Approximation Algorithms. Springer.


pp. 93–100. ISBN 3-540-65367-8.
Ford–Fulkerson algorithm
The Ford–Fulkerson method or Ford–Fulkerson algorithm (FFA) is an algorithm which computes
the maximum flow in a flow network. It is called a "method" instead of an "algorithm" as the approach
to finding augmenting paths in a residual graph is not fully specificied[1] or it is specified in several
implementations with different running times.[2] It was published in 1956 by L. R. Ford, Jr. and D. R.
Fulkerson.[3] The name "Ford–Fulkerson" is often also used for the Edmonds–Karp algorithm, which
is a specialization of Ford–Fulkerson.

The idea behind the algorithm is as follows: As long as there is a path from the source (start node) to
the sink (end node), with available capacity on all edges in the path, we send flow along one of the
paths. Then we find another path, and so on. A path with available capacity is called an augmenting
path.

Contents
 

 1 Algorithm
 2 Complexity
 3 Integral example
 4 Non-terminating example
 5 Python implementation
o 5.1 Usage example
 6 Notes
 7 References
 8 See also
 9 External links

Algorithm
Let   be a graph, and for each edge from   to  , let   be the capacity and   
be the flow. We want to find the maximum flow from the source   to the sink  . After every step in
the algorithm the following is maintained:
The flow along an
Capacity edge can not
constraints: exceed its
capacity.
The net flow
from   to   must
Skew be the opposite of
symmetry: the net flow
from   to   (see
example).
Flow That is, unless   
conservation: is   or  . The net
flow to a node is
zero, except for the
source, which
"produces" flow,
and the sink,
which "consumes"
flow.
That is, the flow
leaving from   
Value(f): must be equal to
the flow arriving
at  .

This means that the flow through the network is a legal flow after each round in the algorithm.
We define the residual network   to be the network with
capacity   and no flow. Notice that it can happen that a flow
from   to   is allowed in the residual network, though disallowed in the original network:
if   and   the
n  .

Algorithm Ford–Fulkerson

Inputs Given a Network   with flow capacity  , a source node  , and a sink


node 
Output Compute a flow   from   to   of maximum value

1.  for all edges 


2. While there is a path   from   to   in  , such that   for all
edges  :
1. Find 
2. For each edge 
1.  (Send flow along the path)
2.  (The flow might be "returned"
later)

The path in step 2 can be found with for example a breadth-first search or a depth-first
search in  . If you use the former, the algorithm is called Edmonds–Karp.

When no more paths in step 2 can be found,   will not be able to reach   in the residual
network. If   is the set of nodes reachable by   in the residual network, then the total
capacity in the original network of edges from   to the remainder of   is on the one
hand equal to the total flow we found from   to  , and on the other hand serves as an
upper bound for all such flows. This proves that the flow we found is maximal. See
also Max-flow Min-cut theorem.

If the graph   has multiple sources and sinks, we act as follows: Suppose
that   and  . Add a new source   
with an edge   from   to every node  , with

capacity  . And add a new sink   with an


edge   from every node   to  , with

capacity  . Then apply the Ford–Fulkerson


algorithm.

Also, if a node   has capacity constraint  , we replace this node with two
nodes  , and an edge  , with capacity  . Then
apply the Ford–Fulkerson algorithm.

Complexity
By adding the flow augmenting path to the flow already established in the graph, the
maximum flow will be reached when no more flow augmenting paths can be found in the
graph. However, there is no certainty that this situation will ever be reached, so the best
that can be guaranteed is that the answer will be correct if the algorithm terminates. In
the case that the algorithm runs forever, the flow might not even converge towards the
maximum flow. However, this situation only occurs with irrational flow values. When the
capacities are integers, the runtime of Ford–Fulkerson is bounded by   (see big
O notation), where   is the number of edges in the graph and   is the maximum flow in
the graph. This is because each augmenting path can be found in   time and
increases the flow by an integer amount which is at least  .

A variation of the Ford–Fulkerson algorithm with guaranteed termination and a runtime


independent of the maximum flow value is the Edmonds–Karp algorithm, which runs
in   time.

Integral example
The following example shows the first steps of Ford–Fulkerson in a flow network with 4
nodes, source   and sink  . This example shows the worst-case behaviour of the
algorithm. In each step, only a flow of   is sent across the network. If breadth-first-
search were used instead, only two steps would be needed.
Resulting flow
Path Capacity
network

Initial flow network


After 1998 more steps …

Final flow network

Notice how flow is "pushed back" from   to   when finding the path  .

Non-terminating example
Consider the flow network shown on the right, with source  , sink  , capacities of
edges  ,   and   respectively  ,   and   and the capacity of all
other edges some integer  . The constant   was chosen so, that  .
We use augmenting paths according to the following table,
where  ,   an
d  .

Residual capacities
Augmenting
Step Sent flow
path

Note that after step 1 as well as after step 5, the residual capacities of edges  ,   
and   are in the form  ,   and  , respectively, for some  . This means
that we can use augmenting paths  ,  ,   and   infinitely many times and residual
capacities of these edges will always be in the same form. Total flow in the network after
step 5 is . If we continue to use augmenting paths as above, the total
flow converges to  , while the maximum flow is  .
In this case, the algorithm never terminates and the flow doesn't even converge to the
maximum flow.

Python implementation
class Edge(object):
def __init__(self, u, v, w):
self.source = u
self.sink = v
self.capacity = w
def __repr__(self):
return "%s->%s:%s" % (self.source, self.sink,
self.capacity)

class FlowNetwork(object):
def __init__(self):
self.adj = {}
self.flow = {}

def add_vertex(self, vertex):


self.adj[vertex] = []

def get_edges(self, v):


return self.adj[v]

def add_edge(self, u, v, w=0):


if u == v:
raise ValueError("u == v")
edge = Edge(u,v,w)
redge = Edge(v,u,0)
edge.redge = redge
redge.redge = edge
self.adj[u].append(edge)
self.adj[v].append(redge)
self.flow[edge] = 0
self.flow[redge] = 0

def find_path(self, source, sink, path):


if source == sink:
return path
for edge in self.get_edges(source):
residual = edge.capacity - self.flow[edge]
if residual > 0 and edge not in path:
result = self.find_path( edge.sink, sink, path +
[edge])
if result != None:
return result

def max_flow(self, source, sink):


path = self.find_path(source, sink, [])
while path != None:
residuals = [edge.capacity - self.flow[edge] for edge
in path]
flow = min(residuals)
for edge in path:
self.flow[edge] += flow
self.flow[edge.redge] -= flow
path = self.find_path(source, sink, [])
return sum(self.flow[edge] for edge in
self.get_edges(source))
Usage example
For the example flow network in maximum flow problem we do the following:
>>> g = FlowNetwork()
>>> [g.add_vertex(v) for v in "sopqrt"]
[None, None, None, None, None, None]
>>>
>>> g.add_edge('s','o',3)
>>> g.add_edge('s','p',3)
>>> g.add_edge('o','p',2)
>>> g.add_edge('o','q',3)
>>> g.add_edge('p','r',2)
>>> g.add_edge('r','t',3)
>>> g.add_edge('q','r',4)
>>> g.add_edge('q','t',2)
>>> print (g.max_flow('s','t'))
5

References
 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein,
Clifford (2001). "Section 26.2: The Ford–Fulkerson method". Introduction to
Algorithms (Second ed.). MIT Press and McGraw–Hill. pp. 651–664. ISBN 0-262-
03293-7.
 George T. Heineman, Gary Pollice, and Stanley Selkow (2008). "Chapter 8:Network
Flow Algorithms". Algorithms in a Nutshell. Oreilly Media. pp. 226–250. ISBN 978-0-
596-51624-6.
 Jon Kleinberg and Éva Tardos (2006). "Chapter 7:Extensions to the Maximum-Flow
Problem". Algorithm Design. Pearson Education. pp. 378–384. ISBN 0-321-29535-

Maximum Flow: Augmenting Path


Algorithms Comparison
In the first section we remind some necessary definitions and statements of the maximum flow
theory. Other sections discuss the augmenting path algorithms themselves. The last section
shows results of a practical analysis and highlights the best in practice algorithm. Also we give a
simple implementation of one of the algorithms.

Statement of the Problem


Suppose we have a directed network G = (V, E) defined by a set V of nodes (or vertexes) and a
set E of arcs (or edges). Each arc (i,j) in E has an associated nonnegative capacity u . Also we
ij

distinguish two special nodes inG: a source node s and a sink node t. For each i in V we denote
by E(i) all the arcs emanating from node i. Let U= max u  by (i,j) in E. Let us also denote the
ij

number of vertexes by n and the number of edges by m.


We wish to find the maximum flow from the source node s to the sink node t that satisfies the arc
capacities and mass balance constraints at all nodes. Representing the flow on
arc (i,j) in E by x  we can obtain the optimization model for the maximum flow problem:
ij
Vector (x ) which satisfies all constraints is called a feasible solution or, a flow (it is not necessary
ij

maximal). Given a flow x we are able to construct the residual network with respect to this flow
according to the following intuitive idea. Suppose that an edge (i,j) in E carries x  units of flow. We
ij

define the residual capacity of the edge (i,j) as r  = u  – x . This means that we can send an
ij ij ij

additional r  units of flow from vertex i to vertexj. We can also cancel the existing flow x  on the arc
ij ij

if we send up x  units of flow from j to i over the arc (i,j).


ij

So, given a feasible flow x we define the residual network with respect to the flow x as follows.
Suppose we have a network G = (V, E). A feasible solution x engenders a new (residual) network,
which we define by G  = (V, E ), where E  is a set of residual edges corresponding to the feasible
x x x

solution x.
What is E ? We replace each arc (i,j) in E by two arcs (i,j), (j,i): the arc (i,j) has (residual) capacity r  =
x ij

u  – x , and the arc (j,i) has (residual) capacity r =x . Then we construct the set E  from the new
ij ij ji ij x

edges with a positive residual capacity.


Augmenting Path Algorithms as a whole
In this section we describe one method on which all augmenting path algorithms are being based.
This method was developed by Ford and Fulkerson in 1956 [3]. We start with some important
definitions.
Augmenting path is a directed path from a source node s to a sink node t in the residual network.
The residual capacity of an augmenting path is the minimum residual capacity of any arc in the
path. Obviously, we can send additional flow from the source to the sink along an augmenting
path.
All augmenting path algorithms are being constructed on the following basic idea known as
augmenting path theorem:

Theorem 1 (Augmenting Path Theorem). A flow x* is a maximum flow if and only if the residual
network Gx* contains no augmenting path.
According to the theorem we obtain a method of finding a maximal flow. The method proceeds
by identifying augmenting paths and augmenting flows on these paths until the network contains
no such path. All algorithms that we wish to discuss differ only in the way of finding augmenting
paths.

We consider the maximum flow problem subject to the following assumptions.

Assumption 1.  The network is directed.


Assumption 2.  All capacities are nonnegative integers.
This assumption is not necessary for some algorithms, but the algorithms whose complexity
bounds involve Uassume the integrality of the data.
Assumption 3.  The problem has a bounded optimal solution.
This assumption in particular means that there are no uncapacitated paths from the source to the
sink.

Assumption 4.  The network does not contain parallel arcs.


This assumption imposes no loss of generality, because one can summarize capacities of all
parallel arcs.

As to why these assumptions are correct we leave the proof to the reader.

It is easy to determine that the method described above works correctly. Under assumption 2, on
each augmenting step we increase the flow value by at least one unit. We (usually) start with zero
flow. The maximum flow value is bounded from above, according to assumption 3. This reasoning
implies the finiteness of the method.

With those preparations behind us, we are ready to begin discussing the algorithms.

Shortest Augmenting Path Algorithm, O(n m) 2

In 1972 Edmonds and Karp — and, in 1970, Dinic — independently proved that if each augmenting
path is shortest one, the algorithm will perform O(nm) augmentation steps. The shortest path
(length of each edge is equal to one) can be found with the help of breadth-first search (BFS)
algorithm. Shortest Augmenting Path Algorithm is well known and widely discussed in many
books and articles, which is why we will not describe it in great detail. Let’s review the idea using
a kind of pseudo-code:

In line 5, current flow x is being increased by some positive amount.


The algorithm was said to perform O(nm) steps of finding an augmenting path. Using BFS, which
requiresO(m) operation in the worst case, one can obtain O(nm ) complexity of the algorithm
2

itself. If m ~ n2 then one must use BFS procedure O(n ) times in worst case. There are some
3
networks on which this numbers of augmentation steps is being achieved. We will show one
simple example below.
Improved Shortest Augmenting Path Algorithm, O(n m) 2

As mentioned earlier, the natural approach for finding any shortest augmenting path would be to
look for paths by performing a breadth-first search in the residual network. It
requires O(m) operations in the worst case and imposes O(nm ) complexity of the maximum flow
2

algorithm. Ahuja and Orlin improved the shortest augmenting path algorithm in 1987. They
exploited the fact that the minimum distance from any node i to the sink node t is monotonically
nondecreasing over all augmentations and reduced the average time per augmentation to O(n).
The improved version of the augmenting path algorithm, then, runs in O(n m) time. We can now
2

start discussing it according to [1].


Definition 1.  Distance function d: V_ Z+ with respect to the residual capacities rij is a function from
the set of nodes to nonnegative integers. Let’s say that distance function is valid if it is satisfies the
following conditions:

 d(t)=0;
 d(i) ≤ d(j) + 1, for every (i,j) in E with rij>0.
Informally (and it is easy to prove), valid distance label of node i, represented by d(i), is a lower
bound on the length of the shortest path from i to t in the residual network Gx. We call distance
function exact if each i in V d(i) equals the length of the shortest path from i to t in the residual
network. It is also easy to prove that if d(s) ≥ n then the residual network contains no path from
the source to the sink.
An arc (i,j) in E is called admissible if d(i) = d(j) + 1. We call other arcs inadmissible. If a path
from s to t consists of admissible arcs then the path is admissible. Evidently, an admissible path is
the shortest path from the source to the sink. As far as every arc in an admissible path satisfies
condition r >0, the path is augmenting.
ij

So, the improved shortest augmenting path algorithm consists of four steps (procedures): main
cycle, advance, retreat and augment. The algorithm maintains a partial admissible path, i.e., a
path from s to some node i, consisting of admissible arcs. It performs advance or retreat steps
from the last node of the partial admissible path (such node is called current node). If there is
some admissible arc (i,j) from current node i, then the algorithm performs the advance step and
adds the arc to the partial admissible path. Otherwise, it performs the retreat step, which
increases distance label of i and backtracks by one arc.
If the partial admissible path reaches the sink, we perform an augmentation. The algorithm stops
when d(s) ≥ n. Let’s describe these steps in pseudo-code. We denoted residual (with respect to
flow x) arcs emanating from node i by E (i). More formally, E (i) = { (i,j) in E(i): r  > 0 }.
x x ij
In line 1 of retreat procedure if E (i) is empty, then suppose d(i) equals n.
x

Ahuja and Orlin suggest the following data structure for this algorithm [1]. We maintain the arc
list E(i) which contains all the arcs emanating from node i. We arrange the arcs in this list in any
fixed order. Each node i has a current arc, which is an arc in E(i) and is the next candidate for
admissibility testing. Initially, the current arc of node i is the first arc in E(i). In line 5 the algorithm
tests whether the node’s current arc is admissible. If not, it designates the next arc in the list as
the current arc. The algorithm repeats this process until either it finds an admissible arc or
reaches the end of the arc list. In the latter case the algorithm declares that E(i) contains no
admissible arc; it again designates the first arc in E(i) as the current arc of node i and performs
the relabeloperation by calling the retreat procedure (line 10).
Now we outline a proof that the algorithm runs in O(n m) time.
2

Lemma 1.  The algorithm maintains distance labels at each step. Moreover, each relabel (or,
retreat) step strictly increases the distance label of a node.
Sketch to proof. Perform induction on the number of relabel operation and augmentations.
Lemma 2.  Distance label of each node increases at most n times. Consecutively, relabel operation
performs at most n   times.
2

Proof. This lemma is consequence of lemma 1 and the fact that if d(s) ≥ n then the residual
network contains no augmenting path.
Since the improved shortest augmenting path algorithm makes augmentations along the
shortest paths (like unimproved one), the total number of augmentations is the same O(nm).
Each retreat step relabels a node, that is why number of retreat steps is O(n ) (according to lemma
2

2). Time to perform retreat/relabel steps is O( n ∑  |E(i)| ) = O(nm). Since one augmentation
i in V

requires O(n) time, total augmentation time is O(n m). The total time of advance steps is bounded
2

by the augmentation time plus the retreat/relabel time and it is againO(n m). We obtain the
2

following result:
Theorem 2.  The improved shortest augmenting path algorithm runs in  O(n m)  time.
2

Ahuja and Orlin suggest one very useful practical improvement of the algorithm. Since the
algorithm performs many useless relabel operations while the maximum flow has been found, it
will be better to give an additional criteria of terminating. Let’s introduce (n+1)-dimensional
additional array, numbs, whose indices vary from 0 to n. The value numbs(k) is the number of
nodes whose distance label equals k. The algorithm initializes this array while computing the
initial distance labels using BFS. At this point, the positive entries in the array numbs are
consecutive (i.e., numbs(0), numbs(1), …, numbs(l) will be positive up to some index l and the
remaining entries will all be zero).
When the algorithm increases a distance label of a node from x to y, it subtracts 1 from numbs(x),
adds 1 tonumbs(y) and checks whether numbs(x) = 0. If it does equal 0, the algorithm terminates.
This approach is some kind of heuristic, but it is really good in practice. As to why this approach
works we leave the proof to the reader (hint: show that the nodes i with d(i) > x and
nodes j with d(j) < x engender a cut and use maximum-flow-minimum-cut theorem).
Comparison of Improved and Unimproved versions
In this section we identify the worst case for both shortest augmenting path algorithms with the
purpose of comparing their running times.
In the worst case both improved and unimproved algorithms will perform O(n ) augmentations,
3

if m ~ n2. Norman Zadeh developed some examples on which this running time is based. Using
his ideas we compose a somewhat simpler network on which the algorithms have to
perform O(n ) augmentations and which is not dependent on a choice of next path.
3
Figure 1. Worst case example for the shortest augmenting path algorithm.
All vertexes except s and t are divided into four subsets: S={s1,…,sk}, T={t1,…,tk}, U={u1,
…,u2p} and V={v1,…,v2p}. Both sets S and T contain k nodes while both
sets U and V contain 2p nodes. k and p are fixed integers. Each bold arc (connecting S and T) has
unit capacity. Each dotted arc has an infinite capacity. Other arcs (which are solid and not
straight) have capacity k.
First, the shortest augmenting path algorithm has to augment flow k  time along paths (s, S, T, t)
2

which have length equal to 3. The capacities of these paths are unit. After that the residual
network will contain reversal arcs (T, S) and the algorithm will chose another k  augmenting paths
2

(s, u1, u2, T, S, v2, v1, t) of length 7. Then the algorithm will have to choose paths (s, u1, u2, u3, u4,
S, T, v4, v3, v2, v1, t) of length 11 and so on…
Now let’s calculate the parameters of our network. The number of vertexes is n = 2k + 4p + 2. The
number of edges is m = k  + 2pk + 2k + 4p. As it easy to see, the number of augmentations is a = k2
2

(p+1).
Consider that p = k – 1. In this case n = 6k – 2 and a = k . So, one can verify that a ~ n  / 216. In [4]
3 3

Zadeh presents examples of networks that require n  / 27 and n  / 12 augmentations, but these
3 3

examples are dependent on a choice of the shortest path.


We made 5 worst-case tests with 100, 148, 202, 250 and 298 vertexes and compared the running
times of the improved version of the algorithm against the unimproved one. As you can see on
figure 2, the improved algorithm is much faster. On the network with 298 nodes it works 23 times
faster. Practice analysis shows us that, in general, the improved algorithm works n / 14 times
faster.

Figure 2. X-axis is the number of nodes. Y-axis is working time in milliseconds.


Blue colour indicates the shortest augmenting path algorithm and red does it improved version.
However, our comparison in not definitive, because we used only one kind of networks. We just
wanted to justify that the O(n m) algorithm works O(n) times faster than the O(nm ) on a dense
2 2

network. A more revealing comparison is waiting for us at the end of the article.
Maximum Capacity Path Algorithm, O(n mlognU) / O(m  lognU logn) / O(m  lognU logU)
2 2 2

In 1972 Edmonds and Karp developed another way to find an augmenting path. At each step they
tried to increase the flow with the maximum possible amount. Another name of this algorithm is
“gradient modification of the Ford-Fulkerson method.” Instead of using BFS to identify a shortest
path, this modification uses Dijkstra’s algorithm to establish a path with the maximal possible
capacity. After augmentation, the algorithm finds another such path in the residual network,
augments flow along it, and repeats these steps until the flow is maximal.

There’s no doubt that the algorithm is correct in case of integral capacity. However, there are
tests with non-integral arc’s capacities on which the algorithm may fail to terminate.
Let’s get the algorithm’s running time bound by starting with one lemma. To understand the
proof one should remember that the value of any flow is less than or equal to the capacity of any
cut in a network . Let’s denote capacity of a cut (S,T) by c(S,T).
Lemma 3.  Let F be the maximum flow’s value, then G contains augmenting path with capacity not
less than F/m.
Proof. Suppose G contains no such path. Let’s construct a set E’={ (i,j) in E: u  ≥ F/m }. Consider ij

network G’ = (V, E’) which has no path from s to t. Let S be a set of nodes obtainable
from s in G and T = V \ S. Evidently, (S, T)is a cut and c(S, T) ≥ F. But cut (S, T) intersects only those
edges (i,j) in E which have u  < F/m. So, it is clear that
ij

c(S,T) < (F/m) _ m = F,


and we got a contradiction with the fact that c(S,T) ≥ F.
Theorem 3.  The maximum capacity path algorithm performs O(m log (nU)) augmentations.
Sketch to proof. Suppose that the algorithm terminates after k augmentations. Let’s denote
by f  the capacity of the first found augmentation path, by f  the capacity of the second one, and
1 2

so on. f  will be the capacity of the latter k-th augmenting path.


k

Consider, F  = f  + f  +…+ f . Let F* be the maximum flow’s value. Under lemma 3 one can justify that
i 1 2 i

f  ≥ (F*-F ) / m.
i i-1

Now we can estimate the difference between the value of the maximal flow and the flow
after i consecutive augmentations:
F* – F  = F* - F  – f  ≤ F* – F  – (F* - F ) / m = (1 – 1 / m) (F* – F ) ≤ … ≤ (1 – 1 / m)  _ F*
i i-1 i i-1 i-1 i-1
i

We have to find such an integer i, which gives (1 – 1 / m)  _ F* < 1. One can check that i

i log  m / (m+1)  F* = O(m _ log F*) = O(m _ log(nU))


And the latter inequality proves the theorem.

To find a path with the maximal capacity we use Dijkstra’s algorithm, which incurs additional
expense at every iteration. Since a simple realization of Dijkstras’s algorithm [2]
incurs O(n ) complexity, the total running time of the maximum capacity path algorithm
2

is O(n mlog(nU)).
2

Using a heap implementation of Dijkstra’s algorithm for sparse network [7] with running
time O(mlogn), one can obtain an O(m  logn log(nU)) algorithm for finding the maximum flow. It
2

seems to be better that the improved Edmonds-Karp algorithm. However, this estimate is very
deceptive.
There is another variant to find the maximum capacity path. One can use binary search to
establish such a path. Let’s start by finding the maximum capacity path on piece [0,U]. If there is
some path with capacity U/2, then we continue finding the path on piece [U/2, U]; otherwise, we
try to find the path on [0,U/2-1]. This approach incurs additional O(mlogU) expense and
gives O(m log(nU)logU) time bound to the maximum flow algorithm. However, it works really
2

poorly in practice.
Capacity Scaling Algorithm, O(m logU)
2

In 1985 Gabow described the so-called “bit-scaling” algorithm. The similar capacity scaling
algorithm described in this section is due to Ahuja and Orlin [1].
Informally, the main idea of the algorithm is to augment the flow along paths with sufficient large
capacities, instead of augmenting along maximal capacities. More formally, let’s introduce a
parameter Delta. First, Delta is quite a large number that, for instance, equals U. The algorithm
tries to find an augmenting path with capacity not less that Delta, then augments flow along this
path and repeats this procedure while any such Delta-path exists in the residual network.
The algorithm either establishes a maximum flow or reduces Delta by a factor of 2 and continues
finding paths and augmenting flow with the new Delta. The phase of the algorithm that augments
flow along paths with capacities at least Delta is called Delta-scaling phase or, Delta-phase. Delta
is an integral value and, evidently, algorithm performs O(logU) Delta-phases. When Delta is equal
to 1 there is no difference between the capacity scaling algorithm and the Edmonds-Karp
algorithm, which is why the algorithm works correctly.

We can obtain a path with the capacity of at least Delta fairly easily - in O(m) time (by using BFS).
At the first phase we can set Delta to equal either U or the largest power of 2 that doesn’t exceed
U.
The proof of the following lemma is left to the reader.

Lemma 4.  At every Delta-phase the algorithm performs O(m) augmentations in worst case.
Sketch to proof. Use the induction by Delta to justify that the minimum cut at each Delta-scaling
phase less that 2m Delta.
Applying lemma 4 yields the following result:

Theorem 4.  Running time of the capacity scaling algorithm is O(m logU).
2
Keep in mind that there is no difference between using breadth-first search and depth-first search
when finding an augmenting path. However, in practice, there is a big difference, and we will see
it.

Improved Capacity Scaling Algorithm, O(nmlogU)


In the previous section we described an O(m logU) algorithm for finding the maximum flow. We
2

are going to improve the running time of this algorithm to O(nmlogU) [1].


Now let’s look at each Delta-phase independently. Recall from the preceding section that a Delta-
scaling phase contains O(m) augmentations. Now we use the similar technique at each Delta-
phase that we used when describing the improved variant of the shortest augmenting path
algorithm. At every phase we have to find the “maximum” flow by using only paths with
capacities equal to at least Delta. The complexity analysis of the improved shortest augmenting
path algorithm implies that if the algorithm is guaranteed to perform O(m)augmentations, it
would run in O(nm) time because the time for augmentations reduces from O(n m) toO(nm) and
2

all other operations, as before, require O(nm) time. These reasoning instantly yield a bound
ofO(nmlogU) on the running time of the improved capacity scaling algorithm.
Unfortunately, this improvement hardly decreases the running time of the algorithm in practice.

Practical Analysis and Comparison


Now let’s have some fun. In this section we compare all described algorithms from practical point
of view. With this purpose I have made some test cases with the help of [8] and divided them into
three groups by density. The first group of tests is made of networks with m ≤ n1.4 – some kind of
sparse networks. The second one consists of middle density tests with n1.6 ≤ m ≤ n1.7. And the
last group represents some kinds of almost full graphs (including full acyclic networks) with m ≥
n1.85.
I have simple implementations of all algorithms described before. I realized these algorithms
without any kind of tricks and with no preference towards any of them. All implementations use
adjacency list for representing a network.

Let’s start with the first group of tests. These are 564 sparse networks with number of vertexes
limited by 2000 (otherwise, all algorithms work too fast). All working times are given in
milliseconds.
Figure 3. Comparison on sparse networks. 564 test cases. m ≤ n1.4.
As you can see, it was a big mistake to try Dijkstra’s without heap implementation of the
maximum capacity path algorithm on sparse networks (and it’s not surprising); however, its heap
implementation works rather faster than expected. Both the capacity scaling algorithms (with
using DFS and BFS) work in approximately the same time, while the improved implementation is
almost 2 times faster. Surprisingly, the improved shortest path algorithm turned out to be the
fastest on sparse networks.

Now let’s look at the second group of test cases. It is made of 184 tests with middle density. All
networks are limited to 400 nodes.

Figure 4. Comparison on middle density networks. 184 test cases. n1.6 ≤ m ≤ n1.7.


On the middle density networks the binary search implementation of the maximum capacity path
algorithm leaves much to be desired, but the heap implementation still works faster than the
simple (without heap) one. The BFS realization of the capacity scaling algorithm is faster than the
DFS one. The improved scaling algorithm and the improved shortest augmenting path algorithm
are both very good in this case.

It is very interesting to see how these algorithms run on dense networks. Let’s take a look — the
third group is made up of 200 dense networks limited by 400 vertexes.

Figure 5. Comparison on dense networks. 200 test cases. m ≥ n1.85.


Now we see the difference between the BFS and DFS versions of the capacity scaling algorithm. It
is interesting that the improved realization works in approximately the same time as the
unimproved one. Unexpectedly, the Dijkstra’s with heap implementation of the maximum
capacity path algorithm turned out to be faster than one without heap.

Without any doubt, the improved implementation of Edmonds-Karp algorithm wins the game.
Second place is taken by the improved scaling capacity algorithm. And the scaling capacity with
BFS got bronze.

As to maximum capacity path, it is better to use one variant with heap; on sparse networks it
gives very good results. Other algorithms are really only good for theoretical interest.
As you can see, the O(nmlogU) algorithm isn’t so fast. It is even slower than the O(n m) algorithm.
2

The O(nm )algorithm (it is the most popular) has worse time bounds, but it works much faster
2

than most of the other algorithms with better time bounds.


My recommendation: Always use the scaling capacity path algorithm with BFS, because it is very
easy to implement. The improved shortest augmenting path algorithm is rather easy, too, but you
need to be very careful to write the program correctly. During the challenge it is very easy to miss
a bug.

I would like to finish the article with the full implementation of the improved shortest augmenting
path algorithm. To maintain a network I use the adjacency matrix with purpose to providing best
understanding of the algorithm. It is not the same realization what was used during our practical
analysis. With the “help” of the matrix it works a little slower than one that uses adjacency list.
However, it works faster on dense networks, and it is up to the reader which data structure is best
for them.

#include

#define N 2007 // Number of nodes

#define oo 1000000000 // Infinity

// Nodes, Arcs, the source node and the sink node

int n, m, source, sink;

// Matrixes for maintaining

// Graph and Flow

int G[N][N], F[N][N];


int pi[N]; // predecessor list

int CurrentNode[N]; // Current edge for each node

int queue[N]; // Queue for reverse BFS

int d[N]; // Distance function

int numbs[N]; // numbs[k] is the number of nodes i with d[i]==k

// Reverse breadth-first search

// to establish distance function d

int rev_BFS() {

int i, j, head(0), tail(0);

// Initially, all d[i]=n

for(i = 1; i <= n; i++)

numbs[ d[i] = n ] ++;

// Start from the sink

numbs[n]--;

d[sink] = 0;

numbs[0]++;
queue[ ++tail ] = sink;

// While queue is not empty

while( head != tail ) {

i = queue[++head]; // Get the next node

// Check all adjacent nodes

for(j = 1; j <= n; j++) {

// If it was reached before or there is no edge

// then continue

if(d[j] < n || G[j][i] == 0) continue;

// j is reached first time

// put it into queue

queue[ ++tail ] = j;

// Update distance function

numbs[n]--;

d[j] = d[i] + 1;
numbs[d[j]]++;

return 0;

// Augmenting the flow using predecessor list pi[]

int Augment() {

int i, j, tmp, width(oo);

// Find the capacity of the path

for(i = sink, j = pi[i]; i != source; i = j, j = pi[j]) {

tmp = G[j][i];

if(tmp < width) width = tmp;

// Augmentation itself

for(i = sink, j = pi[i]; i != source; i = j, j = pi[j]) {

G[j][i] -= width; F[j][i] += width;


G[i][j] += width; F[i][j] -= width;

return width;

// Relabel and backtrack

int Retreat(int &i) {

int tmp;

int j, mind(n-1);

// Check all adjacent edges

// to find nearest

for(j=1; j <= n; j++)

// If there is an arc

// and j is "nearer"

if(G[i][j] > 0 && d[j] < mind)

mind = d[j];

tmp = d[i]; // Save previous distance


// Relabel procedure itself

numbs[d[i]]--;

d[i] = 1 + mind;

numbs[d[i]]++;

// Backtrack, if possible (i is not a local variable! )

if( i != source ) i = pi[i];

// If numbs[ tmp ] is zero, algorithm will stop

return numbs[ tmp ];

// Main procedure

int find_max_flow() {

int flow(0), i, j;

rev_BFS(); // Establish exact distance function

// For each node current arc is the first arc

for(i=1; i<=n; i++) CurrentNode[i] = 1;


// Begin searching from the source

i = source;

// The main cycle (while the source is not "far" from the sink)

for( ; d[source] < n ; ) {

// Start searching an admissible arc from the current arc

for(j = CurrentNode[i]; j <= n; j++)

// If the arc exists in the residual network

// and if it is an admissible

if( G[i][j] > 0 && d[i] == d[j] + 1 )

// Then finish searhing

break;

// If the admissible arc is found

if( j <= n ) {

CurrentNode[i] = j; // Mark the arc as "current"

pi[j] = i; // j is reachable from i

i = j; // Go forward

// If we found an augmenting path


if( i == sink ) {

flow += Augment(); // Augment the flow

i = source; // Begin from the source again

// If no an admissible arc found

else {

CurrentNode[i] = 1; // Current arc is the first arc again

// If numbs[ d[i] ] == 0 then the flow is the maximal

if( Retreat(i) == 0 )

break;

} // End of the main cycle

// We return flow value

return flow;

}
// The main function

// Graph is represented in input as triples

// No comments here

int main() {

int i, p, q, r;

scanf("%d %d %d %d", &n, &m, &source, &sink);

for(i = 0; i < m; i++) {

scanf("%d %d %d", &p, &q, &r);

G[p][q] += r;

printf("%d", find_max_flow());

return 0;

References
[1]  Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows: Theory,
Algorithms, and Applications.
[2]  Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest. Introduction to Algorithms.
[3]  Ford, L. R., and D. R. Fulkerson. Maximal flow through a network.
[4]  Norman Zadeh. Theoretical Efficiency of the Edmonds-Karp Algorithm for Computing Maximal
Flows.
[5]   _efer_. Algorithm Tutorial: MaximumFlow.
[6]   gladius. Algorithm Tutorial: Introduction to graphs and their data structures: Section 1.
[7]   gladius. Algorithm Tutorial: Introduction to graphs and their data structures: Section 3.
[8]   http://elib.zib.de/pub/mp-testdata/generators/index.html -- A number of generators for
network flow problems.

Multi-commodity flow problem


The multi-commodity flow problem is a network flow problem with multiple commodities (flow
demands) between different source and sink nodes.

Contents

 1 Definition
 2 Relation to other problems
 3 Usage
 4 Solutions
 5 External resources
 6 References

Definition
Given a flow network  , where edge   has capacity  . There are   
commodities  , defined by  , where   and  is
the source and sink of commodity  , and   is the demand. The flow of commodity   along
edge   is  . Find an assignment of flow which satisfies the constraints:

Capacity constraints:

Flow conservation:
Demand satisfaction:

In the minimum cost multi-commodity flow problem, there is a cost   for


sending flow on  . You then need to minimize

In the maximum multi-commodity flow problem, there are no hard demands on each


commodity, but the total throughput is maximised:

In the maximum concurrent flow problem, the task is to maximise the minimal fraction
of the flow of each commodity to its demand:

Relation to other problems


The minimum cost variant is a generalisation of the minimum cost flow problem.
Variants of the circulation problem are generalisations of all flow problems.

Usage
Routing and wavelength assignment (RWA) in optical burst switching of Optical
Network would be approached via multi-commodity flow formulas.

Solutions
In the decision version of problems, the problem of producing an integer flow
satisfying all demands is NP-complete,[1] even for only two commodities and unit
capacities (making the problem strongly NP-complete in this case).
If fractional flows are allowed, the problem can be solved in polynomial time
through linear programming.[2] Or through (typically much faster) fully polynomial
time approximation schemes.[3]

Maximum flow problem


In optimization theory, maximum flow problems involve finding a feasible flow through a single-
source, single-sink flow network that is maximum.

The maximum flow problem can be seen as a special case of more complex network flow problems,
such as thecirculation problem. The maximum value of an s-t flow (i.e., flow from source s to sink t)
is equal to the minimum capacity of an s-t cut (i.e., cut severing s from t) in the network, as stated in
the max-flow min-cut theorem.

Contents

 1 History
 2 Definition
 3 Solutions
 4 Integral flow theorem
 5 Application
o 5.1 Multi-source multi-sink maximum flow problem
o 5.2 Minimum path cover in directed acyclic graph
o 5.3 Maximum cardinality bipartite matching
o 5.4 Maximum flow problem with vertex capacities
o 5.5 Maximum edge-disjoint path
o 5.6 Maximum independent (vertex-disjoint) path
 6 Real world applications
o 6.1 Baseball Elimination
o 6.2 Airline scheduling
o 6.3 Circulation-demand problem
o 6.4 Fairness in car sharing (carpool)
 7 See also
 8 References
 9 Further reading

History
The maximum flow problem was first formulated in 1954 by T. E. Harris and F. S. Ross as a
simplified model of Soviet railway traffic flow.[1][2][3] In 1955, Lester R. Ford, Jr. andDelbert R.
Fulkerson created the first known algorithm, the Ford–Fulkerson algorithm.[4][5]

Over the years, various improved solutions to the maximum flow problem were discovered, notably
the shortest augmenting path algorithm of Edmonds and Karp and independently Dinitz; the blocking
flow algorithm of Dinitz; the push-relabel algorithm of Goldberg and Tarjan; and the binary blocking
flow algorithm of Goldberg and Rao. The electrical flow algorithm of Christiano, Kelner, Madry, and
Spielman finds an approximately optimal maximum flow but only works in undirected graphs.[6][7]

Definition
A flow network, with source s and sink t. The numbers next to the edge are the capacities.

Let   be a network with   being the source and the sink of   respectively.

The capacity of an edge is a mapping  , denoted by   or  . It represents the


maximum amount of flow that can pass through an edge.
A flow is a mapping  , denoted by   or  , subject to the following two
constraints:

1.  , for each   (capacity constraint: the flow of an edge cannot exceed its
capacity)
2.  , for each   (conservation of flows: the sum of
the flows entering a node must equal the sum of the flows exiting a node, except for the
source and the sink nodes)
The value of flow is defined by  , where   is the source of  . It represents
the amount of flow passing from the source to the sink.

The maximum flow problem is to maximize  , that is, to route as much flow as


possible from   to  .

Solutions
We can define the Residual Graph, which provides a systematic way to search for
forward-backward operations in order to find the maximum flow.

Given a flow network  , and a flow   on  , we define the residual graph   of   with
respect to   as follows.

1. The node set of   is the same as that of  .

2. Each edge   of   is with a capacity of  .

3. Each edge   of   is with a capacity of  .

The following table lists algorithms for solving the maximum flow problem.
Method Complexity Description

Constraints given by the


Linear
definition of a legal flow. See
programming
the linear program here.

As long as there is an open path


through the residual graph, send
the minimum of the residual
capacities on the path.
Ford–Fulkerson
O(E max| f |)
algorithm The algorithm works only if all
weights are integers. Otherwise
it is possible that the Ford–
Fulkerson algorithm will not
converge to the maximum value.

A specialization of Ford–
Edmonds–Karp
O(VE2) Fulkerson, finding augmenting
algorithm
paths with breadth-first search.

In each phase the algorithms


builds a layered graph
with breadth-first search on
the residual graph. The
Dinic's blocking maximum flow in a layered graph
O(V2E)
flow algorithm can be calculated in O(VE) time,
and the maximum number of the
phases is n-1. In networks with
unit capacities, Dinic's algorithm
terminates in   time.

General push- O(V2E) The push relabel algorithm


relabel maintains a preflow, i.e. a flow
maximum flow function with the possibility of
algorithm excess in the vertices. The
algorithm runs while there is a
vertex with positive excess, i.e.
an active vertex in the graph.
The push operation increases
the flow on a residual edge, and
a height function on the vertices
controls which residual edges
can be pushed. The height
function is changed with a
relabel operation. The proper
definitions of these operations
guarantee that the resulting flow
function is a maximum flow.

Push-relabel algorithm variant


which always selects the most
Push-relabel
recently active vertex, and
algorithm
O(V )
3
performs push operations until
withFIFO vertex
the excess is positive or there
selection rule
are admissible residual edges
from this vertex.

The dynamic trees data structure


Dinic's speeds up the maximum flow
O(VE log(V))
algorithm computation in the layered graph
to O(Elog(V)).

The algorithm builds limited size


Push-relabel trees on the residual graph
algorithm with O(VE log(V /E))
2
regarding to height function.
dynamic trees These trees provide multilevel
push operations.

The value U corresponds to the


Binary blocking
maximum capacity of the
flow algorithm[8]
network.

MPM O(V3) Refer to the Original Paper.


(Malhotra,
Pramodh-
Kumar and
Maheshwari)
algorithm

Orlin's algorithm solves max-flow


Jim Orlin's +
in O(VE) time
KRT (King,
O(VE) for   while
Rao, Tarjan)'s
KRT solves it in O(VE)
algorithm
for 

For a more extensive list, see[9]

Integral flow theorem


The integral flow theorem states that

If each edge in a flow network has integral capacity, then there exists an integral
maximal flow.

Application
Multi-source multi-sink maximum flow problem

Fig. 4.1.1. Transformation of a multi-source multi-sink maximum flow problem into a


single-source single-sink maximum flow problem

Given a network N = (V,E) with a set of sources S = {s1, ..., sn} and a set of sinks T =
{t1, ..., tm} instead of only one source and one sink, we are to find the maximum flow
across N. We can transform the multi-source multi-sink problem into a maximum
flow problem by adding a consolidated source connecting to each vertex in S and
a consolidated sink connected by each vertex in T (also known
as supersourceand supersink) with infinite capacity on each edge (See Fig. 4.1.1.).

Minimum path cover in directed acyclic graph


Given a directed acyclic graph G = (V, E), we are to find the minimum number
of vertex-disjoint paths to cover each vertex in V. We can construct a bipartite
graph G' = (Vout∪Vin, E' ) from G, where

1. Vout = {v∈V: v has positive out-degree}.


2. Vin = {v∈V: v has positive in-degree}.
3. E' = {(u,v)∈Vout×Vin: (u,v)∈E}.

Then it can be shown, via König's theorem, that G' has a matching of size m if and
only if there exists n-m vertex-disjoint paths that cover each vertex in G, where n is
the number of vertices in G. Therefore, the problem can be solved by finding the
maximum cardinality matching in G' instead.

Maximum cardinality bipartite matching

Fig. 4.3.1. Transformation of a maximum bipartite matching problem into a maximum


flow problem

Given a bipartite graph G = (X∪Y, E), we are to find a maximum cardinality


matching in G, that is a matching that contains the largest possible number of
edges. This problem can be transformed into a maximum flow problem by
constructing a network N = (X∪Y∪{s,t), E' }, where

1. E' contains the edges in G directed from X to Y.


2. (s,x)∈E' for each x∈X and (y,t)∈E' for each y∈Y.
3. c(e) = 1 for each e∈E' (See Fig. 4.3.1).

Then the value of the maximum flow in N is equal to the size of the maximum
matching in G.

Maximum flow problem with vertex capacities


Fig. 4.4.1. Transformation of a maximum flow problem with vertex capacities constraint
into the original maximum flow problem by node splitting

Given a network  , in which there is capacity at each node in addition


to edge capacity, that is, a mapping  , denoted by  , such that the
flow   has to satisfy not only the capacity constraint and the conservation of flows,
but also the vertex capacity constraint

In other words, the amount of flow passing through a vertex cannot exceed its
capacity. To find the maximum flow across  , we can transform the problem into
the maximum flow problem in the original sense by expanding  . First,
each   is replaced by   and  , where   is connected by edges going
into   and   is connected to edges coming out from  , then assign
capacity   to the edge connecting   and   (see Fig. 4.4.1, but note that it
has incorrectly swapped   and  ). In this expanded network, the vertex
capacity constraint is removed and therefore the problem can be treated as the
original maximum flow problem.

Maximum edge-disjoint path


Given a directed graph G = (V, E) and two vertices s and t, we are to find the
maximum number of edge-disjoint paths from s to t. This problem can be
transformed to a maximum flow problem by constructing a network N = (V, E)
from G with s and t being the source and the sink of N respectively and assign each
edge with unit capacity.

Maximum independent (vertex-disjoint) path


Given a directed graph G = (V, E) and two vertices s and t, we are to find the
maximum number of independent paths from s to t. Two paths are said to be
independent if they do not have a vertex in common apart from s and t. We can
construct a network N = (V, E) from G with vertex capacities, where
1. s and t are the source and the sink of N respectively.
2. c(v) = 1 for each v∈V.
3. c(e) = 1 for each e∈E.

Then the value of the maximum flow is equal to the maximum number of
independent paths from s to t.

Real world applications


Baseball Elimination

Construction of network flow for baseball elimination problem

In the Baseball Elimination Problem there are n teams competing in a league. At a


specific stage of the league season, wi is the number of wins and ri is the number of
games left to play for team i and rij is the number of games left against team j. A
team is eliminated if it has no chance to finish the season in the first place. The task
of Baseball Elimination Problem is to determine which teams are eliminated at each
point during the season. Schwartz  proposed a method which reduces this problem
to maximum network flow. In this method a network is created to determine whether
team k is eliminated.

Let G = (V, E) be a network with s,t ∈ V being the source and the sink respectively.
We add a game node {i,j} with i < j to V, and connect each of them from s by an
edge with capacity rij — which represents the number of plays between these two
teams. We also add a team node for each team and connect each game node {i,j}
with to team nodes i and j to ensure one of them wins. We do not need to restrict
the flow value on these edges. Finally, we make edges from team node i to the sink
node t and set the capacity of wk+rk–wi to prevent team i from winning more
than wk+rk. Let S be the set of all team participating in the league and
let  . In this method it is claimed team k is not
eliminated if and only if a flow value of size r(S - {k}) exists in network G. In the
mentioned article it is proved that this flow value is the maximum flow value
from s to t.

Airline scheduling
In the airline industry a major problem is the scheduling of the flight crews. Airline
scheduling problem could be considered as an application of extended maximum
network flow. The input of this problem is a set of flights F which contains the
information about where and when each flight departs and arrives. In one version of
Airline Scheduling the goal is to produce a feasible schedule with at most k crews.

In order to solve this problem we use a variation of circulation problem called


bounded circulation which is the generalization of network flow problems, with the
added constraint of a lower bound on edge flows.

Let G = (V, E) be a network with s,t ∈ V as the source and the sink nodes. For the
source and destination of every flight i we add two nodes to V, node si as the source
and nodedi as the destination node of flight i. We also add the following edges to E:

1. An edge with capacity [0, 1] between s and each si.


2. An edge with capacity [0, 1] between each di and t.
3. An edge with capacity [1, 1] between each pair of si and di.
4. An edge with capacity [0, 1] between each di and sj, if source sj is reachable
with a reasonable amount of time and cost from the destination of flight i.
5. An edge with capacity [0, ∞] between s and t.

In the mentioned method, it is claimed and proved that finding a flow value
of k in G between s and t is equal to finding a feasible schedule for flight set F with
at most k crews.[11]

Another version of Airline Scheduling is finding the minimum needed crews to


perform all the flights. In order to find an answer to this problem we create a
bipartite graph G’ = (A∪ B, E) where each flight has a copy in set A and set B. If the
same plane can perform flight j after flight i, connect i∈A to j∈B. A matching
in G’ induces a schedule for F and obviously maximum bipartite matching in this
graph produces the an airline schedule with minimum number of crews.[11] As it is
mentioned in the Application part of this article, the maximum cardinality bipartite
matching is an application of maximum flow problem.

Circulation-demand problem
There are some factories that produce goods and some villages where the goods
have to be delivered. They are connected by a networks of roads with each road
having a capacity   for maximum goods that can flow through it. The problem is to
find if there is a circulation that satisfies the demand. This problem can be
transformed into a max-flow problem.

1. Add a source node   and add edges from it to every factory node   with
capacity   where   is the production rate of factory  .
2. Add a sink node   and add edges from all villages   to   with capacity   
where   is the demand rate of village  .

Let G = (V, E) be this new network. There exists a circulation that satisfies the
demand if and only if :

If there exists a circulation, looking at the max-flow solution would give us the
answer as to how much goods have to be send on a particular road for
satisfying the demands.

Fairness in car sharing (carpool)


The problem exactly is that   people are pooling a car for   days. Each
participant can choose if he participates on each day. We aim to fairly decide
who will be driving on a given day.
The solution is the following:
We can decide this on the basis of the number of people using the car i.e.; if   
people use the car on a day, each person has a responsibility of  . Thus,
for every person  , his driving obligation   as shown. Then person   is
required to drive only   times in   days.
Our aim is to find if such a setting is possible. For this we try to make the
problem as a network, as we can see in the figure.
Now, it can be proved that if a flow   exists then such a fair setting exists and
such a flow of value   always exists.

References[edit]
1. Jump up^ Schrijver, A. (2002). "On the history of the transportation and
maximum flow problems". Mathematical Programming 91 (3): 437–
445. doi:10.1007/s101070100259. edit

2.Jump up^ Gass, Saul I.; Assad, Arjang A. (2005). "Mathematical, algorithmic


and professional developments of operations research from 1951 to 1956". An

Minimum-cost flow problem


The minimum-cost flow problem (MCFP) is to find the cheapest possible way of sending a certain
amount of flow through a flow network. Typical application of this problem involves finding the best
delivery route from a factory to a warehouse where the road network has some capacity and cost
associated. The minimum cost flow problem is one of the most fundamental among all flow and
circulation problems because most other such problems can be cast as a minimum cost flow
problem and also that it can be solved very efficiently using the network simplex algorithm.

Contents

 1 Definition
 2 Relation to other problems
 3 Solutions
 4 Application
o 4.1 Minimum weight bipartite matching
 5 See also
 6 References
 7 External links

Definition
Given a flow network, that is, a directed graph   with source   and sink  ,
where edge   has capacity  , flow   and cost   (most
minimum-cost flow algorithms support edges with negative costs). The cost of sending this flow
is  . You are required to send an amount of flow   from   to  .

The definition of the problem is to minimize the total cost of the flow:

with the constraints


Capacity constraints:

Skew symmetry:

Flow conservation:

Required flow:

Relation to other problems


A variation of this problem is to find a flow which is maximum, but has the lowest cost
among the maximums. This could be called a minimum-cost maximum-flow problem. This is
useful for finding minimum cost maximum matchings.

With some solutions, finding the minimum cost maximum flow instead is straightforward. If
not, you can do a binary search on  .

A related problem is the minimum cost circulation problem, which can be used for solving
minimum cost flow. You do this by setting the lower bound on all edges to zero, and then
make an extra edge from the sink   to the source  , with capacity   and lower
bound  , forcing the total flow from   to   to also be  .

The problem can be specialized into two other problems:

 if the capacity constraint is removed, the problem is reduced to the shortest path


problem,
 if the costs are all set equal to zero, the problem is reduced to the maximum flow
problem.

Solutions
The minimum cost flow problem can be solved by linear programming, since we optimize a
linear function, and all constraints are linear.

Apart from that, many combinatorial algorithms exist, for a comprehensive survey, see [1].
Some of them are generalizations of maximum flow algorithms, others use entirely different
approaches.

Well-known fundamental algorithms (they have many variations):

 Cycle canceling: a general primal method.[2]


 Minimum mean cycle canceling: a simple strongly polynomial algorithm.[3]
 Successive shortest path and capacity scaling: dual methods, which can be viewed as
the generalizations of the Ford–Fulkerson algorithm.[4]
 Cost scaling: a primal-dual approach, which can be viewed as the generalization of
the push-relabel algorithm.[5]
 Network simplex: a specialized version of the linear programming simplex method,
which runs in polynomial time.[6]
 Out-of-kilter algorithm by D. R. Fulkerson

Application
Minimum weight bipartite matching

Reducing Minimum weight bipartite matching to minimum cost max flow problem

Given an bipartite graph G = (A ∪ B, E), we would like to find the maximum cardinality
matching in G that has minimum cost. Let w: E → Rbe a weight function on the edges of E.
The minimum weight bipartite matching problem or assignment problem is to find a perfect
matching M ⊆ E whose total weight is minimized. The idea is to reduce this problem to a
network flow problem.

Let G’ = (V’ = A ∪ B, E’ = E). Assign the capacity of all the edges in E’ to 1. Add a source
vertex s and connect it to all the vertices in A’ and add a sink vertex t and connect all
vertices inside group B’ to this vertex. The capacity of all the new edges is 1 and their costs
is 0. It is proved that there is minimum weight perfect bipartite matching in G if and only if
there a minimum cost flow in G’. [7]

References
1. ^ Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin (1993). Network
Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc. ISBN 0-13-617549-
X.
2. ^ Morton Klein (1967). "A primal method for minimal cost flows with applications to
the assignment and transportation problems". Management Science 14: 205–
220.doi:10.1287/mnsc.14.3.205.
3. ^ Andrew V. Goldberg and Robert E. Tarjan (1989). "Finding minimum-cost
circulations by canceling negative cycles". Journal of the ACM 36 (4): 873–
886.doi:10.1145/76359.76368.
4. ^ Jack Edmonds and Richard M. Karp (1972). "Theoretical improvements in
algorithmic efficiency for network flow problems". Journal of the ACM 19 (2): 248–
264.doi:10.1145/321694.321699.
5. ^ Andrew V. Goldberg and Robert E. Tarjan (1990). "Finding minimum-cost
circulations by successive approximation". Math. Oper. Res. 15 (3): 430–
466.doi:10.1287/moor.15.3.430.
6. ^ James B. Orlin (1997). "A polynomial time primal network simplex algorithm for
minimum cost flows". Mathematical Programming 78: 109–
129.doi:10.1007/bf02614365.
Semi-infinite programming
In optimization theory, semi-infinite programming (SIP) is an optimization problem with a finite
number of variables and an infinite number of constraints, or an infinite number of variables and a
finite number of constraints. In the former case the constraints are typically parameterized.[1]

Contents

 1 Mathematical formulation of the problem


 2 Methods for solving the problem
 3 Examples
 4 See also
 5 References
 6 External links

Mathematical formulation of the problemv


The problem can be stated simply as:

where

SIP can be seen as a special case of bilevel programs (multilevel programming) in which the lower-
level variables do not participate in the objective function.

S-ar putea să vă placă și