Documente Academic
Documente Profesional
Documente Cultură
se verifică; aceasta înseamnă că, în anumite bile ale lui toate valorile funcțiilor sunt
mai mari sau egale decât valoarea în acel punct. Maximul local se definește similar. În
general, minimul local este simplu de găsit — informații adiționale despre problemă
(spre exemplu, funcția este convexă) sunt necesare pentru a fi siguri că soluția
problemei este minimul global.
Un exemplu este cel al problemelor de amestec. Frecvent se pune chestiunea maximizării sau
minimizării unui atribut al amestecului, de obicei costul prin raportare la compozitia amestecului.
De asemenea se pot formula chestiuni referitor la extremul compoziției dintr-un anumit element
chimic dintr-un amestec.
Programarea liniara
1 History
2 Uses
3 Standard form
o 3.1 Example
4 Augmented form (slack form)
o 4.1 Example
5 Duality
o 5.1 Example
o 5.2 Another example
6 Covering/packing dualities
o 6.1 Examples
7 Complementary slackness
8 Theory
o 8.1 Existence of optimal solutions
o 8.2 Optimal vertices (and rays) of polyhedra
9 Algorithms
o 9.1 Basis exchange algorithms
9.1.1 Simplex algorithm of Dantzig
9.1.2 Criss-cross algorithm
o 9.2 Interior point
9.2.1 Ellipsoid algorithm, following Khachiyan
9.2.2 Projective algorithm of Karmarkar
9.2.3 Path-following algorithms
o 9.3 Comparison of interior-point methods versus simplex algorithms
o 9.4 Approximate Algorithms for Covering/Packing LPs
10 Open problems and recent work
11 Integer unknowns
12 Integral linear programs
13 Solvers and scripting (programming) languages
14 See also
15 Notes
16 References
History
Leonid Kantorovich
The problem of solving a system of linear inequalities dates back at least as far as Fourier, who in
1827 published a method for solving them,[1] and after whom the method of Fourier–Motzkin
elimination is named.
The first linear programming formulation of a problem that is equivalent to the general linear
programming problem was given by Leonid Kantorovich in 1939, who also proposed a method for
solving it.[2] He developed it during World War II as a way to plan expenditures and returns so as to
reduce costs to the army and increase losses incurred by the enemy. About the same time as
Kantorovich, the Dutch-American economist T. C. Koopmans formulated classical economic
problems as linear programs. Kantorovich and Koopmans later shared the 1975 Nobel prize in
economics.[1] In 1941, Frank Lauren Hitchcock also formulated transportation problems as linear
programs and gave a solution very similar to the later Simplex method;[2] Hitchcock had died in 1957
and the Nobel prize is not awarded posthumously.
The linear-programming problem was first shown to be solvable in polynomial time by Leonid
Khachiyan in 1979, but a larger theoretical and practical breakthrough in the field came in 1984
when Narendra Karmarkar introduced a new interior-point method for solving linear-programming
problems.
Uses
Linear programming is a considerable field of optimization for several reasons. Many practical
problems in operations research can be expressed as linear programming problems. Certain special
cases of linear programming, such as network flow problems and multicommodity flow problems are
considered important enough to have generated much research on specialized algorithms for their
solution. A number of algorithms for other types of optimization problems work by solving LP
problems as sub-problems. Historically, ideas from linear programming have inspired many of the
central concepts of optimization theory, such as duality, decomposition, and the importance
of convexityand its generalizations. Likewise, linear programming is heavily used
in microeconomics and company management, such as planning, production, transportation,
technology and other issues. Although the modern management issues are ever-changing, most
companies would like to maximize profits or minimize costs with limited resources. Therefore, many
issues can be characterized as linear programming problems.
Standard form
Standard form is the usual and most intuitive form of describing a linear programming problem. It
consists of the following three parts:
Other forms, such as minimization problems, problems with constraints on alternative forms, as well
as problems involving negative variables can always be rewritten into an equivalent problem in
standard form.
Example
Suppose that a farmer has a piece of farm land, say L km2, to be planted with either wheat or barley
or some combination of the two. The farmer has a limited amount of fertilizer,F kilograms, and
insecticide, P kilograms. Every square kilometer of wheat requires F1 kilograms of fertilizer
and P1 kilograms of insecticide, while every square kilometer of barley requires F2 kilograms of
fertilizer and P2 kilograms of insecticide. Let S1 be the selling price of wheat per square kilometer,
and S2 be the selling price of barley. If we denote the area of land planted with wheat and barley
by x1 and x2 respectively, then profit can be maximized by choosing optimal values for x1 and x2. This
problem can be expressed with the following linear programming problem in the standard form:
(limit on fertilizer)
(limit on insecticide)
maximize
subject to
Maximize Z:
x, xs ≥ 0
where xs are the newly introduced slack variables, and Z is the variable to be maximized.
Subject (augmented
to: constraint)
(augmented
constraint)
(augmented
constraint)
where are (non-negative) slack variables, representing in this example the unused area,
the amount of unused fertilizer, and the amount of unused insecticide.
Maximize Z:
Duality
Every linear programming problem, referred to as a primal problem, can be converted into a dual
problem, which provides an upper bound to the optimal value of the primal problem. In matrix form,
we can express the primal problem as:
Maximize cTx subject to Ax ≤ b, x ≥ 0;
Maximize cTx subject to Ax ≤ b;
There are two ideas fundamental to duality theory. One is the fact that (for the symmetric dual) the
dual of a dual linear program is the original primal linear program. Additionally, every feasible
solution for a linear program gives a bound on the optimal value of the objective function of its dual.
The weak duality theorem states that the objective function value of the dual at any feasible solution
is always greater than or equal to the objective function value of the primal at any feasible solution.
The strong duality theorem states that if the primal has an optimal solution, x*, then the dual also has
an optimal solution, y*, and cTx*=bTy*.
A linear program can also be unbounded or infeasible. Duality theory tells us that if the primal is
unbounded then the dual is infeasible by the weak duality theorem. Likewise, if the dual is
unbounded, then the primal must be infeasible. However, it is possible for both the dual and the
primal to be infeasible. As an example, consider the linear program:
Maximize:
Subject to:
Example
Revisit the above example of the farmer who may grow wheat and barley with the set provision of
some L land, F fertilizer and P pesticide. Assume now that y unit prices for each of these means of
production (inputs) are set by a planning board. The planning board's job is to minimize the total cost
of procuring the set amounts of inputs while providing the farmer with a floor on the unit price of each
of his crops (outputs), S1 for wheat and S2 for barley. This corresponds to the following linear
programming problem:
Minimize:
Subject to:
The primal problem deals with physical quantities. With all inputs available in limited quantities, and
assuming the unit prices of all outputs is known, what quantities of outputs to produce so as to
maximize total revenue? The dual problem deals with economic values. With floor guarantees on all
output unit prices, and assuming the available quantity of all inputs is known, what input unit pricing
scheme to set so as to minimize total expenditure?
To each variable in the primal space corresponds an inequality to satisfy in the dual space, both
indexed by output type. To each inequality to satisfy in the primal space corresponds a variable in
the dual space, both indexed by input type.
The coefficients that bound the inequalities in the primal space are used to compute the objective in
the dual space, input quantities in this example. The coefficients used to compute the objective in
the primal space bound the inequalities in the dual space, output unit prices in this example.
Both the primal and the dual problems make use of the same matrix. In the primal space, this matrix
expresses the consumption of physical quantities of inputs necessary to produce set quantities of
outputs. In the dual space, it expresses the creation of the economic values associated with the
outputs from set input unit prices.
Since each inequality can be replaced by an equality and a slack variable, this means each primal
variable
minimize
subject to ,
,
corresponds to a dual slack variable, and each dual variable corresponds to a primal slack variable.
This relation allows us to speak about complementary slackness.
Another example
Sometimes, one may find it more intuitive to obtain the dual program without looking at the program
matrix. Consider the following linear program:
minimize
subject to ,
Since this is a minimization problem, we would like to obtain a dual program that is a lower bound of
the primal. In other words, we would like the sum of all right hand side of the constraints to be the
maximal under the condition that for each primal variable the sum of its coefficients do not exceed its
coefficient in the linear function. For example, x1appears in n + 1 constraints. If we sum its
constraints' coefficients we get a1,1y1 + a1,2y2 + ... + a1,nyn + f1s1. This sum must be at most c1. As a
result we get:
Note that we assume in
our calculations steps
maximize
that the program is in
standard form.
subject to , However, any linear
program may be
transformed to standard
, form and it is therefore
not a limiting factor.
, Minimize: bTy,
Subject to: ATy ≥ c, y ≥ 0,
Maximize: cTx,
Subject to: Ax ≤ b, x ≥ 0,such that the matrix A and the vectors b and c are non-negative.
Examples
Covering and packing LPs commonly arise as a linear programming relaxation of a combinatorial
problem and are important in the study of approximation algorithms.[4] For example, the LP
relaxations of the set packing problem, the independent set problem, and the matching problem are
packing LPs. The LP relaxations of the set cover problem, the vertex cover problem, and
the dominating set problem are also covering LPs.
Finding a fractional coloring of a graph is another example of a covering LP. In this case, there is
one constraint for each vertex of the graph and one variable for eachindependent set of the graph.
Complementary slackness
It is possible to obtain an optimal solution to the dual when only an optimal solution to the primal is
known using the complementary slackness theorem. The theorem states:
wi yi = 0, for i = 1, 2, ... , m.
So if the i-th slack variable of the primal is not zero, then the i-th variable of the dual is equal to zero.
Likewise, if the j-th slack variable of the dual is not zero, then the j-th variable of the primal is equal
to zero.
This necessary condition for optimality conveys a fairly simple economic principle. In standard form
(when maximizing), if there is slack in a constrained primal resource (i.e., there are "leftovers"), then
additional quantities of that resource must have no value. Likewise, if there is slack in the dual
(shadow) price non-negativity constraint requirement, i.e., the price is not zero, then there must be
scarce supplies (no "leftovers").
Theory
Existence of optimal solutions
Geometrically, the linear constraints define the feasible region, which is a convex polyhedron.
A linear function is a convex function, which implies that every local minimum is aglobal minimum;
similarly, a linear function is a concave function, which implies that every local maximum is a global
maximum.An optimal solution need not exist, for two reasons. First, if two constraints are
inconsistent, then no feasible solution exists: For instance, the constraints x ≥ 2 and x ≤ 1 cannot be
satisfied jointly; in this case, we say that the LP is infeasible. Second, when the polytope is
unbounded in the direction of the gradient of the objective function (where the gradient of the
objective function is the vector of the coefficients of the objective function), then no optimal value is
attained.
The vertices of the polytope are also called basic feasible solutions. The reason for this choice of
name is as follows. Let d denote the number of variables. Then the fundamental theorem of linear
inequalities implies (for feasible problems) that for every vertex x* of the LP feasible region, there
exists a set of d (or fewer) inequality constraints from the LP such that, when we treat
those d constraints as equalities, the unique solution is x*. Thereby we can study these vertices by
means of looking at certain subsets of the set of all constraints (a discrete set), rather than the
continuum of LP solutions. This principle underlies the simplex algorithm for solving linear programs.
Algorithms
In a linear programming problem, a series of linear constraints produces a convex feasible region of possible
values for those variables. In the two-variable case this region is in the shape of a convex simple polygon.
In practice, the simplex algorithm is quite efficient and can be guaranteed to find the global optimum
if certain precautions againstcycling are taken. The simplex algorithm has been proved to solve
"random" problems efficiently, i.e. in a cubic number of steps,[11]which is similar to its behavior on
practical problems.[5][12]However, the simplex algorithm has poor worst-case behavior: Klee and Minty
constructed a family of linear programming problems for which the simplex method takes a number
of steps exponential in the problem size.[5][8][9] In fact, for some time it was not known whether the
linear programming problem was solvable in polynomial time, i.e. of complexity class P.
Criss-cross algorithm
Like the simplex algorithm of Dantzig, the criss-cross algorithm is a basis-exchange algorithm that
pivots between bases. However, the criss-cross algorithm need not maintain feasibility, but can pivot
rather from a feasible basis to an infeasible basis. The criss-cross algorithm does not
have polynomial time-complexity for linear programming. Both algorithms visit all 2D corners of a
(perturbed) cube in dimension D, the Klee–Minty cube, in the worst case.[10][13]
Interior point
Ellipsoid algorithm, following Khachiyan[edit]
Khachiyan's algorithm was of landmark importance for establishing the polynomial-time solvability of
linear programs. The algorithm was not a computational break-through, as the simplex method is
more efficient for all but specially constructed families of linear programs.However, Khachiyan's
algorithm inspired new lines of research in linear programming. In 1984, N. Karmarkar proposed
a projective method for linear programming. Karmarkar's algorithm improved on Khachiyan's worst-
case polynomial bound (giving ). Karmarkar claimed that his algorithm was much faster
in practical LP than the simplex method, a claim that created great interest in interior-point methods.
[14]
Path-following algorithmIn contrast to the simplex algorithm, which finds an optimal solution
by traversing the edges between vertices on a polyhedral set, interior-point methods move
through the interior of the feasible region. Since then, many interior-point methods have been
proposed and analyzed. Early successful implementations were based on affine
scaling variants of the method. For both theoretical and practical purposes, barrier
function or path-following methods have been the most popular since the 1990s.[15]
Comparison of interior-point methods versus simplex algorithms
The current opinion is that the efficiency of good implementations of simplex-based methods and
interior point methods are similar for routine applications of linear programming.[15] However, for
specific types of LP problems, it may be that one type of solver is better than another (sometimes
much better), and that the structure of the solutions generated by interior point methods versus
simplex-based methods are significantly different with the support set of active variables being
typically smaller for the later one.[16]
LP solvers are in widespread use for optimization of various problems in industry, such as
optimization of flow in transportation networks.[17]
Approximate Algorithms for Covering/Packing LPs
List of unsolved problems in computer science Covering and packing LPs can be solved
approximately in nearly-linear time. That is,
Does linear programming admit a strongly if matrix A is of dimension n×m and
polynomial-time algorithm? has N non-zero entries, then there exist
algorithms that run in time O(N·(log
N) /ε ) and produce O(1±ε) approximate solutions to given covering and packing LPs. The best
O(1) O(1)
known sequential algorithm of this kind runs in time O(N + (log N)·(n+m)/ε2),[18] and the best known
parallel algorithm of this kind runs in O((log N)2/ε3) iterations, each requiring only a matrix-vector
multiplication which is highly parallelizable.[19]
Does LP admit a polynomial algorithm in the real number (unit cost) model of computation?This
closely related set of problems has been cited by Stephen Smale as among the 18 greatest
unsolved problems of the 21st century. In Smale's words, the third version of the problem "is the
main unsolved problem of linear programming theory." While algorithms exist to solve linear
programming in weakly polynomial time, such as the ellipsoid methods and interior-point techniques,
no algorithms have yet been found that allow strongly polynomial-time performance in the number of
constraints and the number of variables. The development of such algorithms would be of great
theoretical interest, and perhaps allow practical gains in solving large LPs as well.
Although the Hirsch conjecture was recently disproved for higher dimensions, it still leaves the
following questions open.
Do all polytopal graphs have polynomially bounded diameter?These questions relate to the
performance analysis and development of Simplex-like methods. The immense efficiency of the
Simplex algorithm in practice despite its exponential-time theoretical performance hints that there
may be variations of Simplex that run in polynomial or even strongly polynomial timThe Simplex
algorithm and its variants fall in the family of edge-following algorithms, so named because they
solve linear programming problems by moving from vertex to vertex along edges of a polytope. This
means that their theoretical performance is limited by the maximum number of edges between any
two vertices on the LP polytope. As a result, we are interested in knowing the maximum graph-
theoretical diameter of polytopal graphs. It has been proved that all polytopes have subexponential
diameter. The recent disproof of the Hirsch conjecture is the first step to prove whether any polytope
has superpolynomial diameter. If any such polytopes exist, then no edge-following variant can run in
polynomial time. Questions about polytope diameter are of independent mathematical interest.
Simplex pivot methods preserve primal (or dual) feasibility. On the other hand, criss-cross pivot
methods do not preserve (primal or dual) feasibility—they may visit primal feasible, dual feasible or
primal-and-dual infeasible bases in any order. Pivot methods of this type have been studied since
the 1970s. Essentially, these methods attempt to find the shortest pivot path on the arrangement
polytope under the linear programming problem. In contrast to polytopal graphs, graphs of
arrangement polytopes are known to have small diameter, allowing the possibility of strongly
polynomial-time criss-cross pivot algorithm without resolving questions about the diameter of general
polytopes.[10]
Integer unknowns
If all of the unknown variables are required to be integers, then the problem is called an integer
programming (IP) or integer linear programming (ILP) problem. In contrast to linear programming,
which can be solved efficiently in the worst case, integer programming problems are in many
practical situations (those with bounded variables) NP-hard. 0-1 integer programming or binary
integer programming (BIP) is the special case of integer programming where variables are
required to be 0 or 1 (rather than arbitrary integers). This problem is also classified as NP-hard, and
in fact the decision version was one of Karp's 21 NP-complete problems.
If only some of the unknown variables are required to be integers, then the problem is called
a mixed integer programming (MIP) problem. These are generally also NP-hard because they are
even more general than ILP programs.
There are however some important subclasses of IP and MIP problems that are efficiently solvable,
most notably problems where the constraint matrix is totally unimodular and the right-hand sides of
the constraints are integers or - more general - where the system has the total dual integrality (TDI)
property.
cutting-plane method
if the problem has some extra structure, it may be possible to apply delayed column generation.Such
integer-programming algorithms are discussed by Padberg and in Beasley
A linear program in real variables is said to be integral if it has at least one optimal solution which is
integral. Likewise, a polyhedroger.
Integral linear programs are of central importance in the polyhedral aspect of combinatorial
optimization since they provide an alternate characterization of a problem. Specifically, for any
problem, the convex hull of the solutions is an integral polyhedron; if this polyhedron has a
nice/compact description, then we can efficiently find the optimal feasible solution under any linear
objective. Conversely, if we can prove that a linear programming relaxation is integral, then it is the
desired description of the convex hull of feasible (integral) solutions.
Note that terminology is not consistent throughout the literature, so one should be careful to
distinguish the following two concepts,
in an integer linear program, described in the previous section, variables are forcibly constrained to
be integers, and this problem is NP-hard in general,
MINTO (Mixed Integer Optimizer, an integer programming solver which uses branch and bound
algorithm) has publicly available source code[20] but is not open source.
Nonlinear programming
From Wikipedia, the free encyclopedia
Contents
1 Applicability
2 The general non-linear optimization problem (NLP)
3 Possible types of constraint set
4 Methods for solving the problem
5 Examples
o 5.1 2-dimensional example
o 5.2 3-dimensional example
6 Applications
7 See also
8 References
9 Further reading
10 External links
Applicability
A typical nonconvex problem is that of optimising transportation costs by selection from a set of
transportation methods, one or more of which exhibit economies of scale, with various connectivities
and capacity constraints. An example would be petroleum product transport given a selection or
combination of pipeline, rail tanker, road tanker, river barge, or coastal tankship. Owing to economic
batch size the cost functions may have discontinuities in addition to smooth changes.
Modern engineering practice involves much numerical optimization. Except in certain narrow but
important cases such as passive electronic circuits, engineering problems are non-linear, and they
are usually very complicated.
In experimental science, some simple data analysis (such as fitting a spectrum with a sum of peaks
of known location and shape but unknown magnitude) can be done with linear methods, but in
general these problems, also, are non-linear. Typically, one has a theoretical model of the system
under study with variable parameters in it and a model the experiment or experiments, which may
also have unknown parameters. One tries to find a best fit numerically. In this case one often wants
a measure of the precision of the result, as well as the best fit itself.
or
where
An infeasible problem is one for which no set of values for the choice variables satisfies all the
constraints. That is, the constraints are mutually contradictory, and no solution exists.
A feasible problem is one for which there exists at least one set of values for the choice variables
satisfying all the constraints.
An unbounded problem is a feasible problem for which the objective function can be made to exceed
any given finite value. Thus there is no optimal solution, because there is always a feasible solution
that gives a better objective function value than does any given proposed solution.
Several methods are available for solving nonconvex problems. One approach is to use special
formulations of linear programming problems. Another method involves the use ofbranch and
bound techniques, where the program is divided into subclasses to be solved with convex
(minimization problem) or linear approximations that form a lower bound on the overall cost within
the subdivision. With subsequent divisions, at some point an actual solution will be obtained whose
cost is equal to the best lower bound obtained for any of the approximate solutions. This solution is
optimal, although possibly not unique. The algorithm may also be stopped early, with the assurance
that the best possible solution is within a tolerance from the best point found; such points are called
ε-optimal. Terminating to ε-optimal points is typically necessary to ensure finite termination. This is
especially useful for large, difficult problems and problems with uncertain costs or values where the
uncertainty can be estimated with an appropriate reliability estimation.
Examples[edit]
2-dimensional example
The intersection of the line with the constrained space represents the solution. The line is the best
achievable contour line (locus with a given value of the objective function).
x1 ≥ 0
x2 ≥ 0
x12 + x22 ≥ 1
x12 + x22 ≤ 2
f(x) = x1 + x2
3-dimensional example[edit]
The intersection of the top surface with the constrained space in the center represents
the solution
x12 − x22 + x32 ≤ 2
x12 + x22 + x32 ≤ 10
f(x) = x1x2 + x2x3
Applications
Nonlinear optimization methods are used in engineering, for example to construct computational
models of oil reservoirs.[3]
Dynamic programming
Contents
1 Overview
o 1.1 Dynamic programming in mathematical optimization
o 1.2 Dynamic programming in bioinformatics
o 1.3 Dynamic programming in computer programming
2 Example: Economic optimization
o 2.1 Optimal consumption and saving
3 Examples: Computer algorithms
o 3.1 Dijkstra's algorithm for the shortest path problem
o 3.2 Fibonacci sequence
o 3.3 A type of balanced 0–1 matrix
o 3.4 Checkerboard
o 3.5 Sequence alignment
o 3.6 Tower of Hanoi puzzle
o 3.7 Egg dropping puzzle
3.7.1 Faster DP solution using a different parametrization
o 3.8 Matrix chain multiplication
4 History
5 Algorithms that use dynamic programming
6 See also
7 References
Overview
Figure 1. Finding the shortest path in a graph using optimal substructure; a straight line indicates a single
edge; a wavy line indicates a shortest path between the two vertices it connects (other nodes on these paths
are not shown); the bold line is the overall shortest path from start to goal.
If subproblems can be nested recursively inside larger problems, so that dynamic programming
methods are applicable, then there is a relation between the value of the larger problem and the
values of the subproblems.[2] In the optimization literature this relationship is called the Bellman
equation.
There are two key attributes that a problem must have in order for dynamic programming to be
applicable: optimal substructure and overlapping subproblems. If a problem can be solved by
combining optimal solutions to non-overlapping subproblems, the strategy is called "divide and
conquer" instead. This is why mergesort and quicksort are not classified as dynamic programming
problems.
Optimal substructure means that the solution to a given optimization problem can be obtained by the
combination of optimal solutions to its subproblems. Consequently, the first step towards devising a
dynamic programming solution is to check whether the problem exhibits such optimal substructure.
Such optimal substructures are usually described by means of recursion. For example, given a
graph G=(V,E), the shortest path p from a vertex u to a vertex v exhibits optimal substructure: take
any intermediate vertex w on this shortest path p. If p is truly the shortest path, then it can be split
into subpaths p1 from u to w and p2 from w to v such that these, in turn, are indeed the shortest paths
between the corresponding vertices (by the simple cut-and-paste argument described in Introduction
to Algorithms). Hence, one can easily formulate the solution for finding shortest paths in a recursive
manner, which is what the Bellman–Ford algorithm or the Floyd–Warshall algorithm does.
Overlapping subproblems means that the space of subproblems must be small, that is, any recursive
algorithm solving the problem should solve the same subproblems over and over, rather than
generating new subproblems. For example, consider the recursive formulation for generating the
Fibonacci series: Fi = Fi−1 + Fi−2, with base case F1 = F2 = 1. Then F43 = F42 + F41, and F42 = F41 + F40.
Now F41 is being solved in the recursive subtrees of both F43 as well as F42. Even though the total
number of subproblems is actually small (only 43 of them), we end up solving the same problems
over and over if we adopt a naive recursive solution such as this. Dynamic programming takes
account of this fact and solves each subproblem only once.
Figure 2. The subproblem graph for the Fibonacci sequence. The fact that it is not a treeindicates overlapping
subproblems.
This can be achieved in either of two ways.
Top-down approach: This is the direct fall-out of the recursive formulation of any problem. If the
solution to any problem can be formulated recursively using the solution to its subproblems, and
if its subproblems are overlapping, then one can easily memoize or store the solutions to the
subproblems in a table. Whenever we attempt to solve a new subproblem, we first check the
table to see if it is already solved. If a solution has been recorded, we can use it directly,
otherwise we solve the subproblem and add its solution to the table.
Bottom-up approach: Once we formulate the solution to a problem recursively as in terms of its
subproblems, we can try reformulating the problem in a bottom-up fashion: try solving the
subproblems first and use their solutions to build-on and arrive at solutions to bigger
subproblems. This is also usually done in a tabular form by iteratively generating solutions to
bigger and bigger subproblems by using the solutions to small subproblems. For example, if we
already know the values of F41 and F40, we can directly calculate the value of F42.
Let be consumption in period , and assume consumption yields utility as long
as the consumer lives. Assume the consumer is impatient, so that hediscounts future utility by a
factor each period, where . Let be capital in period . Assume initial capital is a
given amount , and suppose that this period's capital and consumption determine next
period's capital as , where is a positive constant and . Assume
capital cannot be negative. Then the consumer's decision problem can be written as follows:
subject to for all
Written this way, the problem looks complicated, because it involves solving for all the choice
variables . (Note that is not a choice variable—the consumer's initial
capital is taken as given.)
The dynamic programming approach to solving this problem involves breaking it apart into a
sequence of smaller decisions. To do so, we define a sequence of value functions ,
for which represent the value of having any amount of capital
at each time . Note that , that is, there is (by assumption) no utility from having
capital after death.
The value of any quantity of capital at any previous time can be calculated by backward
induction using the Bellman equation. In this problem, for each , the
Bellman equation is
This problem is much simpler than the one we wrote down before, because it involves only
two decision variables, and . Intuitively, instead of choosing his whole lifetime plan at
birth, the consumer can take things one step at a time. At time , his current capital is
given, and he only needs to choose current consumption and saving .
To actually solve this problem, we work backwards. For simplicity, the current level of capital
is denoted as . is already known, so using the Bellman equation once we can
calculate , and so on until we get to , which is the value of the initial decision
problem for the whole lifetime. In other words, once we know , we can
calculate , which is the maximum of ,
where is the choice variable and .
Working backwards, it can be shown that the value function at time is
We see that it is optimal to consume a larger fraction of current wealth as one gets older, finally
consuming all remaining wealth in period , the last period of life.
Problem 2. Find the path of minimum total length between two given nodes and .
We use the fact that, if is a node on the minimal path from to , knowledge of the latter
implies the knowledge of the minimal path from to .
Fibonacci sequence
Here is a naïve implementation of a function finding the nth member of the Fibonacci sequence,
based directly on the mathematical definition:
function fib(n)
if n <=1 return n
return fib(n − 1) + fib(n − 2)
Notice that if we call, say, fib(5) , we produce a call tree that calls the function on the same value
many different times:
fib(5)
fib(4) + fib(3)
In particular, fib(2) was calculated three times from scratch. In larger examples, many more
values of fib , or subproblems, are recalculated, leading to an exponential time algorithm.
Now, suppose we have a simple map object, m, which maps each value of fib that has already
been calculated to its result, and we modify our function to use it and update it. The resulting
function requires only O(n) time instead of exponential time (but requires O(n) space):
var m := map(0 → 0, 1 → 1)
function fib(n)
if key n is not in map m
m[n] := fib(n − 1) + fib(n − 2)
return m[n]
This technique of saving values that have already been calculated is called memoization; this is the
top-down approach, since we first break the problem into subproblems and then calculate and store
values.
In the bottom-up approach, we calculate the smaller values of fib first, then build larger values
from them. This method also uses O(n) time since it contains a loop that repeats n − 1 times, but it
only takes constant (O(1)) space, in contrast to the top-down approach which requires O(n) space to
store the map.
function fib(n)
if n = 0
return 0
else var previousFib := 0, currentFib := 1
repeat n − 1 times // loop is skipped if n = 1
var newFib := previousFib + currentFib
previousFib := currentFib
currentFib := newFib
return currentFib
In both examples, we only calculate fib(2) one time, and then use it to calculate
both fib(4) and fib(3) , instead of computing it every time either of them is evaluated.
Note that the above method actually takes time for large n because addition of two integers
with bits each takes time. (The n fibonacci number has
th
bits.) Also, there is a
closed form for the Fibonacci sequence, known as Binet's formula, from which the -th term can
be computed in approximately time, which is more efficient than the above dynamic
programming technique. However, the simple recurrence directly gives the matrix form that leads to
an approximately algorithm by fast matrix exponentiation.
There are at least three possible approaches: brute force, backtracking, and dynamic programming.
Brute force consists of checking all assignments of zeros and ones and counting those that have
balanced rows and columns (n / 2 zeros and n / 2 ones). As there are possible assignments,
this strategy is not practical except maybe up to .
Backtracking for this problem consists of choosing some order of the matrix elements and
recursively placing ones or zeros, while checking that in every row and column the number of
elements that have not been assigned plus the number of ones or zeros are both at least n / 2.
While more sophisticated than brute force, this approach will visit every solution once, making it
impractical for n larger than six, since the number of solutions is already 116,963,796,250 for n = 10,
as we shall see.
Dynamic programming makes it possible to count the number of solutions without visiting them all.
Imagine backtracking values for the first row – what information would we require about the
remaining rows, in order to be able to accurately count the solutions obtained for each first row
value? We consider k × n boards, where 1 ≤ k ≤ n, whose rows contain zeros and
ones. The function f to which memoization is applied maps vectors of n pairs of integers to the
number of admissible boards (solutions). There is one pair for each column, and its two components
indicate respectively the number of zeros and ones that have yet to be placed in that column. We
seek the value of ( arguments or one vector
of elements). The process of subproblem creation involves iterating over every one of
possible assignments for the top row of the board, and going through every column, subtracting one
from the appropriate element of the pair for that column, depending on whether the assignment for
the top row contained a zero or a one at that position. If any one of the results is negative, then the
assignment is invalid and does not contribute to the set of solutions (recursion stops). Otherwise, we
have an assignment for the top row of the k × n board and recursively compute the number of
solutions to the remaining(k − 1) × n board, adding the numbers of solutions for every admissible
assignment of the top row and returning the sum, which is being memoized. The base case is the
trivial subproblem, which occurs for a 1 × n board. The number of solutions for this board is either
zero or one, depending on whether the vector is a permutation of n / 2 andn / 2 pairs
or not.For example, in the first two boards shown above the sequences of vectors would be
((2, 2) (2, 2) (2, 2) (2, 2)) ((2, 2) (2, 2) (2, 2) (2, 2))
k = 4
0 1 0 1 0 0 1 1
((1, 2) (2, 1) (1, 2) (2, 1)) ((1, 2) (1, 2) (2, 1) (2, 1))
k = 3
1 0 1 0 0 0 1 1
((1, 1) (1, 1) (1, 1) (1, 1)) ((0, 2) (0, 2) (2, 0) (2, 0))
k = 2
0 1 0 1 1 1 0 0
((0, 1) (1, 0) (0, 1) (1, 0)) ((0, 1) (0, 1) (1, 0) (1, 0))
k = 1
1 0 1 0 1 1 0 0
((0, 0) (0, 0) (0, 0) (0, 0)) ((0, 0) (0, 0), (0, 0) (0, 0))
Checkerboard
Consider a checkerboard with n × n squares and a cost-function c(i, j) which returns a cost
associated with square i, j (i being the row, j being the column). For instance (on a 5 × 5
checkerboard),
5 6 7 4 7 8
4 7 6 1 1 4
3 3 5 7 8 2
2 – 6 7 0 –
1 – – *5* – –
1 2 3 4 5
Thus c(1, 3) = 5
Let us say you had a checker that could start at any square on the first rank (i.e., row) and you
wanted to know the shortest path (sum of the costs of the visited squares are at a minimum) to get to
the last rank, assuming the checker could move only diagonally left forward, diagonally right forward,
or straight forward. That is, a checker on (1,3) can move to (2,2), (2,3) or (2,4).
5
4
2 x x x
1 o
1 2 3 4 5
This problem exhibits optimal substructure. That is, the solution to the entire problem relies on
solutions to subproblems. Let us define a function q(i, j) as
If we can find the values of this function for all the squares at rank n, we pick the minimum and
follow that path backwards to get the shortest path.
Note that q(i, j) is equal to the minimum cost to get to any of the three squares below it (since
those are the only squares that can reach it) plus c(i, j). For instance:
4 A
3 B C D
1
1 2 3 4 5
Now, let
The first line of this equation is there to make the recursive property simpler (when
dealing with the edges, so we need only one recursion). The second line says what
happens in the last rank, to provide a base case. The third line, the recursion, is the
important part. It is similar to the A,B,C,D example. From this definition we can make a
straightforward recursive code for q(i, j). In the following pseudocode, n is the size of the
board, c(i, j) is the cost-function, and min() returns the minimum of a number of
values:
function minCost(i, j)
if j < 1 or j > n
return infinity
else if i = 1
return c(i, j)
else
return min( minCost(i-1, j-1), minCost(i-1, j),
minCost(i-1, j+1) ) + c(i, j)
It should be noted that this function only computes the path-cost, not the actual path.
We will get to the path soon. This, like the Fibonacci-numbers example, is horribly slow
since it wastes time recomputing the same shortest paths over and over. However, we
can compute it much faster in a bottom-up fashion if we store path-costs in a two-
dimensional array q[i, j] rather than using a function. This avoids recomputation;
before computing the cost of a path, we check the array q[i, j] to see if the path cost
is already there.
We also need to know what the actual shortest path is. To do this, we use another
array p[i, j] , a predecessor array. This array implicitly stores the path to any
square s by storing the previous node on the shortest path to s, i.e. the predecessor. To
reconstruct the path, we lookup the predecessor of s, then the predecessor of that
square, then the predecessor of that square, and so on, until we reach the starting
square. Consider the following code:
function computeShortestPathArrays()
for x from 1 to n
q[1, x] := c(1, x)
for y from 1 to n
q[y, 0] := infinity
q[y, n + 1] := infinity
for y from 2 to n
for x from 1 to n
m := min(q[y-1, x-1], q[y-1, x], q[y-1, x+1])
q[y, x] := m + c(y, x)
if m = q[y-1, x-1]
p[y, x] := -1
else if m = q[y-1, x]
p[y, x] := 0
else
p[y, x] := 1
Now the rest is a simple matter of finding the minimum and printing it.
function computeShortestPath()
computeShortestPathArrays()
minIndex := 1
min := q[n, 1]
for i from 2 to n
if q[n, i] < min
minIndex := i
min := q[n, i]
printPath(n, minIndex)
function printPath(y, x)
print(x)
print("<-")
if y = 2
print(x + p[y, x])
else
printPath(y-1, x + p[y, x])
Sequence alignment
The problem can be stated naturally as a recursion, a sequence A is optimally edited into a
sequence B by either:
1. inserting the first character of B, and performing an optimal alignment of A and the tail of B
2. deleting the first character of A, and performing the optimal alignment of the tail of A and B
3. replacing the first character of A with the first character of B, and performing optimal
alignments of the tails of A and B.
The partial alignments can be tabulated in a matrix, where cell (i,j) contains the cost of the optimal
alignment of A[1..i] to B[1..j]. The cost in cell (i,j) can be calculated by adding the cost of the relevant
operations to the cost of its neighboring cells, and selecting the optimum.
The objective of the puzzle is to move the entire stack to another rod, obeying the following rules:
where n denotes the number of disks to be moved, h denotes the home rod, t denotes the target
rod, not(h,t) denotes the third rod (neither h nor t), ";" denotes concatenation, and
S(n, h, t) := solution to a problem consisting of n disks that are to be moved from rod h to rod
t.
Note that for n=1 the problem is trivial, namely S(1,h,t) = "move a disk from rod h to rod t"
(there is only one disk left).
The number of moves required by this solution is 2n − 1. If the objective is to maximize the
number of moves (without cycling) then the dynamic programming functional equation is
slightly more complicated and 3n − 1 moves are required.[12]
The following is a description of the instance of this famous puzzle involving n=2 eggs and a building
with H=36 floors:[13]
Suppose that we wish to know which stories in a 36-story building are safe to drop eggs
from, and which will cause the eggs to break on landing (using U.S. Englishterminology, in
which the first floor is at ground level). We make a few assumptions:
For instance, s = (2,6) indicates that two test eggs are available and 6 (consecutive)
floors are yet to be tested. The initial state of the process is s = (N,H)
where N denotes the number of test eggs available at the commencement of the
experiment. The process terminates either when there are no more test eggs (n = 0)
or when k = 0, whichever occurs first. If termination occurs at state s = (0,k)
and k > 0, then the test failed.
Now, let
W(n,k) = minimum number of trials required to identify the value of the critical floor under the
worst-case scenario given that the process is in state s = (n,k).
with W(n,1) = 1 for all n > 0 and W(1,k) = k for all k. It is easy to solve this equation iteratively by
systematically increasing the values of n and k.
An interactive online facility is available for experimentation with this model as well as with other
versions of this puzzle (e.g. when the objective is to minimize the expected value of the number of
trials.)[14]
Notice that the above solution takes time with a DP solution. This can be improved
to time by binary searching on the optimal in the above recurrence,
since is increasing in while is decreasing in , thus a local
minimum of is a global minimum. Also, by storing
the optimal for each cell in the DP table and referring to its value for the previous cell, the
optimal for each cell can be found in constant time, improving it to time. However, there
is an even faster solution that involves a different parametrization of the problem:
Let be the total number of floors such that the eggs break when dropped from the th floor (The
example above is equivalent to taking ).
Let be the minimum floor from which the egg must be dropped to be broken.
Let be the maximum number of values of that are distinguishable using tries and
eggs.
Let be the floor from which the first egg is dropped in the optimal strategy.
If the first egg broke, is from to and distinguishable using at most tries and
eggs.
If the first egg did not break, is from to and distinguishable using tries and
eggs.
Therefore .
Thus, if we separately handle the case of , the algorithm would take time.
But the recurrence relation can in fact be solved, giving , which can be
Since for all , we can binary search on to find , giving
an algorithm.
and so on. There are numerous ways to multiply this chain of matrices. They will all
produce the same final result, however they will take more or less time to compute,
based on which particular matrices are multiplied. If matrix A has dimensions m×n and
matrix B has dimensions n×q, then matrix C=A×B will have dimensions m×q, and will
require m*n*q scalar multiplications (using a simplistic matrix multiplication algorithm for
purposes of illustration).
For example, let us multiply matrices A, B and C. Let us assume that their dimensions
are m×n, n×p, and p×s, respectively. Matrix A×B×C will be of size m×s and can be
calculated in two ways shown below:
1. Ax(B×C) This order of matrix multiplication will require nps + mns scalar
multiplications.
2. (A×B)×C This order of matrix multiplication will require mnp + mps scalar
calculations.
Let us assume that m = 10, n = 100, p = 10 and s = 1000. So, the first way to multiply
the chain will require 1,000,000 + 1,000,000 calculations. The second way will require
only 10,000+100,000 calculations. Obviously, the second way is faster, and we should
multiply the matrices using that arrangement of parenthesis.
Therefore, our conclusion is that the order of parenthesis matters, and that our task is to
find the optimal order of parenthesis.
Let's call m[i,j] the minimum number of scalar multiplications needed to multiply a chain
of matrices from matrix i to matrix j (i.e. Ai × .... × Aj, i.e. i<=j). We split the chain at some
matrix k, such that i <= k < j, and try to find out which combination produces minimum
m[i,j].
This formula can be coded as shown below, where input parameter "chain" is the chain
of matrices, i.e. :
function OptimalMatrixChainParenthesis(chain)
n = length(chain)
for i = 1, n
m[i,i] = 0 //since it takes no calculations to
multiply one matrix
for len = 2, n
for i = 1, n - len + 1
for j = i, len -1
m[i,j] = infinity //so that the first
calculation updates
for k = i, j-1
So far, we have calculated values for all possible m[i, j], the minimum number of
calculations to multiply a chain from matrix i to matrix j, and we have recorded the
corresponding "split point"s[i, j]. For example, if we are multiplying chain A1×A2×A3×A4,
and it turns out that m[1, 3] = 100 and s[1, 3] = 2, that means that the optimal
placement of parenthesis for matrices 1 to 3 is (A1×A2)×A3 and to multiply those
matrices will require 100 scalar calculation.
This algorithm will produce "tables" m[, ] and s[, ] that will have entries for all possible
values of i and j. The final solution for the entire chain is m[1, n], with corresponding split
at s[1, n]. Unraveling the solution will be recursive, starting from the top and continuing
until we reach the base case, i.e. multiplication of single matrices.
Therefore, the next step is to actually split the chain, i.e. to place the parenthesis where
they (optimally) belong. For this purpose we could use the following algorithm:
function PrintOptimalParenthesis(s, i, j)
if i = j
print "A"i
else
print "(" PrintOptimalParenthesis(s, i, s[i, j])
PrintOptimalParenthesis(s, s[i, j] + 1, j) ")"
Of course, this algorithm is not useful for actual multiplication. This algorithm is just a
user-friendly way to see what the result looks like.
To actually multiply the matrices using the proper splits, we need the following
algorithm:
function MatrixChainMultiply(chain from 1 to n) //
returns the final matrix, i.e. A1×A2×... ×An
OptimalMatrixChainParenthesis(chain from 1 to n) // this
will produce s[ . ] and m[ . ] "tables"
OptimalMatrixMultiplication(s, chain from 1 to n) //
actually multiply
function OptimalMatrixMultiplication(s, i, j) // returns
the result of multiplying a chain of matrices from Ai to Aj in
optimal way
if i < j
// keep on splitting the chain and multiplying the
matrices in left and right sides
LeftSide = OptimalMatrixMultiplication(s, i, s[i, j])
RightSide = OptimalMatrixMultiplication(s, s[i, j] + 1, j)
return MatrixMultiply(LeftSide, RightSide)
else if i = j
return Ai // matrix at position i
else
print "error, i <= j must hold"
References
1. Jump up^ S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani, 'Algorithms', p 173, available
athttp://www.cs.berkeley.edu/~vazirani/algorithms.html
2. Jump up^ Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C. (2001), Introduction to Algorithms
(2nd ed.), MIT Press & McGraw–Hill, ISBN 0-262-03293-7 . pp. 327–8.
3. Jump up^ DeLisi, Biopolymers, 1974, Volume 13, Issue 7, pages 1511–1512, July 1974
4. Jump up^ Gurskiĭ GV, Zasedatelev AS, Biofizika, 1978 Sep-Oct;23(5):932-46
5. Jump up^ "M. Memo". J Vocabulary. J Software. Retrieved 28 October 2011.
6. Jump up^ Stokey et al., 1989, Chap. 1
7. Jump up^ Sniedovich, M. (2006), "Dijkstra’s algorithm revisited: the dynamic programming
connexion" (PDF), Journal of Control and Cybernetics 35 (3): 599–620. Online version of the paper
with interactive computational modules.
8. Jump up^ Denardo, E.V. (2003), Dynamic Programming: Models and Applications, Mineola,
NY:Dover Publications, ISBN 978-0-486-42810-9
9. Jump up^ Sniedovich, M. (2010), Dynamic Programming: Foundations and Principles, Taylor &
Francis, ISBN 978-0-8247-4099-3
10. Jump up^ Dijkstra 1959, p. 270
11. ^ Jump up to:a b Eddy, S. R., What is dynamic programming?, Nature Biotechnology, 22, 909–910
(2004).
12. Jump up^ Moshe Sniedovich (2002), "OR/MS Games: 2. The Towers of Hanoi Problem,",INFORMS
Transactions on Education 3 (1): 34–51.
13. Jump up^ Konhauser J.D.E., Velleman, D., and Wagon, S. (1996). Which way did the Bicycle
Go? Dolciani Mathematical Expositions – No 18. The Mathematical Association of America.
14. ^ Jump up to:a b Sniedovich, M. (2003). The joy of egg-dropping in Braunschweig and Hong Kong.
INFORMS Transactions on Education, 4(1) 48–64.
15. Jump up^ Dean Connable Wills, Connections between combinatorics of permutations and algorithms
and geometry
16. Jump up^ http://www.wu-wien.ac.at/usr/h99c/h9951826/bellman_dynprog.pdf
17. Jump up^ Nocedal, J.; Wright, S. J.: Numerical Optimization, page 9, Springer, 2006..
Teoria stocurilor
Urmărește pagina
În cercetarea operațională, teoria stocurilor studiază, cu ajutorul unor modele matematice,
procesele economice de stocare în vederea adoptării unei decizii de maximă eficiență economică.
Fără a fi o teorie matematică propriu-zisă (deoarece nu există o axiomatizare specifică și nici
teoreme proprii), teoria stocurilor studiază cele mai reprezentative modele din această categorie.
De asemenea, își propune formularea unor metode de cercetare aplicabile tuturor modelelor
particulare de stocare.
Pentru funcțiile dublu diferențiabile, problemele fără limitări de posibilități se pot rezolva
găsind punctele în care panta funcției obiectiv este 0 (acestea sunt punctele staționare)
și folosind matricea Hessiană pentru a clasifica tipul fiecărui punct. Dacă este pozitiv
definită, punctul este un minim local, dacă este negativă, un maxim local, iar dacă este
nedefinită, un punct de șa.
Se poate găsi acel punct staționar pornind de la o bănuială despre el, apoi ajungând la
el printr-una din metodele:
descreșterea pantei
metoda lui Newton
conjugatul pantei
căutarea liniară
Dacă funcția este convexă în regiunea de interes, atunci orice minim local este și global.
Există metode rapide pentru optmizarea funcțiilor dublu diferențiabile convexe.
Problemele cu limitări de situații pot fi transformate în probleme fără limitări cu ajutorul
multiplicatorilor lui Lagrange.
algoritmi genetici
strategie de evoluție
evoluție diferențială
SQP methods solve a sequence of optimization subproblems, each of which optimizes a quadratic
model of the objective subject to a linearization of the constraints. If the problem is unconstrained,
then the method reduces to Newton's method for finding a point where the gradient of the objective
vanishes. If the problem has only equality constraints, then the method is equivalent to
applying Newton's method to the first-order optimality conditions, or Karush–Kuhn–Tucker
conditions, of the problem. SQP methods have been implemented in many packages,
including NPSOL, SNOPT, NLPQL, OPSYC, OPTIMA, MATLAB, GNU Octave and SQP.
Algorithm basics[edit]
Consider a nonlinear programming problem of the form:
Note that the term in the expression above may be left out for the minimization
problem, since it is constant.
Gradient descent
For the analytical method called "steepest descent", see Method of steepest descent.
contents
1 Description
o 1.1 Examples
o 1.2 Limitations
2 Solution of a linear system
3 Solution of a non-linear system
4 Comments
5 A computational example
6 Extensions
o 6.1 Fast proximal gradient method
o 6.2 The momentum method
7 See also
8 References
Description
Illustration of gradient descent.
for small enough, then . With this observation in mind, one starts with a
guess for a local minimum of , and considers the sequence such that
We have
so hopefully the sequence converges to the desired local minimum. Note that the value of
the step size is allowed to change at every iteration. With certain assumptions on the function
(for example, convex and Lipschitz) and particular choices of (e.g., chosen via a line
search that satisfies the Wolfe conditions), convergence to a local minimum can be guaranteed.
When the function is convex, all local minima are also global minima, so in this case gradient
descent can converge to the global solution.
This process is illustrated in the picture to the right. Here is assumed to be defined on the plane,
and that its graph has a bowl shape. The blue curves are the contour lines, that is, the regions on
which the value of is constant. A red arrow originating at a point shows the direction of the
negative gradient at that point. Note that the (negative) gradient at a point is orthogonal to the
contour line going through that point. We see that gradient descent leads us to the bottom of the
bowl, that is, to the point where the value of the function is minimal.
Examples
Gradient descent has problems with pathological functions such as the Rosenbrock function shown
here.
The Rosenbrock function has a narrow curved valley which contains the minimum. The bottom
of the valley is very flat. Because of the curved flat valley the optimization is zig-zagging slowly
with small stepsizes towards the minimum.
The "Zig-Zagging" nature of the method is also evident below, where the gradient ascent
In traditional linear least squares for real and the Euclidean norm is used, in which case
In this case, the line search minimization, finding the locally optimal step size on every iteration,
can be performed analytically, and explicit formulas for the locally optimal are known.[2]
For solving linear equations, gradient descent is rarely used, with the conjugate gradient
method being one of the most popular alternatives. The speed of convergence of gradient descent
depends on the maximal and minimal eigenvalues of , while the speed of convergence
of conjugate gradients has a more complex dependence on the eigenvalues, and can benefit
from preconditioning. Gradient descent also benefits from preconditioning, but this is not done as
commonly.
where
We know that
where
The Jacobian matrix
and
So that
and
An animation showing the first 83 iterations of gradient descent applied to this example. Surfaces
are isosurfaces of at current guess , and arrows show the direction of descent. Due to a small
and constant step size, the convergence is slow.
Now a suitable must be found such that . This can be done with any of a
variety of line search algorithms. One might also simply guess which gives
Comments[edit]
Gradient descent works in spaces of any number of dimensions, even in infinite-dimensional ones.
In the latter case the search space is typically a function space, and one calculates the Gâteaux
derivative of the functional to be minimized to determine the descent direction.[3]
The gradient descent can take many iterations to compute a local minimum with a
required accuracy, if the curvaturein different directions is very different for the given function. For
such functions, preconditioning, which changes the geometry of the space to shape the function
level sets likeconcentric circles, cures the slow convergence. Constructing and applying
preconditioning can be computationally expensive, however.
The gradient descent can be combined with a line search, finding the locally optimal step size on
every iteration. Performing the line search can be time-consuming. Conversely, using a fixed small
can yield poor convergence.
A computational example
The gradient descent algorithm is applied to find a local minimum of the function f(x)=x4−3x3+2, with
derivative f'(x)=4x3−9x2. Here is an implementation in the Python programming language.
# From calculation, we expect that the local minimum occurs at x=9/4
x_old = 0
x_new = 6 # The algorithm starts at x=6
gamma = 0.01 # step size
precision = 0.00001
def f_derivative(x):
return 4 * x**3 - 9 * x**2
The above piece of code has to be modified with regard to step size according to the system at hand
and convergence can be made faster by using an adaptive step size. In the above case the step size
is not adaptive. It stays at 0.01 in all the directions which can sometimes cause the method to fail by
diverging from the minimum.
Extensions
Gradient descent can be extended to handle constraints by including a projection onto the set of
constraints. This method is only feasible when the projection is efficiently computable on a computer.
Under suitable assumptions, this method converges. This method is a specific case of the forward-
backward algorithm for monotone inclusions (which includes convex programming and variational
inequalities).[6]
References[edit]
Mordecai Avriel (2003). Nonlinear Programming: Analysis and Methods. Dover Publishing. ISBN
0-486-43227-0.
Jan A. Snyman (2005). Practical Mathematical Optimization: An Introduction to Basic
Optimization Theory and Classical and New Gradient-Based Algorithms. Springer
Publishing. ISBN 0-387-24348-8
Cauchy, Augustin (1847). Méthode générale pour la résolution des systèmes d'équations
simultanées. pp. 536–538.
1. Jump up^ Kiwiel, Krzysztof C. (2001). "Convergence and efficiency of subgradient methods for
quasiconvex minimization". Mathematical Programming (Series A) 90 (1) (Berlin, Heidelberg:
Springer). pp. 1–25. doi:10.1007/PL00011414. ISSN 0025-5610. MR 1819784.
2. Jump up^ Yuan, Ya-xiang (1999). "Step-sizes for the gradient method" (PDF). AMS/IP Studies in
Advanced Mathematics (Providence, RI: American Mathematical Society) 42 (2): 785.
3. Jump up^ G. P. Akilov, L. V. Kantorovich, Functional Analysis, Pergamon Pr; 2 Sub edition,ISBN 0-
08-023036-9, 1982
4. Jump up^ W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, Numerical Recipes in C:
The Art of Scientific Computing, 2nd Ed., Cambridge University Press, New York, 1992
5. Jump up^ T. Strutz: Data Fitting and Uncertainty (A practical introduction to weighted least squares
and beyond). Vieweg+Teubner, Wiesbaden 2011, ISBN 978-3-8348-1022-9.
6. Jump up^ P. L. Combettes and J.-C. Pesquet, "Proximal splitting methods in signal processing",
in: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, (H. H. Bauschke, R. S.
Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, Editors), pp. 185-212. Springer,
New York, 2011.
7. Jump up^ Yu. Nesterov, "Introductory Lectures on Convex Optimization. A Basic Course" (Springer,
2004, ISBN 1-4020-7553-7)
8. Jump up^ Fast Gradient Methods, lecture notes by Prof. Lieven Vandenberghe for EE236C at UCLA
9. Jump up^ Qian, Ning (January 1999). "On the momentum term in gradient descent learning
algorithms" (PDF). Neural Networks 12 (1): 145–151. Retrieved 17 October 2014.
10. Jump up^ "Momentum and Learning Rate Adaptation". Willamette University. Retrieved 17
October 2014.
11. Jump up^ Geoffrey Hinton; Nitish Srivastava; Kevin Swersky. "6 - 3 - The momentum
method". YouTube. Retrieved 18 October 2014. Part of a lecture series for the Coursera online
course Neural Networks for Machine Learning.
contents
-norm is defined as ( )
However beyond such problems projection operators are not appropriate and more general
operators are required to tackle them. Among the various generalizations of the notion of a convex
projection operator that exist, proximity operators are best suited for other purposes.
Definition
Proximity operators of function at is defined as
Examples
Special instances of Proximal Gradient Methods are
Projected Landweber
Alternating projection
Alternating-direction method of multipliers
Landweber iteration
The Landweber iteration or Landweber algorithm is an algorithm to solve ill-posed linear inverse
problems, and it has been extended to solve non-linear problems that involve constraints. The
method was first proposed in the 1950s,[1] and it can be now viewed as a special case of many other
more general methods.[2]
Contents
1 Basic algorithm
2 Nonlinear extension
3 Extension to constrained problems
4 Applications
5 References
Basic algorithm
The original Landweber algorithm [1] attempts to recover a signal x from measurements y. The linear
version assumes that for a linear operator A. When the problem is in
finite dimensions, A is just a matrix.
where the relaxation factor satisfies . Here is the largest singular value of
. If we write , then the update can be written in terms of the gradient
and hence the algorithm is a special case of gradient descent.
Discussion of the Landweber iteration as a regularization algorithm can be found in.[3][4]
Nonlinear extension
In general, the updates generated by will generate a sequence
that converges to a minimizer of f whenever f is convex and the stepsize is chosen such
that where is the spectral norm.
Since this is special type of gradient descent, there currently is not much benefit to analyzing it on its
own as the nonlinear Landweber, but such analysis was performed historically by many communities
not aware of unifying frameworks.
The nonlinear Landweber problem has been studied in many papers in many communities; see, for
example,.[5]
Applications
Since the method has been around since the 1950s, it has been adopted and rediscovered by many
scientific communities, especially those studying ill-posed problems. In X-ray computed
tomography it is called SIRT - simultaneous iterative reconstruction technique. It has also been used
in the computer vision community[7] and the signal restoration community.[8] It is also used in image
processing, since many image problems, such as deconvolution, are ill-posed. Variants of this
method have been used also in sparse approximation problems and compressed sensing settings.
Contents
1 Algorithm
2 Related algorithms
3 Further reading
4 References
Algorithm[edit]
where C and D are closed convex sets.
To use the POCS algorithm, one must know how to project onto the sets C and D separately.
The algorithm starts with an arbitrary value for and then generates the sequence
Related algorithms
Example of averaged projections variant
The method of averaged projections is quite similar. For the case of two closed convex
sets C and D, it proceeds by
to .
Augmented Lagrangian method
Augmented Lagrangian methods are a certain class of algorithms for
solving constrained optimization problems. They have similarities to penalty methods in that they
replace a constrained optimization problem by a series of unconstrained problems and add a penalty
term to the objective; the difference is that the augmented Lagrangian method adds yet another
term, designed to mimic a Lagrange multiplier. The augmented Lagrangian is not the same as
the method of Lagrange multipliers.
Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an
additional penalty term (the augmentation).
The method was originally known as the method of multipliers, and was studied much in the 1970
and 1980s as a good alternative to penalty methods. It was first discussed byMagnus Hestenes in
1969[1] and by Powell in 1969.[2] The method was studied by R. Tyrrell Rockafellar in relation
to Fenchel duality, particularly in relation to proximal-point methods, Moreau–Yosida regularization,
and maximal monotone operators: These methods were used in structural optimization. The method
was also studied by Dimitri Bertsekas, notably in his 1982 book,[3] together with extensions involving
nonquadratic regularization functions, such as entropic regularization, which gives rise to the
"exponential method of multipliers," a method that handles inequality constraints with a twice
differentiable augmented Lagrangian function.
Contents
1 General method
2 Comparison with penalty methods
3 Alternating direction method of multipliers
4 Stochastic Optimization
5 Software
6 See also
7 References
8 Bibliography
General method[edit]
Let us say we are solving the following constrained problem:
subject to
This problem can be solved as a series of unconstrained minimization problems. For reference, we
first list the penalty method approach:
The penalty method solves this problem, then at the next iteration it re-solves the problem using a
larger value of (and using the old solution as the initial guess or "warm-start").
and after each iteration, in addition to updating , the variable is also updated according to the
rule
where is the solution to the unconstrained problem at the kth step, i.e.
The variable is an estimate of the Lagrange multiplier, and the accuracy of this estimate improves
at every step. The major advantage of the method is that unlike the penalty method, it is not
necessary to take in order to solve the original constrained problem. Instead, because of
the presence of the Lagrange multiplier term, can stay much smaller.
The method can be extended to handle inequality constraints. For a discussion of practical
improvements, see.[4]
Though this change may seem trivial, the problem can now be attacked using methods of
constrained optimization (in particular, the augmented Lagrangian method), and the objective
function is separable in x and y. The dual update requires solving a proximity function in x and y at
the same time; the ADMM technique allows this problem to be solved approximately by first solving
for x with y fixed, and then solving for y with x fixed. Rather than iterate until convergence (like
the Jacobi method), the algorithm proceeds directly to updating the dual variable and then repeating
the process. This is not equivalent to the exact minimization, but surprisingly, it can still be shown
that this method converges to the right answer (under some assumptions). Because of this
approximation, the algorithm is distinct from the pure augmented Lagrangian method.
The ADMM can be viewed as an application of the Douglas-Rachford splitting algorithm, and the
Douglas-Rachford algorithm is in turn an instance of the Proximal point algorithm; details can be
found here.[5] There are several modern software packages that solve Basis pursuit and variants and
use the ADMM; such packages include YALL1(2009), SpaRSA (2009) and SALSA (2009).
Stochastic Optimization
Stochastic optimization considers the problem of minimizing a loss function with access to noisy
samples of (gradient of) the function. The goal is to have an estimate of the optimal parameter
(minimizer) per new sample. ADMM is originally a batch method. However, with some modifications
it can also be used for stochastic optimization. Since in stochastic setting we only have access to
noisy samples of gradient, we use an inexact approximation of the Lagrangian as
The alternating direction method of multipliers (ADMM) is a popular method for online and distributed
optimization on a large scale,[7] and is employed in many applications, e.g.[8][9][10] ADMM is often
applied to solve regularized problems, where the function optimization and regularization can be
carried out locally, and then coordinated globally via constraints. Regularized optimization problems
are especially relevant in the high dimensional regime since regularization is a natural mechanism to
overcome ill-posedness and to encourage parsimony in the optimal solution, e.g., sparsity and low
rank. Due to the efficiency of ADMM in solving regularized problems, it has a good potential for
stochastic optimization in high dimensions. However, conventional stochastic ADMM methods suffer
from curse of dimensionality. Their convergence rate is proportional to square of the dimension and
in practice they scale poorly. See figure REASON vs Stochastic ADMM
Recently, a general framework has been proposed for stochastic optimization in high-dimensions
that solves this bottleneck by adding simple and cheap modifications to ADMM.,[11][12] The method is
called REASON (Regularized Epoch-based Admm for Stochastic Optimization in high-dimensioN).
The modifications are in terms of added projection which goes a long way and results in logarithmic
dimension dependency. REASON can be performed on any regularized optimization with any
number of regularizers. The specific cases of sparse optimization framework and noisy
decomposition framework are discussed further. In both cases, REASON obtains minimax optimal
convergence rate. REASON provides the first online guarantees for noisy matrix decomposition.
Experiment results show that in aforementioned cases, REASON outperforms state-of-the-art.
Convex optimization
Convex minimization, a subfield of optimization, studies the problem of minimizing convex
functions over convex sets. The convexity property can make optimization in some sense "easier"
than the general case - for example, any local minimum must be a global minimum.
defined on a convex subset of , the problem is to find any point in for which the
number is smallest, i.e., a point such that
for all .
The convexity of makes the powerful tools of convex analysis applicable. In finite-
dimensional normed spaces, the Hahn–Banach theorem and the existence
of subgradientslead to a particularly satisfying theory of necessary and sufficient
conditions for optimality, a duality theory generalizing that for linear programming, and
effective computational methods.
Contents
[hide]
Theory
The following statements are true about the convex minimization problem:
These results are used by the theory of convex minimization along with geometric notions
from functional analysis (in Hilbert spaces) such as the Hilbert projection theorem, theseparating
hyperplane theorem, and Farkas' lemma.
Standard form
Standard form is the usual and most intuitive form of describing a convex minimization problem. It
consists of the following three parts:
Note that every equality constraint can be equivalently replaced by a pair of inequality
constraints and . Therefore, for theoretical purposes, equality constraints
are redundant; however, it can be beneficial to treat them specially in practice.
Following from this fact, it is easy to understand why has to be affine as opposed to
merely being convex. If is convex, is convex, but is concave.
Therefore, the only way for to be convex is for to be affine.
Examples
The following problems are all convex minimization problems, or can be transformed into convex
minimizations problems via a change of variables:
Least squares
Linear programming
Convex quadratic minimization with linear constraints
quadratic minimization with convex quadratic constraints
Conic optimization
Geometric programming
Second order cone programming
Semidefinite programming
Entropy maximization with appropriate constraints
Lagrange multipliers
Consider a convex minimization problem given in standard form by a cost function and
inequality constraints , where . Then the domain is:
For each point x in X that minimizes f over X, there exist real numbers λ0, ..., λm, called Lagrange
multipliers, that satisfy these conditions simultaneously:
Methods
Convex minimization problems can be solved by the following contemporary methods:[4]
Subgradient methods can be implemented simply and so are widely used.[5] Dual subgradient
methods are subgradient methods applied to a dual problem. The drift-plus-penaltymethod is similar
to the dual subgradient method, but takes a time average of the primal variables.
Quasiconvex minimization
Problems with convex level sets can be efficiently minimized, in theory. Yurii Nesterov proved that
quasi-convex minimization problems could be solved efficiently, and his results were extended by
Kiwiel.[7] However, such theoretically "efficient" methods use "divergent-series" stepsize rules, which
were first developed for classical subgradient methods. Classical subgradient methods using
divergent-series rules are much slower than modern methods of convex minimization, such as
subgradient projection methods, bundle methods of descent, and nonsmooth filter methods.
Convex maximization
Conventionally, the definition of the convex optimization problem (we recall) requires that the
objective function f to be minimized and the feasible set be convex. In the special case of linear
programming (LP), the objective function is both concave and convex, and so LP can also consider
the problem of maximizing an objective function without confusion. However, for most convex
minimization problems, the objective function is not concave, and therefore a problem and then such
problems are formulated in the standard form of convex optimization problems, that is, minimizing
the convex objective function.
For nonlinear convex minimization, the associated maximization problem obtained by substituting
the supremum operator for the infimum operator is not a problem of convex optimization, as
conventionally defined. However, it is studied in the larger field of convex optimization as a problem
of convex maximization.[9]
The convex maximization problem is especially important for studying the existence of maxima.
Consider the restriction of a convex function to a compact convex set: Then, on that set, the function
attains its constrained maximum only on the boundary.[10] Such results, called "maximum principles",
are useful in the theory of harmonic functions, potential theory, and partial differential equations.
Extensions
Advanced treatments consider convex functions that can attain positive infinity, also; the indicator
function of convex analysis is zero for every and positive infinity otherwise.
Optimization problem
In mathematics and computer science, an optimization problem is the problem of finding
the best solution from all feasible solutions. Optimization problems can be divided into two
categories depending on whether the variables are continuous or discrete. An optimization problem
with discrete variables is known as a combinatorial optimization problem. In a combinatorial
optimization problem, we are looking for an object such as an integer, permutation or graph from a
finite (or possibly countable infinite) set.
Contents
where
The goal is then to find for some instance an optimal solution, that is, a feasible solution with
In the field of approximation algorithms, algorithms are designed to find near-optimal solutions to
hard problems. The usual decision version is then an inadequate definition of the problem since
it only specifies acceptable solutions. Even though we could introduce suitable decision
problems, the problem is more naturally characterized as an optimization problem.[2]
NP optimization problem
An NP-optimization problem (NPO) is a combinatorial optimization problem with the following
additional conditions.[3] Note that the below referred polynomials are functions of the size of the
respective functions' inputs, not the size of some implicit set of input instances.
the size of every feasible solution is polynomially bounded in the size of the given
instance ,
the languages and can be recognized in polynomial time, and
m is polynomial-time computable.
This implies that the corresponding decision problem is in NP. In computer science, interesting
optimization problems usually have the above properties and are therefore NPO problems. A
problem is additionally called a P-optimization (PO) problem, if there exists an algorithm which finds
optimal solutions in polynomial time. Often, when dealing with the class NPO, one is interested in
optimization problems for which the decision versions are NP-complete. Note that hardness relations
are always with respect to some reduction. Due to the connection between approximation algorithms
and computational optimization problems, reductions which preserve approximation in some respect
are for this subject preferred than the usual Turing and Karp reductions. An example of such a
reduction would be the L-reduction. For this reason, optimization problems with NP-complete
decision versions are not necessarily called NPO-complete.[4]
Another class of interest is NPOPB, NPO with polynomially bounded cost functions. Problems with
this condition have many desirable properties.
Proximal gradient methods for learning
Proximal gradient (forward backward splitting) methods for learning is an area of research
in optimization and statistical learning theory which studies algorithms for a general class
of convex regularization problems where the regularization penalty may not be differentiable. One
such example is regularization (also known as Lasso) of the form
Proximal gradient methods offer a general framework for solving regularization problems from
statistical learning theory with penalties that are tailored to a specific problem application.[1]
[2]
Such customized penalties can help to induce certain structure in problem solutions, such
as sparsity (in the case of lasso) or group structure (in the case ofgroup lasso).
Contents
1 Relevant background
o 1.1 Moreau decomposition
2 Lasso regularization
o 2.1 Solving for proximity operator
o 2.2 Fixed point iterative schemes
3 Practical considerations
o 3.1 Adaptive step size
o 3.2 Elastic net (mixed norm regularization)
4 Exploiting group structure
o 4.1 Group lasso
o 4.2 Other group structures
Relevant background
Proximal gradient methods are applicable in a wide variety of scenarios for solving convex
optimization problems of the form
which is well-defined because of the strict convexity of the norm. The proximity operator can be
seen as a generalization of a projection.[1][3][4] We see that the proximity operator is important
In certain situations it may be easier to compute the proximity operator for the conjugate instead
of the function , and therefore the Moreau decomposition can be applied. This is the case
for group lasso.
Lasso regularization
Consider the regularized empirical risk minimization problem with square loss and with the
norm as the regularization penalty:
where The regularization problem is sometimes referred to
as lasso (least absolute shrinkage and selection operator).[5] Such regularization problems are
interesting because they induce sparse solutions, that is, solutions to the minimization problem
have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-
convex problem
where denotes the "norm", which is the number of nonzero entries of the vector . Sparse
solutions are of particular interest in learning theory for interpretability of results: a sparse solution
can identify a small number of important factors.[5]
Let us compute the proximity operator for . First we find an alternative characterization of the
proximity operator as follows:
Using the recharacterization of the proximity operator given above, for the choice
of and we have that is defined entrywise by
which is known as the soft thresholding operator .[1][6]
Given that we have computed the form of the proximity operator explicitly, then we can define a
standard fixed point iteration procedure. Namely, fix some initial , and for
define
Note here the effective trade-off between the empirical error term and the regularization
penalty . This fixed point method has decoupled the effect of the two different convex
functions which comprise the objective function into a gradient descent step ( )
and a soft thresholding step (via ).
Convergence of this fixed point scheme is well-studied in the literature[1][6] and is guaranteed under
appropriate choice of step size and loss function (such as the square loss taken
here). Accelerated methods were introduced by Nesterov in 1983 which improve the rate of
convergence under certain regularity assumptions on .[7] Such methods have been studied
extensively in previous years.[8] For more general learning problems where the proximity operator
cannot be computed explicitly for some regularization term , such fixed point schemes can still be
carried out using approximations to both the gradient and the proximity operator.[4][9]
Practical considerations
There have been numerous developments within the past decade in convex optimization techniques
which have influenced the application of proximal gradient methods in statistical learning theory.
Here we survey a few important topics which can greatly improve practical algorithmic performance
of these methods.[2][10]
solutions to where is some empirical loss function, need not be unique.
This is often avoided by the inclusion of an additional strictly convex term, such as an norm
regularization penalty. For example, one can consider the problem
term is now strictly convex, and hence the minimization problem
now admits a unique solution. It has been observed that for sufficiently small , the additional
penalty term acts as a preconditioner and can substantially improve convergence while not
adversely affecting the sparsity of solutions.[2][14]
Group lasso
Group lasso is a generalization of the lasso method when features are grouped into disjoint blocks.
[15]
Suppose the features are grouped into blocks . Here we take as a regularization
penalty
which is the sum of the norm on corresponding feature vectors for the different groups. A similar
proximity operator analysis as above can be used to compute the proximity operator for this penalty.
Where the lasso penalty has a proximity operator which is soft thresholding on each individual
component, the proximity operator for the group lasso is soft thresholding on each group. For the
In contrast to lasso, the derivation of the proximity operator for group lasso relies on the Moreau
decomposition. Here the proximity operator of the conjugate of the group lasso penalty becomes a
projection onto the ball of a dual norm
Given a function of variables to minimize, its gradient indicates the direction
of maximum increase. One simply starts in the opposite (steepest descent) direction:
with an adjustable step length and performs a line search in this direction until it
reaches the minimum of :
After this first iteration in the steepest direction , the following steps constitute
one iteration of moving along a subsequent conjugate direction ,
where :
1. Calculate the steepest direction: ,
2. Compute according to one of the formulas below,
3. Update the conjugate direction:
With a pure quadratic function the minimum is reached within N iterations (excepting
roundoff error), but a non-quadratic function will make slower progress. Subsequent
search directions lose conjugacy requiring the search direction to be reset to the
steepest descent direction at least every N iterations, or sooner if progress stops.
However, resetting every iteration turns the method into steepest descent. The
algorithm stops when it finds the minimum, determined when no progress is made
after a direction reset (i.e. in the steepest descent direction), or when some
tolerance criterion is reached.
Within a linear approximation, the parameters and are the same as in the
linear conjugate gradient method but have been obtained with line searches. The
conjugate gradient method can follow narrow (ill-conditioned) valleys where
the steepest descent method slows down and follows a criss-cross pattern.
Four of the best known formulas for are named after their developers and are
given by the following formulas:
Fletcher–Reeves:
Polak–Ribière:
Hestenes-Stiefel:
Dai–Yuan:
.
These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred
formula is a matter of heuristics or taste. A popular choice is which provides
a direction reset automatically.
The VRP has many obvious applications in industry. In fact the use of computer optimisation
programs can give savings of 5% to a company [3] as transportation is usually a significant
component of the cost of a product (10%) [4] - indeed the transportation sector makes up 10% of
the EU's GDP. Consequently, any savings created by the VRP, even less than 5%, are significant.[3]
Contents
The road network can be described using a graph where the arcs are roads and vertices are
junctions between them. The arcs may be directed or undirected due to the possible presence of
one way streets or different costs in each direction. Each arc has an associated cost which is
generally its length or travel time which may be dependent on vehicle type.[2]
To know the global cost of each route, the travel cost and the travel time between each customer
and the depot must be known. To do this our original graph is transformed into one where the
vertices are the customers and depot and the arcs are the roads between them. The cost on each
arc is the lowest cost between the two points on the original road network. This is easy to do
as shortest path problems are relatively easy to solve. This transforms the sparse original graph into
a complete graph. For each pair of vertices iand j, there exists an arc (i,j) of the complete graph
whose cost is written as and is defined to be the cost of shortest path from i to j. The travel
time is the sum of the travel times of the arcs on the shortest path from i to j on the original road
graph.
Sometimes it is impossible to satisfy all of a customer's demands and in such cases solvers may
reduce some customers' demands or leave some customers unserved. To deal with these situations
a priority variable for each customer can be introduced or associated penalties for the partial or lack
of service for each customer given [2]
The objective function of a VRP can be very different depending on the particular application of the
result but a few of the more common objectives are:[2]
Minimise the global transportation cost based on the global distance travelled as well as the
fixed costs associated with the used vehicles and drivers
Minimise the number of vehicles needed to serve all customers
Least variation in travel time and vehicle load
Minimise penalties for low quality service
VRP flavours
A map showing the relationship between common VRP subproblems.
Vehicle Routing Problem with Pickup and Delivery (VRPPD): A number of goods need to be
moved from certain pickup locations to other delivery locations. The goal is to find optimal routes
for a fleet of vehicles to visit the pickup and drop-off locations.
Vehicle Routing Problem with LIFO: Similar to the VRPPD, except an additional restriction is
placed on the loading of the vehicles: at any delivery location, the item being delivered must be
the item most recently picked up. This scheme reduces the loading and unloading times at
delivery locations because there is no need to temporarily unload items other than the ones that
should be dropped off.
Vehicle Routing Problem with Time Windows (VRPTW): The delivery locations have time
windows within which the deliveries (or visits) must be made.
Capacitated Vehicle Routing Problem: CVRP or CVRPTW. The vehicles have limited carrying
capacity of the goods that must be delivered.
Vehicle Routing Problem with Multiple Trips (VRPMT): The vehicles can do more than one
route.
Open Vehicle Routing Problem (OVRP): Vehicles are not required to return to the depot.
Several software vendors have built software products to solve the various VRP problems.
Numerous articles are available for more detail on their research and results.
Although VRP is related to the Job Shop Scheduling Problem, the two problems are typically solved
using different techniques.[5]
1. Vehicle flow formulations - this uses integer variables associated with each arc that count
the number of times that the edge is traversed by a vehicle. It is generally used for basic
VRPs. This is good for cases where the solution cost can be expressed as the sum of any
costs associated with the arcs. However it can't be used to handle many practical
applications [2].
1. Commodity flow formulations - additional integer variables are associated with the arcs or
edges which represent the flow of commodities along the paths travelled by the vehicles.
This has only recently been used to find an exact solution [2].
1. Set partitioning problem - These have an exponential number of binary variables which are
each associated with a different feasible circuit. The VRP is then instead formulated as a set
partitioning problem which asks what is the collection of circuits with minimum cost that
satisfy the VRP constraints. This allows for very general route costs [2].
Vehicle flow formulations
The formulation of the TSP by Dantzig, Fulkerson and Johnson was extended to create the two
index vehicle flow formulations for the VRP
subject to
Constraint 1 and 2 say that exactly one arc enters and exactly one leaves each vertex associated
with a customer respectively. Constraint 3 and 4 says that the number of vehicles leaving the depot
is the same as the number entering. We say that 3, 4 and 5 are the capacity cut constraints, these
impose that the routes must be connected and that the demand on each route must not exceed the
vehicle capacity [2].
An alternative formulation may be obtained by transforming the capacity cut constraints into
GCECs and CCCs have an exponential number of constraints so it is practically impossible to solve
the linear relaxation. A possible way to solve this is to consider a limited subset of these constraints
and add the rest if needed.
A different method again is to use a family of constraints which have a polynomial cardinality which
are known as the MTZ constraints, they were first proposed for the TSP [6] and subsequently
extended by Christofides, Mingozzi and
Toth [7].
where is an additional continuous variable which represents the load of the
vehicle after visiting customer i and d_i is the demand of customer i. These impose both the
connectivity and the capacity requirements. When constraint then i 'is not binding'
since and whereas they impose that .
These have been used extensively to model the basic VRP (CVRP) and the VRPB. However their
power is limited to these simple problems. They can only be used when the cost of the solution can
be expressed as the sum of the costs of the arc costs. We cannot also know which vehicle traverses
each arc. Hence we cannot use this for more complex models where the cost and or feasibility is
dependent on the order of the customers or the vehicles used [2].
In the theory of computational complexity, the decision version of the TSP (where, given a length L,
the task is to decide whether the graph has any tour shorter than L) belongs to the class of NP-
complete problems. Thus, it is possible that the worst-case running time for any algorithm for the
TSP increases superpolynomially (perhaps, specifically, exponentially) with the number of cities.
The problem was first formulated in 1930 and is one of the most intensively studied problems in
optimization. It is used as a benchmark for many optimization methods. Even though the problem is
computationally difficult, a large number of heuristics and exact methods are known, so that some
instances with tens of thousands of cities can be solved completely and even problems with millions
of cities can be approximated within a small fraction of 1%.[1]
The TSP has several applications even in its purest formulation, such as planning, logistics, and the
manufacture of microchips. Slightly modified, it appears as a sub-problem in many areas, such
as DNA sequencing. In these applications, the concept city represents, for example, customers,
soldering points, or DNA fragments, and the concept distance represents travelling times or cost, or
a similarity measure between DNA fragments. The TSP also appears in astronomy, as astronomers
observing many sources will want to minimise the time spent slewing the telescope between the
sources. In many applications, additional constraints such as limited resources or time windows may
be imposed.
Contents
1 History
2 Description
o 2.1 As a graph problem
o 2.2 Asymmetric and symmetric
o 2.3 Related problems
3 Integer linear programming formulation
4 Computing a solution
o 4.1 Exact algorithms
o 4.2 Heuristic and approximation algorithms
4.2.1 Constructive heuristics
4.2.2 Christofides' algorithm for the TSP
4.2.3 Iterative improvement
4.2.4 Randomised improvement
4.2.4.1 Ant colony optimization
5 Special cases of the TSP
o 5.1 Metric TSP
o 5.2 Euclidean TSP
o 5.3 Asymmetric TSP
5.3.1 Solving by conversion to symmetric TSP
o 5.4 Analyst's travelling salesman problem
o 5.5 TSP path length for random sets of points in a square
5.5.1 Upper bound
5.5.2 Lower bound
6 Computational complexity
o 6.1 Complexity of approximation
7 Human performance on TSP
8 Benchmarks
9 Popular culture
10 See also
11 Notes
12 References
13 Further reading
Description
As a graph problem
TSP can be modelled as an undirected weighted graph, such that cities are the graph's vertices,
paths are the graph's edges, and a path's distance is the edge's length. It is a minimization problem
starting and finishing at a specified vertex after having visited each othervertex exactly once. Often,
the model is a complete graph (i.e. each pair of vertices is connected by an edge). If no path exists
between two cities, adding an arbitrarily long edge will complete the graph without affecting the
optimal tour.
Related problems
The requirement of returning to the starting city does not change the computational
complexity of the problem, see Hamiltonian path problem.
The generalized travelling salesman problem, also known as the "travelling politician problem",
deals with "states" that have (one or more) "cities" and the salesman has to visit exactly one
"city" from each "state". One application is encountered in ordering a solution to the cutting stock
problem in order to minimise knife changes. Another is concerned with drilling
in semiconductor manufacturing, see e.g., U.S. Patent 7,054,798. Surprisingly, Behzad and
Modarres demonstrated that the generalised travelling salesman problem can be transformed
into a standard travelling salesman problem with the same number of cities, but a
modified distance matrix.
The sequential ordering problem deals with the problem of visiting a set of cities where
precedence relations between the cities exist.
The travelling purchaser problem deals with a purchaser who is charged with purchasing a set of
products. He can purchase these products in several cities, but at different prices and not all
cities offer the same products. The objective is to find a route between a subset of the cities,
which minimizes total cost (travel cost + purchasing cost) and which enables the purchase of all
required products.
For i = 0, ..., n, let be an artificial variable, and finally take to be the distance from city i to
city j. Then TSP can be written as the following integer linear programming problem:
The first set of equalities requires that each city be arrived at from exactly one other city,
and the second set of equalities requires that from each city there is a departure to exactly
one other city. The last constraints enforce that there is only a single tour covering all cities,
and not two or more disjointed tours that only collectively cover all cities. To prove this, it is
shown below (1) that every feasible solution contains only one closed sequence of cities,
and (2) that for every single tour covering all cities, there are values for the dummy
variables that satisfy the constraints.
To prove that every feasible solution contains only one closed sequence of cities, it suffices
to show that every subtour in a feasible solution passes through city 0 (noting that the
equalities ensure there can only be one such tour). For if we sum all the inequalities
corresponding to for any subtour of k steps not passing through city 0, we obtain:
which is a contradiction.
It now must be shown that for every single tour covering all cities, there are values for
the dummy variables that satisfy the constraints.
Without loss of generality, define the tour as originating (and ending) at city 0.
Choose if city i is visited in step t (i, t = 1, 2, ..., n). Then
Devising algorithms for finding exact solutions (they will work reasonably
fast only for small problem sizes).
Devising "suboptimal" or heuristic algorithms, i.e., algorithms that deliver
either seemingly or probably good solutions, but which could not be proved
to be optimal.
Finding special cases for the problem ("subproblems") for which either
better or exact heuristics are possible.
Exact algorithms
The most direct solution would be to try all permutations (ordered combinations)
and see which one is cheapest (using brute force search). The running time for
this approach lies within a polynomial factor of , the factorial of the
number of cities, so this solution becomes impractical even for only 20 cities.
Solution to a symmetric TSP with 7 cities using brute force search. Note: Number of
permutations: (7-1)!/2 = 360
Improving these time bounds seems to be difficult. For example, it has not been
determined whether an exact algorithm for TSP that runs in time
exists. [14]
Solution of a TSP with 7 cities using a simple Branch and bound algorithm. Note: The
number of permutations is much less than Brute force search
An exact solution for 15,112 German towns from TSPLIB was found in 2001
using the cutting-plane method proposed by George Dantzig,Ray Fulkerson,
and Selmer M. Johnson in 1954, based on linear programming. The
computations were performed on a network of 110 processors located at Rice
University and Princeton University (see the Princeton external link). The total
computation time was equivalent to 22.6 years on a single 500 MHz Alpha
processor. In May 2004, the travelling salesman problem of visiting all 24,978
towns in Sweden was solved: a tour of length approximately 72,500 kilometers
was found and it was proven that no shorter tour exists.[16] In March 2005, the
travelling salesman problem of visiting all 33,810 points in a circuit board was
solved using Concorde TSP Solver: a tour of length 66,048,945 units was found
and it was proven that no shorter tour exists. The computation took
approximately 15.7 CPU-years (Cook et al. 2006). In April 2006 an instance
with 85,900 points was solved using Concorde TSP Solver, taking over 136
CPU-years, seeApplegate et al. (2006).
Constructive heuristics[edit]
Nearest Neighbor algorithm for a TSP with 7 cities. The solution changes as the starting
point is changed
Another constructive heuristic, Match Twice and Stitch (MTS) (Kahng, Reda
2004 [19]), performs two sequentialmatchings, where the second matching is
executed after deleting all the edges of the first matching, to yield a set of
cycles. The cycles are then stitched to produce the final tour.
This algorithm looks at things differently by using a result from graph theory
which helps improve on the LB of the TSP which originated from doubling the
cost of the minimum spanning tree. Given an Eulerian graph we can find
an Eulerian tour in O(n) time.[5] So if we had an Eulerian graph with cities from a
TSP as vertices then we can easily see that we could use such a method for
finding an Eulerian tour to find a TSP solution. By triangular inequality we know
that the TSP tour can be no longer than the Eulerian tour and as such we have
a LB for the TSP. Such a method is described below.
Using a shortcut heuristic on the graph created by the matching below
To make a graph into an Eulerian graph, one starts with the minimum spanning
tree. Then all the vertices of odd order must be made even. So a matching for
the odd degree vertices must be added which increases the order of every odd
degree vertex by one.[5] This leaves us with a graph where every vertex is of
even order which is thus Eulerian. Now we can adapt the above method to give
Christofides' algorithm,
Pairwise exchange
The pairwise exchange or 2-opt technique involves iteratively removing two edges and replacing
these with two different edges that reconnect the fragments created by edge removal into a new and
shorter tour. This is a special case of the k-opt method. Note that the label Lin–Kernighan is an often
heard misnomer for 2-opt. Lin–Kernighan is actually the more general k-opt method.
For Euclidean instances, 2-opt heuristics give on average solutions that are about 5% better than
Christofides' algorithm. If we start with an initial solution made with a greedy algorithm, the average
number of moves greatly decreases again and is O(n). For random starts however, the average
number of moves is O(n log(n)). However whilst in order this is a small increase in size, the initial
number of moves for small problems is 10 times as big for a random start compared to one made
from a greedy heuristic. This is because such 2-opt heuristics exploit `bad' parts of a solution such
as crossings. These types of heuristics are often used within Vehicle routing problemheuristics to
reoptimise route solutions [20].
Optimized Markov chain algorithms which use local searching heuristic sub-algorithms can find a
route extremely close to the optimal route for 700 to 800 cities.
TSP is a touchstone for many general heuristics devised for combinatorial optimization such
as genetic algorithms, simulated annealing, Tabu search, ant colony optimization,river formation
dynamics (see swarm intelligence) and the cross entropy method.
ACS sends out a large number of virtual ant agents to explore many possible routes on the map.
Each ant probabilistically chooses the next city to visit based on a heuristic combining the distance
to the city and the amount of virtual pheromone deposited on the edge to the city. The ants explore,
depositing pheromone on each edge that they cross, until they have all completed a tour. At this
point the ant which completed the shortest tour deposits virtual pheromone along its complete tour
route (global trail updating). The amount of pheromone deposited is inversely proportional to the tour
length: the shorter the tour, the more it deposits.
Ant Colony Optimization Algorithm for a TSP with 7 cities: Red and thicklines in the pheromone map indicate
presence of more pheromone
Special cases of the TSP
Metric TSP
In the metric TSP, also known as delta-TSP or Δ-TSP, the intercity distances satisfy the triangle
inequality.
A very natural restriction of the TSP is to require that the distances between cities form a metric to
satisfy the triangle inequality; that is the direct connection from A to B is never farther than the route
via intermediate C:
The edge spans then build a metric on the set of vertices. When the cities are viewed as points in
the plane, many natural distance functions are metrics, and so many natural instances of TSP satisfy
this constraint.
The following are some examples of metric TSPs for various metrics.
The last two metrics appear for example in routing a machine that drills a given set of holes in
a printed circuit board. The Manhattan metric corresponds to a machine that adjusts first one co-
ordinate, and then the other, so the time to move to a new point is the sum of both movements. The
maximum metric corresponds to a machine that adjusts both co-ordinates simultaneously, so the
time to move to a new point is the slower of the two movements.
In its definition, the TSP does not allow cities to be visited twice, but many applications do not need
this constraint. In such cases, a symmetric, non-metric instance can be reduced to a metric one.
This replaces the original graph with a complete graph in which the inter-city distance is
replaced by the shortest path between A and B in the original graph.
Euclidean TSP
The Euclidean TSP, or planar TSP, is the TSP with the distance being the ordinary Euclidean
distance.
The Euclidean TSP is a particular case of the metric TSP, since distances in a plane obey the
triangle inequality.
Like the general TSP, the Euclidean TSP is NP-hard. With discretized metric (distances rounded up
to an integer), the problem is NP-complete.[22] However, in some respects it seems to be easier than
the general metric TSP. For example, the minimum spanning tree of the graph associated with an
instance of the Euclidean TSP is a Euclidean minimum spanning tree, and so can be computed in
expected O (n log n) time for n points (considerably less than the number of edges). This enables
the simple 2-approximation algorithm for TSP with triangle inequality above to operate more quickly.
In general, for any c > 0, where d is the number of dimensions in the Euclidean space, there is a
polynomial-time algorithm that finds a tour of length at most (1 + 1/c) times the optimal for geometric
instances of TSP in
Asymmetric TSP
In most cases, the distance between two nodes in the TSP network is the same in both directions.
The case where the distance from A to B is not equal to the distance from B toA is called asymmetric
TSP. A practical application of an asymmetric TSP is route optimisation using street-level routing
(which is made asymmetric by one-way streets, slip-roads, motorways, etc.).
Solving an asymmetric TSP graph can be somewhat complex. The following is a 3×3 matrix
containing all possible path weights between the nodes A, B and C. One option is to turn an
asymmetric matrix of size N into a symmetric matrix of size 2N.[24]
A B C
A 1 2
B 6 3
C 5 4
To double the size, each of the nodes in the graph is duplicated, creating a second ghost node.
Using duplicate points with very low weights, such as −∞, provides a cheap route "linking" back to
the real node and allowing symmetric evaluation to continue. The original 3×3 matrix shown above is
visible in the bottom left and the inverse of the original in the top-right. Both copies of the matrix have
had their diagonals replaced by the low-cost hop paths, represented by −∞.
A B C A′ B′ C′
−
A 6 5
∞
B 1 −∞ 4
C 2 3 −∞
A
−∞ 1 2
′
B
6 −∞ 3
′
C
5 4 −∞
′
The original 3×3 matrix would produce two Hamiltonian cycles (a path that visits every node once),
namely A-B-C-A [score 9] and A-C-B-A [score 12]. Evaluating the 6×6 symmetric version of the
same problem now produces many paths, including A-A′-B-B′-C-C′-A, A-B′-C-A′-A, A-A′-B-C′-A [all
score 9 – ∞].
The important thing about each new sequence is that there will be an alternation between dashed (A
′,B′,C′) and un-dashed nodes (A, B, C) and that the link to "jump" between any related pair (A-A′) is
effectively free. A version of the algorithm could use any weight for the A-A′ path, as long as that
weight is lower than all other path weights present in the graph. As the path weight to "jump" must
effectively be "free", the value zero (0) could be used to represent this cost—if zero is not being used
for another purpose already (such as designating invalid paths). In the two examples above, non-
existent paths between nodes are shown as a blank square.
where is a positive constant that is not known explicitly. Since (see below), it
Upper bound
One has , and therefore , by using a naive path which visits monotonically
the points inside each of slices of width in the square.
By observing that is greater than times the distance between and the closest
point , one gets (after a short computation)
A better lower bound is obtained[25] by observing that is greater than times the sum of the
distances between and the closest and second closest points , which gives
Held and Karp[28] gave a polynomial-time algorithm that provides numerical lower bounds for , and
thus for which seem to be good up to more or less 1%.[29] In particular, David S.
Johnson[30] obtained a lower bound by computer experiment:
where 0.522 comes from the points near square boundary which have fewer neighbors, and
Christine L. Valenzuela and Antonia J. Jones [31] obtained the following other numerical lower bound:
.
Computational complexity
The problem has been shown to be NP-hard (more precisely, it is complete for the complexity
class FPNP; see function problem), and the decision problem version ("given the costs and a
number x, decide whether there is a round-trip route cheaper than x") is NP-complete.
The bottleneck travelling salesman problem is also NP-hard. The problem remains NP-hard even for
the case when the cities are in the plane with Euclidean distances, as well as in a number of other
restrictive cases. Removing the condition of visiting each city "only once" does not remove the NP-
hardness, since it is easily seen that in the planar case there is an optimal tour that visits each city
only once (otherwise, by thetriangle inequality, a shortcut that skips a repeated visit would not
increase the tour length).
Complexity of approximation
In the general case, finding a shortest travelling salesman tour is NPO-complete.[32] If the distance
measure is a metric and symmetric, the problem becomes APX-complete[33]and Christofides’s
algorithm approximates it within 1.5.[34]
If the distances are restricted to 1 and 2 (but still are a metric) the approximation ratio becomes 8/7.
[35]
In the asymmetric, metric case, only logarithmic performance guarantees are known, the best
current algorithm achieves performance ratio 0.814 log(n);[36] it is an open question if a constant
factor approximation exists.
Fundamentals of Transportation/Timetabling
and Scheduling
In the previous unit on service planning, the strategic decisions of network and route design, stop
layout, and frequency determination were described. In this unit, the tactical decisions associated
with creating a service schedule (timetabling), creating a schedule for vehicles to operate the
service (vehicle scheduling), and creating work shifts for operators (crew scheduling) are
presented. A practical guidebook and learning tool has been published recently.[1]
The motivation for good solutions to these tactical decisions is to minimize the net operating costs to
the agency. Once the timetable is determined, the number of vehicles required to be in revenue
service can also be identified. When the vehicle schedule is determined, the total mileage and hours
for the vehicle fleet are defined. Finally, when the crew schedule is determined, the total cost of labor
(operators) is defined. Since these factors are the primary determinants of operating costs, finding
efficient solutions has a direct effect on the bottom line.
In many cases, these tactical activities are assisted by software tools that can generate high quality
solutions in a short period of time, often with direct interaction with the planner. As a result, the
interested student may wish to consult other sources to identify and to investigate the specific
software tools that might be available, such as those described in a recent publication
Contents
1 Timetabling
2 Vehicle scheduling
3 Crew scheduling
4 Glossary
5 Related books
6 References
Timetabling
The general idea behind timetabling is to create a schedule for service. As inputs, one would
consider the frequency of service for the given route (see previous unit) and the expected travel
times between stops on the route. The latter could be determined either by historical experience or
through estimates based on traffic conditions, vehicle acceleration and deceleration characteristics,
expected dwell times, etc.
Let h be the selected headway for a route, perhaps for a specific time period of the day.
Let tij be the time between stop i and stop j along the route, where i and j are adjacent stops. The
travel times between stops, tij, can vary by time of day, particularly as they may be affected by traffic
conditions. They may also reflect any slack time built into the schedule between stops, to allow for
possible variability in travel times.
Finally, let t0 be the dispatch time (departure time) of the first vehicle from a terminal.
Then, the timetable can be created simply using the following structure, with n stops on the route
and k+1 vehicles to dispatch:
The primary decision variable here is the initial dispatch time, t0. Different operating conditions might
lead to a number of possible choices for t0:
“Clockface” values. Passengers may remember the schedule more clearly if the dispatch
times fall at easily recognized times on the clock. For example, with 15-minute headways, there
may be value to passengers in dispatching a vehicle on the :00, :15, :30, and :45 of each hour.
Coordination for improved vehicle scheduling. When a vehicle finishes its trip at a terminal, it
will often be turned around to continue onto the next trip in the opposite direction on the route. In
this case, there is a need for sufficient layover time at the terminal. If the vehicle finishes a trip at
time t, then completes the layover after an additional time tL, then the vehicle may start its return
trip after t + tL. Choosing the dispatch time to occur at or slightly after t + tL allows for higher
vehicle utilization.
One way of visualizing such a system uses a so-called “string diagram,” shown in this figure.
The blue lines indicate the trajectory of a vehicle from the terminal at stop 1 to the terminal at
stop n, with short dwell times at each stop. Vehicles arriving at stop n then can return along
the route in the opposite direction (the red lines), after a layover (indicated by the black
arrows). These diagrams can be useful in visualizing vehicle movements and crosses along
the route.
Reduction of vehicle requirements. The timetable will dictate how many vehicles are in
operation at any time of the day. In some cases, minor adjustments in the dispatch times,
coupled with changes in layovers and/or dead-heads of vehicles between terminals can lead
to a reduction in the number of vehicles needed for service.
Vehicle scheduling
Vehicle scheduling, also called “blocking”, involves assigning vehicles to cover the trips
associated with the timetable. A vehicle “block” is the schedule of travel of a vehicle for a given
day, including: (1) a pull-out from the depot, (2) a sequence of trips from the timetable, (3) any
dead-head trips, and (4) a pull-in back to the depot (recall the vehicle cycle from the unit on
vehicle operations).
Generally, once the timetable is created, the time and mileage that vehicles spend in revenue
service (i.e., completing the trips in the timetable) is fixed. So, the usual goal in vehicle
scheduling is to minimize the time and/or distance that vehicles spend outside of revenue
service: e.g., pull-ins, pull-outs, dead-heads, and layovers. These all represent time or mileage
that are “unproductive”, and hence should be minimized.
To solve for the vehicle schedule, one might consider a simple “first-in-first-out” rule. In this
case, a vehicle stays on the same route throughout the whole period, and is always assigned to
the next trip after a layover. The string diagram above gives just such an arrangement.
As a simple example, suppose we have a route that runs from terminal A to terminal B and then
back to terminal A. Travel time from A to B and from B to A, including running and dwell time, is
30 minutes, and a minimum 5 minute layover is needed at each terminal. Headways are 15
minutes.
Below is a timetable for this situation, for trips between 6:00 am and after 9:00 am. The left-hand
side of the timetable shows vehicle trips from A to B, while the right-hand side shows vehicle
trips from B to A.
The colors correspond to different vehicles used on the route. The gray color corresponds to the
first vehicle of the day, leaving A at 6:00 am and continuing with the trip from B at 6:40, the trip
from A at 7:15, etc. A total of five vehicles are required to cover all the trips in this timetable.
In addition to the trips from the timetable, the vehicle block also includes a pull-out and pull-in,
so that the final block for the first vehicle (gray) could look like the following.
For networks with longer policy headways (e.g., 30- or 60-minute headways), longer layovers at
terminals may be necessary if vehicles serve the same route throughout the block. As a result,
other options can be considered, particularly in terms of shifting vehicles from one route to
another. The timetable may allow vehicles to shift from one route to another, in order to reduce
layover time and/or to avoid pull-outs or pull-ins. Specific activities in the block can include:
Interlining: the process of switching a vehicle from one route to another at a terminal, when
the routes share that common terminal.
Deadheading: the process of switching a vehicle from one route to another, also requiring a
re-location of the vehicle (traveling empty) to another terminal.
Crew scheduling
Crew scheduling (also called “run-cutting” in the transit industry) is the task of determining work
shifts (so-called “duties” or “runs”) for operators. Generally, the primary interest in crew
scheduling is to minimize the total cost of labor that meets the service requirements.
A significant fraction, typically 60-70%, of the total operating costs at a transit agency involves
the cost of operators, including wages, benefits, and other premiums. With this in mind, small
reductions in the number of operators, or in the total work hours, can result in more substantive
reductions in the total operating cost. For this reason, the task of scheduling crew to vehicles is
one area where many large transit agencies can achieve some efficiencies and potential cost
savings.
Crew scheduling is complicated because operators often cannot simply be assigned to a vehicle
for the entire vehicle block. First, the shift would often be much longer than a typical 8-hour work
period; and, second, the operator may not get sufficient break time during vehicle layovers (e.g.,
for lunch). Instead, the duties have to consider more practical concerns of the operators.
In this regard, transit agencies have rules that dictate the kind of work shifts the operators may
perform. In most cases in the US, the types of work shifts are governed by collective bargaining
agreements (union work rules) that specify work conditions for transit operators. Possible
examples of work rules could include restrictions like the following:
The general approach to creating a crew schedule begins by cutting each vehicle block into
“pieces of work.” Each piece of work is a subset of trips in the block, forming the elemental unit
of work (driving) for the operator. Then, according to the constraints from the work rules, these
pieces of work are assembled into feasible duties. The hope is to assemble a full set of duties
such that all pieces of work are covered and that the total cost is minimal. The cost of a duty can
depend on both the traditional hourly rate of pay for the operator for hours worked. If the
operator has a straight shift (no unpaid break), they are paid a certain amount, usually at a given
hourly rate. Other costs can include:
A minimum guarantee of hours of pay, if the guarantee exceeds the number of hours worked
(e.g., 8 hours of pay, even if the operator works only 7 hours);
Premiums for overtime (e.g., time in the duty over 8 hours);
Premiums for spread time. Spread is the total time between the start and end of a duty. If this
exceeds a certain maximum (e.g., 9 hours), the operator is entitled to extra pay;
Premiums for swing. Swing occurs when the duty starts and ends at different locations
(terminals, depots);
Premiums for split duties, where the duty has an unpaid break. This can occur when an
operator works only the AM and PM peak periods, without working in the mid-day;
These rules on pay suggest that the crew schedule should contain as many straight duties as
possible. Small pieces of work that remain after generating these straight duties can be
allocated to part-time operators (if they are available), to avoid other premiums, or covered using
split duties with associated split and/or spread penalties.
A second problem in crew scheduling is rostering, in which duties are assembled into a group of
duties (the “roster”) for each operator, by week. For example, one roster could include the same
8-hour duty for 5 weekdays. However, many possible combinations of duties could be
considered, especially if weekend or evening service is provided. Once the rosters are created,
operators choose from among these duty rosters.
Glossary
Block: the sequence of trips made by a vehicle in the course of one day of operations,
including both revenue and non-revenue trips.
Duty (or, Run): a work shift for an operator for one day.
Guarantee: the minimum pay hours for an operator, regardless of the number of hours
worked.
Roster (also, rostering): the set of duties for a single operator in a week.
Split: a duty covering at least two intervals of time with an unpaid break.
Spread: the time between when an operator reports for duty and when they end their duty.
Swing: a duty in which the operator begins and ends at different locations.
Tripper: a short work assignment (e.g., 2-4 hours); generally much shorter than a typical
straight.
References
1. Jump up↑ Transportation Research Board (2009a). Controlling System Costs: Basic and Advanced
Scheduling Manuals and Contemporary Issues in Transit Scheduling, Transit Cooperative Research
Program, Report 135. [2]
2. Jump up↑ Transportation Research Board (2009b). Controlling System Costs: Basic and Advanced
Scheduling Manuals and Contemporary Issues in Transit Scheduling, Appendix, Transit Cooperative
Research Program, Report 135 Appendix. [3]
Flow network
In graph theory, a flow network (also known as a transportation network) is a directed
graph where each edge has a capacity and each edge receives a flow. The amount of flow on an
edge cannot exceed the capacity of the edge. Often in operations research, a directed graph is
called a network. The vertices are called nodes and the edges are called arcs. A flow must satisfy
the restriction that the amount of flow into a node equals the amount of flow out of it, unless it is
a source, which has only outgoing flow, or sink, which has only incoming flow. A network can be
used to model traffic in a road system, circulation with demands, fluids in pipes, currents in an
electrical circuit, or anything similar in which something travels through a network of nodes.
Contents
1 Definition
2 Example
3 Applications
4 See also
5 References
6 Further reading
7 External links
Definition
Let be a finite directed graph in which every edge has a non-negative,
real-valued capacity . If , we assume that . We distinguish two
vertices: a source and a sink . A flow in a flow network is a real function with
the following three properties for all nodes and :
Capacity
constraints: . The flow along an edge cannot exceed its capacity.
Skew symmetry: . The net flow from to must be the opposite of the
net flow from to (see example).
Notice that is the net flow from to . If the graph represents a physical network, and
if there is a real flow of, for example, 4 units from to , and a real flow of 3 units from to ,
we have and .
Basically we can say that flow for a physical network is flow leaving at s =
Sometimes one needs to model a network with more than one source, a supersource is
introduced to the graph. This consists of a vertex connected to each of the sources with edges
of infinite capacity, so as to act as a global source. A similar construct for sinks is called
a supersink.
Example
To the right you see a flow network with source labeled , sink , and four additional nodes. The
flow and capacity is denoted . Notice how the network upholds skew symmetry, capacity
constraints and flow conservation. The total amount of flow from to is 5, which can be easily
seen from the fact that the total outgoing flow from is 5, which is also the incoming flow to .
We know that no flow appears or disappears in any of the other nodes.
Residual network for the above flow network, showing residual capacities.
Below you see the residual network for the given flow. Notice how there is positive residual
capacity on some edges where the original capacity is zero, for example for the edge .
This flow is not a maximum flow. There is available capacity along the paths
, and , which are then the augmenting paths. The residual
capacity of the first path
is
. Notice that as long as there exists
some path with a positive residual capacity, the flow will not be maximum. The residual capacity
for some path is the minimum residual capacity of all edges in that path.
Applications
Picture a series of water pipes, fitting into a network. Each pipe is of a certain diameter, so it can
only maintain a flow of a certain amount of water. Anywhere that pipes meet, the total amount of
water coming into that junction must be equal to the amount going out, otherwise we would
quickly run out of water, or we would have a buildup of water. We have a water inlet, which is
the source, and an outlet, the sink. A flow would then be one possible way for water to get from
source to sink so that the total amount of water coming out of the outlet is consistent. Intuitively,
the total flow of a network is the rate at which water comes out of the outlet.
Flow networks also find applications in ecology: flow networks arise naturally when considering
the flow of nutrients and energy between different organizations in a food web. The
mathematical problems associated with such networks are quite different from those that arise in
networks of fluid or traffic flow. The field of ecosystem network analysis, developed by Robert
Ulanowicz and others, involves using concepts from information theory and thermodynamics to
study the evolution of these networks over time.
The simplest and most common problem using flow networks is to find what is called
the maximum flow, which provides the largest possible total flow from the source to the sink in a
given graph. There are many other problems which can be solved using max flow algorithms, if
they are appropriately modeled as flow networks, such as bipartite matching, the assignment
problem and the transportation problem. Maximum flow problems can be solved efficiently with
the relabel-to-front algorithm. The max-flow min-cut theoremstates that finding a maximal
network flow is equivalent to finding a cut of minimum capacity that separates the source and
the sink. Where a cut is the division of vertices such that the source is in one division and the
sink is in another.
In a multi-commodity flow problem, you have multiple sources and sinks, and various
"commodities" which are to flow from a given source to a given sink. This could be for example
various goods that are produced at various factories, and are to be delivered to various given
customers through the same transportation network.
In a minimum cost flow problem, each edge has a given cost , and the cost of
sending the flow across the edge is . The objective is to send a
given amount of flow from the source to the sink, at the lowest possible price.
In a circulation problem, you have a lower bound on the edges, in addition to the upper
bound . Each edge also has a cost. Often, flow conservation holds for allnodes in a
circulation problem, and there is a connection from the sink back to the source. In this way, you
can dictate the total flow with and . The flow circulatesthrough the network,
hence the name of the problem.
In a network with gains or generalized network each edge has a gain, a real number (not
zero) such that, if the edge has gain g, and an amount x flows into the edge at its tail, then an
amount gx flows out at the head.
In a source localization problem, an algorithm tries to identify the most likely source node of
information diffusion through a partially observed network. This can be done in linear time for
trees and cubic time for arbitrary networks and has applications ranging from tracking mobile
phone users to identifying the originating village of disease outbreaks.[3]
See also
Braess' paradox
Centrality
Constructal theory
Ford–Fulkerson algorithm
Dinic's algorithm
Flow (computer networking)
Flow graph
Max-flow min-cut theorem
Oriented matroid
Shortest path problem
References
1. Jump up^ Black, Paul E. "Supersource". Dictionary of Algorithms and Data Structures. NIST.
2. Jump up^ Black, Paul E. "Supersink". Dictionary of Algorithms and Data Structures. NIST.
3. Jump
up^ http://www.pedropinto.org.s3.amazonaws.com/publications/locating_source_diffusion_netwo
rks.pdf
Max-flow min-cut theorem
In optimization theory, the max-flow min-cut theorem states that in a flow network, the maximum
amount of flow passing from the source to the sink is equal to the minimum capacity that, when
removed in a specific way from the network, causes the situation that no flow can pass from the
source to the sink.
Maximum flow
Definition. The capacity of an edge is a mapping c : E → R+, denoted by cuv or c(u, v). It
represents the maximum amount of flow that can pass through an edge.
1. Capacity Constraint:
2. Conservation of Flows:
where s is the source of N. It represents the amount of flow passing from the source to
the sink.
maximize
minimize
subject to subject to
The max-flow problem and min-cut problem can be formulated as two primal-dual linear programs.
Note that for the given s-t cut if then and 0 otherwise. Therefore
should be 1 and shout be zero. The equality in the max-flow min-cut theorem follows from
the strong duality theorem in linear programming, which states that if the primal program has an
optimal solution, x*, then the dual program also has an optimal solution, y*, such that the optimal
values formed by the two solutions are equal.
Example
A network with the value of flow equal to the capacity of an s-t cut
The figure on the right is a network having a value of flow of 7. The vertex in white and the vertices
in grey form the subsets Sand T of an s-t cut, whose cut-set contains the dashed edges. Since the
capacity of the s-t cut is 7, which equals to the value of flow, the max-flow min-cut theorem tells us
that the value of flow and the capacity of the s-t cut are both optimal in this network.
Application
Generalized max-flow min-cut theorem
In addition to edge capacity, consider there is capacity at each vertex, that is, a
mapping c : V → R+, denoted by c(v), such that the flow f has to satisfy not only the capacity
constraint and the conservation of flows, but also the vertex capacity constraint
In other words, the amount of flow passing through a vertex cannot exceed its capacity. Define an s-
t cut to be the set of vertices and edges such that for any path from s to t, the path contains a
member of the cut. In this case, the capacity of the cut is the sum the capacity of each edge and
vertex in it.
In this new definition, the generalized max-flow min-cut theorem states that the maximum value of
an s-t flow is equal to the minimum capacity of an s-t cut in the new sense.
Menger's theorem
In the undirected edge-disjoint paths problem, we are given an undirected graph G = (V, E) and two
vertices s and t, and we have to find the maximum number of edge-disjoint s-t paths in G.
The Menger's theorem states that the maximum number of edge-disjoint s-t paths in an undirected
graph is equal to the minimum number of edges in an s-t cut-set.
Project selection problem
A network formulation of the project selection problem with the optimal solution
Let P be the set of projects not selected and Q be the set of equipments purchased, then the
problem can be formulated as,
Since the first term does not depend on the choice of P and Q, this maximization problem can be
formulated as a minimization problem instead, that is,
The above minimization problem can then be formulated as a minimum-cut problem by constructing
a network, where the source is connected to the projects with capacity r(pi), and the sink is
connected by the equipments with capacity c(qj). An edge (pi, qj) with infinitecapacity is added if
project pi requires equipment qj. The s-t cut-set represents the projects and equipments
in P and Q respectively. By the max-flow min-cut theorem, one can solve the problem as
a maximum flow problem.
The figure on the right gives a network formulation of the following project selection problem:
Project r(pi) Equipmentc(qj)
The minimum capacity of a s-t cut is 250 and the sum of the revenue of each project is 450;
therefore the maximum profit g is 450 − 250 = 200, by selecting projects p2 and p3.
The idea here is to 'flow' the project profits through the 'pipes' of the equipment. If we cannot fill the
pipe, the equipment's return is less than its cost, and the min cut algorithm will find it cheaper to cut
the project's profit edge instead of the equipment's cost edge.
In the image segmentation problem, there are n pixels. Each pixel i can be assigned a foreground
value fi or a background value bi. There is a penalty of pij if pixels i, j are adjacent and have
different assignments. The problem is to assign pixels to foreground or background such that the
sum of their values minus the penalties is maximum.
Let P be the set of pixels assigned to foreground and Q be the set of points assigned to background,
then the problem can be formulated as,
This maximization problem can be formulated as a minimization problem instead, that is,
History
The max-flow min-cut theorem was proven by P. Elias, A. Feinstein, and C.E. Shannon in 1956[1],
and independently also by L.R. Ford, Jr. and D.R. Fulkerson in the same year[2].
Proof
Let G = (V, E) be a network (directed graph) with s and t being the source and the sink
of G respectively.
In G, there exists an outgoing edge such that it is not saturated, i.e., f
(x, y) < cxy. This implies, that there exists a forward edge from x to y in Gf, therefore there exists a
path from s to y in Gf, which is a contradiction. Hence, any outgoing edge (x, y) is fully saturated.
In G, there exists an incoming edge such that it carries some non-zero
flow, i.e., f (x, y) > 0. This implies, that there exists a backward edge fromx to y in Gf, therefore
there exists a path from s to y in Gf, which is again a contradiction. Hence, any incoming
edge (x, y) must have zero flow.
Both of the above statements prove that the capacity of cut obtained in the above described manner
is equal to the flow obtained in the network. Also, the flow was obtained byFord-Fulkerson algorithm,
so it is the max-flow of the network as well.
Also, since any flow in the network is always less than or equal to capacity of every cut
possible in a network, the above described cut is also the min-cut which obtains themax-flow.
References
Eugene Lawler (2001). "4.5. Combinatorial Implications of Max-Flow Min-Cut Theorem, 4.6. Linear
Programming Interpretation of Max-Flow Min-Cut Theorem".Combinatorial Optimization: Networks
and Matroids. Dover. pp. 117–120. ISBN 0-486-41453-1.
The idea behind the algorithm is as follows: As long as there is a path from the source (start node) to
the sink (end node), with available capacity on all edges in the path, we send flow along one of the
paths. Then we find another path, and so on. A path with available capacity is called an augmenting
path.
Contents
1 Algorithm
2 Complexity
3 Integral example
4 Non-terminating example
5 Python implementation
o 5.1 Usage example
6 Notes
7 References
8 See also
9 External links
Algorithm
Let be a graph, and for each edge from to , let be the capacity and
be the flow. We want to find the maximum flow from the source to the sink . After every step in
the algorithm the following is maintained:
The flow along an
Capacity edge can not
constraints: exceed its
capacity.
The net flow
from to must
Skew be the opposite of
symmetry: the net flow
from to (see
example).
Flow That is, unless
conservation: is or . The net
flow to a node is
zero, except for the
source, which
"produces" flow,
and the sink,
which "consumes"
flow.
That is, the flow
leaving from
Value(f): must be equal to
the flow arriving
at .
This means that the flow through the network is a legal flow after each round in the algorithm.
We define the residual network to be the network with
capacity and no flow. Notice that it can happen that a flow
from to is allowed in the residual network, though disallowed in the original network:
if and the
n .
Algorithm Ford–Fulkerson
The path in step 2 can be found with for example a breadth-first search or a depth-first
search in . If you use the former, the algorithm is called Edmonds–Karp.
When no more paths in step 2 can be found, will not be able to reach in the residual
network. If is the set of nodes reachable by in the residual network, then the total
capacity in the original network of edges from to the remainder of is on the one
hand equal to the total flow we found from to , and on the other hand serves as an
upper bound for all such flows. This proves that the flow we found is maximal. See
also Max-flow Min-cut theorem.
If the graph has multiple sources and sinks, we act as follows: Suppose
that and . Add a new source
with an edge from to every node , with
Also, if a node has capacity constraint , we replace this node with two
nodes , and an edge , with capacity . Then
apply the Ford–Fulkerson algorithm.
Complexity
By adding the flow augmenting path to the flow already established in the graph, the
maximum flow will be reached when no more flow augmenting paths can be found in the
graph. However, there is no certainty that this situation will ever be reached, so the best
that can be guaranteed is that the answer will be correct if the algorithm terminates. In
the case that the algorithm runs forever, the flow might not even converge towards the
maximum flow. However, this situation only occurs with irrational flow values. When the
capacities are integers, the runtime of Ford–Fulkerson is bounded by (see big
O notation), where is the number of edges in the graph and is the maximum flow in
the graph. This is because each augmenting path can be found in time and
increases the flow by an integer amount which is at least .
Integral example
The following example shows the first steps of Ford–Fulkerson in a flow network with 4
nodes, source and sink . This example shows the worst-case behaviour of the
algorithm. In each step, only a flow of is sent across the network. If breadth-first-
search were used instead, only two steps would be needed.
Resulting flow
Path Capacity
network
Notice how flow is "pushed back" from to when finding the path .
Non-terminating example
Consider the flow network shown on the right, with source , sink , capacities of
edges , and respectively , and and the capacity of all
other edges some integer . The constant was chosen so, that .
We use augmenting paths according to the following table,
where , an
d .
Residual capacities
Augmenting
Step Sent flow
path
Note that after step 1 as well as after step 5, the residual capacities of edges ,
and are in the form , and , respectively, for some . This means
that we can use augmenting paths , , and infinitely many times and residual
capacities of these edges will always be in the same form. Total flow in the network after
step 5 is . If we continue to use augmenting paths as above, the total
flow converges to , while the maximum flow is .
In this case, the algorithm never terminates and the flow doesn't even converge to the
maximum flow.
Python implementation
class Edge(object):
def __init__(self, u, v, w):
self.source = u
self.sink = v
self.capacity = w
def __repr__(self):
return "%s->%s:%s" % (self.source, self.sink,
self.capacity)
class FlowNetwork(object):
def __init__(self):
self.adj = {}
self.flow = {}
References
Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein,
Clifford (2001). "Section 26.2: The Ford–Fulkerson method". Introduction to
Algorithms (Second ed.). MIT Press and McGraw–Hill. pp. 651–664. ISBN 0-262-
03293-7.
George T. Heineman, Gary Pollice, and Stanley Selkow (2008). "Chapter 8:Network
Flow Algorithms". Algorithms in a Nutshell. Oreilly Media. pp. 226–250. ISBN 978-0-
596-51624-6.
Jon Kleinberg and Éva Tardos (2006). "Chapter 7:Extensions to the Maximum-Flow
Problem". Algorithm Design. Pearson Education. pp. 378–384. ISBN 0-321-29535-
distinguish two special nodes inG: a source node s and a sink node t. For each i in V we denote
by E(i) all the arcs emanating from node i. Let U= max u by (i,j) in E. Let us also denote the
ij
maximal). Given a flow x we are able to construct the residual network with respect to this flow
according to the following intuitive idea. Suppose that an edge (i,j) in E carries x units of flow. We
ij
define the residual capacity of the edge (i,j) as r = u – x . This means that we can send an
ij ij ij
additional r units of flow from vertex i to vertexj. We can also cancel the existing flow x on the arc
ij ij
So, given a feasible flow x we define the residual network with respect to the flow x as follows.
Suppose we have a network G = (V, E). A feasible solution x engenders a new (residual) network,
which we define by G = (V, E ), where E is a set of residual edges corresponding to the feasible
x x x
solution x.
What is E ? We replace each arc (i,j) in E by two arcs (i,j), (j,i): the arc (i,j) has (residual) capacity r =
x ij
u – x , and the arc (j,i) has (residual) capacity r =x . Then we construct the set E from the new
ij ij ji ij x
Theorem 1 (Augmenting Path Theorem). A flow x* is a maximum flow if and only if the residual
network Gx* contains no augmenting path.
According to the theorem we obtain a method of finding a maximal flow. The method proceeds
by identifying augmenting paths and augmenting flows on these paths until the network contains
no such path. All algorithms that we wish to discuss differ only in the way of finding augmenting
paths.
As to why these assumptions are correct we leave the proof to the reader.
It is easy to determine that the method described above works correctly. Under assumption 2, on
each augmenting step we increase the flow value by at least one unit. We (usually) start with zero
flow. The maximum flow value is bounded from above, according to assumption 3. This reasoning
implies the finiteness of the method.
With those preparations behind us, we are ready to begin discussing the algorithms.
In 1972 Edmonds and Karp — and, in 1970, Dinic — independently proved that if each augmenting
path is shortest one, the algorithm will perform O(nm) augmentation steps. The shortest path
(length of each edge is equal to one) can be found with the help of breadth-first search (BFS)
algorithm. Shortest Augmenting Path Algorithm is well known and widely discussed in many
books and articles, which is why we will not describe it in great detail. Let’s review the idea using
a kind of pseudo-code:
itself. If m ~ n2 then one must use BFS procedure O(n ) times in worst case. There are some
3
networks on which this numbers of augmentation steps is being achieved. We will show one
simple example below.
Improved Shortest Augmenting Path Algorithm, O(n m) 2
As mentioned earlier, the natural approach for finding any shortest augmenting path would be to
look for paths by performing a breadth-first search in the residual network. It
requires O(m) operations in the worst case and imposes O(nm ) complexity of the maximum flow
2
algorithm. Ahuja and Orlin improved the shortest augmenting path algorithm in 1987. They
exploited the fact that the minimum distance from any node i to the sink node t is monotonically
nondecreasing over all augmentations and reduced the average time per augmentation to O(n).
The improved version of the augmenting path algorithm, then, runs in O(n m) time. We can now
2
So, the improved shortest augmenting path algorithm consists of four steps (procedures): main
cycle, advance, retreat and augment. The algorithm maintains a partial admissible path, i.e., a
path from s to some node i, consisting of admissible arcs. It performs advance or retreat steps
from the last node of the partial admissible path (such node is called current node). If there is
some admissible arc (i,j) from current node i, then the algorithm performs the advance step and
adds the arc to the partial admissible path. Otherwise, it performs the retreat step, which
increases distance label of i and backtracks by one arc.
If the partial admissible path reaches the sink, we perform an augmentation. The algorithm stops
when d(s) ≥ n. Let’s describe these steps in pseudo-code. We denoted residual (with respect to
flow x) arcs emanating from node i by E (i). More formally, E (i) = { (i,j) in E(i): r > 0 }.
x x ij
In line 1 of retreat procedure if E (i) is empty, then suppose d(i) equals n.
x
Ahuja and Orlin suggest the following data structure for this algorithm [1]. We maintain the arc
list E(i) which contains all the arcs emanating from node i. We arrange the arcs in this list in any
fixed order. Each node i has a current arc, which is an arc in E(i) and is the next candidate for
admissibility testing. Initially, the current arc of node i is the first arc in E(i). In line 5 the algorithm
tests whether the node’s current arc is admissible. If not, it designates the next arc in the list as
the current arc. The algorithm repeats this process until either it finds an admissible arc or
reaches the end of the arc list. In the latter case the algorithm declares that E(i) contains no
admissible arc; it again designates the first arc in E(i) as the current arc of node i and performs
the relabeloperation by calling the retreat procedure (line 10).
Now we outline a proof that the algorithm runs in O(n m) time.
2
Lemma 1. The algorithm maintains distance labels at each step. Moreover, each relabel (or,
retreat) step strictly increases the distance label of a node.
Sketch to proof. Perform induction on the number of relabel operation and augmentations.
Lemma 2. Distance label of each node increases at most n times. Consecutively, relabel operation
performs at most n times.
2
Proof. This lemma is consequence of lemma 1 and the fact that if d(s) ≥ n then the residual
network contains no augmenting path.
Since the improved shortest augmenting path algorithm makes augmentations along the
shortest paths (like unimproved one), the total number of augmentations is the same O(nm).
Each retreat step relabels a node, that is why number of retreat steps is O(n ) (according to lemma
2
2). Time to perform retreat/relabel steps is O( n ∑ |E(i)| ) = O(nm). Since one augmentation
i in V
requires O(n) time, total augmentation time is O(n m). The total time of advance steps is bounded
2
by the augmentation time plus the retreat/relabel time and it is againO(n m). We obtain the
2
following result:
Theorem 2. The improved shortest augmenting path algorithm runs in O(n m) time.
2
Ahuja and Orlin suggest one very useful practical improvement of the algorithm. Since the
algorithm performs many useless relabel operations while the maximum flow has been found, it
will be better to give an additional criteria of terminating. Let’s introduce (n+1)-dimensional
additional array, numbs, whose indices vary from 0 to n. The value numbs(k) is the number of
nodes whose distance label equals k. The algorithm initializes this array while computing the
initial distance labels using BFS. At this point, the positive entries in the array numbs are
consecutive (i.e., numbs(0), numbs(1), …, numbs(l) will be positive up to some index l and the
remaining entries will all be zero).
When the algorithm increases a distance label of a node from x to y, it subtracts 1 from numbs(x),
adds 1 tonumbs(y) and checks whether numbs(x) = 0. If it does equal 0, the algorithm terminates.
This approach is some kind of heuristic, but it is really good in practice. As to why this approach
works we leave the proof to the reader (hint: show that the nodes i with d(i) > x and
nodes j with d(j) < x engender a cut and use maximum-flow-minimum-cut theorem).
Comparison of Improved and Unimproved versions
In this section we identify the worst case for both shortest augmenting path algorithms with the
purpose of comparing their running times.
In the worst case both improved and unimproved algorithms will perform O(n ) augmentations,
3
if m ~ n2. Norman Zadeh developed some examples on which this running time is based. Using
his ideas we compose a somewhat simpler network on which the algorithms have to
perform O(n ) augmentations and which is not dependent on a choice of next path.
3
Figure 1. Worst case example for the shortest augmenting path algorithm.
All vertexes except s and t are divided into four subsets: S={s1,…,sk}, T={t1,…,tk}, U={u1,
…,u2p} and V={v1,…,v2p}. Both sets S and T contain k nodes while both
sets U and V contain 2p nodes. k and p are fixed integers. Each bold arc (connecting S and T) has
unit capacity. Each dotted arc has an infinite capacity. Other arcs (which are solid and not
straight) have capacity k.
First, the shortest augmenting path algorithm has to augment flow k time along paths (s, S, T, t)
2
which have length equal to 3. The capacities of these paths are unit. After that the residual
network will contain reversal arcs (T, S) and the algorithm will chose another k augmenting paths
2
(s, u1, u2, T, S, v2, v1, t) of length 7. Then the algorithm will have to choose paths (s, u1, u2, u3, u4,
S, T, v4, v3, v2, v1, t) of length 11 and so on…
Now let’s calculate the parameters of our network. The number of vertexes is n = 2k + 4p + 2. The
number of edges is m = k + 2pk + 2k + 4p. As it easy to see, the number of augmentations is a = k2
2
(p+1).
Consider that p = k – 1. In this case n = 6k – 2 and a = k . So, one can verify that a ~ n / 216. In [4]
3 3
Zadeh presents examples of networks that require n / 27 and n / 12 augmentations, but these
3 3
network. A more revealing comparison is waiting for us at the end of the article.
Maximum Capacity Path Algorithm, O(n mlognU) / O(m lognU logn) / O(m lognU logU)
2 2 2
In 1972 Edmonds and Karp developed another way to find an augmenting path. At each step they
tried to increase the flow with the maximum possible amount. Another name of this algorithm is
“gradient modification of the Ford-Fulkerson method.” Instead of using BFS to identify a shortest
path, this modification uses Dijkstra’s algorithm to establish a path with the maximal possible
capacity. After augmentation, the algorithm finds another such path in the residual network,
augments flow along it, and repeats these steps until the flow is maximal.
There’s no doubt that the algorithm is correct in case of integral capacity. However, there are
tests with non-integral arc’s capacities on which the algorithm may fail to terminate.
Let’s get the algorithm’s running time bound by starting with one lemma. To understand the
proof one should remember that the value of any flow is less than or equal to the capacity of any
cut in a network . Let’s denote capacity of a cut (S,T) by c(S,T).
Lemma 3. Let F be the maximum flow’s value, then G contains augmenting path with capacity not
less than F/m.
Proof. Suppose G contains no such path. Let’s construct a set E’={ (i,j) in E: u ≥ F/m }. Consider ij
network G’ = (V, E’) which has no path from s to t. Let S be a set of nodes obtainable
from s in G and T = V \ S. Evidently, (S, T)is a cut and c(S, T) ≥ F. But cut (S, T) intersects only those
edges (i,j) in E which have u < F/m. So, it is clear that
ij
Consider, F = f + f +…+ f . Let F* be the maximum flow’s value. Under lemma 3 one can justify that
i 1 2 i
f ≥ (F*-F ) / m.
i i-1
Now we can estimate the difference between the value of the maximal flow and the flow
after i consecutive augmentations:
F* – F = F* - F – f ≤ F* – F – (F* - F ) / m = (1 – 1 / m) (F* – F ) ≤ … ≤ (1 – 1 / m) _ F*
i i-1 i i-1 i-1 i-1
i
We have to find such an integer i, which gives (1 – 1 / m) _ F* < 1. One can check that i
To find a path with the maximal capacity we use Dijkstra’s algorithm, which incurs additional
expense at every iteration. Since a simple realization of Dijkstras’s algorithm [2]
incurs O(n ) complexity, the total running time of the maximum capacity path algorithm
2
is O(n mlog(nU)).
2
Using a heap implementation of Dijkstra’s algorithm for sparse network [7] with running
time O(mlogn), one can obtain an O(m logn log(nU)) algorithm for finding the maximum flow. It
2
seems to be better that the improved Edmonds-Karp algorithm. However, this estimate is very
deceptive.
There is another variant to find the maximum capacity path. One can use binary search to
establish such a path. Let’s start by finding the maximum capacity path on piece [0,U]. If there is
some path with capacity U/2, then we continue finding the path on piece [U/2, U]; otherwise, we
try to find the path on [0,U/2-1]. This approach incurs additional O(mlogU) expense and
gives O(m log(nU)logU) time bound to the maximum flow algorithm. However, it works really
2
poorly in practice.
Capacity Scaling Algorithm, O(m logU)
2
In 1985 Gabow described the so-called “bit-scaling” algorithm. The similar capacity scaling
algorithm described in this section is due to Ahuja and Orlin [1].
Informally, the main idea of the algorithm is to augment the flow along paths with sufficient large
capacities, instead of augmenting along maximal capacities. More formally, let’s introduce a
parameter Delta. First, Delta is quite a large number that, for instance, equals U. The algorithm
tries to find an augmenting path with capacity not less that Delta, then augments flow along this
path and repeats this procedure while any such Delta-path exists in the residual network.
The algorithm either establishes a maximum flow or reduces Delta by a factor of 2 and continues
finding paths and augmenting flow with the new Delta. The phase of the algorithm that augments
flow along paths with capacities at least Delta is called Delta-scaling phase or, Delta-phase. Delta
is an integral value and, evidently, algorithm performs O(logU) Delta-phases. When Delta is equal
to 1 there is no difference between the capacity scaling algorithm and the Edmonds-Karp
algorithm, which is why the algorithm works correctly.
We can obtain a path with the capacity of at least Delta fairly easily - in O(m) time (by using BFS).
At the first phase we can set Delta to equal either U or the largest power of 2 that doesn’t exceed
U.
The proof of the following lemma is left to the reader.
Lemma 4. At every Delta-phase the algorithm performs O(m) augmentations in worst case.
Sketch to proof. Use the induction by Delta to justify that the minimum cut at each Delta-scaling
phase less that 2m Delta.
Applying lemma 4 yields the following result:
Theorem 4. Running time of the capacity scaling algorithm is O(m logU).
2
Keep in mind that there is no difference between using breadth-first search and depth-first search
when finding an augmenting path. However, in practice, there is a big difference, and we will see
it.
all other operations, as before, require O(nm) time. These reasoning instantly yield a bound
ofO(nmlogU) on the running time of the improved capacity scaling algorithm.
Unfortunately, this improvement hardly decreases the running time of the algorithm in practice.
Let’s start with the first group of tests. These are 564 sparse networks with number of vertexes
limited by 2000 (otherwise, all algorithms work too fast). All working times are given in
milliseconds.
Figure 3. Comparison on sparse networks. 564 test cases. m ≤ n1.4.
As you can see, it was a big mistake to try Dijkstra’s without heap implementation of the
maximum capacity path algorithm on sparse networks (and it’s not surprising); however, its heap
implementation works rather faster than expected. Both the capacity scaling algorithms (with
using DFS and BFS) work in approximately the same time, while the improved implementation is
almost 2 times faster. Surprisingly, the improved shortest path algorithm turned out to be the
fastest on sparse networks.
Now let’s look at the second group of test cases. It is made of 184 tests with middle density. All
networks are limited to 400 nodes.
It is very interesting to see how these algorithms run on dense networks. Let’s take a look — the
third group is made up of 200 dense networks limited by 400 vertexes.
Without any doubt, the improved implementation of Edmonds-Karp algorithm wins the game.
Second place is taken by the improved scaling capacity algorithm. And the scaling capacity with
BFS got bronze.
As to maximum capacity path, it is better to use one variant with heap; on sparse networks it
gives very good results. Other algorithms are really only good for theoretical interest.
As you can see, the O(nmlogU) algorithm isn’t so fast. It is even slower than the O(n m) algorithm.
2
The O(nm )algorithm (it is the most popular) has worse time bounds, but it works much faster
2
I would like to finish the article with the full implementation of the improved shortest augmenting
path algorithm. To maintain a network I use the adjacency matrix with purpose to providing best
understanding of the algorithm. It is not the same realization what was used during our practical
analysis. With the “help” of the matrix it works a little slower than one that uses adjacency list.
However, it works faster on dense networks, and it is up to the reader which data structure is best
for them.
#include
int rev_BFS() {
numbs[n]--;
d[sink] = 0;
numbs[0]++;
queue[ ++tail ] = sink;
// then continue
queue[ ++tail ] = j;
numbs[n]--;
d[j] = d[i] + 1;
numbs[d[j]]++;
return 0;
int Augment() {
tmp = G[j][i];
// Augmentation itself
return width;
int tmp;
int j, mind(n-1);
// to find nearest
// If there is an arc
// and j is "nearer"
mind = d[j];
numbs[d[i]]--;
d[i] = 1 + mind;
numbs[d[i]]++;
// Main procedure
int find_max_flow() {
int flow(0), i, j;
i = source;
// The main cycle (while the source is not "far" from the sink)
// and if it is an admissible
break;
if( j <= n ) {
i = j; // Go forward
else {
if( Retreat(i) == 0 )
break;
return flow;
}
// The main function
// No comments here
int main() {
int i, p, q, r;
G[p][q] += r;
printf("%d", find_max_flow());
return 0;
References
[1] Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows: Theory,
Algorithms, and Applications.
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest. Introduction to Algorithms.
[3] Ford, L. R., and D. R. Fulkerson. Maximal flow through a network.
[4] Norman Zadeh. Theoretical Efficiency of the Edmonds-Karp Algorithm for Computing Maximal
Flows.
[5] _efer_. Algorithm Tutorial: MaximumFlow.
[6] gladius. Algorithm Tutorial: Introduction to graphs and their data structures: Section 1.
[7] gladius. Algorithm Tutorial: Introduction to graphs and their data structures: Section 3.
[8] http://elib.zib.de/pub/mp-testdata/generators/index.html -- A number of generators for
network flow problems.
Contents
1 Definition
2 Relation to other problems
3 Usage
4 Solutions
5 External resources
6 References
Definition
Given a flow network , where edge has capacity . There are
commodities , defined by , where and is
the source and sink of commodity , and is the demand. The flow of commodity along
edge is . Find an assignment of flow which satisfies the constraints:
Capacity constraints:
Flow conservation:
Demand satisfaction:
In the maximum concurrent flow problem, the task is to maximise the minimal fraction
of the flow of each commodity to its demand:
Usage
Routing and wavelength assignment (RWA) in optical burst switching of Optical
Network would be approached via multi-commodity flow formulas.
Solutions
In the decision version of problems, the problem of producing an integer flow
satisfying all demands is NP-complete,[1] even for only two commodities and unit
capacities (making the problem strongly NP-complete in this case).
If fractional flows are allowed, the problem can be solved in polynomial time
through linear programming.[2] Or through (typically much faster) fully polynomial
time approximation schemes.[3]
The maximum flow problem can be seen as a special case of more complex network flow problems,
such as thecirculation problem. The maximum value of an s-t flow (i.e., flow from source s to sink t)
is equal to the minimum capacity of an s-t cut (i.e., cut severing s from t) in the network, as stated in
the max-flow min-cut theorem.
Contents
1 History
2 Definition
3 Solutions
4 Integral flow theorem
5 Application
o 5.1 Multi-source multi-sink maximum flow problem
o 5.2 Minimum path cover in directed acyclic graph
o 5.3 Maximum cardinality bipartite matching
o 5.4 Maximum flow problem with vertex capacities
o 5.5 Maximum edge-disjoint path
o 5.6 Maximum independent (vertex-disjoint) path
6 Real world applications
o 6.1 Baseball Elimination
o 6.2 Airline scheduling
o 6.3 Circulation-demand problem
o 6.4 Fairness in car sharing (carpool)
7 See also
8 References
9 Further reading
History
The maximum flow problem was first formulated in 1954 by T. E. Harris and F. S. Ross as a
simplified model of Soviet railway traffic flow.[1][2][3] In 1955, Lester R. Ford, Jr. andDelbert R.
Fulkerson created the first known algorithm, the Ford–Fulkerson algorithm.[4][5]
Over the years, various improved solutions to the maximum flow problem were discovered, notably
the shortest augmenting path algorithm of Edmonds and Karp and independently Dinitz; the blocking
flow algorithm of Dinitz; the push-relabel algorithm of Goldberg and Tarjan; and the binary blocking
flow algorithm of Goldberg and Rao. The electrical flow algorithm of Christiano, Kelner, Madry, and
Spielman finds an approximately optimal maximum flow but only works in undirected graphs.[6][7]
Definition
A flow network, with source s and sink t. The numbers next to the edge are the capacities.
Let be a network with being the source and the sink of respectively.
1. , for each (capacity constraint: the flow of an edge cannot exceed its
capacity)
2. , for each (conservation of flows: the sum of
the flows entering a node must equal the sum of the flows exiting a node, except for the
source and the sink nodes)
The value of flow is defined by , where is the source of . It represents
the amount of flow passing from the source to the sink.
Solutions
We can define the Residual Graph, which provides a systematic way to search for
forward-backward operations in order to find the maximum flow.
Given a flow network , and a flow on , we define the residual graph of with
respect to as follows.
The following table lists algorithms for solving the maximum flow problem.
Method Complexity Description
A specialization of Ford–
Edmonds–Karp
O(VE2) Fulkerson, finding augmenting
algorithm
paths with breadth-first search.
If each edge in a flow network has integral capacity, then there exists an integral
maximal flow.
Application
Multi-source multi-sink maximum flow problem
Given a network N = (V,E) with a set of sources S = {s1, ..., sn} and a set of sinks T =
{t1, ..., tm} instead of only one source and one sink, we are to find the maximum flow
across N. We can transform the multi-source multi-sink problem into a maximum
flow problem by adding a consolidated source connecting to each vertex in S and
a consolidated sink connected by each vertex in T (also known
as supersourceand supersink) with infinite capacity on each edge (See Fig. 4.1.1.).
Then it can be shown, via König's theorem, that G' has a matching of size m if and
only if there exists n-m vertex-disjoint paths that cover each vertex in G, where n is
the number of vertices in G. Therefore, the problem can be solved by finding the
maximum cardinality matching in G' instead.
Then the value of the maximum flow in N is equal to the size of the maximum
matching in G.
In other words, the amount of flow passing through a vertex cannot exceed its
capacity. To find the maximum flow across , we can transform the problem into
the maximum flow problem in the original sense by expanding . First,
each is replaced by and , where is connected by edges going
into and is connected to edges coming out from , then assign
capacity to the edge connecting and (see Fig. 4.4.1, but note that it
has incorrectly swapped and ). In this expanded network, the vertex
capacity constraint is removed and therefore the problem can be treated as the
original maximum flow problem.
Then the value of the maximum flow is equal to the maximum number of
independent paths from s to t.
Let G = (V, E) be a network with s,t ∈ V being the source and the sink respectively.
We add a game node {i,j} with i < j to V, and connect each of them from s by an
edge with capacity rij — which represents the number of plays between these two
teams. We also add a team node for each team and connect each game node {i,j}
with to team nodes i and j to ensure one of them wins. We do not need to restrict
the flow value on these edges. Finally, we make edges from team node i to the sink
node t and set the capacity of wk+rk–wi to prevent team i from winning more
than wk+rk. Let S be the set of all team participating in the league and
let . In this method it is claimed team k is not
eliminated if and only if a flow value of size r(S - {k}) exists in network G. In the
mentioned article it is proved that this flow value is the maximum flow value
from s to t.
Airline scheduling
In the airline industry a major problem is the scheduling of the flight crews. Airline
scheduling problem could be considered as an application of extended maximum
network flow. The input of this problem is a set of flights F which contains the
information about where and when each flight departs and arrives. In one version of
Airline Scheduling the goal is to produce a feasible schedule with at most k crews.
Let G = (V, E) be a network with s,t ∈ V as the source and the sink nodes. For the
source and destination of every flight i we add two nodes to V, node si as the source
and nodedi as the destination node of flight i. We also add the following edges to E:
In the mentioned method, it is claimed and proved that finding a flow value
of k in G between s and t is equal to finding a feasible schedule for flight set F with
at most k crews.[11]
Circulation-demand problem
There are some factories that produce goods and some villages where the goods
have to be delivered. They are connected by a networks of roads with each road
having a capacity for maximum goods that can flow through it. The problem is to
find if there is a circulation that satisfies the demand. This problem can be
transformed into a max-flow problem.
1. Add a source node and add edges from it to every factory node with
capacity where is the production rate of factory .
2. Add a sink node and add edges from all villages to with capacity
where is the demand rate of village .
Let G = (V, E) be this new network. There exists a circulation that satisfies the
demand if and only if :
If there exists a circulation, looking at the max-flow solution would give us the
answer as to how much goods have to be send on a particular road for
satisfying the demands.
References[edit]
1. Jump up^ Schrijver, A. (2002). "On the history of the transportation and
maximum flow problems". Mathematical Programming 91 (3): 437–
445. doi:10.1007/s101070100259. edit
Contents
1 Definition
2 Relation to other problems
3 Solutions
4 Application
o 4.1 Minimum weight bipartite matching
5 See also
6 References
7 External links
Definition
Given a flow network, that is, a directed graph with source and sink ,
where edge has capacity , flow and cost (most
minimum-cost flow algorithms support edges with negative costs). The cost of sending this flow
is . You are required to send an amount of flow from to .
Skew symmetry:
Flow conservation:
Required flow:
With some solutions, finding the minimum cost maximum flow instead is straightforward. If
not, you can do a binary search on .
A related problem is the minimum cost circulation problem, which can be used for solving
minimum cost flow. You do this by setting the lower bound on all edges to zero, and then
make an extra edge from the sink to the source , with capacity and lower
bound , forcing the total flow from to to also be .
Solutions
The minimum cost flow problem can be solved by linear programming, since we optimize a
linear function, and all constraints are linear.
Apart from that, many combinatorial algorithms exist, for a comprehensive survey, see [1].
Some of them are generalizations of maximum flow algorithms, others use entirely different
approaches.
Application
Minimum weight bipartite matching
Reducing Minimum weight bipartite matching to minimum cost max flow problem
Given an bipartite graph G = (A ∪ B, E), we would like to find the maximum cardinality
matching in G that has minimum cost. Let w: E → Rbe a weight function on the edges of E.
The minimum weight bipartite matching problem or assignment problem is to find a perfect
matching M ⊆ E whose total weight is minimized. The idea is to reduce this problem to a
network flow problem.
Let G’ = (V’ = A ∪ B, E’ = E). Assign the capacity of all the edges in E’ to 1. Add a source
vertex s and connect it to all the vertices in A’ and add a sink vertex t and connect all
vertices inside group B’ to this vertex. The capacity of all the new edges is 1 and their costs
is 0. It is proved that there is minimum weight perfect bipartite matching in G if and only if
there a minimum cost flow in G’. [7]
References
1. ^ Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin (1993). Network
Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc. ISBN 0-13-617549-
X.
2. ^ Morton Klein (1967). "A primal method for minimal cost flows with applications to
the assignment and transportation problems". Management Science 14: 205–
220.doi:10.1287/mnsc.14.3.205.
3. ^ Andrew V. Goldberg and Robert E. Tarjan (1989). "Finding minimum-cost
circulations by canceling negative cycles". Journal of the ACM 36 (4): 873–
886.doi:10.1145/76359.76368.
4. ^ Jack Edmonds and Richard M. Karp (1972). "Theoretical improvements in
algorithmic efficiency for network flow problems". Journal of the ACM 19 (2): 248–
264.doi:10.1145/321694.321699.
5. ^ Andrew V. Goldberg and Robert E. Tarjan (1990). "Finding minimum-cost
circulations by successive approximation". Math. Oper. Res. 15 (3): 430–
466.doi:10.1287/moor.15.3.430.
6. ^ James B. Orlin (1997). "A polynomial time primal network simplex algorithm for
minimum cost flows". Mathematical Programming 78: 109–
129.doi:10.1007/bf02614365.
Semi-infinite programming
In optimization theory, semi-infinite programming (SIP) is an optimization problem with a finite
number of variables and an infinite number of constraints, or an infinite number of variables and a
finite number of constraints. In the former case the constraints are typically parameterized.[1]
Contents
where
SIP can be seen as a special case of bilevel programs (multilevel programming) in which the lower-
level variables do not participate in the objective function.