Slides9 24

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Thursday, September 24, 2015
Outline
1 Recap
2 Applications of DFS
Cycle detection
Topological sorting
Strongly connected components
Today
1 Recap
Cycle detection
Topological sorting
Review of the last lecture
1. Applications of BFS
I
I
Connected components in undirected graphs

Testing bipartiteness
2. DFS
I
Classification of graph edges in directed graphs: back,

forward, cross
Time intervals of vertices, identifying the type of an edge
from the time intervals of its endpoints
Finding your way in a maze
Depth-first search (DFS): starting from a vertex s, explore

the graph as deeply as possible, then backtrack
1. Try the first edge out of s, towards some node v.
2. Continue from v until you reach a dead end, that is a node
whose neighbors have all been explored.
3. Backtrack to the first node with an unexplored neighbor
and repeat 2.
Remark: DFS answers s-t connectivity
Directed graphs: classification of edges
Graph edges that do not belong to the DFS tree(s) may be

1. forward: from a vertex to a descendant (other than a child)
2. back: from a vertex to an ancestor
3. cross: from right to left (no ancestral relation), that is
I
I
from tree to tree

between nodes in the same tree but on different branches
On the time intervals of vertices u, v
If we use an explicit stack, then

I
start(u) is the time when u is pushed in the stack
f inish(u) is the time when u is popped from the stack

(that is, all of its neighbors have been explored).
Intervals [start(u), f inish(u)] and [start(v), f inish(v)] either

I
contain each other (u is an ancestor of v or vice versa); or
they are disjoint.
Classifying edges using time
1. Edge (u, v) E is a back edge in a DFS tree if and only if

start(v) < start(u) < f inish(u) < f inish(v).
2. Edge (u, v) E is a forward edge if
start(u) < start(v) < f inish(v) < f inish(u).
3. Edge (u, v) E is a cross edge if
start(v) < f inish(v) < start(u) < f inish(u).
Today
1 Recap
Cycle detection
Topological sorting
Application I: Cycle detection

Claim 1.
G = (V, E) has a cycle if and only if DFS(G) yields a back edge.
Proof.
If (u, v) is a back edge, together with the path on the DFS tree
from v to u, it forms a cycle.
Conversely, suppose G has a cycle. Let v be the first vertex
from the cycle discovered by DFS(G). Let (u, v) be the
preceding edge in the cycle. Since there is a path from v to
every vertex in the cycle, all vertices in the cycle are now
discovered and fully explored before v is popped from the
stack. Hence the interval of u is contained in the interval of v.
By Claim 1, (u, v) is a back edge.
Application II: Topological sorting in DAGs
An undirected acyclic graph has an extremely simple

structure: it is a tree, hence a sparse graph (O(n) edges).
A directed acyclic graph (DAG) may be dense ((n2 )

edges): e.g., V = {1, . . . , n}, E = {(i, j) if i < j }.
Topological sorting: motivation

Input:
I
a set of tasks {1, 2, . . . , n} that need to be performed
a set of dependencies, each of the form (i, j), indicating

that task i must be performed before task j.
Output: a valid order in which the tasks may be performed, so

that all dependencies are respected.
Example: tasks are courses and certain courses must be taken
before others.
How can we model this problem using a graph? What kind of
graph must arise and why?
Topological ordering: definition
Definition 1.
A topological ordering of G is an ordering of its nodes as
1, 2, . . . , n such that for every edge (i, j), we have i < j.
All edges point forward in the topological ordering.
It provides an order in which all tasks can be safely

performed: when we try to perform task j, all tasks
required to precede it have already been done.
Example of DAG and its topological sorting

2
A DAG (top left), its topological sort (top right) and a drawing
emphasizing the topological sort (bottom).
Topological sorting in DAGs
Claim 2.
If G has a topological ordering, then G is a DAG.
Proof: By contradiction (exercise).
A visualization of the proof is provided by the linearized graph
of the previous slide: vertices appear in increasing order, edges
go from left to right, hence no cycles.
Is the converse true: does every DAG have a topological
ordering? And how can we find it?
Structural properties of DAGs

In a DAG, can every vertex have
I
an outgoing edge?
an incoming edge?
Definition 2 (source and sink).

A source is a node with no incoming edges.
A sink is a node with no outgoing edges.
Fact 3.
Every DAG has at least one source and at least one sink.
How can we use Fact 3 to find a topological order?

The node that we label first in the topological sorting must have
no incoming edges. Fact 3 guarantees that such a node exists.
Fact 4.
Let G0 be the graph after a source node and its adjacent edges
have been removed. Then G0 is a DAG.
Proof: removing edges from G cannot yield a cycle!
This gives rise to a recursive algorithm for finding the
topological order of a DAG. Its correctness can be shown by
induction (use Facts 3, 4 to show induction step).
Algorithm for topological sorting
TopologicalOrder(G)
1. Find a source vertex s and order it first.
2. Delete s and its adjacent edges from G; let G0 be the new
graph.
3. TopologicalOrder(G0 )
4. Append the order found after s.
Running time: O(n2 ). Can be improved to O(n + m).
Topological sorting via DFS
Let G = (V, E) be a DAG.

I
Run DFS(G); compute f inish times.
Process the tasks in decreasing order of f inish times.
Running time: O(m + n)
Intuition behind this algorithm
The task v with the largest f inish has no incoming edges

(if it had an incoming edge from some other task u, then u
would have the largest f inish). Hence v does not depend
on any other task and it is safe to perform it first.
The same reasoning shows that the task w with the second
largest f inish has no incoming edges from any other task
except (maybe) task v. Hence it is safe to perform w
second.
And so on and so forth.
Formal proof of correctness

By Claim 1 there are no back edges in the DFS forest of a
DAG. Thus every edge (u, v) E is either
1. forward/tree: start(u) < start(v) < f inish(v) < f inish(u)
s
2. or cross edge: f inish(v) < start(u) < f inish(u)

s
u
Proof of correctness (contd)
Hence for every (u, v) E, f inish(v) < f inish(u).

Consider a task v. All tasks u upon which v depends, that is,
all tasks u such that there is an edge (u, v) E, satisfy
f inish(v) < f inish(u).
Since we are processing tasks in decreasing order of finish times,
all tasks u upon which v depends have already been processed
before we start processing v.
Exploring the connectivity of a graph
Undirected graphs: find all connected components
Directed graphs: find all strongly connected components

(SCCs)
I
SCC(u) = set of nodes that are reachable from u and have

a path back to u
SCCs provide a hierarchical view of the connectivity of the

graph:
I
on a top level, the meta-graph of SCCs has a useful and

simple structure (coming up);
each meta-vertex of this graph is a fully connected
subgraph that we can further explore.
How can we find SCC(u) using BFS?
1. Run BFS(u); the resulting tree T consists of the set of

nodes to which there is a path from u.
2. Define Gr as the reverse graph, where edge (i, j) becomes
edge (j, i).
3. Run BFS(u) in Gr ; the resulting BFS tree T 0 consists of the
set of nodes that have a path to u.
4. The common vertices in T , T 0 compose the strongly
connected component of u.
What if we want all the SCCs of the graph?
The meta-graph of SCCs of a directed graph
Consider the meta-graph of all SCCs of G.

I
Make a (super)vertex for every SCC.
Add a (super)edge from SCC Ci to SCC Cj if there is an

edge from some vertex u of Ci to some vertex v of Cj .
What kind of graph is the meta-graph of SCCs?
The meta-graph of SCCs of a directed graph
C1
1
3
C2
6
4
C3
Consider the meta-graph of all SCCs of G.

I
Make a (super)vertex for every SCC.
Add a (super)edge from SCC Ci to SCC Cj if there is an

edge from some vertex u of Ci to some vertex v of Cj .
This graph is a DAG.
Is there an SCC we could process first?
Suppose we had a sink SCC of G, that is, an SCC with no

outgoing edges.
1. What will DFS discover starting at a node of a sink SCC?
2. How do we find a node that for sure lies in a sink SCC?
3. How do we continue to find all other SCCs?
Easier to find a node in a source SCC!

Fact 5.
The node assigned the largest f inish time when we run DFS(G)
belongs to a source SCC in G.
Example: v5 belongs to source SCC C2 .
Proof.
We will use Lemma 6 below. Let G be a directed graph. The
meta-graph of its SCCs is a DAG. For an SCC C, let
f inish(C) = max f inish(v)
vC
Example: f inish(C1 ) = f inish(v1 ) = 8.
Lemma 6.
Let Ci , Cj be SCCs in G. Suppose there is an edge (u, v) E
such that u Ci and v Cj . Then f inish(Ci ) > f inish(Cj ).
Gr is useful again
Fact 5 provides a direct way to find a node in a source SCC

of G: pick the node with largest f inish.
But we want a node in a sink SCC of G!
Consider Gr , the graph where the edges of G are reversed.

How do the SCCs of G and Gr compare?
Run DFS on Gr : the node with the largest f inish comes

from a source SCC of Gr (Fact 5). This is a sink SCC of G!
Using this observation to find all SCCs
We now know how to find a sink SCC in G.

1. Run DFS(Gr ); compute f inish times.
2. Run DFS(G) starting from the node with the largest f inish:
the nodes in the resulting tree T form a sink SCC in G.
How do we find all remaining SCCs?
I
Remove T from G; let G0 be the resulting graph.
The meta-graph of SCCs of G0 is a DAG, hence it has at

least one sink SCC.
Apply the procedure above recursively on G0 .
Algorithm for finding SCCs in directed graphs

SCC(G = (V, E))
1. Compute Gr .
2. Run DFS(Gr ); compute f inish(u) for all u.
3. Run DFS(G) in decreasing order of f inish(u).
4. Output the vertices of each tree in the DFS forest of line 3
as an SCC.
Remark 1.
1. Running time: O(n + m) why?
2. Equivalently, we can (i) run DFS(G), compute f inish times;
(ii) run DFS(Gr ) by decreasing order of f inish. Why?
A directed graph and its DFS forest with time intervals
1 (1,8)
2 (2,5)
3 (3,4)
5
4 (6,7)
(9,14)
6 (10,13)
7 (11,12)
DFS forest of Gr ; nodes are considered by decreasing

f inish times
(8)
v
(14)
v
(13)
v
(4)
v
v (5)
v (7)
v (12)
Still need to prove Lemma 6
Let G be a directed graph. The meta-graph of its SCCs is a

DAG.
For an SCC C, let
f inish(C) = max f inish(v)
vC
Lemma 7.
Let Ci , Cj be SCCs in G. Suppose there is an edge (u, v) E
such that u Ci and v Cj . Then f inish(Ci ) > f inish(Cj ).
Proof of Lemma 6
There are two cases to consider:

1. start(u) < start(v) (DFS starts at Ci )
I
Before leaving u, DFS will explore edge (u, v).
Since v Cj , all of Cj will now be explored.
Since there is no edge from Cj back to Ci (DAG!), all

vertices in Cj will be assigned f inish times before DFS
backtracks to u and assigns a f inish time to u. Thus
f inish(Cj ) < f inish(u) f inish(Ci )
Proof of Lemma 6 (contd)
2. start(u) > start(v) (DFS starts at Cj )

Since there is no edge from Cj to Ci , DFS will finish
exploring Cj before it restarts from some vertex that will
result in discovery of Ci . Thus
f inish(Cj ) < start(u) < f inish(u)
f inish(Cj ) < f inish(Ci )

Slides9 24

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Slides9 24

Încărcat de

Drepturi de autor:

Formate disponibile

Algorithms for Data Science

Review of the last lecture

Connected components in undirected graphs

Classification of graph edges in directed graphs: back,

Finding your way in a maze

Depth-first search (DFS): starting from a vertex s, explore

Directed graphs: classification of edges

Graph edges that do not belong to the DFS tree(s) may be

from tree to tree

On the time intervals of vertices u, v

If we use an explicit stack, then

start(u) is the time when u is pushed in the stack

f inish(u) is the time when u is popped from the stack

Intervals [start(u), f inish(u)] and [start(v), f inish(v)] either

contain each other (u is an ancestor of v or vice versa); or

they are disjoint.

Classifying edges using time

1. Edge (u, v) E is a back edge in a DFS tree if and only if

Application I: Cycle detection

Application II: Topological sorting in DAGs

An undirected acyclic graph has an extremely simple

A directed acyclic graph (DAG) may be dense ((n2 )

Topological sorting: motivation

a set of tasks {1, 2, . . . , n} that need to be performed

a set of dependencies, each of the form (i, j), indicating

Output: a valid order in which the tasks may be performed, so

Topological ordering: definition

All edges point forward in the topological ordering.

It provides an order in which all tasks can be safely

Example of DAG and its topological sorting

Topological sorting in DAGs

Structural properties of DAGs

Definition 2 (source and sink).

How can we use Fact 3 to find a topological order?

Algorithm for topological sorting

Running time: O(n2 ). Can be improved to O(n + m).

Topological sorting via DFS

Let G = (V, E) be a DAG.

Run DFS(G); compute f inish times.

Process the tasks in decreasing order of f inish times.

Running time: O(m + n)

Intuition behind this algorithm

The task v with the largest f inish has no incoming edges

And so on and so forth.

Formal proof of correctness

2. or cross edge: f inish(v) < start(u) < f inish(u)

Proof of correctness (contd)

Hence for every (u, v) E, f inish(v) < f inish(u).

Exploring the connectivity of a graph

Undirected graphs: find all connected components

Directed graphs: find all strongly connected components

SCC(u) = set of nodes that are reachable from u and have

SCCs provide a hierarchical view of the connectivity of the

on a top level, the meta-graph of SCCs has a useful and

How can we find SCC(u) using BFS?

1. Run BFS(u); the resulting tree T consists of the set of

The meta-graph of SCCs of a directed graph

Consider the meta-graph of all SCCs of G.

Make a (super)vertex for every SCC.

Add a (super)edge from SCC Ci to SCC Cj if there is an

What kind of graph is the meta-graph of SCCs?

The meta-graph of SCCs of a directed graph

Consider the meta-graph of all SCCs of G.