Documente Academic
Documente Profesional
Documente Cultură
Segmentation
C. V. Jawahar and Girish Varma
IIIT Hyderabad
The 3 core problems
Reconstruction
Machine
Learning
Goal:
Figure 7: Some examples of RCF. From top to bottom: BSDS500 [2], NYUD [49], Multicue-Boundary [41], and Multicue-
Richer convolutional
Edge [41]. From left tofeatures for image,
right: origin edge detection, CVPR
ground truth, RCF2017
edge map, origin image, ground truth, and RCF edge map.
Another View Point: Finer Understanding
American Bulldog
Motorcycle
No objects, just pixels Single Object Multiple Object This image is CC0 public domain
Semantic Segmentation
§ Semantic Segmentation
§ Labelling every pixel in an image
§ A key part of Scene Understanding
§ Applications
§ Autonomous navigation
§ Assisting the partially sighted
§ Medical diagnosis
§ Image editing
11
(Clockwise from top) [1] Cityscapes Dataset. [2] ISBI Challenge 2015, dental x-ray images. [3] Royal National Institute of Blind People
A quick tour of
“Segmentation”
Age Old Methods
Example: assume known
probability distributions
P1 = N ( µ1 , s )
P2 = N ( µ 2 , s )
r³0 Û I ³T
µ1 + µ 2
T=
μ2 T μ1 2
Thresholding could be derived as
statistical decision: likelihood ratio test
P1 ( I p ) P1 and P2 are
rp := log object and background
P0 ( I p ) known color models
rp ³ 0 Þ pixel p is object
rp < 0 Þ pixel p is background
Segmentation as clustering (Unsupervised Learning)
Distance based
on color and
position
Source: K. Grauman
Segmentation Unsupervised
K Means
K Means Algorithm
• There are K clusters {1,2, …, K} with means µ1,…, µK
k=4 k=4
K Means
Normalized Cut
j
wij
i
j
wij
i
A B C
A B
Source: S. Seitz
Normalized cut algorithm
GraphCut
t
n-links a cut
D p (s )
t-link
S
w pq
t-link
S
D p (t )
segmentation Û cut s
S p Î {0,1}
E(S) =
cost(cut) åD å
(1)D+ ( Så)D
pÎS
p
p
p p
pÎS
p (0) + åw pq × [ S p ¹ Sq ]
pqÎN
cost of severed t-links cost of severed n-links
unary terms pair-wise terms
regional properties of S boundary smoothness for S
Segmentation as Graph Cut
Boykov (2001)
• Cut: separating source and sink
• Energy : collection of edges
• Min Cut: Global minimal energy in
polynomial time under some regularities
Example
Associative Potentials:
pay a cost when neighbouring
pixels are different
GraphCut
Cons:
• Not always globally optimal (Theorem / Appropriate potentials )
• Approximate for multi-label (example: α-expansion)
Learning from Humans
Grab Cut [Rother et al. 2004]
Details: Unary/pairwise
Grab Cut using Iterated Graph Cuts
Gu
ara
?
co nte
nv ed
er
ge to
User Initialisation 1 2 3 4
Energy after each Iteration
Iterated
graph cut
Initial
Rectangle
Initial
Result
Input image
Training of
Potentials
(Learning)
MAP
(Inference)
Final segmentation
Semantic Segmentation
Summary
• Supervision: No supervision, Human interaction, Annotated examples
of pixels and superpixels. Weaker supervision from multiple images,
Use of Prior
Overview of CNNs
Data = 3D tensors 6
There is a vector of feature channels (e.g. RGB) at each spatial location (pixel). Example: convolution layer
W channels
H =
W
✱
3D
tensor H
=
Linear / non-linear chains 9
Σ S
Σ S …
Σ S
x y
Input: Image
Output: P(c) (A vector/distribution of probabilities)
Semantic Segmentation Idea: Fully Convolutional
Input:
Scores: Predictions:
3xHxW
CxHxW HxW
Convolutions:
DxHxW
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 22 May 10, 2017
In general,
Conv1
Convolutional Encoder Convolutional Decoder Deconv1 Pred
Conv2 Deconv2
Conv3 Deconv3
Conv4 Deconv4
Conv5 Deconv5
Conv6 Deconv6
Max pooling
Max pooling Unpooling
Max pooling Unpooling
Max pooling Unpooling
Max pooling Unpooling
Unpooling
motivated by efficient object detection. One of their draw- and introduce a light-weighted decoder. The first layer of
backs is that bounding boxes usually cannot provide accu- decoder “deconv6” is designed for dimension reduction that
rate object localization. More related to our work is gener- projects 4096-d “conv6” to 512-d with 1⇥1 kernel so that
ating segmented object proposals [4, 10, 14, 24, 26, 29, 43]. we can re-use the pooling switches from “conv5” to upscale
At the core of segmented object proposal algorithms is con- the feature maps by twice in the following “deconv5” layer.
More (wait for GANs)
More (Wait for GANs)
Summary/History/Relationships
• Initial Methods
• Simple unsupervised learning/Clustering/Partitioning
• Introduction of spatial relationship and formalisms
• Graphs, Graph Cuts and Energy Minimization Frameworks
• Learning from “one” image
• Learn unary potential; human; iterate;
• Learn from many labelled examples
• Popular semantic/instance segmentations.
• Finer understanding of the visual content
• Input: Image. Output: Image
• Many low-level vision tasks in the same framework
• Low-level vision, Segmentation, Generation, ??
More on Newer
Methods/Trends