Documente Academic
Documente Profesional
Documente Cultură
CVPR 2011
June 24, 2011
Matti Pietikäinen
Property
Transformation Pattern Contrast
Gray scale no effect affects
Thus,
1) contrast is of no interest in gray scale invariant analysis
2) often we need a gray scale and rotation invariant pattern measure
51 -19 0
47 65 -23 -5 0 0
62 70 70 -8 0 0 1
80 78 10 8 1 1
83 13 1
g3 g2 g1 g2
g3 g1
g4 R g0 g4 R g0
gc gc
g5 g7
g5 g6 g7 g6
T ~ t(gc) t(g0-gc«JP-1-gc)
T ~ t(g0-gc«JP-1-gc)
T ~ t(s(g0-gc), «, s(gP-1-gc))
where
1, x t 0
s(x) = { 0, x < 0
Above is transformed into a unique P-bit pattern code by assigning
binomial coefficient 2p to each sign s(gp-gc):
P-1
LBPP,R = 6 s (gp-gc) 2p
p=0
µ8QLIRUP¶SDWWHUQV
µ8QLIRUP¶SDWWHUQV3
U=0
U=2
([DPSOHVRIµQRQXQLIRUP¶SDWWHUQV3
1 = black
0 = white
edge (15)
rotation
mapping
(15)
Uniform
patterns
Bit patterns with 0
or 2 transitions
0ĺ1 or 1ĺ0
when the pattern is
considered circular
All non-uniform
patterns assigned to
a single bin
1 P-1
VARP,R= - 6(gp - m)2
P p=0
where
P-1
1
m= -
P
6 gp
p=0
VARP,R
invariant wrt. gray scale shifts (but not to any monotonic transformation like LBP)
invariant wrt. rotation along the circular neighborhood
Nonuniform quantization
Every bin have the same amount of total data
Highest resolution of the quantization is used where the number of entries
is largest
equal area
total distribution
cut values
LBPP,Rriu2
P+1
0 1 2 3 4 5 6 7 ...
LBPP,Rriu2
Joint histogram of
two operators
B-1
0 1 2 3 4 5 6 7 ...
VARP,R LBPP,Rriu2 / VARP,R
MACHINE VISION GROUP
Multiscale analysis
N
LN = 6 L n e.g. LBP8,1riu2 + LBP8,3riu2 + LBP8,5riu2
n=1
The bins of the LBP feature distribution can also be used directly as
features e.g. for SVM classifiers.
Rotation revisited
Rotation of an image by Į degrees
Translates each local neighborhood to a new location
Rotates each neighborhood by Į degrees
P 1
H (n, u ) ¦ h (U
r 0
I P (n, r ))e i 2Sur / P
. .
. .
. .
Fourier
magnitudes
Region Descriptor
Input CS-LBP
Region Features
Feature
y y
x x
y x
Dynamic textures
An extension of texture to the temporal domain
Encompass the class of video sequences that
exhibit some stationary properties in time
Sampling in volume
Thresholding
Multiply
Pattern
4
x 10
10
Concatenated LBP
VLBP
5
0
0 2 4 6 8 10 12 14 16
P: Number of Neighboring Points
Y
-1
-2
-3
1
3
2
0 1
0
-1
-2
T -1 -3
3 X
1 1
1
0 0
T
T
0
Y
-1
-2 -1 -1
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3 Y
-3 X
-3 -2 -1 0 1 2 3
X
LBP-TOP
0.15
0.4
0.1
0.2 0.05
0
0 0 100 200 300 400 500 600 700 800
5 10 15 20 25 30
Matti Pietikäinen
Edge dection
- used to enhance the gradient information
Neighborhood topology
Local tenary patterns - encoding by 3 values [Tan & Triggs, AMGF 2007]
1 1 0
Ternary code:1100(-1)(-1)00 0 0
0 0 0
1 1 0
0 0
Binary code: 00001100
0 -1 -1
0 0 0
0 0
0 1 1
Enlongated quinary patterns (EPQ) [Nanni et al., Artif. Intell. Med. 2010]
Feature selection e.g. with AdaBoost to reduce the number of bins [Zhang
et al., LNCS 2005]
Learning the most dominant and discriminative patterns [Liao et al., IEEE
TIP 2009], [Guo et al., ACCV 2010]
Uniform
Input image
LBP histogram
LBPV puts the local contrast into 1-dimensional histogram [Guo et al.,
Pattern Recogn. 2010]
a) 3 x 3 sample block
b) The local differences
c) The sign component
d) The magnitude component
Composed of
excitation
and orientation
components
Janne Heikkilä
Two approaches:
± restoration (deblurring)
± using blur-insensitive features
r=0,1,2.
Classification accuracy
60
40
Methods:
20
LBP8,3
Gabor
LBP: [Ojala et al., PAMI 0
0 0.5 1 1.5 2
We notice that
Original
Phase
remains
unchanged
Blurred
(circular
blur d=4)
(Gabor filter)
STFT is separable
± It can be implemented with 1-D convolutions.
where a = 1/m.
The coefficients are quantized using: Im
2 3
that results in a two-bit code. 0 1 Re
LPQ
(uniform)
LBP
Gabor
filter bank
m=7
Image model
Neighboring pixels are strongly correlated in natural images.
Let denote the pixel positions
in an m by m image patch
where
where
and
where
and
If L = 4, D is an 8 by 8 matrix.
We can employ whitening transform
8-bit integers obtained are used in the same way as in the basic
LPQ.
Reference: Ojansivu V & Heikkilä J (2008) Blur insensitive texture classification using local
phase quantization. Image and Signal Processing, ICISP 2008 Proceedings, Lecture Notes
in Computer Science 5099:236-243.
MACHINE VISION GROUP
Gabor
filter bank
m = 7,
U = 0.9
Reference: Ahonen T, Rahtu E, Ojansivu V & Heikkilä J (2008) Recognition of blurred faces
using local phase quantization. Proc. 19th International Conference on Pattern Recognition
(ICPR 2008), Tampa, FL, 4p.
MACHINE VISION GROUP
Rotation invariant LPQ
We define blur insensitive local characteristic orientation by
Spatio-temporal model
The covariance between the pixels values in a 3-D volume is
assumed to follow the exponential model:
where
and
LPQ-TOP
First proposed by Jiang et al. [FG 2011].
Similar to LBP-TOP
LBP has been replaced with LPQ.
LPQ is computed in three planes
XY, XT, and YT.
Histograms are concatenated.
Experimental results
Correlation coefficients Us = 0.1, Ut = 0.1, uniform weighting,
window size 5 x 5 x 5.
Spatially
and
temporally
varying blur:
Matti Pietikäinen
Ahonen et al. proposed (ECCV 2004, PAMI 2006) a face descriptor based
on LBPs
This method has already been adopted by many leading scientists and
groups
Computationally very simple, excellent results in face recognition and
authentication, face detection, facial expression recognition, gender
classification
Ahonen T, Hadid A & Pietikäinen M (2006) Face description with local binary
patterns: application to face recognition. IEEE Transactions on Pattern Analysis
and Machine Intelligence 28(12):2037-2041. (an early version published at
ECCV 2004)
A facial description for face recognition:
130 * 150
Feature vector length Æ 2891
MACHINE VISION GROUP
Data set
The images used for training and testing were downloaded from the
PASCAL Visual Object Classes (VOC2007) database and the Flickr
site
± All the images were manually labeled and resized to QVGA
(320x240).
Training: 1115 landscape images and
2617 non-landscape images
Testing: 912 and 2140, respectively
Classification results
Landscape as Non-landscape as
landscape non-landscape
(TP) (TN)
Landscape
Non-landscape
as
as
non-landscape
landscape
(FN)
(FP)
Demo videos
Reference: Huttunen S, Rahtu E, Kunttu I, Gren J & Heikkilä J (2011) Real-time detection of
landscape scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS,
6688:338-347.
Background modeling
The goal is to construct and maintain a statistical representation of the
scene that the camera sees.
Foreground Detection
The comparison of the input frame with the current background model.
The areas of the input frame that do not fit to the background model are
considered as foreground.
x1
x2
xK
Psychological studies [Bassili 1979], have demonstrated that humans do a better job in
recognizing expressions from dynamic images as opposed to the mug shot.
(a) Block volumes (b) LBP features (c) Concatenated features for one block volume
from three orthogonal planes with the appearance and motion
MACHINE VISION GROUP
Database
Cohn-Kanade database :
97 subjects
374 sequences
Age from 18 to 30 years
Sixty-five percent were female, 15 percent were African-American,
and three percent were Asian or Latino.
Low resolution
No eye detection
Translation, in-plane and out-of-
plane rotation, scale
Illumination change
Robust with respect to errors in
face alignment
2) Use of different radii which can catch the occurrences in different space and
time scales
System overview
LBP-XT images
Speaker-independent:
100
Recognition results (%)
80
60
40
7KHSKUDVHV´H[FXVHPH´DQG´,DPVRUU\´DUH
different throughout the whole utterance, and the
selected features also come from the whole
pronunciation.
6HOHFWHGVOLFHVIRUSKUDVHV´([FXVHPH´DQG´,DPVRUU\´
MACHINE VISION GROUP
Activity recognition
Kellokumpu V, Zhao G & Pietikäinen M (2009) Recognition of human actions
using texture. Machine Vision and Applications (available online).
Silhouette representation
MHI MEI
HMM modeling
w1
w2
w3
w4
± State priors
Observation probability is a 12 a 23
taken as intersection of the
observation and model
histograms:
P(hobs | st qi ) ¦ min( h
obs , hi )
yt
xt
Walk
Jack
Side
Skip
Run
Bend 1.00
Jack 1.00
Pjump 1.00
Run 1.00
Side 1.00
Walk 1.00
Wave1 1.00
1.00
Wave2
xt yt xy
Similarity ¦ min( h , h ) i j
B F S B F
S
CMU database
25 subjects
4 different conditions
(ball, slow, fast, incline)
Introduction
- parametric approaches
physics-based
method and image-based method
- video games
- movie stunt
- virtual reality
MACHINE VISION GROUP
Synthesis of dynamic textures using a new representation
- The basic idea is to create transitions from frame i to frame j anytime the
successor of i is similar to j, that is, whenever Di+1, j is small.
An example:
Experiments
Biometrics
Biomedical applications
- Cell phenotype classification [Nanni & Lumini, Artif. Intell. Med. 2008]
- Diagnosis of renal cell carcinoma [Fuchs et al., MICCAI 2008]
- Ulcer detection in capsule endoscope images [Li & Meng, IVC 2009]
- Mass false positive reduct. in mammographic images [Llado et al., CMIG 2009]
- Lung texture analysis in CT images [Sorensen et al., IEEE TMI 2010]
- Retrieval of brain MR images [Unay et al., IEEE TITM 2010]
- Quantitative analysis of facial paralysis [He et al., IEEE TBE 2009]
Matti Pietikäinen
Summary
Modern texture operators form a generic tool for computer vision
LBP and its variants are very effective for various tasks in computer
vision
The choice between LBP and LPQ depends on the problem at hand