Documente Academic
Documente Profesional
Documente Cultură
|v|
2
2
+
2
(1)
where v is the unnormalized descriptor vector, |v|
k
is its k-norm and is a small constant. The
overlapping of the blocks permits each cell response to contribute (with respect to block in which it
is normalized by neighboring) to the nal descriptor vector more than once. Specically, the corner
cells appear once, the other edge cells appear twice each, and the interior cells appear four times each.
The rationale in the block normalization approach is that changes in contrast are more likely to occur
over smaller regions within the image. So rather than normalizing over the entire image, we normalize
within a small region around the cell.
The nal descriptor is in fact a normalized vector on a 64x128 pixel detection window which is divided
into 7 blocks across and 15 blocks vertically, for a total of 105 blocks. Each block contains 4 cells with
a 9-bin histogram for each cell, for a total of 36 values per block. This brings the nal vector size to
3,780 values.
4 PDF L
A
T
E
X coloured text and graphics
Figure 5: Gradient magnitudes in a brighter and contrast-enhanced images. First row represents an
image with its surrounding pixel values and the value of its gradients magnitude. In second row we
added +50 to pixel values (increased brightness) from the rst image while in the third image we
multiplied them by 1.5 (increased contrast).
2.4 Methodology
Dalal and Triggs used an INRIA (French Institute of Research in Informatics and Automation) data
set of images which can be found online at http://lear.inrialpes.fr/data. They selected 1239 of them
as positive examples and made their left-right reections (2478 positive examples in total). A xed
set of 12180 image patches sampled randomly from person-free images served as negative examples.
For each detector and parameter combination they made a two-level trainining. First, a preliminary
detector is trained on whole set and used to identify the false positives which are denoted as hard
examples. Second, they combine only the hard examples with negative examples and then repeat
the method in order to obtain the nal detector. This retraining process improves signicantly the
performance on their data set.
3 HoG by other researchers
In [4] authors compare Haar (all-time famous face detector) to HoG features in vehicle detection.
They conclude that HoG outperforms the Haar cascade and they make an interesting observation
that when increasing the HoG feature space the detection rate increases while the number of false
positives uctuates at a stable level contrary to Haar where (by increase in Haar feature space) the
number of false positives decreases but at the lower corresponding detection rate. The training is at
multi-stage levels (11 and 12) as in a Viola Jones boosting algorithm [8].
A fused HoG-HCT (Histograms of Census Transform) feature set is created in [3]. First, HoG and
HCT features are extracted from dierent vehicle viewpoints and then fused togeteher in a single
block of matrix elements by PCA (Principal Component Analysis) algorithm. Afterwards they are
introduced and built into a deformable parts model (front view, rear view, side view). Astonishing
results based on car object class images from VOC 2007 show that this aproach outperforms HoG.
However, they only present results on images of cars on roadways (no urban trac). The quality of
detector is always tied to specicity of the training set (data set).
One interesting study was aimed to train HoG-based classiers specic to vehicle orientation versus
C.T.J. Dodson 5
a coarse HoG-based classier [5]. The goal was to detect vehicle orientation and with it to further
infer some relevant information. The conclusion was that, counter-intuitivelly, the coarse classier
outperformed the specic ones.
There is also a study that unsuccessfully tried to modify the HoG descriptor to separate between edges
(edge gradients) and shades (diuse gradients) because this descriptor at its current implementa-
tion cannot make a classier (SVM) sensitive to a i.e. black object on white background without
becoming sensitive to texture [6]. Also, HoG exhibits aliasing problems where two image patches that
are perceptually very dierent may end up with very similar HoG descriptors.
Harzallah [7] proposed a two-stage cascade in object detection. First, he uses HoG with linear SVM
to fast classify between positive and negative detections on a sliding detection window. Afterwards
he uses BoW with a non-linear classier to attribute condence scores to each remaining example and
nally suppression of non maxima. This rst stage speed / second stage performance tradeo proved
to outperform each other contestant on Pascal VOC 2008. The class car was well in the scope of
high scoring detections of his algorithm (data set was Pascal VOC 2007).
Purpose of this section was to give an insight how to use HoG, what are his weaknesses and how
to exploit its strenght. Descriptor itself is a proven tool but it should be used in conjuction with
others algorithms whether in cascades or in boosting.
4 My own HoG
In the scope of my thesis I make use of KITTI (Karlsruhe Institute of Technology and Toyota Tech-
nological Institute at Chicago) image data set [9]. This dataset comprises of 7480 labeled training
images and 7517 testing images with an average 1224x370 p resolution. However, because of speed,
I used 362 positive image patches with their left-right reections and 751 negative image patches as
training set and 631 positive with 587 negative image patches as test set. All patches are in RGB
color space. The balanced number of positive/negative examples is due to evaluation after the testing
phase which is done by calculating the EER (equal error rate) metric on the ROC (Receiver Operat-
ing Charateristic) curves of dierent HoG-based classiers. I used EER metric insted of AUC (Area
Under Curve) because of high accuracy of classiers and therefore - more linear EER metric which
takes into account false negatives (high accuracy is due to a small number of false positives). The
smaller the EER is the better the classier is.
Goal of the simulation was to determine which HoG descriptor (in a combination with a linear SVM)
with which parameters regarding cell size, block size, number of orientation bins and HoG detecton
window size would t best on KITTI data set. Thus I evaluated (in nested loops) the following
parameters:
1. HoG window size: [120x60], [128x64], [136x68], [144x72], [152x76];
2. block size: [1x1], [2x2], [3x3], [4x4];
3. cell size: [4x4], [6x6], [8x8], [10x10], [12x12];
4. bin size: 9, 10, 11, 12;
5. orientation: 1, 0;
resulting in a total of 800 dierent HoG-based classiers. It took approximatelly 26 hours to run a
single simulation under Matlab on an Intel Core i3 CPU platform clocked at max 1.8 MHz and having
6 GB of RAM.
4.1 Results
Figure 6 shows comparable results.
6 PDF L
A
T
E
X coloured text and graphics
Figure 6: The whole graphic may be divided into 5 columns; each column represents a n-by-n block
size group of units. In each block size there are m-by-m cell size group of units. In each cell there are
8 (9,10,11,12 - non-oriented and the same number-based oriented) units. Each unit corresponds to a
n-by-n m-by-m K-oriented/non-oriented classiers equal error rate.
C.T.J. Dodson 7
4.2 Discussion
From Figure 6 we may conclude:
1. Increasing HoG window size by 4 in height and by 8 pixels in width we gain in performance -
at least 10% smaller EER;
2. For block sizes, decreasing them from 4x4 = 16 cells to 3x3 = 9 cells and so consequently to
1x1 = 1 cell we gain in performance on the classiers that have smaller number than at least 64
(8x8) pixels in their cells;
3. Smaller cell sizes - more local normalization - tend to increase performance signicantly by rates
of over 20% in EER;
4. bin size: the best bin size is between 22 and 24 (both oriented) and all the non-oriented
histogram-based classiers perform poorer than those with oriented histograms.
The best HoG based classier on KITTI data set is the 152x76 1x1 4x4 22-bin oriented classier. Its
EER is 0.0376 meaning automatically an AUC over 97%. It is the equivalent if we divided the 152x76
imagette into 4x4 pixel squares without overlapping and used a 22-bin oriented histogram. In overall,
for vehicle detection, we may say that HoG works best on larger image patches (than those used for
pedestrian detection) that are divided into a grid of preferrably not overlapping small blocks which
contain small number of cells. These cells are also mapping a small number of neighbouring pixels
and the invariance to illumination and contrast are very important as HoG works best with larger
histograms.
5 Future Work
The next step is to wire this descriptor with a detection window, nd condence scores and perform
suppression of non maximas. Histogram of Oriented gradients descriptor with linear SVM should be
used in conjuction with others algorithms. In the scope of my thesis I plan to try using it with Adaboost
and, if the time permits, neural networks. One possibility of cooperation would be incorporating fuzzy-
based detection algorithms develloped by Chair of Informatics at FTN, Novi Sad.
8 PDF L
A
T
E
X coloured text and graphics
References
[1] Navneet Dalal and Bill Triggs. Histogram of Oriented Gradients for Human Detection,
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego,
CA, USA, Volume 1, pages 886 - 893, 2005.
[2] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91 -
110, 2004.
[3] Sun Li, Do Wang, ZHniui Zheng, Hailuo Wang. Multi-view vehicle detection in trac
surveillance combining HOG-HCT and deformable part models, Proceedings of the
2012 International Conference on Wavelet Analysis and Pattern Recognition, Xian, 15-17 July,
2012.
[4] Pablo Negri, Xavier Clady, Lionel Prevost. Benchmarking HAAR and Histograms of Ori-
ented Gradients features applied to vehicle detection, Universite Pierre et Marie Curie-
Paris 6, ISIR, CNRS FRE 2507.
[5] Paul E. Rybski and Daniel Huber and Daniel D. Morris and Regis Homan. Visual Classica-
tion of Coarse Vehicle Orientation using Histogram of Oriented Gradients Features.
[6] Carl Doersch and Alexei Efros. Improving the HoG descriptor
[7] H. Harzallah and C. Schmid and F. Jurie and A. Gaidon. Classication aided two stage
localization, PASCAL Visual Object Classes Challenge Workshop, in conjunction with ECCV,
October 2008.
[8] Viola P. and Jones M. Rapid object detection using a boosted cascade of simple features,
In CCVPR pages 511 - 518, 2001.
[9] Andreas Geiger, Philip Lenz, Raquel Urtasun. Are we ready for Autonomous Driving? The
KITTI Vision Benchmark Suite.,Conference on Computer Vision and Pattern Recognition
(CVPR), 2012.