Sunteți pe pagina 1din 8

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228944333

Self-calibration of an on-board stereo-vision


system for driver assistance systems

Article January 2006


DOI: 10.1109/IVS.2006.1689621

CITATIONS READS

29 47

5 authors, including:

Cristina Hilario Arturo de la Escalera


Istituto Italiano di Tecnologia University Carlos III de Madrid
23 PUBLICATIONS 303 CITATIONS 132 PUBLICATIONS 1,858 CITATIONS

SEE PROFILE SEE PROFILE

J.M. Armingol
University Carlos III de Madrid
119 PUBLICATIONS 1,817 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Urban Information Assistant for Safer Driving - URBANITA-Safe View project

RENVISION EU-funded project FP7-FET View project

All content following this page was uploaded by J.M. Armingol on 22 May 2017.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Intelligent Vehicles Symposium 2006, June 13-15, 2006, Tokyo, Japan

4-1

Self-calibration of an On-Board Stereo-vision System for Driver Assistance


Systems

Juan M. Collado, Cristina Hilario, Arturo de la Escalera and Jose M. Armingol


Intelligent Systems Lab
Department of Systems Engineering and Automation
Universidad Carlos III de Madrid, 28911, Leganes, Madrid, SPAIN
E-mail: {Cristina.Hilario,JuanManuel.Collado,JoseMaria.Armingol,Arturo.delaEscalera}@uc3m.es

Abstract tion. Calibration of extrinsic parameters can be done in fac-


tory, by technical operators, or afterward, by the user. The
Vision-based Driver Assistance Systems need to establish algorithm proposed in this paper is useful in both cases.
a correspondence between the position of the objects on Obtaining the extrinsic parameters has several problems.
the road, and its projection in the image. Although intrin- On one hand, due to the variability of vehicles, the space
sic parameters can be calibrated before installation, cali- of possible positions and orientations is big enough to pre-
bration of extrinsic parameters can only be done with the vent using initial guesses.On the other hand, any calibra-
cameras mounted in the vehicle. tion process need a calibration pattern, and when the stereo
In this paper the self-calibration system of the IVVI (Intel- systems is already installed on-board, the pattern must be
ligent Vehicle based on Visual Information) project is pre- seen from the cameras. Most current systems use a cali-
sented. It pretends to ease the process of installation in bration pattern which is painted on the road [1], painted on
commercial vehicles. The system is able to self calibrate a the hood of the vehicle [2], or in a moving object [3].
stereo-vision system using only basic road infrastructure. The algorithm presented here intends to ease installation of
Specifically, only a frame captured in a straight and plane stereo-vision systems, by taking advantage of road struc-
stretch of a road with marked lanes is required. Road lines ture, i.e, using the structured, and partially known, road
are extracted with the Hough Transform, and used as a environment as a calibration pattern.
calibration pattern. Then, a genetic algorithm finds the
In short, the main goal of this work, is to design and imple-
height, pitch and roll parameters of the vision system. The
ment a self-calibration algorithm for stereo-vision systems
user must run this algorithm only once, unless the vision
boarded on vehicles, with the following requirements:
system is replaced or reinstalled in a different position.
1. allowing to obtain the main extrinsic parameters
needed by the Driver Assistance System of the IvvI,
1 Introduction namely, height, pitch and roll.
2. using exclusively objects of the basic road infrastruc-
Companies are showing an increasing interest in Driver
ture.
Assistance Systems (DAS). Many of them have declared
its interest in commercialize vision-based DAS in the short 3. requiring minimum user participation.
term. These vision-based DAS need to establish a corre-
spondence between the position of the objects on the road, The designed algorithm uses the road lane boundaries as a
and its projection in the image. Thus, it is essential a cal- calibration pattern, and a genetic algorithm (GA) [4] as an
ibration process in order to obtain the parameters of the optimization technique. The user only needs to drive the
vision-system. vehicle through a straight and plane stretch of a road with
A vision-system has intrinsic and extrinsic parameters. In- marked lanes, and indicate the system to initiate the auto-
trinsic parameters are those related to the camera-optic set, calibration process. This is only needed to be performed
and may be pre-calibrated in factory or before being in- once, unless the vision system is changed or reinstalled in
stalled on-board in the vehicle. However, extrinsic pa- a different position.
rameters (position and orientation of the stereo system, as This algorithm has been designed to support other capabil-
showed in figure 2(a)) can only be calibrated after installa- ities of a DAS.

156
Intelligent Vehicles Symposium 2006, June 13-15, 2006, Tokyo, Japan

1.1 Previous work


2
To calibrate monocular systems usually requires to sim-
Left camera Right camera
plify the problem, or certain information about the envi-
3
ronment. In [5] the calibration problem is reduced to detect
the height and the pitch of the vision system. For the pitch,
it is enough to detect the horizon line. However, in or-
der to make distance measures in the image, the user must 1
provide the algorithm with the true distance between two
points.
Similarly, in [6] a supervised calibration method is de- 4

scribed. It requires the user to supply the width of the lane,


and to select from the detected lines in the image, which of
Fig. 1. Algorithm Description
them are lane boundaries. As in [5], only the pitch angle is
considered.
In [7], the orientation of a moving camera is calculated extracted with the Hough Transform. After that, assum-
by analyzing sequences of 2 or 3 images. The camera is ing a possible set of extrinsic parameters, the perspective
supposed to keep the same orientation during movement. transformation is reversed (figure 1(3)). If the extrinsic pa-
The algorithm needs to know the height of the camera, and rameters coincide with the true ones, the images projected
the displacement of the vehicle between two frames. from both cameras onto the road plane should be identical,
Other systems use a pattern painted on the floor to cali- but, if the parameters are wrong, then the road lines will
brate a binocular system. In [1] the pattern is used to cali- not match neither be parallel. A genetic algorithm tries to
brate each camera separately. The equations are linearized find a set of extrinsic parameters that makes the bird-eye
and solved with a recursive minimum square technique, view obtained from right and left camera be coherent. In
and the algorithm outputs the displacement and rotation other words:
of the stereo-system respect to the world. Other works
1. all the road lines appear completely parallel,
have a supervised stage, where, once the grid pattern has
been captured by a binocular system, the user selects the 2. the road lines projected from right camera match the
intersections of the grid lines [2]. It follows an unsuper- lines projected from the other.
vised stage, where the parameters are refined with an itera-
tive process, assuming that the images captured from both Therefore, the genetic algorithm evaluates the fitness of the
cameras come from a flat textured surface. This technique possible solutions following the two criteria above men-
is also used to correct small drifts of the extrinsic param- tioned. The fitness function will be detailed in subsec-
eters due to the movement of the vehicle, but, instead of tion 2.3.
a flat textured surface, it uses several markers placed on
the hood of the vehicle that can be easily detected by the
2.1 Stereo-vision System Model
cameras.
In [8], no special calibration pattern in needed, and the v- The perspective transformation requires a mathematical
disparity algorithm is used to sequentially estimate roll, model of the complete vision system. This model consists
pitch and yaw parameters of the vehicle. of two parallel cameras joined together with a solid rod, so
This paper presents a new approach, able to estimate si- that they are separated a distance b. Thanks to the rectifi-
multaneously the roll, pitch, and height of a stereo vision- cation of the images [9], both cameras can be considered
system, without the need of an artificial calibration pattern. to be identical, and perfectly aligned. The origin of the
reference system is located in the middle point of the rod
(figure 2(b)).
2 Algorithm Description The projection of any point of the road onto each camera
can be calculated using homogeneus matrices, which al-
The main idea of this algorithm is to use the road lane lows to represent advanced transformations with a single
boundaries as a calibration pattern. First, we start from a matrix.
straight and plane stretch of a road (figure 1(1)). Then, we In order to obtain the matrix that represents the perspec-
capture an image with both cameras (figure 1(2)). Next, tive transformation between road plane and camera plane,
pixels belonging to road lanes are detected, and lines are several steps are followed:

157
Intelligent Vehicles Symposium 2006, June 13-15, 2006, Tokyo, Japan

Roll Z Yaw 2.2 Extraction of the calibration pattern


Z
Pitch Extraction of the calibration pattern is done in three steps.
First, in order to obtain the horizontal gradient, the image
b is correlated with the kernel 1 0 1 .
Height
Y Y As road lines are white stripes over a darker background,
they can be recognized because if the gradient image is
X X
scanned row by row, two opposite peaks must appear at a
(a) Extrinsic parameters (b) Stereo-vision model
distance equal to the line width. So, when a positive peak
Fig. 2. Vision system model can be matched with a corresponding negative peak, the
positive peak is marked as a pixel that belongs to a road
line.
First the coordinate system is raised a height H, and ro- Several trials were carried out marking the middle point
tated an angle (roll) and (pitch) over the OY and OX (equidistant to positive and negative peaks), but the results
axes, respectively. Second, the coordinate system is dis- were not satisfactory. This is because, due to perspective
placed b/2 or b/2 (for the right or left camera, respec- effects, the middle point of the horizontal section, do not
tively) over the OX axis. Next, the point is projected along coincide exactly with the center of the line.
the OY axis, according to the pin-hole model. Finally, the In the second step, the marked pixels are used by the
coordinates are translated from millimeters to pixels, and Hough Transform to detect the right and left road lines.
the origin translated to the optical center of the CCD. The Hough Transform used here has been improved as ex-
Multiplying these matrices, the global transformation ma- plained in [10]. The usual - parameterization is used,
trix is obtained: and the range of the parameter goes from to , al-
lowing to be negative , so that left and right lane borders
have different sign in (figure 3(a)). Then, Hough Trans-
Ku f Cu 0 Ku f b/2 +Cu form selects the best line in the region of negative angles,
0 1 0 0 and the best in the region of positive angles.
T =
0 Cv Kv f Cv But these detected lines are not valid yet as a calibration
0 0 0 1 pattern. Some experiments have been done trying to com-
pare two lines from its parameters, without success. For

C +S S S C S C S H
0 C S S H this reason, what the algorithm compares is two set of

S +C S C C C C C H points extracted from the road lines, evaluating the de-
0 0 0 1 tected lines at different heights in the image. The first
(1) height is at 90% of the vanishing point height. The sub-
where sequent heights are spaced ten pixels, until the bottom of
S and C stand for the sine and cosine functions, the image is reached.
f is the focal distance,
Ku and Kv refer to the pixel size, expressed in pixel/mm, 2.3 Fitness function
and Cu and Cv are the optical center in pixels.
The matrix dimensions can be reduced to 33, if the sec- Each individual of the population of the genetic algorithm
ond row (that corresponds to the depth coordinate, irrele- represents a position and orientation of the stereo-vision
vant in images), and the third column (that corresponds to system.
the height coordinate, always zero if the world is flat), are The GA should evaluate that:
eliminated.
the projections of the pattern from both cameras onto
The resultant matrix M persp stores all the necessary infor- the road are identical,
mation to perform the perspective transformation:
road lines are parallel.

u x Accordingly, these requirements will be evaluated by an
v = M persp y (2) error function composed of two terms.
s0 1 The first term evaluates that the two pairs of lines are
matched. The algorithm compares two set of points ex-
where s0 is a scale factor. tracted from the road lines, as already explained. Each

158
Intelligent Vehicles Symposium 2006, June 13-15, 2006, Tokyo, Japan

0 function increases the resolution when we are close to the


50

100
global maximum.
150

y 200 1 1
250 F1 = ; F2 = (6)
300 E1 + K E2 + K
350
+
- 400 The global fitness function cannot be a simple sum of F1
+ 450
x
0 100 200 300 400 500 600
and F2 , because the genetic algorithm tends to maximize
-
only one of them at the expense of the other. Thus the
(a) Hough Parameterization (b) Calibration pattern fitness function has been defined as the minimum of the
two terms:
Fig. 3. Calibration pattern. In (b): (+) represent the pattern from
the right camera; () represent the pattern from the right camera 1
Fitness = min (F1 , F2 ) (7)
K
point projected onto the road (through the inverse perspec- In the experiments, the constant K has been empirically set
tive transformation) from the right camera should coincide to 104 , so that the fitness is constrained to the interval
with the projection from the left camera. The discrepan- [0, 104 ].
cies between the two sets of lines can be evaluated by the
sum of the squares of the distance between one point and
its corresponding from the other camera: 3 Results
2
E1 = ~xi, le f t ~xi, right (3) Both tests in sinthetic and real environments have been car-
i
ried out. The behaviour or the algorithm in a real environ-
where ~xi, le f t and ~xi, right are the sets of points extracted ment has been tested with the IvvI platform (figure 4). IvvI
from the left camera and right camera, respectively. is a research platform for the implementation of systems
The second term evaluates the parallelism between the based on computer vision, with the goal of building an Ad-
right and left borders of the lane. In Eq. (4) and Eq. (5) vanced Driver Assistance System (ADAS).However, a real
is the angle between the line and the horizontal axis. Sub- environment do not allow to evaluate the estimation be-
scripts 1 and 2 represent first and second line in the image cause it is difficult to measure the true parameters. Thus
(i.e. right and left lane border), while subscripts right and synthetic images (figure 5) have been used in order to eval-
le f t represents right and left camera, respectively. uate the performance of the algorithm. These images have
been generated through a perspective transformation with

E2 = right,1 right,2 + le f t,1 le f t,2 (4) the same intrinsic parameters of the cameras mounted on
the IvvI (Table I).
Table II shows the parameters of the genetic algorithm
(GA), and table III shows the output of the GA and the
E2 = min right,1 right,2 , le f t,1 le f t,2 (5)
comparison with the true parameters. For the first set of
Equation (4) does not work. Experiments using this equa-
tions shows that it gives a very high error even when the
parameters are close to the true ones. In contrast, Eq. (5)
gives better results, and it represents that at least there is
one pair of lines well aligned.
Both terms E1 and E2 should be zero (or nearly zero) in the
perfect case. Now, a fitness function must be created from
them. Experience has proved that it is indispensable for the
fitness function to give both terms the same importance,
and to have a high resolution when we are close to the true
solution.
Next, E1 and E2 are normalized and transformed into F1
and F2 , as in Eq. (6). K is a constant that helps to define
an upper limit (1/K), which is the same for both terms, Fig. 4. (left) IVVI vehicle; (top-right) detail of the trinocular
so that they have the same importance. The use of inverse vision system; (bottom-right) detail of the processing system

159
Intelligent Vehicles Symposium 2006, June 13-15, 2006, Tokyo, Japan

TABLE I
I NTRINSIC PARAMETERS

(a) Road Model


Image width 640 pixels
Image height 480 pixels
Focal distance 6,28 mm
CCD width 7,780 mm
CCD heigth 3,589 mm
CCD X center 342.58 pixels
CCD Y center 261.43 pixels
Distance between cameras (b) 148,91 mm
(b) Left camera (c) Right camera

Fig. 5. Road projected onto the cameras of the stereo-vision


TABLE II
system
GA PARAMETERS

parameters, it also includes the error of the estimation and Population representation Real-valued
the variance of the estimated parameters calculated from Population 1000
ten executions of the algorithm. The variance of the result Number of parents 90% of the population
is about 102 in height, 108 in pitch, and 1010 in roll, Number of children 90% of the population
that is, the convergence of the algorithm is quite robust. Crossover probability 70%
Fig. 6 shows an example of execution. The figure on Mutation probability 1%
the left represents the error of the solution, and the con- Max. number of generations 50
vergence. The figure on the right shows the correspon-
dence between the points projected from the right camera TABLE III
(crosses) and the left camera (circles). It also shows (with R ESULTS OF THE GA
dots) the points projected onto the road using the true pa-
rameters. In other words, the dots represent the true posi-
Parameters Height Pitch Roll
tion of the lane on the road, and the crosses and circles
(mm) (rad) (deg) (rad) (deg)
represent the supposed position using the estimated param-
True 1500 0.170 9.74 -0.090 -5.15
eters. Error is about 1 meter at 35 meters ahead (3%), and
Estimated 1458 0.169 9.69 -0.093 -5.35
about 5 centimeters at 4 meters (1.3%).
Variance 1.34 9 1011 2 107
In order to recreate real environment, and to study the sen-
Error 42.0 0.001 0.05 0.003 0.20
True 1500 0.170 9.74 0.090 5.16
H = 1454.7319 0.16846318 0.090043347
Estimated 1484 0.168 9.62 0.081 4.65
9.3 35
True 1100 0.170 9.74 -0.090 -5.16
9.2 Best = 4118.3768
F1= 5881.6232
30 Estimated 1065 0.168 9.64 -0.100 -5.74
9.1
F2= 5881.7838
Longitudinal distance (m)

9 25
log(best fitness)

8.9
20
TABLE IV
8.8 R ESULTS WITH GAUSSIAN NOISE ADDED TO ORIGINAL
15
8.7 IMAGES
8.6 10

8.5

8.4
5 Parameters Height Pitch Roll
8.3 0
(mm) (rad) (deg) (rad) (deg)
0 10 20 30 40 50 3 2 1 0 1 2
Generation Lateral distance (m) True 1500 0.170 9.74 -0.090 -5.16
5% noise 1455 0.169 9.65 -0.090 -5.16
Fig. 6. GA execution example; (+) points projected from right 10% noise 1677 0.170 9.72 -0.042 -2.42
camera; () points projected from left camera; () points projected 20% noise 1663 0.170 9.71 -0.040 -2.28
using true parameters

160
Intelligent Vehicles Symposium 2006, June 13-15, 2006, Tokyo, Japan

H = 1454.7536 0.16846318 0.090043763


9.3 35

9.2 Best = 4118.3772


F1= 5881.6228 30
9.1 F2= 5881.9861

Longitudinal distance (m)


9 25
log(best fitness)

8.9
20
8.8
15
8.7

8.6 10 (a) Left camera (b) Right camera


8.5
5
8.4

8.3 0
0 10 20 30 40 50 3 2 1 0 1 2
Generation Lateral distance (m)

(a) 5% noise (c) Markings detected (d) Markings detected

H = 1676.6389 0.1696372 0.042278277


9.15 40

Best = 6702.5721 35
9.1
F1= 3297.4279
F = 3298.1936
2 30
(e) Lines detected (f) Lines detected
Longitudinal distance (m)

9.05
log(best fitness)

25
9 Fig. 8. Lines detected and used to generate the calibration pattern
20
8.95
15

8.9
10 meter at 35 meters ahead (3%), and about 5 centimeters at
8.85 5
4 meters (1.3%). With over 10% of noise, the error is about
40 centimeters at 4 meters ahead (10%), and 5 meters at 30
8.8 0
0 10 20
Generation
30 40 50 4 2 0 2
Lateral distance (m)
4
meters ahead (16%).
It can be seen that the algorithm always converge before 20
(b) 10% noise
generations.
H = 1662.5203 0.16953448 0.039735898
9.25 40 Table V shows the results obtained from a sequence in a
9.2
Best = 7504.2862
F = 2495.7138
35 real environment, while figure 8 shows the first frame the
1
F2= 2496.6751 30
sequence and the output of the preprocessing steps. In this
Longitudinal distance (m)

9.15
sequence the vehicle has just finished a curve to the left,
log(best fitness)

25
9.1 thus it is difficult to divide the variability that comes from
20
9.05 the algorithm from the variability that comes from the in-
15
ertial movements of the vehicle. The results are coherent
9
10
with the location and orientation of the stereo-system on-
8.95 5 board the vehicle.
8.9 0
0 10 20 30 40 50 4 2 0 2 4
Generation Lateral distance (m)

(c) 20% noise 4 Conclusion and Perspectives


Fig. 7. Results with noise added to original images; (left) con-
A self-calibration algorithm to estimate height, pitch and
vergence of the GA and error of the solution; (right) projection of
the pattern in world coordinates, where: + are points projected roll parameters for stereo-vision systems has been pre-
from right camera, are points projected from left camera, and sented. Precision is very good with noise below 10%, and
are points projected using true parameters location of objects is estimated with less than 3% error 30
meters ahead. With noise over 10% the error increases
because the road lines extraction is not accurate enough.
sitivity of the algorithm to errors in the pattern extracted Thus, if the image is highly degraded, only the pitch is
from the image, gaussian noise has been added to the syn- estimated accurately, and the position error of objects is
thetic images. Table IV and figure 7 show the results of an about 16%. The test in a real environment have proved the
experiment with different amounts of noise. Experiments consistency and convergence of the algorithm.
show that noise affects mainly the height and roll estima- Future work involves the improvement of the road lines
tion, but the pitch remains nearly unaffected. With 5% of extraction, in order to increase the fitting precision so that
noise, the algorithm remains unaffected. Error is about 1 the algorithm can deal with highly degraded images. Also

161
Intelligent Vehicles Symposium 2006, June 13-15, 2006, Tokyo, Japan

TABLE V [5] T. Bucher, Measurement of distance and height in


R ESULTS FROM A SEQUENCE OF TEN CONSECUTIVE FRAMES images based on easy attainable calibration parame-
CAPTURED FROM A CAR DRIVING AT 80 KM / H
ters, in Proceedings IEEE Intelligent Vehicles Sym-
posium, Detroit, USA, 2000, pp. 314319.
Frame Height Pitch Roll
(m) (rad) (deg) (rad) (deg) [6] B. Southall and C. Taylor, Stochastic road shape es-
timation, in 8th IEEE International Conference on
1 0.951 -0.0020 -0.11 -0.0392 -2.24
Computer Vision (ICCV), vol. 1, 7-14 July 2001, pp.
2 0.999 -0.0032 -0.19 -0.0367 -2.10
205212.
3 0.926 -0.0072 -0.41 -0.0294 -1.69
4 0.970 -0.0074 -0.43 -0.0312 -1.79 [7] H.-J. Lee and C.-T. Deng, Determining camera mod-
5 1.020 -0.0131 -0.75 -0.0273 -1.57 els from multiple frames, Journal of Information
6 1.047 -0.0131 -0.75 -0.0279 -1.60 Science and Engineering, vol. 12, no. 2, pp. 193214,
7 1.028 -0.0091 -0.52 -0.0296 -1.70 June 1996.
8 1.006 -0.0055 -0.31 -0.0272 -1.56
9 1.034 -0.0078 -0.45 -0.0345 -1.98 [8] R. Labayrade and D. Aubert, A single framework for
10 0.986 -0.0078 -0.45 -0.0243 -1.39 vehicle roll, pitch, yaw estimation and obstacles de-
Avg. 0.997 -0.0076 -0.44 -0.0307 -1.76 tection by stereovision, in Proceedings IEEE Intelli-
Std. 0.039 0.0036 0.2074 0.0047 0.2688 gent Vehicles Symposium, 9-11 June 2003, pp. 3136.

[9] J.-Y. Bouguet, Camera calibration toolbox for


matlab, Last visit on: April 2006. [Online].
fitting a curved model will make the algorithm work not
Available: http://www.vision.caltech.edu/bouguetj/
only in straight sections of road, but in curved sections too.
calib\ doc

[10] J. M. Collado, C. Hilario, A. de la Escalera, and J. M.


Acknowledgments Armingol, Detection and classification of road lanes
with a frequency analysis, in IEEE Intelligent Vehi-
This work was supported in part by the Spanish Govern-
cle Symposium, Las Vegas, Nevada, U.S.A., June 6-8
ment through the CICYT project ASISTENTUR (Grant
2005, pp. 7883.
TRA2004-07441-C03-01).

References
[1] S. Ernst, C. Stiller, J. Goldbeck, and C. Roessig,
Camera calibration for lane and obstacle detection,
in Proceedings 1999 IEEE/IEEJ/JSAI International
Conference on Intelligent Transportation Systems,
Tokyo, Japan, 5-8 Oct. 1999, pp. 356361.

[2] A. Broggi, M. Bertozzi, and A. Fascioli, Self-


calibration of a stereo vision system for automo-
tive applications, in Proceedings 2001 IEEE In-
ternational Conference on Robotics and Automation
(ICRA), vol. 4, Seoul, South Korea, 21-26 May 2001,
pp. 36983703.

[3] T. Dang and C. Hoffmann, Stereo calibration in


vehicles, in IEEE Intelligent Vehicles Symposium,
Parma, Italy, 14-17 June 2004, pp. 268273.

[4] D. E. Goldberg, Genetic Algorithms in Search, Opti-


mization and Machine Learning. Addison-Wesley
Pub. Co., 1989.

162

View publication stats

S-ar putea să vă placă și