Documente Academic
Documente Profesional
Documente Cultură
4, DECEMBER 2005
AbstractMonitoring traffic intersections in real time and pre- Studies have shown that collisions between vehicles at traffic
dicting possible collisions is an important first step towards build- intersections account for nearly a third of all reported crashes
ing an early collision-warning system. We present a vision-based in the United States [8], [18], [21]. This has led to considerable
system addressing this problem and describe the practical adap-
tations necessary to achieve real-time performance. Innovative interest at the federal level in developing an intelligent low-cost
low-overhead collision-prediction algorithms (such as the one system that can detect and prevent potential collisions in real
using the time-as-axis paradigm) are presented. The proposed time. This paper presents the various components of a vision-
system was able to perform successfully in real time on videos based system that monitors a traffic intersection and uses the
of quarter-video graphics array (VGA) (320 240) resolution tracking results to predict collisions over a short time interval
under various weather conditions. The errors in target position
and dimension estimates in a test video sequence are quantified extending into the future. Our goal is to establish the feasibility
and several experimental results are presented. of this approach; the specific way in which these predictions
Index TermsCollision prediction, machine vision, real-time can be used to prevent possible traffic accidents, however, is
systems, tracking, traffic control (transportation). not addressed at this time. As far as we know, this is the first
time that computer vision in conjunction with computational
geometry tries to supply a solution to this complex problem.
I. I NTRODUCTION
The rest of this paper is organized as follows: We initially
mask provide us with good object outlines, allow for automatic are rarely occluded in the views of all cameras. However, even
target identification, and are well-suited for real-time systems. when multiple cameras are used, it is advantageous to have
In order to classify foreground pixels, we need a background some means of handling occlusions in a single view. For that
model of the observed scene. Due to the gradual changes in reason, we represent targets as sets of regions and introduce
scene appearance over extended periods of time, we cannot use a second-level tracker that is capable of handling blob merges
a static background model. Instead, we use an adaptive back- and splits. The second-level tracker also makes use of camera-
ground model based on the mixtures-of-Gaussians segmenta- calibration data in order to estimate the position and velocity of
tion method in [24]. The resulting background/foreground clas- vehicles in world coordinates. The calibration method we use is
sifier adapts well to gradual changes in the monitored outdoor presented in [16] and allows us to accurately map points from
environment and allows for the detection of targets even if they the image planes of all cameras to positions on a single world-
are not movinga common occurrence at traffic intersections. coordinate ground plane.
Occasionally, the background extraction can fail, either because It is common practice to use the centroids of connected
of sudden illumination changes in the scene caused by passing regions to represent a targets position. Such an approach is sub-
clouds or a cameras gain control circuitry, or due to minor optimal, because the position of a centroid relative to the ground
camera shakes resulting from road vibrations or wind load. plane depends on the size and orientation of vehicles and also
We have developed efficient methods for compensating for on the particular camera placement. The last fact complicates
sudden illumination changes and camera shakes. We also have multicamera tracking since the centroids tracked in different
devised a fast implementation of the method presented in [5] camera views do not correspond to the same real-world point.
for cleaning up the foreground masks. We introduce a method that can identify the centers of vehicular
Vehicles that move throughout the scene will sometimes bases on the ground plane given the outlines of the vehicles
occlude other moving objects, or be themselves occluded by and the camera-calibration data. The base centers identified by
static objects such as road-sign poles, traffic lights, etc. Such our method correspond to the same real-world point, which
occlusions cause blobs to merge, split into smaller regions, or allows for the sequential incorporation of the measurements in
to disappear completely. These interactions between connected a targets state vector. The method also produces estimates of
regions can either cause a single target to be visible as more the width, length, and height of vehicles that are critical for the
than one blob in the foreground, or cause several targets to collision-prediction system.
be represented as a single blob. The problem can be allevi- The last component of our system is the collision-prediction
ated by making use of multiple cameras to observe the same module. Given the visual measurements of all targets in the
intersection; proper camera placement can ensure that vehicles scene, the module reports all target pairs that will collide within
418 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 6, NO. 4, DECEMBER 2005
Fig. 2. Low-level vision-system components and data flow for a single frame. Lighter lines indicate the use of data from the processing of the previous frame.
Fig. 3. Collision-prediction-system components and data flow. Size and position estimates are provided to the ground-plane tracking component of the
tracking module.
a time interval of length L in the immediate future, assuming outlines and calibration data for the first task is presented,
their velocities stay constant. Optionally, the actual time of followed by a description of the collision-detection test.
impact for each target pair that collides within the specified
L time units can be reported. We present several methods for
predicting pairwise collisions. The most effective method is A. Dimension and Position Measurement
based on the idea of extruding the two-dimensional (2-D)
To measure a targets position and dimensions, we fit a three-
vehicular bases along a time axis to obtain three-dimensional
dimensional box to its outline. The outline is the union of
polytopes that can be tested for overlap efficiently by making
all contour points of regions in the targets blob collection.
use of the Separating Axis Theorem [9].
Assuming for the moment that we know the boxs position,
dimension, and orientation, we can extend its edges into lines.
III. L OW -L EVEL V ISION S YSTEM Those lines intersect at three distinct vanishing points in the
image plane (we also handle lines parallel to the image plane
In this section, we summarize the methods used. Initially, we as a special case). The importance of this result is that we can
tried to base our methods on bounding rectangles. It worked reverse the processstarting with the three vanishing points
in an acceptable fashion, but we had to use the assumption and the objects outline, we can find some of the edges of
of known vehicle width, which really distorted the vehicle- the box. The vanishing points in a given direction d R3 are
size estimation. We then developed more advanced methods for also determined. The relevant directions are found by making
background-model maintenance, illumination filtering, camera- two assumptions: that the targets orientation coincides with
shake compensation, and noise removal in the foreground mask. its direction of motion and that the bases of the targets are
Fig. 2 shows the components of the low-level vision system and parallel to the ground plane. The first assumption determines
the data interactions between them. More information about the vanishing points wx and wy in directions parallel to the
these methods can be found in [2]. ground plane: the direction of motion dx and perpendicular
direction dy . The second assumption fixes the third vanishing
point wz in the direction dz = [0, 0, 1]T .
IV. C OLLISION P REDICTION
For each vanishing point, we find the two tangent lines to
Initially, the collision-prediction module was responsible for the convex hull of the targets outline. If a vanishing point is
measuring the bounding rectangles. We eventually evolved the inside the outlines hull, it will be ignored in subsequent steps,
module to measure the base center and dimensions of a target since all box edges that vanish to that point would be contained
given its outline, and to predict potential collisions between in the outline and are thus, irrecoverable. There is no need to
vehicles (Fig. 3). The specific way in which we use object compute the convex hull, since its vertices are a subset of the
ATEV et al.: A VISION-BASED APPROACH TO COLLISION PREDICTION AT TRAFFIC INTERSECTIONS 419
Fig. 5. Three major cases for the tangent-line intersections. Thicker line segments indicate edges whose length can be retrieved. The location of the wy vanishing
point and the direction of motion dx are indicated. In (a) and (c), all three dimensions can be recovered, while in (b), only two are recoverable.
Fig. 6. Rectangles extruded in time. (a) A w l rectangle at position p0 , moving for L time units with velocity v (reaching point pL ); and (b) two overlapping
parallelepipeds with labeled edges; the vector c connects the centroids of the polytopes.
should be that the vehicles get distributed uniformly across the Collision prediction however requires the computation of all
cells. This depends on the knowledge of vehicle flow pattern, vehicle pairs that could possibly collide within the next f time
density, and their variation with respect to time of the day. One steps, provided the predicted vehicle positions are available.
simple way out is to allow the end user who has knowledge While these brute-force or space-based algorithms would suf-
of traffic-flow patterns to draw these grids for a specific traffic fice for computing collisions in the current time frame, they do
intersection. not scale readily for solving the following problem: Given the
A natural extension of the above fixed grid-based approach is position, orientation, and size, at each time step, of n oriented
to make the grid adaptive to the conditions of vehicular traffic rectangles in 2-D, find all possible pairs of rectangles that could
flow. The goal is to maintain an invariant that the maximum intersect in f future time steps. In the existing implementation,
occupancy of any grid cell is given by a predetermined num- this is done by using the predicted positions through a Kalman
ber p. The grid cell is recursively divided into four subcells filter and repeating the collision-detection process once for each
whenever the above invariant is violated. Vehicles can be added of the f time steps, thus increasing the running time by a factor
to a grid cell, so long as its occupancy remains less than p. of f . Algorithms like the interval-based approach can be used
When the (p + 1)st vehicle is added to a grid cell, that cell is instead to exploit the spatial proximity of a vehicles predicted
recursively divided into four subcells and all the p + 1 vehicles position across the time frame to update the data structure incre-
are reassigned to the new child cells until the invariant is mentally, instead of recomputing the entire data structure again.
satisfied. We then moved to more complex representations and again
The above design can be represented as a Quad Tree with tried to solve the collision problem. The number of vehicles
the root of the tree representing a bounding rectangular plane that can be present at a traffic intersection is fairly limited,
that will contain any given rectangle (vehicle). Each leaf of so the lower time complexity of advanced collision-detection
the tree will have a maximum of p vehicles. In addition to methods, such as those outlined in [12], does not translate to
the data required to represent a tree, each node v of the Quad better run-time performance because of the overhead imposed
Tree will hold the rectangular quadrant R(v) that it represents. by preprocessing steps and the use of advanced data structures.
Each leaf of the Quad Tree will additionally hold pointers to Thus, we opted to test all vehicle pairs in a scene for possible
the list of rectangles it contains (maximum of p). Finally at collision over a time interval and focused on maximizing the
this stage, we experimented with the popular interval-based performance of the individual tests.
approach. Intersection of convex polygons is one of the well- Our method is based on the idea of extruding the bases
studied problems in computational geometry. The intersection of vehicles along a time axis. The extrusion of a rectangle
of oriented rectangles is a special case of the above problem. It moving from a point p0 to the point pL over L time units is
is shown that the intersection of simple polygons is linear-time a parallelepiped like the one shown in Fig. 6(a). A collision oc-
transformable to the line-segment intersection testing problem. curs if and only if the parallelepipeds representing two vehicles
To find pairs of intersecting rectangles in a plane, the problem overlap. Fig. 6(b) illustrates this and introduces our notation for
is converted into a line-segment overlap problem. the following discussion.
All of the rectangle-intersection algorithms described above Two convex polytopes are disjoint if there exists an axis
are used to compute collisions at a particular time instance. on which their projections are disjoint. The Separating Axis
ATEV et al.: A VISION-BASED APPROACH TO COLLISION PREDICTION AT TRAFFIC INTERSECTIONS 421
Fig. 7. Example illustrating (1). In this particular case, the objects are separated by the axis.
TABLE I TABLE II
SIMILAR TERMS IN THE SEPARATING-AXIS TEST COMPUTING THE NUMBER OF NEAR-MISSES IN A SPARSE TRAFFIC
INTERSECTION MONITORING 60 VEHICLES OVER 4000 FRAMES
FOR D IFFERENT V ALUES OF I NTER V EHICULAR D ISTANCE d
[4] Y. Bar-Shalom and T. E. Fortmann, Tracking and Data Association. New Hemanth Arumugam received the M.S. degree in computer science from the
York: Academic, 1987. Department of Computer Science at the University of Minnesota, Minneapolis,
[5] A. Bevilacqua, Effective object segmentation in a traffic monitoring in 2004.
application, in Proc. 3rd Int. Association Pattern Recognition (IAPR) He has research interests in computational geometry and its applications to
Indian Conf. Computer Vision, Graphics and Image Processing, Ahmed- computer vision and bioinformatics.
abad, India, 2002, pp. 125130.
[6] B. Coifman, D. Beymer, P. McLauchlan, and J. Malik, A real-time
computer vision system for vehicle tracking and traffic surveillance,
J. Transp. Res., Part C, vol. 6, no. 4, pp. 271288, 1998.
[7] D. Comaniciu, V. Ramesh, and P. Meer, Kernel-based object tracking,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 564577, Osama Masoud received the B.S. and M.Sc. degrees
May 2003. in computer science from King Fahd University of
[8] B. Ferlis, Analysis of infrastructure-based system conceptsIntersection Petroleum and Minerals, Dhahran, Saudi Arabia, in
collision avoidance problem area, 1999. Unpublished FHWA document. 1992 and 1994, respectively, and the Ph.D. degree in
[9] S. Gottschalk, M. C. Lin, and D. Manocha, OBB-Tree: A hierarchical computer science from the University of Minnesota,
structure for rapid interference detection, Comput. Graph., vol. 30, no. 3, Minneapolis, in 2000.
pp. 171180, 1996. He is currently a Research Associate at the De-
[10] J. Kato, T. Watanabe, S. Joga, J. Rittscher, and A. Blake, An HMM-based partment of Computer Science and Engineering at
segmentation method for traffic monitoring movies, IEEE Trans. Pattern the University of Minnesota. In the past, he was a
Anal. Mach. Intell., vol. 24, no. 9, pp. 12911296, Sep. 2002. Postdoctoral Associate at the same department and
[11] D. Koller et al., Towards robust automatic traffic scene analysis in real- served as the Director of Research and Development
time, in Proc. 12th Int. Conf. Pattern Recognition (ICPR), Jerusalem, at Point Cloud Inc., Plymouth, MN. His research interests include computer
Israel, 1994, pp. 126131. vision, robotics, transportation applications, and computer graphics.
[12] M. Lin and S. Gottschalk, Collision detection between geometric mod- Mr. Masoud is the recipient of a Research Contribution Award from the
els: A survey, in Proc. Institute Mathematics and Applications (IMA) University of Minnesota, the Rosemount Instrumentation Award from
Conf. Mathematics Surfaces, Birmingham, U.K., 1998, pp. 3756. Rosemount Inc., and the Matt Huber Award for Excellence in Transportation
[13] S. Lu, D. Metaxas, D. Samaras, and J. Oliensis, Using multiple cues Research. One of his papers (coauthored by N. P. Papanikolopoulos) was
for hand tracking and model refinement, in Proc. Computer Vision and awarded the IEEE VTS 2001 Best Land Transportation Paper Award.
Pattern Recognition Conf., Madison, WI, 2003, pp. 443450.
[14] S. Kamijo, Y. Matsushita, K. Ikeuchi, and M. Sakauchi, Occlusion robust
tracking utilizing spatio-temporal Markov random field model, in Int.
Conf. Pattern Recognition, Barcelona, Spain, 2000, vol. 1, pp. 140144.
[15] P. Mahalanobis, On the generalized distance in statistics, in Proc. Nat. Ravi Janardan (M00SM01) received the Ph.D.
Institute Science India, Bhavnagar, India, 1936, vol. 12, pp. 4955. degree in computer science from Purdue University,
[16] O. Masoud and N. Papanikolopoulos, Using geometric primitives to West Lafayette, IN, in 1987.
calibrate traffic scenes, in Proc. IEEE/RSJ Int. Conf. Intelligent Robots He is Professor of Computer Science and Engi-
and Systems (IROS), Sendai, Japan, 2004, pp. 18781883. neering at the University of MinnesotaTwin Cities.
[17] R. Van der Merwe, J. de Freitas, A. Doucet, and E. Wan, The unscented His research interests are in the design and analysis
particle filter, in Advances Neural Information Processing Systems, of geometric algorithms and data structures, and
Denver, CO, 2000, pp. 584590. their application to problems in a variety of areas,
[18] T. Penney, Intersection Collision Warning System, 1999. Pub. No. FHWA- including computer-aided design and manufacturing,
RD-99-103. transportation, very-large-scale-integration (VLSI)
[19] P. Prez, J. Vermaak, and A. Blake, Data fusion for visual tracking with design, bioinformatics, and computer graphics. He
particles, Proc. IEEE, vol. 92, no. 3, pp. 495513, Mar. 2004. has published extensively in these areas.
[20] W. Power and J. Schoones, Understanding background mixture models
for foreground segmentation, in Proc. Imaging and Vision Computing
New Zealand (IVCNZ), Auckland, New Zealand, 2002, pp. 267271.
[21] H. Preston, R. Storm, M. Donath, and C. Shankwitz, Review of
Minnesotas rural intersection crashes: Methodology for identifying in-
tersections for Intersection Decision Support (IDS), Minnesota Dept. Nikolaos P. Papanikolopoulos (S88M93
Transp., St. Paul, MN, Tech. Rep. MN/RC-2004-31, 2004. SM01) was born in Piraeus, Greece, in 1964.
[22] C. Rasmussen and G. Hager, Probabilistic data association methods for He received the Dipl.Ing. degree in electrical and
tracking complex visual objects, IEEE Trans. Pattern Anal. Mach. Intell., computer engineering from the National Technical
vol. 23, no. 6, pp. 560576, Jun. 2001. University of Athens, Athens, Greece, in 1987,
[23] J. Sherrah and S. Gong, Fusion of perceptual cues using covariance the M.S.E.E. degree in electrical engineering from
estimation, in Proc. British Machine Vision Conf., Nottingham, U.K., Carnegie Mellon University (CMU), Pittsburgh, PA,
1999, pp. 564573. in 1988, and the Ph.D. in electrical and computer
[24] C. Stauffer and W. Grimson, Adaptive background mixture models for engineering from CMU in 1992.
real time tracking, in Proc. Computer Vision and Pattern Recognition Currently, he is a Professor in the Department of
(CVPR), Fort Collins, CO, 1999, p. 252. Computer Science at the University of Minnesota,
[25] Y. Wu and T. Huang, Robust visual tracking by integrating multiple Minneapolis, and the Director of the Center for Distributed Robotics. He
cues based on co-inference learning, Int. J. Comput. Vis., vol. 58, no. 1, was a McKnight Land-Grant Professor at the University of Minnesota for the
pp. 5571, 2004. period 19951997. He was the recipient of the Kritski Fellowship in 1986
and 1987. His research interests include robotics, sensors for transportation
applications, control, and computer vision. He has authored or coauthored
more than 190 journal and conference papers in the abovementioned areas
(47 refereed journal papers).
Stefan Atev received the B.A. degree in com-
Dr. Papanikolopoulos was a finalist for the Anton Philips Award for Best
puter science and mathematics from Luther College,
Student Paper in the 1991 IEEE International Conference on Robotics and
Decorah, IA, in 2003. He is working towards the
Automation, the recipient of the NSF Research Initiation and Early Career
Ph.D. degree in computer science at the University
Development Awards and the Faculty Creativity Award at the University of
of Minnesota, Minneapolis.
Minnesota in 19951997, and the recipient of the Best Video Award in the
He has research interests in real-time computer-
2000 IEEE International Conference on Robotics and Automation. One of his
vision systems, video surveillance, and image
papers (coauthored by O. Masoud) was awarded the IEEE VTS 2001 Best Land
processing.
Transportation Paper Award. Finally, he received grants from DARPA, DHS,
Sandia National Laboratories, NSF, Microsoft, INEEL, U.S. Army, U.S. Air
Force, USDOT, MN/DOT, Honeywell, and 3M.