Sunteți pe pagina 1din 36

International Institute of Information Technology B. Tech Project Report

2007

Dynamic Analysis From Continuous Video

Srikanth Cherla (200301103) Nikhil Walecha (200301066)

Dynamic Event Analysis From Continuous Video

Srikanth Cherla Nikhil Walecha

C. V. Jawahar

2007

Abstract

This project report describes the work done as a part of our B. Tech Project. The project is titled “Dynamic Event Analysis From Continuous Video”. The project started as a tool for analyzing different aspects of a video such as foreground objects, motion of these objects, etc. given the video. In this tool, it was possible to select a particular object in the scene, remove it from the scene, add it to another scene and track a selected object as long as it is in the video. Later this project was applied in detecting violations that occurred on roads. Violations involving driving on the wrong side of the road were successfully detected with the system. The other two types of violations – Signal Jumping and speeding, have been implemented but there is still scope for improvement. In completing this project, we made use of several interesting concepts that are explained and cited as references. Our system is initially given an input video, which is nothing but a continuous sequence of images displayed successively at a certain frame rate. In each frame, we first apply background subtraction using adaptive Gaussian mixture model to obtain foreground pixels corresponding to objects such as cars, pedestrians, motorcycles, etc. We then use connected components to cluster these pixels and obtain foreground objects. These objects are validated as per our requirements and wrong detections are rejected. The motion details of each of the detected and validated objects is obtained with the help of the Lucas­ Kanade optical flow algorithm. Based on these motion details, violations are detected. High resolution images of the violators' number plates are automatically taken using high­zooming cameras. Currently, we are working on extracting the license plate from these images and obtaining the number on the license plate. A few other enhancements were made which have also been explained.

Declaration of Originality

This report is our own unaided work and was not copied from nor written in collaboration with any other person.

Signed:

Acknowledgments

We take this opportunity to express our profound gratitude and respect to various people who gave us incredible support in terms of resources and knowledge to complete our project smoothly over the past two semesters

We would like to thank Dr. C. V. Jawahar (Project Advisor) for giving us an opportunity to work on this interesting project through which we learned many new things. We would also like to thank Dr. Vishal Garg for his incessant support and encouragement throughout the course of this project.

Table of Contents

Topic

Page Number

1. Introduction

7

2. Motivation

7

3. Traffic Surveillance System

8

3.1 Wrong Side Violations

8

3.2 Vehicle Speeding

12

3.3 Signal Jumping

13

3.4 Object Removal

18

3.5 Object Insertion

19

4. Concepts Used

17

4.1 Background Subtraction using Adaptive Gaussian Mixture Model

17

4.2 Scale Invariant Feature Transform

22

4.3 Lucas­Kanade Optical Flow Algorithm

25

4.4 A Comparison between optical flow and SIFT

26

4.5 License Plate Detection

26

4.6 Enhancement of results using shadow detection

29

5. Problems Faced While Testing

31

6. Future Work

32

7. Conclusions

35

8. References

35

1. Introduction

As a scientific discipline, computer vision is concerned with the theory and technology for building artificial systems that obtain information from images or multi­dimensional data. Simulating human vision in a computer has always been the primary goal for computer vision researchers. Over the years, there have been several advances in various fields of Computer/ Machine vision like Face Recognition, Palm Print Recognition, Optical Character Recognition, etc. All these advances take us closer to achieving the afore mentioned goal. Computer vision concepts have been immensely useful in surveillance, organizing information and modeling objects or environments. Interdisciplinary exchange between biological and computer vision has proven increasingly fruitful for both fields.

vision has proven increasingly fruitful for both fields. A figure depicting various applications of image processing,

A figure depicting various applications of image processing, computer vision and machine vision

2. Motivation

Crime prevention and detection in places like banks, stores, museums, public places, etc. is very important. For this reason, continuous surveillance of a scene is needed in such places. The first step towards fulfilling the need for surveillance was achieved by employing guards and police personnel to keep a watch over a certain place over a period of time. A guard being present in a room to keep and

eye on whatever is happening helped achieve this requirement to some extent. The next step was to have cameras installed in different parts of a place to be able to cover more ground. Whatever these cameras recorded was monitored, once again, by security personnel manually. Today, it is possible to even automate surveillance of a particular scene. The motivation of this project is to develop a system that can dynamically analyze events in a scene over a period of time. This will help in automated observation of events taking place in a scene and extraction of information about these events. The system developed by us is for traffic surveillance. It uses the visual information obtained from the video of a road and checks for driving on the wrong side of the road violations.

3. Traffic Surveillance System

A system that monitors and detects violations on the road was implemented. Three kinds of violations were mainly targeted

Driving on the wrong side of the road.

Jumping the stop signal.

Speeding.

The results for the wrong side violation detection system were satisfactory. After testing, it

speeding and signal

appeared that there was a good scope of improving the other two components (i. e jumping) of the system.

3.1 Wrong Side Violations

The first goal in developing a traffic surveillance system was to identify violations that involved driving on the wrong side of the road. The idea was to use a camera with a wide angle to detect violations and then take a photograph of the number­plate of the violators' vehicles using a high­ zooming camera. The equipment that we used for this was

IP Camera (D­Link DCE 900)

Two high zooming cameras (Canon Powershot A60)

A server (To control the IP camera and store images recorded by it)

Two laptops (To control the two high­zoom cameras)

A Hub (To connect the above components)

The area we chose is a T­ junction near the IIIT. We observed that there were many traffic violations (driving on the wrong side of the road) taking place here throughout the day and that's why we chose this location. The IP camera was mounted on one side of the road on a pole, about 15 feet above the ground so that it had a clear view of the road and the junction. The camera was placed such that vehicles moving away from the camera were moving in the right direction and those moving towards the camera, the wrong direction. A view of the road through the IP camera is shown in the figure below

3.1.1 Placing the Camera

The IP camera was mounted on an electricity pole as shown in the picture. A

The IP camera was mounted on an electricity pole as shown in the picture.

camera was mounted on an electricity pole as shown in the picture. A view of the

A view of the pole on which the IP camera was mounted.

S e tu p

Towards Detecti ISB on Region ing CMC IIIT Hyderabad Towards HCU Towards Indiranagar
Towards
Detecti
ISB
on
Region
ing
CMC
IIIT
Hyderabad
Towards
HCU
Towards
Indiranagar

Our

Setup-IP

Camera,

High-

zoom

cameras

The first high­zoom camera was placed below the IP camera on a platform about 5 feet above the ground so that it had a good view of the number plates of violators. The second one was places at about the same height from the ground but around 10 feet in front of the first camera. The reason behind doing this was that the first camera would be able to capture images of number­plates of violators who move fast and whom the second camera would miss. The second camera will be able to get pictures of the slower violators.

The IP camera, the main server (connected to the IP camera), the two laptops were given the following Ips

IP Camera – 192.168.0.20

Server – 192.168.0.10

High­zoom camera 1 laptop – 192.168.0.15

High­zoom camera 2 laptop – 192.168.0.16

View from the IP Camera View from one of the High­zooming cameras

View from the IP Camera

View from the IP Camera View from one of the High­zooming cameras

View from one of the High­zooming cameras

The above four IPs were connected to each other with the help of the hub. The IP camera transferred images to 192.168.0.10 in the MJPEG format. The server 192.168.0.10 had a program running on it, which did the background subtraction and tracked the objects from the MJPEG video. The two laptops (192.168.0.15 and 192.168.0.16) waited for a signal from the server when there was a violation and captured images of the number­plates of the violators. We wrote a simple program to enable communication between the server and the laptops. The number plate capturing function was run in a separate thread so as to not halt the violation detection program. The image processing algorithms used by us were able to detect foreground objects in the video and find out their path of motion. For each image that the IP camera captured, our core violation detection algorithm was as follows

for ( each object in the scene)

{

check if object satisfies violation conditions from its motion; if (violation)

{

set camera 1 status to “busy”; run thread to capture with camera 1; set camera 2 status to “busy”; run thread to capture with camera 2;

}

}

Each of the threads worked as follows

function capture()

{

capture image of the violator; set camera status to “free”; exit thread;

}

The details of the above described algorithm will be presented in the following sections.

3.2 Vehicle Speeding

3.2.1 Method

Here we have used camera which can capture frames at 30 fps We stored the following parameters

The number of Frames in which the object occurred

Starting Point

Current Point

Speed of a car passing by. Our camera is placed next to the road such

Speed of a car passing by.

Our camera is placed next to the road such that the direction of motion of vehicles is parallel to the lens of the camera. As we know the frame rate of the video (30 fps), the speed can be computed. Here we have multiplied by some constant so as to give some reasonable speed

speed = distance/time.

We measure the distance traveled by a point on the vehicle from the left of the video to the right. With a knowledge of the frame rate, we can measure the time for which the vehicle was in the video. The values can be applied in the above equation to get an initial value of speed v1. In order to get the actual velocities of vehicles, we multiply v1 with a constant which would give us their approximate speed.

3.3 Signal Jumping

3.3.1 Method

In this we were suppose to detect the violators which jump the traffic signals. Our system for detecting the object will work here as well. We keep a check on the traffic light and then check in which

of the case vehicles are suppose to move and in which cases vehicles are suppose to stop at the signal.

Ideally our camera should have been at the top of some pole to detect the violator line but here we manually took the video and marked the line which gave somewhat near results

This is the view of the traffic light near Gachibowli

Both Red denotes ( for the opposite traffic ) traffic can move

One Green and One Red denotes ( for the opposite traffic ) traffic has to stop

One Red and One Green denotes traffic has to stop

One Red denotes ( for the opposite traffic ) traffic has to stop ● One Red
One Red denotes ( for the opposite traffic ) traffic has to stop ● One Red
3.3.2 Violations One Violator Detected Bounding Box crossed the line when the signal is not

3.3.2 Violations

3.3.2 Violations One Violator Detected Bounding Box crossed the line when the signal is not GO

One Violator Detected Bounding Box crossed the line when the signal is not GO

One Red light and one Orange light : Traffic should stop Our approach to detect

One Red light and one Orange light : Traffic should stop

Our approach to detect the violators is based on if the bounding box crosses the line but here after sometime when object become to merge into background bounding box start diminishing and also since bike and auto are overlapping bounding box is formed is overlapped

3.4 Object Removal

In the tool that we developed, the following functionalities were implemented

Removal of foreground objects from the video by clicking on them.

Addition of foreground objects removed from a different scene into the current scene.

Tracking objects in a scene by clicking on them.

current scene. ● Tracking objects in a scene by clicking on them. Original Frame Background Subtracted

Original Frame

Background Subtracted

Object Removed

Here Green Dot denotes that we have selected the object in the first frame

In

Second Frame only the object we have selected is there and we have removed the remaining

 

area

In

Third Frame we have removed that object from the frame and replaced it with buffer frame,

frame with only background. We have marked a red bounding box around it to show that we have removed this object

3.5 Object Insertion :

show that we have removed this object 3.5 Object Insertion : Original Frame Background Subtracted Object

Original Frame

Background Subtracted

Object Inserted

Similarly object could be inserted in some other frame

4. Concepts used

In completing this project, several interesting concepts were used. They have been explained

below.

4.1 Background Subtraction Using Adaptive Gaussian Mixture Model

4.1.1 Method

A system used for video surveillance should be robust to whatever is in its visual field, or

whatever lighting effects occur. It should be capable of dealing with movement through cluttered areas, objects overlapping in the visual field, shadows, lighting changes, effects of moving elements of the scene (e.g. Swaying trees), slow­moving objects, and objects being introduced or removed from the scene. Traditional approaches based on backgrounding methods typically fail in these general situations. We need to create a robust, adaptive tracking system that is flexible enough to handle variations in lighting, moving scene clutter, multiple moving objects and other arbitrary changes to the observed scene.

A common method for real­time segmentation of moving regions in image sequences involves

“background subtraction,” or thresholding the error between an estimate of the image without moving objects and the current image. A common approach to this problem is to directly use the pixel values

obtained from the image. These pixel values are compared through successive frames and foreground objects are obtained. In this method, each pixel is modeled as a mixture of Gaussians and using an on­ line approximation to update the model. The Gaussian distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution, which represents it most effectively, is considered part of the background model.

Rather than explicitly modeling the values of all the pixels as one particular type of distribution, the values of a particular pixel are modeled as a mixture of Gaussians. Based on the persistence and the variance of each of the Gaussians of the mixture, Gaussians corresponding to background colors are determined. Pixel values that do not fit the background distributions are considered foreground until there is a Gaussian that includes them with sufficient, consistent evidence supporting it. Also, repetitive variations are learned, and a model for the background distribution is generally maintained even if it is temporarily replaced by another distribution which leads to faster recovery when objects are removed. Our method contains two significant parameters – α , the learning constant and T, the proportion of the data that should be accounted for by the background.

Due to lighting changes and multiple surfaces in the view frustum of a particular pixel, multiple, adaptive Gaussians are necessary. We use a mixture of adaptive Gaussians to approximate this process. Each time the parameters of the Gaussians are updated, the Gaussians are evaluated using a simple heuristic to hypothesize which are most likely to be part of the “background process.” Pixel values that do not match one of the pixel’s “background” Gaussians are grouped using connected components. Finally, the connected components are tracked from frame to frame and foreground object identities are obtained.

We consider the values of a particular pixel over time as a “pixel process”. The “pixel process” is a time series of pixel values, e.g. scalars for gray values or vectors for color images. At any time, t, what is known about a particular pixel, {x0 , y0 }, is its history

{X 1 ,

where I is the image sequence.

, X t } = {I (x 0 , y 0 , i) : 1 i t}

More recent observations are given more importance in determining the Gaussian parameter estimates. An additional aspect of variation occurs if moving objects are present in the scene. Even a relatively consistently colored moving object is generally expected to produce more variance than a “static” object. Also, in general, there should be more data supporting the background distributions because they are repeated, whereas pixel values for different objects are often not the same color.

The recent history of each pixel, {X 1 ,

, X t }, is modeled by a mixture of K Gaussian

distributions. The probability of observing the current pixel value is

, X t }, is modeled by a mixture of K Gaussian distributions. The probability of

where K is the number of distributions, is an estimate of the weight (what portion of the data is accounted for by this Gaussian) of the i th Gaussian in the mixture at time t, is the mean value of the

i th Gaussian in the mixture at time t,

time t, and where

is the covariance matrix of the ith Gaussian in the mixture at

covariance matrix of the ith Gaussian in the mixture at is a Gaussian probability density function
covariance matrix of the ith Gaussian in the mixture at is a Gaussian probability density function
covariance matrix of the ith Gaussian in the mixture at is a Gaussian probability density function
covariance matrix of the ith Gaussian in the mixture at is a Gaussian probability density function

is a Gaussian probability density function

in the mixture at is a Gaussian probability density function K is determined by the available

K is determined by the available memory and computational power. Currently, from 3 to 5 are used. Also, for computational reasons, the covariance matrix is assumed to be of the form:

reasons, the covariance matrix is assumed to be of the form: Thus, the distribution of recently

Thus, the distribution of recently observed values of each pixel in the scene is characterized by a mixture of Gaussians. A new pixel value will, in general, be represented by one of the major components of the mixture model and used to update the model.

Every new pixel value, X t , is checked against the existing K Gaussian distributions, until a match is found. A match is defined as a pixel value within 2.5 standard deviations of a distribution. This is effectively a per pixel/per distribution threshold. A uniform threshold often results in objects disappearing when they enter shaded regions. If none of the K distributions match the current pixel value, the least probable distribution is replaced with a distribution with the current value as its mean value, an initially high variance, and low prior weight. The prior weights of the K distributions at time t, , are adjusted as follows

the K distributions at time t, , are adjusted as follows where is the learning rate
the K distributions at time t, , are adjusted as follows where is the learning rate

where is the learning rate and M k,t is 1 for the model which matched and 0 for the remaining models. After this approximation, the weights are renormalized. 1/ defines the time constant which determines the speed at which the distribution's parameters change. The and parameters for unmatched distributions remain the same. The parameters of the distribution which matches the new observation are updated as follows

which matches the new observation are updated as follows where the second learning rate, , is
which matches the new observation are updated as follows where the second learning rate, , is
which matches the new observation are updated as follows where the second learning rate, , is
which matches the new observation are updated as follows where the second learning rate, , is
which matches the new observation are updated as follows where the second learning rate, , is

where the second learning rate,

are updated as follows where the second learning rate, , is One of the significant advantages

, is

are updated as follows where the second learning rate, , is One of the significant advantages

One of the significant advantages of this method is that when something is allowed to become part of the background, it doesn't destroy the existing model of the background. The original background color remains in the mixture until it becomes the K th most probable and a new color is observed. Therefore, if an object is stationary just long enough to become part of the background and

then it moves, the distribution describing the previous background still exists with the same

but a lower

previous background still exists with the same but a lower and will be quickly re-incorporated into

and will be quickly re-incorporated into the background.

and will be quickly re-incorporated into the background. and , First, the Gaussians are ordered by

and

,
,
will be quickly re-incorporated into the background. and , First, the Gaussians are ordered by the

First, the Gaussians are ordered by the value of . This value increases both as a distribution gains more evidence and as the variance decreases. After re-estimating the parameters of the mixture, it is sufficient to sort from the matched distribution towards the most probable background distribution, because only the matched models relative value will have changed. This ordering of the model is effectively an ordered, open-ended list, where the most likely background distributions remain on top and the less probable transient background distributions gravitate towards the bottom and are eventually replaced by new distributions. Then the first B distributions are chosen as the background model, where

B distributions are chosen as the background model, where where T is a measure of the

where T is a measure of the minimum portion of the data that should be accounted for by the background. This takes the “best” distributions until a certain portion, T, of the recent data have been accounted for.

The method described above allows us to identify foreground pixels in each new frame while updating the description of each pixel's process. These labeled foreground pixels can then be segmented into regions by a two-pass, connected components algorithm. Because this procedure is effective in determining the whole moving object, moving regions can be characterized not only by their position, but size, moments, and other shape information. Not only can these characteristics be useful for later processing and classification, but they can aid in the tracking process.

4.1.2 Results We obtained satisfactory results with background subtraction. The results are shown below.

Works Well in rain as well

We obtained satisfactory results with background subtraction. The results are shown below. Works Well in rain

Removing the effect of Swaying Trees

Removing the effect of Swaying Trees Swaying Tree in first frame Tree is part of a

Swaying Tree in first frame Tree is part of a background

Some Improvement in result

Improvements in Background Subtraction

Improvement in result Improvements in Background Subtraction After GMM ( FPS ~ 13­14 ) for (640

After GMM ( FPS ~ 13­14 ) for (640 x 480)

Subtraction After GMM ( FPS ~ 13­14 ) for (640 x 480) After Removing Noise (

After Removing Noise ( FPS ~ 9 ­ 10 )

After GMM ( FPS ~ 13­14 ) for (640 x 480) After Removing Noise ( FPS

After Shadow Removal ( FPS ~ 7­8 )

4.2 Scale Invariant Feature Transform

4.2.1 Method

This algorithm is based on SIFT Features proposed by David Lowe. The features are invariant to image scaling, translation and rotation. The features are partially invariant to illumination changes and 3D affine transformations.

To achieve rotation invariance and a high level of efficiency, we have chosen to select key locations at maxima and minima of a difference of Gaussian function applied in scale space. This can be computed very efficiently by building an image pyramid with re­sampling between each level. Furthermore, it locates keypoints at regions and scales of high variation, making these locations particularly stable for characterizing the image. As the 2D Gaussian function is separable, its convolution with the input image can be efficiently computed by applying two passes of the 1D Gaussian function in the horizontal and vertical directions.

Gaussian function in the horizontal and vertical directions. Thus the input image is first convolved with

Thus the input image is first convolved with sigma = 2 to give an image A1 and then convolved again to give A2. The difference of Gaussian function is obtained by subtracting image A2 from A1. To generate next level A2 is re­sampled 1.5 times using Bi­linear interpolation. Maxima and minima are determined by comparing each pixel with 8 neighbors in same level and 9 to levels above and below.

To characterize the image at each key location, the smoothed image A at each level of the pyramid is processed to extract image gradients and orientations. At each pixel the image gradient magnitude, M ij and orientation, R ij , are computed using pixel differences.

Object recognition approaches based on local invariant descriptors (features) have become increasingly popular and have experienced an impressive development in the past few years.

Recognition Process

Invariance against: scale, in­plane rotation, partial occlusion, partial distortion, partial change of point of view.

The recognition process consists on two stages:

1. scale­invariant local descriptors (features) of the frame is computed.

2. these descriptors are matched against descriptors of object prototypes , these object

are those obtained from background subtraction, when a user click on object.

But there are some problems in using this approach

I. Dimensionality problems

A given image can produce ~100­1,000 descriptors of 128 components (real values)

The model database can contain until 1,000­10,000 objects in some special applications

=> large number of comparisons => large processing time

II. Unwanted Noise It could be anything such as sharp edge, sharp corner, dark shadow which has some kind of resemblance with matching object. It is difficult to control this kind of noise, but one or two matches will not make much difference considering matches of object which are trying to match in frame

In the matching­generation stage, an input image gives another set of keypoints and vectors.

For each input descriptor, the first and second nearest descriptors are found.

Then, a pair of nearest descriptors (d,dFIRST) gives a pair of matched keypoints (p,pFIRST).

The match is accepted if the ratio between the distance to the first nearest descriptor and the distance to the second nearest descriptor is lower than a given threshold

This indicates that exists no possible confusion in the search results.

Matching & Storage of Local Descriptors

Accepted if: distance(d, dFIRST) < distance(d, dSEC)

4.2.2 PCA­SIFT

Matching using SIFT is not so good because of noisy points which disturbs matching, PCA­ SIFT improves matching by considering only high­dimension. It is used to lower the dimensionality of a dataset with a minimal information loss. We choose a new coordinate system with the first axis

axis. By

pointing in direction of the greatest variance in the dataset; accordingly for second, third,

eliminating axis with a low variance, the dimensionality is reduced but only little information is lost. Mathematically this is done by a eigenvector decomposition of the covariance matrix of this dataset

PCA­SIFT is a dimensionality reduced descriptor. Due to its high dimensionality and the computational cost caused by this, PCA can greatly improve SIFT. PCA­SIFT replaces the original SIFT­descriptor. The matrix used to project into the PCA­based n dimensional space will be called

projection matrix. The steps to creating a PCA­SIFT­descriptor are:

1. Compute or load a projection matrix

2. Detect keypoints

3. Project the image patch around the keypoint by multiplying it with the projection matrix

Computing the projection matrix

Select a representative set of pictures and detect all keypoints in these pictures

For each keypoint:

• Extract an image patch around it with size 41 x 41 pixels

• Calculate horizontal and vertical gradients, resulting in a vector of size 39 x 39 x 2 = 3042

Put all these vectors into a k x 3042 matrix A where k is the number of keypoints detected

Calculate the covariance matrix of A: A = A mean_A_covA= AT A

Building the descriptor

Input: a keypoint location in scale­space and an orientation

Extract a 41 x 41 patch around the keypoint at the given scale, rotated to its orientation

Calculate 39 x 39 horizontal and vertical gradients, resulting in a vector of size 3042

Multiply this vector using the precomputed n x 3042 projection matrix

This results in a PCA­SIFT descriptor of size n

4.2.3 Matching Correspondences

Our Algorithm Our algorithm is divided into two steps :

1)

2) When a user click on frame on some object, we find the connected components of that object using previously calculated background subtraction frames, and get the corresponding object from that. Now our next task is to match that object in frames which are cumulative after that frame, We match the object with background subtracted frames, and look for the object which has maximum number of matches with the object, now for the next frame, object corresponding to previous frame becomes the test object Image. In this way when object moves in the video object Image are stored for that particular object which we have selected in the beginning and thus matching becomes more efficient when we match the upcoming frames using previously stored object at different view and orientation.

We first calculate background subtracted frames and also the PCA­sift keypoint of that frames.

Input : Video containing different objects Intermediate Output : Background Subtracted Frames, PCA­Sift keypoints of those frames Final Output : Frame with object Removed, Frame with only object present

4.2.4 Some Problems we face while using this algorithm

Since Calculation of background subtraction and PCA­Sift Keypoints is a heavy task which also

depends on the size of the image, so larger the size of the frame, slower the preprocessing will be done. 2) We are using SIFT Executable of David Lowe's Algorithm but we have implemented PCA­Sift, requirement of PCA­Sift is Sift Keys, but both involve one common task which is heaviest, making Gaussian pyramid which are doing twice. If we also implement SIFT then processing time will be reduced by half. At present if takes around 2.5 second to calculate keys of 320x240 frame. 3) If an object size is less then 35 x 35 , for a low resolution object less number of keypoints will be found and thus matching will not be so good

1)

4.3 Object tracking using the Lucas­Kanade Optical Flow Algorithm

As our system was to be used for real­time purposes, we decided to use the Lucas­Kanade Optical Flow Algorithm instead of SIFT to track the objects identified using background subtraction.

In order to track the object, we choose the object's centroid and track the point as long as the object is in the scene. Optical Flow involves estimating the motion vectors of the point in the X and Y directions. We estimate the motion vectors (v x , v y ) at each pixel location by solving the Optical Flow Equation (OFE):

I x v x + I y v y + I t = 0

where I x , I y and It are the spatial­temporal derivatives of intensities we can compute for each frame in the image sequence.

The OFE is under­constrained (1 equation with 2 unknowns), but we can solve it by imposing additional constraints, for example by assuming neighboring pixels have the same motion. We used a 5 × 5 window, and use least squares to solve for the motion vectors where the summation is taken over a 5x5 pixel Gaussian window with σ of 1 pixel. Summing over a Gaussian window gives more weight to the center pixels and less to those at the peripheral, and the result is a weighted least squares solution [8]. The validity of the OFE rests upon two assumptions: constant intensity along motion trajectory and ‘small’ motion. The small motion assumption is often violated — for example, in video sequences, even the slower moving cars sometimes have velocities of 3­4 pixels/frame. This leads us to a multi resolution or hierarchical approach.

In a hierarchical framework, we first generate a Gaussian pyramid of images by iteratively filtering and sub­sampling by a factor of 2. We used a 5­by­5 separable filter (or generating kernel).

At each level of the pyramid, starting from the highest (lowest resolution) we (i) Project and interpolate estimates from previous level (ii) Pre­warp the neighborhood window according to the motion estimates computed from the previous level before performing least squares.

4.4

Comparison Between SIFT and Lucas­Kanade Optical Flow Algorithm

A comparison between the Lucas­Kanade Optical Flow algorithm revealed the following

SIFT matched correspondences between objects in successive frames whereas optical flow searched for the next position of a point in the object is successive frames.

The Optical flow algorithm was faster than SIFT in tracking objects in each frame.

SIFT had better results than the Optical Flow algorithm.

As our system had to achieve real­time performance, we used optical flow for object tracking.

4.5 License Plate Detection System

4.5.1 Rectangle Detection

Our objective is to store 4 vertices of the rectangle

Select the maximum ROI in the image with the width and height divisible by 2

Down­scale and Upscale the image to filter out the noise

Find Rectangle in every color plane of the image

Use Canny edge detector to threshold, canny helps to get rectangles with gradient shading

Dilate canny output to remove holes

Now find contours of the image

Filter out small contours

Since rectangle contours will have 4 vertices and will be convex

Now take the angles of the vertices and apply the threshold in angles to remove objects which are not near to rectangles

4.5.2 Rectangle Detection Results

The results we obtained for license plate extraction are shown below. The correct results are shown first.

Results The results we obtained for license plate extraction are shown below. The correct results are
Results The results we obtained for license plate extraction are shown below. The correct results are
Wrong Results Shape of the License plate

Wrong Results

Wrong Results Shape of the License plate

Shape of the License plate

Wrong Results Shape of the License plate
Wrong Results Shape of the License plate
Wrong Results Shape of the License plate

Small number plate of bikes are not detected as alignment is also one of the problem in these cases. Not much satisfactory results were obtained when we applied this process on real data i.e. Violators images which our system has captured from the camera.

4.5.2 License Plate Segmentation

This is to ensure that the rectangle detected is a license plate. This technique is based on the fact that the lines where the number plate is located in the image have a clear "signature" which makes it usually possible to distinguish them from other lines in the image, or at least to pre­select some positions where to look further. The figure below shows two such lines. The top image shows the positions of the cross sections (the white lines). The "signature" of the number plate can be observed in the bottom cross section.

number plate can be observed in the bottom cross section. It corresponds to strong grey level

It corresponds to strong grey level variations at somehow "regular" intervals.

in the bottom cross section. It corresponds to strong grey level variations at somehow "regular" intervals.
in the bottom cross section. It corresponds to strong grey level variations at somehow "regular" intervals.

An algorithm which analyses the maxima and minima of the cross section is used. The algorithm searches for a set of continuous maxima and minima that have some predefined characteristics (number, relative distances, amplitude, etc.). These characteristics are dynamically chosen from a set of predefined values, using statistical information. Once an horizontal line that crosses the number plate has been located, this information is used to define an area which should contain the number plate image.

4.6 Enhancement of results using shadow detection

It was observed that the object validation algorithm which found out whether the object detected is a pedestrian or a vehicle was not based on the shape of the bounding box created around the object was not working well because of shadows cast by objects in the scene. In order to overcome this difficulty, it was decided to implement a shadow detection/removal algorithm. The algorithm that we used was proposed by Horprasert et al. (cited in the “References” section) and uses a novel approach in identifying shadows based on color and intensity distortion of the pixel values. The approach is explained below.

4.6.1 Method

In the proposed method, a color model that separates the brightness from the chromaticity component is proposed. Consider a pixel, i, in the image; let E i = [ER(i); EG(i); EB(i)] represent the pixel's expected RGB color in the reference or background image. The line OE i passing through the origin and the point Ei is called expected chromaticity line. Next, let I i = [IR(i); IG(i); IB(i)] denote the pixel's RGB color value in a current image that we want to subtract from the background. Basically, we want to measure the distortion of I i from E i . This is done by decomposing the distortion measurement into two components, brightness distortion and chromaticity distortion, defined below.

distortion and chromaticity distortion, defined below. The brightness distortion (_) is a scalar value that brings

The brightness distortion (_) is a scalar value that brings the observed color close to the

expected chromaticity line. It is obtained by minimizing

expected chromaticity line. It is obtained by minimizing represents the pixel's strength of brightness with respect
expected chromaticity line. It is obtained by minimizing represents the pixel's strength of brightness with respect

represents the pixel's strength of brightness with respect to the expected value. the brightness of the given pixel in the current image is the same as in the reference image. than 1 if it is darker and greater than 1 if it becomes brighter than the expected brightness.

is 1 if is less
is lessis 1 if

Color distortion is defined as the orthogonal distance between the observed color and the expected chromaticity line. The color distortion of a pixel i is given by

line. The color distortion of a pixel i is given by The basic scheme is to

The basic scheme is to subtract the image from a reference image that models the background scene. Typically, the basic steps of the algorithm are as follows:

Background modeling constructs a reference image representing the background.

Threshold selection determines appropriate threshold values used in the subtraction operation to obtain a desired detection rate.

Subtraction operation or pixel classification classifies the type of a given pixel, i.e., the pixel is the part of background (including ordinary background and shaded background), or it is a moving object.

In the background training process, the reference background image and some parameters

associated with normalization are computed over a number of static background frames. The background is modeled statistically on a pixel by pixel basis. A pixel is modeled by a 4­tuple < E i ; s i ; a i ;

b i > where E i is the expected color value, s i is the standard deviation of color value. The expected color value of pixel i is given by

value. The expected color value of pixel i is given by where the three values in

where the three values in the vector are the arithmetic means of the ith pixel's red, green, blue values computed over N background frames. We obtain the brightness and chromaticity distortions as follows

green, blue values computed over N background frames. We obtain the brightness and chromaticity distortions as

And the components a i and b i are obtained as shown below

components a i and b i are obtained as shown below We classify the pixels using
components a i and b i are obtained as shown below We classify the pixels using

We classify the pixels using these values of a and b into the following categories

Original Background (B)

Shadowed background or shadow (S)

Highlighted background (B)

Moving foreground object (F)

As different pixels yield different distributions of the color and chromaticity distortions, we rescale these values for each pixel

distortions, we rescale these values for each pixel A pixel is classified into one of the

A pixel is classified into one of the above categories based on the classifications given below

above categories based on the classifications given below 5. Problems Faced While Testing We faced several

5. Problems Faced While Testing

We faced several problems while testing our system on real data

Detection of pedestrians : Our system, along with detecting vehicles, detected pedestrians also, which resulted in false alarms. To tackle this problem, we set a ratio threshold between the height and width of the detected object. If the height/ width ratio of the object was above the threshold, it was assumed to be a pedestrian. This helped reduced several, but not all, of such false alarms.

Positioning the high­zoom cameras : We had to experiment a little with the positions of the two high­zoom cameras so that they were able to capture violator number plates accurately.

Objects detected too far away : Some vehicles were detected too far away for both the high zoom cameras to get a clear picture of their number plates. To tackle this, we blocked the cameras from taking pictures till the detected vehicle reached a certain point in the image. The cameras were then signaled to take images.

Readability of number­plates : The high­zoom camera parameters such as zoom, shutter speed, aperture, image quality had to be experimented with in order to get images in which the number plates were readable.

Achieving real­time performance : Initially, our system failed to achieve real­time performance when tested. Several code­level and algorithm optimizations were made in order to reduce computation time.

6. Results

6.1 Violation Detection

We obtained satisfactory results with the wrong side violation detection system. But it appeared that our speeding and signal violation detection systems required improvement. The results shown below are images of wrong side violations that were detected by our system.

The results shown below are images of wrong side violations that were detected by our system.

Correct Detection

Correct Detection Wrong Detection due to pedestrians Images of license plates taken by the high­zooming

Correct Detection

Correct Detection Wrong Detection due to pedestrians Images of license plates taken by the high­zooming cameras

Wrong Detection due to pedestrians

Images of license plates taken by the high­zooming cameras are shown below. We were able to read the number on the license plate by zooming into the image manually owing to the high resolution of the cameras used.

Violator: License Plate Number AP13 W 4222 Violator: License Plate Number AP13 W 4222 Violator:

Violator: License Plate Number AP13 W 4222

Violator: License Plate Number AP13 W 4222

License Plate Number AP13 W 4222 Violator: License Plate Number AP13 W 4222 Violator: License Plate

Violator: License Plate Number AP9 AR 9945

Violator: License Plate Number AP28 T 9084 Violator: License Plate Number AP28 T 9084 7.

Violator: License Plate Number AP28 T 9084

Violator: License Plate Number AP28 T 9084

7. Conclusions

The objective of this project was, initially, to develop a tool to analyze different aspects of a video and to extract information from it, and later, to implement a Traffic Monitoring System that detects violations occurring on the road. The procedure involves extracting necessary information from an input video and using this information to monitor and detect violations. In accomplishing this, several important concepts such as background subtraction, SIFT, Optical Flow and LPRS are used. The wrong side violation detection system has given satisfactory results. There is still scope for improvement in the signal jumping and speeding violation systems. Such a system could be extremely useful in real­life to check traffic violations and upon making certain modifications, even in indoor environments for surveillance.

8. References

[1] C. Stauffer, W.E.L. Grimson. “Adaptive Background Mixture Models for Real­Time Tracking”. In CVPR ’99, Vol. 2, pp.246­252, June 1999.

[2]

Zoran Zivkovic, Improved Adaptive Gaussian Mixture Model for Background Subtraction, In Proc. ICPR, 2004

[3]

David G. Lowe. 2004. Distinctive image features from scale­invariant keypoints. International

Journal of Computer Vision, 2004.

[4]

Lucas­Kanade algorithm for optical flow.

[5]

J. Barroso, A. Rafael, E. L. Dagless, J. Bulas­Cruz, A temporal smoothing technique for motion

detection.

[6] Horprasert T., Harwood D., Davis L.S. a statistical approach for real­time robust background subtraction and shadow detection.