Application of Object Detection and Tracking Techniques For Unmanned Aerial Vehicles

Available online at www.sciencedirect.
com
ScienceDirect
Procedia Computer Science 61 (2015) 436 – 441
Complex Adaptive Systems, Publication 5

Cihan H. Dagli, Editor in Chief
Conference Organized by Missouri University of Science and Technology
2015-San Jose, CA
Application of Object Detection and Tracking Techniques for

Unmanned Aerial Vehicles
Shreyamsh Kamate, Nuri Yilmazer
Texas A&M University – Kingsville, Kingsville, TX 78363, USA
Abstract
In this research, the information captured by Unmanned Aerial Vehicles (UAVs) are eminently utilized in
detecting and tracking moving objects which pose a primary security threat against the United States southern
border. Illegal trespassing and border encroachment by immigrants is a huge predicament against the United States
border security force and the Department of Homeland Security. It becomes insurmountable to warranty suspicious
behaviour, monitoring by human operators for long periods of time, due to the massive amount of data involved.
The main objective of this research is to assist the human operators, by implementing intelligent visual surveillance
systems which help in detecting and tracking suspicious or unusual events in the video sequence. The visual
surveillance system requires fast and robust methods of detecting and tracking moving objects. In this research, we
have investigated methods for detecting and tracking objects from UAVs. Moving objects were detected using
adaptive background subtraction technique successfully and these detected objects were tracked by using Lucas-
Kanade optical flow tracking, Continuously Adaptive Mean-Shift tracking based techniques. The simulation results
show the efficacy of these techniques in detecting and tracking moving objects in the video sequences acquired by
the UAV.
©
© 2015
2015Published by Elsevier
The Authors. B.V. by
Published This is an open
Elsevier access article under the CC BY-NC-ND license
B.V.
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-reviewunder
Peer-review underresponsibility
responsibility of scientific
of scientific committee
committee of Missouri
of Missouri University
University of Science
of Science and Technology.
and Technology
Keywords: Visual Surveillance; Background Subtraction; Mean-Shift; Lucas-Kanade.
1. Introduction
With the rapidly growing importance in military and security applications visual surveillance1, 2 has become a
necessary area of research. It becomes cumbersome for human operators to monitor for long durations. In order to
1877-0509 © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of scientific committee of Missouri University of Science and Technology
doi:10.1016/j.procs.2015.09.183
Shreyamsh Kamate and Nuri Yilmazer / Procedia Computer Science 61 (2015) 436 – 441 437
identify important events in real-time in an avid manner, an intelligent visual surveillance system is proposed to
assist the human operators.
The goal of this research is to detect and track moving objects from Unmanned Aerial Vehicles (UAVs), in order
to protect the United States southern border from illegal border crossings. UAVs have played an important role in
modern wars and industries. Aerial surveillance has higher mobility and large surveillance scope in contrast to fixed
cameras. However, it has imperfections such as unstable background, low resolution and illumination changes.
Detection of moving objects3, 4 can be a daunting task and the objects can be detected using several methods such as
Histogram of Oriented Gradients (HOG) descriptors 5, 6, 7, background subtraction8, 9, 10, adaptive background
subtraction using Gaussian mixture models11, 12, 13.
The images were captured by a UAV called the “Phantom”, which is a multi-rotor aerial filming system and has a
unit range of 300 meters on the remote control and a speed of 6mph in the air. The camera is a GoPro Hero 3 black
edition which is mounted on the base of the Phantom and is capable of producing videos containing up to 30 frames
and 4K resolution. The acquired images are pre-processed to remove the noise and stabilize the images, before
proceeding to the object detection stage.
There are myriad of techniques for detecting and tracking objects of interest from a stationary camera. However,
dynamic camera makes it very onerous to detect objects. Object tracking is the consequent step in the process and is
one of the important components of many vision systems. It has numerous applications in traffic control, human-
computer interaction, digital forensics, gesture recognition, augmented reality and visual surveillance. For object
tracking the focus will be on Continuously Adaptive Mean-Shift tracking14, 15, Lucas-Kanade optical flow16, 17
tracking.
2. Methods
2.1 Background Subtraction
Background subtraction is the most widely used technique for object detection. The goal is to leave only the
foreground objects of interest by subtracting the background pixels in the scene. There are three stages in the
process: Background initialization: The initial phase of the background model until the background pixels are
stable is simple temporal frame differencing18. The temporal frame difference FDt(x, y) at time t is
FDt(x, y) = | It (x, y) - It -1(x, y) | (2.1)
where It(x, y) is the intensity of pixel (x, y) in the frame at time t.

The foreground binary mask is achieved by comparing FDt(x, y) to a threshold T1 which is empirically decided
on. A pixel is considered as having significant motion if the difference is greater than threshold and termed as
foreground.
FGt (x, y) = (2.2)
A pixel is regarded as “stable” or a background pixel if there is no huge movement identified i.e. FDt(x,y)<T1
for a specific number of frames, which is denoted by Tfr. Consider a frame count Cfr that is increased by 1, every
time FDt(x, y)<T1, at the point where Cfr > Tfr is analytically determined. We can utilize this pixel as a part of the
current frame to develop a model for the background.
BMt (x, y) = (2.3)
When the background data is available for a pixel i.e. BMt (x, y) is developed using equation (2.3) and the
foreground mask is determined using background differencing.
Background maintenance: The background difference frame is defined as
Dt(x, y) = |It(x, y) – BMt-1(x, y)| (2.4)

438 Shreyamsh Kamate and Nuri Yilmazer / Procedia Computer Science 61 (2015) 436 – 441
To compute the adaptive threshold18 an iterative process is proposed. The difference image Dt(x, y) is sectored in
two sections. The foremost threshold is established to be the center value of the magnitude of intensities, and then
the image is thresholded with this middle value. The intensities of sample means associated with the foreground and
background pixels are determined for each iteration. The average of the two sample means determines the new
threshold value. The iteration process concludes when the threshold stops changing. This threshold is denoted as
ThAd,t, and will be used in equation 2.5.
Foreground/background pixel classification: The next step in the process is to classify the pixels as
foreground/background using the current frame and background model. The regions of significant motion are
represented by a foreground mask by the threshold, ThAd, t. This threshold is achieved using the technique mentioned
above, and using the present difference image Dt(x, y), it is resolved for each frame adaptively. If the difference
between the two pixels is greater than the threshold, the movement of the pixel is considered significant. A
foreground binary mask is generated according to the below equation and the position of the moving pixels is
implied by the foreground mask.
FGt (x, y) = (2.5)
2.2 Continuously Adaptive Mean Shift Tracking (CAMShift)
Object tracking19 is initiated when an object enters a scene, and moving object detection will be terminated.
When the position and dimensions of an object in a video sequence are required, object tracking is applied and are
represented with one or many points by rectangles or ellipses and contours for indefinite objects.
CAMShift is based on mean-shift tracking technique and was initially proposed to track human-faces in a user
interface system20. This technique has an advantage that it adjusts the search window adaptively as compared to
mean-shift tracking. Mean-shift technique is a step wise technique, where a search window is chosen and that gives
us the antecedent location, type (Gaussian or uniform), shape (symmetric, rectangular or rounded) and size of the
object. The window’s center of mass is computed and this in turn is converged with the window’s center. These
steps are repeated until the window stops moving.
In a d-dimensional space Rd, consider a set {xi}i=1…n of n points, the multifarious kernel frequency estimation
with kernel k(||.||) and the window radius h2, calculated in the x is given as
(x) = (2.6)
The focus is on the gradient

(x) = (x-xi) (2.7)
Let g(x) = -k`(x), the derivative of the kernel.
(x) = [ i ][ x] (2.8)
where h is the window dimensions and the second part of the equation is the mean-shift vector.
The mean-shift vector re-centers the mean-shift window above the computed mass center contained by that
window. This is an iterative process until it converges to a mean-shift vector of zero. Moments can be used to find
the object’s new size and this gives the algorithm ability to update the object’s scale during tracking. Although
mean-shift algorithm is robust and efficient, it has certain set-backs which may not be ideal for tracking an
object from an UAV. CAMShift makes use of color histogram and texture of the object as features for tracking.
The video sequences are transformed to hue, saturation and intensities which will improve the exactness of
object tracking. After initializing the search window, the color histogram of the object is computed and saved as a
reference. To find the objects of interest or image segmentation, histogram backprojection21 is utilized. The
histogram of an image comprising the object of interest is created. The next step is to back-project the
histogram above the image at test where the object needs to be found which is calculating the probability of
entire pixels associated to the ground and showing it. The resultant on appropriate thresholding outputs the
ground. The next step is to calculate the object’s new size and location using mean-shift algorithm.
2.3 Lucas-Kanade Optical Flow Tracking
Without any prior knowledge of the frames, we often want to calculate the motion between two frames. With
each pixel some kind of velocity can be associated and known as dense optical flow. Sparse optical flow depends on
some ways of postulating the subdivision of points being tracked beforehand. These points usually possess
characteristics for instance “corners” which aid tracking and makes it relatively dependable and robust. The
computational cost of sparse tracking is relatively low as compared to dense tracking. The known technique for
tracking is Lucas-Kanade (LK) optical flow22.
The LK method is functional in sparse conditions. It banks exclusively on local information which is the
resultant from small window near every points of interest. The limitation of utilizing small windows in this method
is that huge movements can shift points out of the local window which makes it difficult for the algorithm to search.
The LK technique is grounded on three assumptions. The first requirement is that the pixels in one traced cover look
similar over period of time.
f (x, t) ≡ I (x(t), t) = I (x(t + dt), t + dt) (2.9)
This proves that the traced pixel concentration shows no alteration over time
=0 (2.10)
In temporal persistence, motions remain small from frame to frame which would be the second assumption. For
this situation we may begin with the illumination consistency equation, we standby the definition of illumination f
(x, t) while considering the definite requirement of x on t, I (x(t), t), and applying the chain rule for partial
differentiation. This results to
│ │= 0 (2.11)
where Ix is the spatial derivative transversely of the main (first) image, v is the velocity and It is the derivative
among images over time.
V=- (2.12)
The objective is to find the velocity at which movement at the edge takes place. There is one more drawback to
this technique, unstable image brightness; and time sets are not as wanton relative to the motion. Thus there is no
exact solution; however the closest thing to do is iterate to get an approximate answer. The spatial derivative in x
remains the same as calculated on the initial frame. The calculated spatial derivative gives important computational
savings. However, the time derivative should be computed in each iteration and frame and if close enough, then
these iterations shall join briefly. This is known as Newton’s method. When we add in the y co-ordinate, we can
hypothesize it to images in 2-D. When we apply this to an image in two dimensions we get
I xu + I yv + I t = 0 (2.13)
The equation contains two unknowns for any given pixel. The capacities will not suffice at a single pixel level.
We only see an edge, not a corner when a motion is detected. Since optical flow suffers from aperture problem, we
cannot determine how the object is moving. Surrounding pixels are often used to answer for motion of the central
pixels by the setting up a system of equations. Ideally we choose 5x5, because 3x3 window size might yield into
aperture problem again and 7x7 window size would be too large and it will violate the coherent motion assumption.
To solve this, 25 equations are set up as follows
=- (2.14)
440 Shreyamsh Kamate and Nuri Yilmazer / Procedia Computer Science 61 (2015) 436 – 441
The first matrix is represented as a 25x2 matrix A, the second and third matrix are represented as 2x1 matrixes d
and the matrix b which is 2x1 respectively. This provides more information than just an edge. Least-squares
minimizations of the equation are formulated to explain this system, where min‫צ‬Ad-b‫צ‬2 is determined in standard
form as
(ATA)d = ATb (2.15)
We obtain u and v motion from this relation. This results to
= - (2.16)
The solution to this equation yields
= (ATA)-1 ATb (2.17)
This can be resolved when (ATA) is invertible, and occurs when it has full rank (2), which materializes when it
has 2 eigenvectors. (ATA) will be culminating when the corner region in an image is centered over the tracking
window. This in turn can be solved using Harris corner detector. The scoring function in Harris corner detector is
given as
R = λ1 λ2 – K (λ1 + λ2)2 (2.18)
R = min(λ1, λ2) (2.19)
If it is greater than threshold, it is considered as a corner. The result of the experimentation demonstrates that the
number of the corner points increments and their dispersion is more symmetrical.
3. Experimental Results
Figure 1. (a) Original video sequence obtained across Texas A&M University – Kingsville from the Phantom UAV. (b) The mean background
image is calculated in the next window. (c) (d) The detected object shown in the foreground image mask and the contour in the foreground mask.
(e) A car being tracked using CAMShift tracking in a video sequence obtained from Phantom UAV.
Figure 2. (a) (b) Cars and a cyclist being tracked using CAMShift tracking in a video sequence obtained from Phantom UAV. The results also
display the minimum and maximum intensity and saturation values to be adjusted by the user to obtain optimum results . (c) (d) (e) Cars and
human-beings tracked using Lucas-Kanade Optical flow tracking in a video sequence obtained from Phantom UAV. The red dots indicate the
detection of corner points in a frame and then these points are tracked iteratively using Lucas-Kanade optical flow indicated by the blue dots and
borders.
In this section, we have used video sequences obtained from the Phantom UAV across Texas A&M University –
Kingsville. The simulation results were obtained on OpenCV® platform. The results were also tested on several
stabilized video sequences obtained from an online dataset as well as from the webcam of a computer. The results
were very precise while using the stable videos and also from the webcam as compared to the video sequences
obtained from the Phantom. However, the research is more oriented to detect and track moving objects from an
aerial view with a dynamic camera.
4. Conclusion
In this research, we applied adaptive background subtraction method for detection of moving objects.
Additionally, different algorithms for tracking moving objects such as Optical flow based Lucas-Kanade tracking;
CAMShift tracking were applied. The computational cost, speed and efficiency were considered as the major design
goals. The main objective was to design and develop intelligent visual surveillance systems to assist the human
operators to detect unusual events in the video sequence and responding to them rapidly. In this study, it was shown
that the objects in the images/videos taken by UAV effectively detected and successfully tracked using numerous
techniques and their performance has been compared. The performance of these algorithms is excellent when
applied to static camera views and certain video stabilization algorithms such as circular block matching techniques
can be applied to dynamic camera views in order to obtain optimum results.
5. References
[1] Celik, T.; Kabakli, T.; Uyguroklu, M.; Ozkaramanli, H.; Demirel, Hasan, "Automatic threshold selection for automated visual
surveillance," Signal Processing and Communications Applications Conference, 2004. Proceedings of the IEEE 12th , vol., no., pp.478,480, 28-
30 April 2004.
[2] Weiming Hu; Tieniu Tan; Liang Wang; Maybank, S., "A survey on visual surveillance of object motion and behaviors," Systems, Man, and
Cybernetics, Part C: Applications and Reviews, IEEE Transactions on , vol.34, no.3, pp.334,352, Aug. 2004.
[3] Yokoyama, T.; Iwasaki, T.; Watanabe, T., "Motion Vector Based Moving Object Detection and Tracking in the MPEG Compressed
Domain," Content-Based Multimedia Indexing, 2009. CBMI '09. Seventh International Workshop on , vol., no., pp.201,206, 3-5 June 2009.
[4] Yiyan Wang; Yuexian Zou; Hang Shi; He Zhao, "Video Image Vehicle Detection System for Signaled Traffic Intersection," Hybrid
Intelligent Systems, 2009. HIS '09. Ninth International Conference on , vol.1, no., pp.222,227, 12-14 Aug. 2009.
[5] Dalal, N.; Triggs, B., "Histograms of oriented gradients for human detection," Computer Vision and Pattern Recognition, 2005. CVPR 2005.
IEEE Computer Society Conference on , vol.1, no., pp.886,893 vol. 1, 25-25 June 2005.
[6] Hyung-Ji Lee; Jae-Ho Chung, "Hand gesture recognition using orientation histogram," TENCON 99. Proceedings of the IEEE Region 10
Conference , vol.2, no., pp.1355,1358 vol.2, Dec 1999.
[7] Del Bue, A.; Comaniciu, D.; Ramesh, V.; Regazzoni, C., "Smart cameras with real-time video object generation," Image Processing. 2002.
Proceedings. 2002 International Conference on , vol.3, no., pp.III-429,III-432 vol.3, 2002.
[8] Ali, S.S.; Zafar, M.F., "A robust adaptive method for detection and tracking of moving objects," Emerging Technologies, 2009. ICET 2009.
International Conference on , vol., no., pp.262,266, 19-20 Oct. 2009.
[9] Bouttefroy, P.L.M.; Bouzerdoum, A.; Phung, S.L.; Beghdadi, A., "On the analysis of background subtraction techniques using Gaussian
Mixture Models," Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on , vol., no., pp.4042,4045, 14-19
March 2010.
[10] Brutzer, S.; Hoferlin, B.; Heidemann, G., "Evaluation of background subtraction techniques for video surveillance," Computer Vision and
Pattern Recognition (CVPR), 2011 IEEE Conference on , vol., no., pp.1937,1944, 20-25 June 2011.
[11] Fazli, S.; Pour, H.M.; Bouzari, H., "Multiple object tracking using improved GMM-based motion segmentation," Electrical
Engineering/Electronics, Computer, Telecommunications and Information Technology, 2009. ECTI-CON 2009. 6th International Conference on ,
vol.02, no., pp.1130,1133, 6-9 May 2009.
[12] Zhen Tang; Zhenjiang Miao, "Fast Background Subtraction and Shadow Elimination Using Improved Gaussian Mixture Model," Haptic,
Audio and Visual Environments and Games, 2007. HAVE 2007. IEEE International Workshop on , vol., no., pp.38,41, 12-14 Oct. 2007.
[13] Han-Pang Huang; Chun-Ting Lin, "Multi-CAMSHIFT for Multi-View Faces Tracking and Recognition," Robotics and Biomimetics, 2006.
ROBIO '06. IEEE International Conference on , vol., no., pp.1334,1339, 17-20 Dec. 2006.
[14] Yi-Bo Li; Xiao-ling Shen; Shan-shan Bei, "Real-time tracking method for moving target based on an improved Camshift
algorithm," Mechatronic Science, Electric Engineering and Computer (MEC), 2011 International Conference on , vol., no., pp.978,981, 19-22
Aug. 2011.
[15] Chwan-Hsen Chen; Yung-Pyng Chan, "Real Time Multi-target Visual Tracking based on Velocity Segmentation Technique," Industrial
Electronics Society, 2007. IECON 2007. 33rd Annual Conference of the IEEE , vol., no., pp.2813,2817, 5-8 Nov. 2007.
[16] Ishii, I.; Taniguchi, T.; Yamamoto, K.; Takaki, T., "High-Frame-Rate Optical Flow System," Circuits and Systems for Video Technology,
IEEE Transactions on , vol.22, no.1, pp.105,112, Jan. 2012.
[17] Yunyi Liu; Zhiyan Bin, "The Improved Moving Object Detection and Shadow Removing Algorithms for Video
Surveillance," Computational Intelligence and Software Engineering (CiSE), 2010 International Conference on , vol., no., pp.1,5, 10-12 Dec.
2010.
[18] Rashed, D.M.; Sayed, M.S.; Abdalla, M.I., "Improved moving object detection algorithm based on adaptive background
subtraction," Electronics, Communications and Computers (JEC-ECC), 2013 Japan-Egypt International Conference on , vol., no., pp.29,33, 17-19
Dec. 2013.
[19] Serby, D.; Meier, E.K.; Van Gool, L., "Probabilistic object tracking using multiple features," Pattern Recognition, 2004. ICPR 2004.
Proceedings of the 17th International Conference on , vol.2, no., pp.184,187 Vol.2, 23-26 Aug. 2004.
[20] Brethes, L.; Menezes, P.; Lerasle, F.; Hayet, J., "Face tracking and hand gesture recognition for human-robot interaction," Robotics and
Automation, 2004. Proceedings. ICRA '04. 2004 IEEE International Conference on , vol.2, no., pp.1901,1906 Vol.2, April 26-May 1, 2004.
[21] Qiong Liu; Guang-zheng Peng, "A robust skin color based face detection algorithm," Informatics in Control, Automation and Robotics
(CAR), 2010 2nd International Asia Conference on , vol.2, no., pp.525,528, 6-7 March 2010.
[22] Cheng-Ming Huang; Yi-Ru Chen; Li-Chen Fu, "Real-time object detection and tracking on a moving camera platform," ICCAS-SICE,
2009 , vol., no., pp.717,722, 18-21 Aug. 2009.

Application of Object Detection and Tracking Techniques For Unmanned Aerial Vehicles

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Application of Object Detection and Tracking Techniques For Unmanned Aerial Vehicles

Încărcat de

Drepturi de autor:

Formate disponibile

Available online at www.sciencedirect.

Complex Adaptive Systems, Publication 5

Application of Object Detection and Tracking Techniques for

Keywords: Visual Surveillance; Background Subtraction; Mean-Shift; Lucas-Kanade.

2.1 Background Subtraction

FDt(x, y) = | It (x, y) - It -1(x, y) | (2.1)

where It(x, y) is the intensity of pixel (x, y) in the frame at time t.

BMt (x, y) = (2.3)

Dt(x, y) = |It(x, y) – BMt-1(x, y)| (2.4)

2.2 Continuously Adaptive Mean Shift Tracking (CAMShift)

The focus is on the gradient

Let g(x) = -k`(x), the derivative of the kernel.

2.3 Lucas-Kanade Optical Flow Tracking

We obtain u and v motion from this relation. This results to

S-ar putea să vă placă și