Documente Academic
Documente Profesional
Documente Cultură
M
i
n
+ g
i+1
. (1)
In Eq. (1), M
i+1
is the estimated moving average for pixel i + 1 having a gray-
scale value of g
i+1
and M
i
is the previous moving average value. If a pixel is less
than the moving average, it is set to black, otherwise it is set to white. As may
be seen in Fig. 2, the black and white image obtained from the moving average
algorithm provides an encouraging result. The next stage is to segment the objects
out of the now binary scene.
I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.
March 27, 2007 14:53 WSPC/157-IJCIA 00192
Detection of Persons in Cluttered Beach Scenes 153
Fig. 2. Object detection and segmentation process shown from left to right. The original image,
the image after applying the moving average algorithm, segmented images, image segment con-
taining the gray-scale image, and the edge detected image (not to scale).
Fig. 3. Object segmentation. The original image is shown on the left, and the segmented objects
on the right. The segmentation settings were r = 2 and n = 4.
Objects were segmented out of the binary image using a breadth rst search,
starting at the top left of the image. Pixels were grouped to form objects based on
their radius r from other pixels. In Fig. 3, the image contains seven distinct groups
of pixels, with some groups having only one pixel. If the above process is applied
with a radius r = 2 and a minimum number of pixels n = 4, then only the three
objects displayed on the right in Fig. 3 will be extracted.
The segmented components are now used to locate the objects in the original
(gray-scale) image scene. These objects are subsequently processed by the Canny
edge detection algorithm
22
and a feature vector for each object (created using the
modied direction feature (MDF) extractor) is presented to a multi-layer perceptron
(MLP) for classication.
3.1.2. Exhaustive search
The background extraction technique described above provides a fast method
for removing background information such as sand, sea and sky. Unfortunately,
foreground objects (blobs) detected using this technique can be joined to other
objects by shadows, the proximity to other objects, and occlusion. An exhaustive
search method is subsequently put forward to try and overcome the aforementioned
I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.
March 27, 2007 14:53 WSPC/157-IJCIA 00192
154 S. Green et al.
Fig. 4. Exhaustive search window increasing in size to locate person objects.
Fig. 5. Exhaustive search technique locating persons based on foreground objects revealed by
applying background extraction.
problems. The exhaustive search technique described in this research applies a
search window of increasing size over ROIs to detect person and non-person objects
(see Figs. 4 and 5).
As the name implies, the exhaustive search method is highly computationally
intensive. To reduce the search time, regions that were not of interest needed to
be excluded quickly. One method to exclude these regions quickly is to provide
a hierarchy of classiers, with the simplest and most ecient classier utilized
rst to remove non-objects. The proposed object detection system incorporates a
single classier (MLP). Regions containing non-objects are removed from the search
space by applying background extraction, and then building a summed-area table
(SAT)
23
of the binarized image. The SAT can then be used to quickly look at an
I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.
March 27, 2007 14:53 WSPC/157-IJCIA 00192
Detection of Persons in Cluttered Beach Scenes 155
ROI to see if it contains foreground pixels.
24
In order to provoke a response, an
ROI was required to contain between 20% and 80% of foreground pixels compared
with the maximum possible number of foreground pixels. This was an empirical
measurement based on observation. If an ROI is classied as containing a person
object, the region is removed from the search space by setting the foreground pixels
to white, and then rebuilding the SAT.
3.2. Modied direction feature
Modied direction feature (MDF) has been detailed elsewhere
25
and will only be
briey described here. MDF feature vector creation is based on the location of
transitions from background to foreground pixels in the vertical and horizontal
directions of an image represented by edges. When a transition is located, two
values are stored: the location of the transition (LT) and the direction transition
(DT) (see Fig. 6). An LT is calculated by taking the ratio between the position
where a transition occurs and the distance across the entire image in a particular
direction. The DT value at a particular location is also stored. The DT is calcu-
lated by examining the stroke direction of an objects boundary at the position
where a transition occurs (as dened in Blumenstein et al.
25
) Finally, a vector
comprising the [LT, DT] values in each of the four possible traversal directions
is created.
Fig. 6. In the gure, the MDF extraction technique examines the person object in the left-to-right
direction to determine the LT and DT values.
I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.
March 27, 2007 14:53 WSPC/157-IJCIA 00192
156 S. Green et al.
3.3. Neural-based classication
In this research, an interconnected, feed-forward MLP was adopted for the process
of classifying objects into person and non-person categories. The MLP was trained
using the backpropagation (BP) algorithm. In the experiments conducted, the BP-
MLP was trained with images processed by the MDF extraction technique.
4. Results
4.1. Image database
To test the person detection system on beach scenes using the MDF feature extrac-
tion technique, a database was created including people and non-people objects.
The image database consisted of 450 images of varying size. These images were
extracted from DVD recordings taken from a Coastalwatch web camera at Surfers
Paradise. Frames from the beach recordings were extracted, and image regions were
classied by a human operator into two categories; person and non-person.
The image database was then broken into three sets to facilitate three-fold cross-
validation, whereby each set was composed of 350 training images and 100 test
images. Cross-validation was performed to verify the results obtained for the vari-
ous neural network experiments.
4.2. Classier settings
As previously mentioned, the image database was divided into separate sets to
perform three-fold cross-validation. On each set, training was conducted using 8,
12, 16, 20, 24, and 28 hidden units, and the RMS error rate was recorded. An RMS
error rate of 0.001 was used as a stopping criterion, but in some cases the MLP did
not train to an RMS of 0.001. The results from all three sets were averaged and
can be seen in Table 1.
4.3. Classication results
In this section, results for the classication of person objects are presented in tabular
form. Table 1 presents the results obtained using MDF for processing the test image
set. The MLP parameters for training were modied to optimize the training result.
The test set classication rate was calculated based upon the number of successfully
recognized person objects (true positives). Also listed below is the number of non-
person items incorrectly labeled as person objects (false positives). The table shows
that, overall, the system provided a good result considering the complexity of the
beach scenes.
4.4. Discussion of classication results
As may be seen from Table 1, the best result was a classication rate of 85% for
people with a false positive rate of 5% using the MDF technique and an MLP
I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.
March 27, 2007 14:53 WSPC/157-IJCIA 00192
Detection of Persons in Cluttered Beach Scenes 157
Table 1. BP-MLP classication rates using the MDF extraction technique
Person Object Classication Rate (%)
No. of Hidden Units RMS Error True Positive False Positive
8 0.01375233 87 11
12 0.01390490 88 9
16 0.01404126 85 7
20 0.01414976 85 6
24 0.03212585 86 5
28 0.01438113 85 5
Fig. 7. Person detection results showing false positives (top row) and true positives (bottom
row); both sets of images are not to scale.
with 28 hidden units. Figure 7 shows some results for cases of false positives and
true positives. By observation of the data, people who were not identied correctly
were either lying on the beach, occluded by other objects (both person and non-
person), or were obscured by the surf. In the case of false positives, objects that
were classied as a person when the object was a non-person, tended to have an
outline that was upright, causing the MLP to give an incorrect classication.
4.5. Comparison between background extraction and exhaustive
search results
Experiments were carried out to compare both background extraction and exhaus-
tive search techniques. The experiments were conducted on 10 minutes of beach
footage recorded from a Coastalwatch web camera located at Surfers Paradise beach
in Queensland, Australia. The beach scene in all frames contained approximately
31 people. For both techniques, each frame was recorded with white rectangles rep-
resenting objects classied as a person (true positive), and black rectangles repre-
senting objects classied as non-person (false positive). In the case of the exhaustive
search, black regions were not shown due to the large number of these generated
by the exhaustive search method (see Fig. 8). In Table 2, it can be seen that the
background extraction technique did not deal with occlusion and shadowing eects
very well; however, it did manage to locate most areas containing person objects.
The background extraction technique had a very low false positive rate in that
there were very few non-person objects classied incorrectly. The exhaustive search
method gave a much better result for locating person objects that were occluded
by other objects and were obscured by shadowing eects, with a best result of
I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.
March 27, 2007 14:53 WSPC/157-IJCIA 00192
158 S. Green et al.
Fig. 8. Persons located in beach imagery using background extraction and exhaustive search
methods.
Table 2. Results for both background extraction and exhaustive
search methods, showing the number of people detected (true
positive) in each frame (average of 31 people in each frame).
Frame No. True Positive False Positive
Background extraction
10010 5 1
10080 8 1
10110 5 2
Exhaustive Search
10010 19 4
10080 22 4
10110 20 10
22 persons correctly located. The disadvantage of the exhaustive search method
was the substantial increase in false positive classications.
5. Conclusions and Future Research
In this paper, a person detection system for quantifying people in beach scenes has
been described. A human operator segmented actual beach data from a Surfers Par-
adise web camera into person and non-person categories. Tests were then carried out
on images using two techniques: (1) background extraction (2) and the exhaustive
I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.
March 27, 2007 14:53 WSPC/157-IJCIA 00192
Detection of Persons in Cluttered Beach Scenes 159
search technique using beach footage. The system incorporated a neural-based
classication technique for the detection of person and non-person objects. Encour-
aging results were presented for automated person detection and quantication,
which can be applied to beach scene analysis for surveillance and coastal manage-
ment applications.
Currently, the features described in this paper are not invariant to object rota-
tion. Future research will be directed to address this point. Further work will also
examine dierent object segmentation techniques, and determine ways to further
deal with images that are not correctly classied. Finally, frames retrieved from
beach media are at present processed individually. Future research will look at gen-
erating a condence value for a person object based on temporal data from past
frames. Also, motion detection may provide a more eective way of detecting person
objects in beach imagery.
Acknowledgments
The authors acknowledge the assistance of Chris Lane of CoastalWatch for assis-
tance in providing video imagery for this research.
References
1. M. Browne, M. Blumenstein, R. Tomlinson and C. Lane, An intelligent system for
remote monitoring and prediction of beach conditions, Proc. Int. Conf. Artif. Intell.
Appl., Innsbruck (2005), pp. 533537.
2. A. J. Schoeld, T. J. Stonham and P. A. Mehta, Automated people counting to aid
lift control, Automat. Constr. 6(56) (1997) 437445.
3. A. J. Schoeld, P. A. Mehta and T. J. Stonham, A system for counting people in video
images using neural networks to identify the background scene, Pattern Recogn. 29(8)
(1996) 14211428.
4. A. Iketani, A. Nagai, Y. Kuno and Y. Shirai, Real-time surveillance system detecting
persons in complex scenes, Real-Time Imaging 7(5) (2001) 433446.
5. C. Sacchi, G. Gera, L. Marcenaro and C. S. Regazzoni, Advanced image-processing
tools for counting people in tourist site-monitoring applications, Signal Process. 81(5)
(2001) 10171040.
6. F. Bartolini, V. Cappellini and A. Mecocci, Counting people getting in and out of
a bus by real-time image sequence processing, Image Vision Comput. 12(1) (1994)
3641.
7. C.-J. Pai, H.-R. Tyan, Y.-M. Liang, H.-Y. M. Liao and S.-W. Chen, Pedestrian detec-
tion and tracking at crossroads, Pattern Recogn. 37(5) (2004) 10251034.
8. G. L. Foresti, C. Micheloni and C. Piciarelli, Detecting moving people in video
streams, Pattern Recogn. 26(14) (2005) 22322243.
9. Y. Ricquebourg and P. Bouthemy, Real-time tracking of moving persons by exploiting
spatio-temporal image slices, IEEE Trans. Pattern Anal. Mach. Intell. 22(8) (2000)
797808.
10. C. Lerdsudwichai, M. Abdel-Mottaleb and A.-N. Ansari, Tracking multiple people
with recovery from partial and total occlusion, Pattern Recogn. 38(7) (2005) 1059
1070.
I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.
March 27, 2007 14:53 WSPC/157-IJCIA 00192
160 S. Green et al.
11. J.-C. Tai, S.-T. Tseng, C.-P. Lin and K.-T. Song, Real-time image tracking for auto-
matic trac monitoring and enforcement applications, Image Vision Comput. 22(6)
(2004) 485501.
12. D. M. Ha, J.-M. Lee, and Y.-D. Kim, Neural-edge-based vehicle detection and trac
parameter extraction, Image Vision Comput. 22 (2004) 899907.
13. C. Wohler and J. K. Anlauf, Real-time object recognition on image sequences with
the adaptable time delay neural network algorithm Applications for autonomous
vehicles, Image Vision Comput. 19(910) (2001) 593618.
14. J. Altmann, S. Linev and A. Wei, Acoustic-seismic detection and classication of mili-
tary vehicles Developing tools for disarmament and peace-keeping, Appl. Acoustics
63(10) (2002) 10851107.
15. H. Zhang, W. Gao and D. Zhao, Object Detection Using Spatial Histogram Features,
Image Vision Comput. 24(4) (2006) 327341.
16. H. Schneiderman, Feature-centric evaluation for ecient cascaded object detection,
Conf. Comput. Vision Pattern Recogn. (CVPR), 2 (2004) 2936.
17. D. Hoiem, R. Sukthankar, H. Schneiderman and L. Huston, Object-based image
retrieval using the statistical structure of images, Conf. Comput. Vision Pattern
Recogn. (CVPR) 2 (2004) 490497.
18. P. Figueroa, N. Leite and R. Barros, Background recovering in outdoor image
sequences, Image Vision Comput. 24(4) (2006) 363374.
19. B. Shoushtarian and H. E. Bez, A practical adaptive approach for dynamic back-
ground subtraction using an invariant colour model and object tracking, Pattern
Recogn. Lett. 26(1) (2005) 526.
20. I. Haritaoglu, D. Harwood and L. S. Davis, W4: Real-time surveillance of people and
their activities, IEEE Trans. Pattern Anal. Mach. Int. 22(8) (2000) 809830.
21. J. R. Parker. Algorithms for Image Processing and Computer Vision (John Wiley and
Sons, New York, 1997).
22. J. F. Canny, Computational approach to edge detection. IEEE Trans. Pattern Anal.
Mach. Intell. 8(6) (1986) 679698.
23. F. Crow, Summed-area tables for texture mapping. Comput. Graphics (SIGGRAPH
84 Proceedings) 18(3) (1984) 207212.
24. P. Viola and M. Jones, Robust real-time object detection, in 2nd International Work-
shop Statistical and Computational Theories of Vision Modeling, Learning, Com-
puting, and Sampling, Vancouver, Canada (2001).
25. M. Blumenstein, X. Y. Liu and B. Verma, A modied direction feature for cur-
sive character recognition, in Proc. Int. Joint Conf. Neural Networks (IJCNN 04),
Budapest, Hungary (2004), pp. 29832987. I
n
t
.
J
.
C
o
m
p
.
I
n
t
e
l
.
A
p
p
l
.
2
0
0
6
.
0
6
:
1
4
9
-
1
6
0
.
D
o
w
n
l
o
a
d
e
d
f
r
o
m
w
w
w
.
w
o
r
l
d
s
c
i
e
n
t
i
f
i
c
.
c
o
m
b
y
U
N
I
V
E
R
S
I
D
A
D
E
D
E
S
A
O
P
A
U
L
O
o
n
0
5
/
2
1
/
1
3
.
F
o
r
p
e
r
s
o
n
a
l
u
s
e
o
n
l
y
.