Documente Academic
Documente Profesional
Documente Cultură
Research Article
Self-Supervised Sensor Learning and Its Application:
Building 3D Semantic Maps Using Terrain Classification
Chuho Yi,1 Donghui Song,2 and Jungwon Cho3,4
1
1. Introduction
As seen in the Defense Advanced Research Projects Agency
(DARPA) Grand Challenge, USA, robotics has progressed
markedly from industrial robots that perform only given
tasks to autonomous mobile robots that determine how to
travel to a target. For robots to operate more intelligently,
research on mobile robots needs to consider the following:
(1) how to recognize certain objects, humans, or specific patterns; (2) simultaneous localization and mapping (SLAM)
[1]; and (3) navigation to a specific destination. When an
autonomous robot needs to reach a destination, the most
important basic process before moving is for it to assess the
safety of the surrounding environment and find a safe path
for movement [2]. This process could incorporate global
positioning system (GPS) and mapping technology, but a
GPS system is not accurate enough to identify an exact position, and maps do not include all objects, especially moving
objects. Therefore, a moving robot must be able to recognize
the surrounding environment. One recognition method is
(a)
(b)
Figure 1: (a) Example of an urban environment with a road, sidewalk, and buildings. (b) Example of a 3D semantic map (blue region: road;
green region: sidewalk; red region: obstacles).
2. Related Works
2.1. Terrain Classification. During the past year, several studies have proposed methods for terrain classification. One
study developed a method that involved searching for possible obstacles using a stereo camera, eliminating candidates
based on texture and color clues, and then modeling terrain
after obstacles had been defined [3]. Another study focused
on avoiding trees in a forest; it used a stereo camera to
recognize trees and classify terrain to find a safe pathway
[4]. Other studies used a vibrating sensor to classify terrain
that had already been traveled, based on various vibration
frequencies [5].
However, these techniques only work in specific environments. Robots need to be able to learn about unknown
terrain. Some studies have focused on supervised learning
that requires human intervention when a robot reaches an
unknown area. Due to the limitations of supervised learning,
many researchers are now working on self-supervised or
unsupervised techniques, in which a robot can learn about
an environment on its own, without any human supervision.
One recent study developed a technique in which a robot
can calculate the depth of a ground plane using a depth map
generated by a stereo camera and can classify and learn about
the ground and obstacles within 12 m. Based on these data,
it can recognize very distant regions, as far as 3040 m [6].
Another study developed an unsupervised learning method
that deletes incorrect detections about a wide variety of terrain types (e.g., trees, rocks, tall grass, bushes, and logs) while
the robot navigates and collects data [7]. Yet another study
involved self-supervised classification using two classifiers: an
offline classifier that used vibration frequencies to provide the
other classifier, an online and visual classifier, with labels for
(a)
(b)
Figure 2: Detection, classification, and tracking of moving objects ((a) the result of background subtraction, (b) the result of object classification (blue square: human, green: vehicle) and tracking based on size and location).
() = (, ) + ,
(1)
=1
Figure 3: Path extraction about moving objects and lines shows its
path (blue: human, green: vehicles).
max
=1
1
( , )
2 =1 =1
subject to = 0,
=1
(2)
0 ,
(3)
(1) Observe the object that is moving ( is the object is detected at this frame).
(2) Confirm whether the presently detected object was seen before or not by comparing it with
every object that was already detected, based on size and position data of the objects .
(2a) If the object was seen before (e.g., and (, ),1 are similar to and (, ), ),
update data of through {(, ),1 }, = { , }, = + 1 and = + .
(2b) If the object is seen for the first time, create new object data +1 same as .
Thus, the total number of detection objects increases to + 1.
(3) Check whether is a multiple of .
(3a) If it is, do path extraction.
(3b) If it is not, keep observing until becomes the next multiples of .
Algorithm 1: Data collection and association.
= The total number of detected objects
= The threshold for defining a moving object
Sidewalk map: A map that includes the total paths of human movement
Road map: A map that includes the total paths of vehicle movement
Input: All detect object ( = 1, 2, . . . , )
Goal: Find all paths on which humans and vehicles move (Sidewalk: , Road: )
(0) Initialize , , Sidewalk map and Road map
(1) for = 1 to do
(2) if then
(3)
is a moving object
(4)
for = 1 to do
= , , /2
(5)
,
(6)
end for
(7)
if > 0 then
(8)
for = 1 to ( 1) do
(9)
Draw a line from ( , ), to ( , ),+1 into Sidewalk map
(10)
end for
(11) else
(12)
for = 1 to ( 1) do
(13)
Draw a line from ( , ), to ( , ),+1 into Road map
(14)
end for
(15) else
(16)
is a noise
(17) end for
(18) for all pixels of Sidewalk and Road map do
(19) if Sidewalk (, ) ! = 0 then {(, )}
(20) if Road (, ) ! = 0 then {(, )}
(21) end for
Algorithm 2: Path extraction.
(4)
( = 1, 2, . . . , ) .
Here, is the number of total objects that are detected
as moving object; our system defines the bottom-left of
Gap, = (mod, , ) ,
(5)
=1
Dist = ( , ) ,
(6)
(7)
=1
Cluster number 1
GapS,1
GapR,1
DistR
DistR
DistS
DistS
(a)
(b)
Figure 5: (a) The result of clustering (small dots: candidates of the background; same color represents same cluster), (b) the result of terrain
data extraction (red dots: background, blue dots: sidewalk, green dots: road, the others: excluded clusters of candidates from background
class).
(a)
(b)
Figure 6: (a) A sidewalk with complex texture, such as fallen leaves. (b) The road plan estimated using a V-disparity map.
for scene classification [24] and we used a Global Edge Histogram for texture features. Equation (8) presents the visual
feature vector used for the terrain classifier:
= [ , , , , , , V , , 45 , 135 , non ] ,
(8)
2
= argmin ( ( ) ) + (1 ) ( ( ( ) )
(6) until (Error Change ( ) ) or (maxIter reached)
(7) return
(5)
1000, 1000
600, 1000
600, 600
300, 600
400, 400
200, 400
200, 200
100, 200
50, 100
25, 50
94
92
90
88
86
84
82
80
78
10, 20
5. Experiments
5.1. Experimental Environment. We conducted experiments
at four locations at Seongdong-gu, Seoul, Republic of Korea,
and captured test datasets using a camera set at 640 360.
The experiments involved a robot observing moving
objects and learning about the surrounding terrain based on
the paths of moving objects; the robot was motionless with
the assumption that it was in an unknown location.
To extract a reasonable number of patches, test was conducted to classify terrain with various numbers of training
data. Figure 8 presents the results.
(a)
(b)
(c)
(d)
(e)
Figure 9: Experimental results. Columns present the results and processes of experiments in the whole dataset. (a) Experimental environments, (b) extraction of training patches based on paths after observing moving objects, (c) ground-true images for terrain classification
(red: background, green: road, blue: sidewalk), (d) supervised terrain classifications based on ground-true image, (e) classification results
of the proposed method.
(a)
(b)
Figure 10: (a) 3D map of the university, (b) RGB images of the same
scene.
(a)
(b)
Figure 11: (a) The 3D semantic map. (b) Magnification of part of the 3D semantic map (red: obstacles; blue: road; green: sidewalk).
6. Conclusions
We proposed a self-supervised terrain classification framework in which a robot observes moving objects and learns
about its environment based on the paths of the moving
objects captured using a monocular camera. The results
were similar (ca. 39% worse) to those of supervised terrain
classification methods, which is sufficient for an autonomous
robot to recognize its environment. In addition, we built a 3D
dense map with a RGB-D sensor using the ICP algorithm.
Finally, the 3D semantic map was created by adding the
results of the terrain classification to the 3D map.
In future research, we will focus on better ways to extract
background data, using a vanishing point or depth data and
a total framework for navigation. We will also conduct tests
in various environments, such as indoors and unstructured
outdoor environments.
Conflict of Interests
The authors declare that there is no conflict of interests
regarding the publication of this paper.
Acknowledgment
This research was supported by the 2014 Scientific Promotion
Program funded by Jeju National University.
References
[1] C. Yi, S. Jeong, and J. Cho, Map representation for robots,
Smart Computing Review, vol. 2, no. 1, pp. 1827, 2012.
[2] X. Li and B. J. Choi, Design of obstacle avoidance system for
mobile robot using fuzzy logic systems, International Journal
of Smart Home, vol. 7, no. 3, pp. 321328, 2013.
[3] A. Talukder, R. Manduchi, R. Castano et al., Autonomous
terrain characterisation and modelling for dynamic control of
unmanned vehicles, in Proceedings of the IEEE International
Conference on Intelligent Robots and Systems, pp. 708713,
October 2002.
[4] A. Huertas, L. Matthies, and A. Rankin, Stereo-based tree traversability analysis for autonomous off-road navigation, in Proceedings of the 7th IEEE Workshop on Applications of Computer
Vision (WACV 05), pp. 210217, January 2005.
10
[5] C. A. Brooks and K. Iagnemma, Vibration-based terrain classification for planetary exploration rovers, IEEE Transactions on
Robotics, vol. 21, no. 6, pp. 11851191, 2005.
[6] M. J. Procopio, J. Mulligan, and G. Grudic, Learning terrain
segmentation with classifier ensembles for autonomous robot
navigation in unstructured environments, Journal of Field
Robotics, vol. 26, no. 2, pp. 145175, 2009.
[7] D. Kim, J. Sun, S. M. Oh, J. M. Rehg, and A. F. Bobick, Traversability classification using unsupervised on-line visual learning
for outdoor robot navigation, in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 06), pp.
518525, May 2006.
[8] A. B. Christopher and K. Iagnemma, Self-supervised terrain
classification for planetary rovers, in Proceedings of the NASA
Science Technology Conference, 2007.
[9] S. May, D. Droeschel, D. Holz et al., Three-dimensional mapping with time-of-flight cameras, Journal of Field Robotics, vol.
26, no. 11-12, pp. 934965, 2009.
[10] K. Konolige and M. Agrawal, FrameSLAM: from bundle
adjustment to real-time visual mapping, IEEE Transactions on
Robotics, vol. 24, no. 5, pp. 10661077, 2008.
[11] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, RGB-D
mapping: using depth cameras for dense 3D modeling of indoor
environments, in Proceedings of the International Symposium
on Experimental Robotics (ISER 10), 2010.
[12] Y. M. Kim, C. Theobalt, J. Diebel, J. Kosecka, B. Miscusik, and
S. Thrun, Multi-view image and ToF sensor fusion for dense
3D reconstruction, in Proceedings of the IEEE 12th International
Conference on Computer Vision Workshops (ICCV Workshops
09), pp. 15421546, October 2009.
[13] V. Vapnik, Statistical Learning Theory, 1998.
[14] M. Piccardi, Background subtraction techniques: a review, in
Proceedings of the IEEE International Conference on Systems,
Man and Cybernetics (SMC 04), vol. 4, pp. 30993104, October
2004.
[15] H. Sidenbladh, Detecting human motion with support vector
machines, in Proceedings of the 17th International Conference on
Pattern Recognition (ICPR 04), vol. 2, pp. 188191, August 2004.
[16] D. Mezghani, S. Boujelbene, and N. Ellouze, Evaluation of
SVM kernels and conventional machine learning algorithms for
speaker identification, International Journal of Hybrid Information Technology, vol. 3, no. 3, pp. 2334, 2010.
[17] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2013, http://www.csie.ntu.edu.tw/cjlin/libsvm.
[18] J.-C. Liu, C.-H. Lin, J.-L. Yu, W.-S. Lai, and C.-H. Ho, Anomaly
detection using LibSVM training tools, International Journal of
Security and Its Applications, vol. 2, no. 4, pp. 8998, 2008.
[19] C. S. Won, D. K. Park, and S.-J. Park, Efficient use of MPEG-7
edge histogram descriptor, ETRI Journal, vol. 24, no. 1, pp. 23
30, 2002.
[20] I. Sarker and S. Iqbal, Content-based image retrieval using
haar wavelet transform and color moment, Smart Computing
Review, vol. 3, no. 3, pp. 155165, 2013.
[21] D. Comaniciu and P. Meer, Mean shift: a robust approach
toward feature space analysis, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603619,
2002.
[22] C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, vol. 2,
no. 2, pp. 121167, 1998.
International Journal of
Rotating
Machinery
The Scientific
World Journal
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Engineering
Journal of
Volume 2014
Volume 2014
Advances in
Mechanical
Engineering
Journal of
Sensors
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
International Journal of
Distributed
Sensor Networks
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Advances in
Civil Engineering
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Volume 2014
Advances in
OptoElectronics
Journal of
Robotics
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Volume 2014
VLSI Design
Modelling &
Simulation
in Engineering
International Journal of
Navigation and
Observation
International Journal of
Chemical Engineering
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Volume 2014
Advances in
Volume 2014
Volume 2014
Journal of
Control Science
and Engineering
Volume 2014
International Journal of
Journal of
Antennas and
Propagation
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Volume 2014
Volume 2014