Documente Academic
Documente Profesional
Documente Cultură
I. INTRODUCTION
2
provide a foundation for object detection and definition for
autonomous interactions of untrained objects.
III. SPATIO-TEMPORAL
Spatio-temporal refers to something belonging to both space
and time. This paper uses spatio-temporal features within
video sequences for defining objects.
First, this paper outlines a method for obtaining these
spatio-temporal features. These features describe how features
within an image change between frames over time. Next, we
show a candidate clustering algorithm for associating and
correlating these features together into distinct objects.
There have been substantial amounts of research in the area
of spatio-temporal clustering. Rather than clustering on space
or time individually, this type of clustering shows how data
correlates between both time and space. Many papers focus on
the aspect that events happen in certain areas at certain times;
this draws conclusions about distinct events. Another analysis
of spatio-temporal features looks at how data points change
over time.
The latter, which this paper focuses on, of these types of
spatio-temporal clustering shows promising results towards
unsupervised learning. We can draw conclusions about data by
observing how they change over time. Clusters of certain
behaviors can be observed and decisions can be made about
this similar behavior without having strict prior training.
IV. RELATED WORK
Many have researched this area with regards to observing
image feature changes within a video sequence. This is to try
to gain enhanced performance with regards to image
segmentation. Properly segmenting an image provides
boundaries of objects. This task is vital to a handful of
applications shown in Section IX.
A. Point Trajectories
A point trajectory describes how a point in an image has
changed from the previous image. To calculate a point
trajectory, one must correlate the desired pixel value to the
previous frame. Once the correlation is made between this
frame and the previous frame, the point trajectory is the vector
from the previous pixel location to the current pixel location.
Addressed in [3], point trajectories inherently focus on
translational change but lack in higher order dynamics models
(i.e. including rotation and scale). [3] attempts to address this
issue by using hypergraphs. By executing a projection, the
hypergraph is transferred to a normal graph for use in spectral
clustering. Unfortunately, the computational complexity is
( ) for k-affinities. [3] mitigates this problem by reducing
the number of hyperedges to be considered for clustering by
sampling the edges.
B. Region Trajectories
A region trajectory differs from a point trajectory by looking
how regions, or segments, change from frame to frame.
However, the calculation of a region trajectory differs from a
point trajectory. [2] shows how to generate region trajectories
in section 2.2 by using an acyclic graph.
3
Interface (GUI) was created in MATLAB. This GUI allows
for users to load in a picture or video. Once loaded in, the GUI
will process the file with candidate algorithms. A simple video
player will allow the user to navigate between frames. The
user has the option to overlay data onto the image displayed in
the video player. Figure 3 shows the GUI in use.
When the user opens the GUI, the user will select a folder
using Browse functionality. Once a folder is selected, a
navigation list is populated with all the videos and pictures
within the folder; this makes it easy for quickly changing
between files. Figure 1 shows the navigation list.
Once the user selects a file and presses Load, the program
will read in the file and do all possible processing. This
preprocessing helps with analysis of algorithms as some of the
feature detectors are slower than real-time. The user is alerted
of the progress of loading and processing the file.
Once the file is loaded and image features are detected, the
user may navigate between frames with a video player
interface. Figure 2 shows this video player. Users can skip to a
frame, go next or previous, and play forward or backward.
The GUI allows the user to select what type of data is
overlaid the image displayed. This includes the image features
detected using different algorithms (i.e. SURF, FAST, colorbased blobs, etc.) as well as clustering output. Figure 3 shows
an image with overlaid features.
Figure 6. Test GUI in use. Features are overlaid onto an image within a video
sequence. Sobel edges can be seen as magent, FAST corners can be seen as
green/red, and SURF features can be seen as blue.
Figure 7. marple2 frame 24. Spatio-temporal clusters are shown as colored
dots.
VIII. EVALUATION
A. Berkeley Motion Segmentation Benchmark
A benchmark suite [5] for training and testing video
segmentation is introduced in [4]. This benchmark suite
contains 100 videos with ground-truth labeling and error
metrics. The ground-truth labeled/annotated videos were
annotated by 4 separate individuals about how the video
sequences should be segmented. This provides a reliable
benchmark suite to compare algorithms ([1] - [4] all use this
dataset for performance evaluation). The benchmark even
provides software for evaluating algorithms and generating
error metrics.
Unfortunately this benchmark suite was not used for
evaluating the presented algorithm. It appears that the
benchmark suite only annotates a few frames out of the video
sequence. The algorithm presented focuses on continuous
video data that does not lose several frames before obtaining a
new sample. Testing against the benchmark suite [5] was not
very applicable to this algorithm at the time of writing this
paper. Further research will allow the algorithm be robust
enough to compete against the benchmark tests.
B. Results
Although there was not a rigorous truthing analysis to
determine error metrics, results could still be seen using the
GUI. For certain translational scenes, like that of Figure 7,
semi-accurate clustering was shown. You can see in this
image that the sleeve of the arm graphing for the phone was
clustered together, separate than the cluster of the chair as well
as the picture on the wall.
IX. APPLICATIONS
Video segmentation has many applications. As mentioned
in Section II, machine learning algorithms can improve given
accurate segmentation. If range estimates were known of
objects within video, which could be achieved with binocular
sensors, machine learning algorithms can make better
predictions about size and shape of unknown objects. This can
5
open new doors for object interaction (i.e. pushing, grabbing,
accurate avoiding, etc.) for robots using unsupervised
learning.
Anomaly detection can also be realized by looking for
behavior that is uncommon to the surrounding behavior. This
could be large objects passing through or a change/lack of
flow. This could be looking for anomalies on a manufacturing
line or traffic in a street. Anomaly detection can help
businesses, first responders, and other entities detect issues
early on and react quickly.
X. LIMITATIONS
Due to time constraints, limited research was able to be
conducted. The current algorithm focuses on a simple 2D
dynamics model where performance lacks for rotating and
scaling objects. Since all video has a form of these higher
order motion dynamics and almost never is purely
translational, the clustering algorithm did not accurately
portray objects.
Further research is necessary to incorporate higher order
motion dynamics as well as more advanced clustering
techniques such as spectral clustering. The framework
presented in this paper is a good foundation for further
development and will ease future research.
XI. CONCLUSIONS AND FUTURE WORK
There is plenty of research in the area of video
segmentation. My initial research when the proposal was
written showed minimal amounts of research. It wasnt until a
rigorous research was conducted, when I found specific
keywords for the field, did much relevant research appear. I
was able to conduct sufficient research to gather theories about
path forward for future investigation; however, there was
limited time to develop these theories.
[3] presents a strong algorithm for higher order motion
dynamics. As a base, this algorithm shows promise, but the
computational complexity of ( ) is too large. Further
research will be conducted to reduce this computational
complexity and see how higher order motion dynamics can be
used efficiently.
The framework and GUI presented in this research paper
provide a core development environment for further
investigation in video segmentation. Further development will
add capability for quick benchmark testing using the Berkeley
Motion Segmentation Benchmark presented in [4].
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]