Documente Academic
Documente Profesional
Documente Cultură
Paper:
Keywords: self organizing maps, particle swarm optimization, 4W1H, activity recognition
1. Introduction
Human Activity Recognition (HAR) systems are
present in cities, buildings, and rooms where they change
and adapt to continuously changing environments. To
be able to do this, these systems process the data gathered from sensors in the environment. They then modify
the environment in consonance with the activities demonstrated by the people present in the environment. Human
activity recognition has been recognized as a very large
field including many challenges (e.g., HAR can focus on
a person or people, a single room, or even a whole city).
Although the proposed approach could be applied, potentially, to any setting, the focus in the paper is on intelligent
rooms, where the users are few and variables, such as objects and places, are known. In particular, we will use
the iSpace [1], which has an adequate set of sensors to do
recognition tasks (Fig. 1).
Intelligent room settings usually have three components: sensing, classification and action. The sensing
and classification problems have a close relation since the
traits of the sensed data (images, video, sound, etc.) will
dictate which classification tools should be used. Nevertheless, most conventional techniques of HAR have flaws.
For example, cameras recognize activities using only the
human pose [2], often overlooking the multiple characteristics of each scene (e.g., time, place, and environment).
Vol.15 No.7, 2011
To address this issue, some groups focus on extracting activities using context detection. Works like [3] and [4]
showed that sensing extra variables can increase the accuracy of a recognition system.
In this work, we propose describing the actions in an
environment using 4W1H. The 4W1H paradigm defines
activities as a set of 5 variables (Who, When, What,
Where, and How) deemed sufficient to describe every
action. Furthermore, by defining each activity as a set of
these variables, we can mix sensing techniques. For example, we can detect What and Who using object and
subject identification algorithms. We can use an RFID
tagged environment to sense the What variable. Then,
we can use clustering and classification techniques to process the different activities given multiple sets of 4W1H.
There are, however, also problems to solve in the
4W1H method. For example, the way a left-handed person uses a pencil is different from the way a right-handed
person does, and different people have different ways of
doing things. When we use 4W1H, therefore, the How
variable has an intrinsic complexity and needs a special
classification on its own. We need a scheme capable of
performing on-line recognition of an increasing number
of possible Hows. To do this, we used a mix of wavelets
and self-organized maps, which showed good results [5]
when doing a rough classification of unknown inputs with
a high variance.
793
2. Preliminaries
2.1. Related Work
Schilit and Theimer [6] first defined context sensing as:
The ability of the system to discover and react to changes
in the environment they are located in. Using this definition, we may also say that a context sensing system
is capable of sensing those variables which generate the
changes in the environment. Work by Schmidt [7] described how context may help to infer human activities.
Schmidts work did not set up a set of activities or a pattern, but described context-aware applications. Robertson
and Reid [2] used an approach that used position and velocity in addition to local motion to describe an activity.
This enhanced the recognition rate of their system. Li and
Fei-Fei [3] also used context and defined three variables
for static images (what, where, and who). Their approach
used a Dirichlet mixture model to define activities as a
mixture of variables found in the scene. Their work overall, however, did not investigate the implications of relying in one sensor to get all variables. Their results, while
compelling, were limited to static images. Huang et al. [4]
used a similar approximation of context. Their work focused on the when, what, and where variables using an
arrangement of sensors. They also used a pattern matching algorithm to match sensed data to activities. This pattern matching sensing, however, may suffer from a lack of
flexibility in situations with new objects. The presented
work differs from these contributions by increasing the
number of sensing variables. The presented approach also
uses a clustering system. This clustering system will allow the the system to recognize a wide number of unseen
activities.
2.2. Self-Organizing Maps
Kohonen [8] describes Self-Organizing Maps (SOM)
as an algorithm that projects a Z dimensional feature
space into a 2 dimensional map. It places similar elements
close to each other, thus, preserving the topology of the
space. Typically, a SOM is represented by an N N matrix, where each element is a neuron that has Z weights.
In this work, we present K inputs from a Z dimensional
794
. . . . . . . . (1)
comprise the time-frequency representation of the original signal. In digital signal processing (as in this work),
the fast-forward wavelet transform is typically implemented as a set of tree-structured filter banks. The input
signal is divided into contiguous, non-overlapping blocks
of samples called frames, and sampling works by sampling frame by frame for the forward transform. This
work uses wavelets as a fast way to compress and filter
the signal from the MTx sensor. It is expected that this