Sunteți pe pagina 1din 33

CHAPTER 1

INTRODUCTION
The Xbox 360 is the second video game console developed by and produced for
Microsoft and the successor to the Xbox. Kinect is a "controller-free gaming and
entertainment experience" for the Xbox 360. It was first announced on June 1, 2009 at the
Electronic Entertainment Expo, under the codename, Project Natal. The add-on peripheral
enables users to control and interact with the Xbox 360 without a game controller by using
gestures, spoken commands and presented objects and images. The Kinect accessory is
compatible with all Xbox 360 models, connecting to new models via a custom connector,
and to older ones via a USB and mains power adapter. Based around a webcam-style add-
on peripheral for the Xbox 360 console, it enables users to control and interact with the
Xbox 360 without the need to touch a game controller, through a natural user interface
using gestures and spoken commands. It aims at broadening the Xbox 360's audience
beyond its typical gamer base. Kinect holds the Guinness World Record of being the
"fastest selling consumer electronics device". It is also considered as the advanced Virtual
Reality controller.
1.1 MOTIVATION
Initially developed by Microsoft for gaming, KINECT holds further scope of
application in a vast array of areas. The purpose of this seminar is to identify such areas
which are already or yet to be recognized and to familiarize with such fields. The main
motivation for this seminar is to provide overview information about techniques and
application of Kinect along with familiarizing and sharing knowledge about the future
scope. The possible scenarios of virtual reality, and its scope also remains open to be
explored.
1.2 LITERATURE REVIEW
Kinect is a relatively recent technology, no more than one and a half years old. But
being such an advanced and novel piece of hardware, a lot of research has already been
performed on it. That research includes studies in fields that vary from physical
rehabilitation [23] to robotics. A good example of its use in robotics is NASAs integration
of Kinect with their prototype robots [24]. The LIREC project (Living with Robots and
interactive Characters) is also another good example of Kinects integration in robotics.
This project is a collaboration between several entities (universities, research institutes and
companies) from several different countries, Heriot-Watt being one of the partner
universities. Heriot-Watt researchers have been integrating Kinect with their prototype
robot and studying how can it be used to facilitate human-robot interaction.
Efficiently tracking and recognizing hand gestures with Kinect is one of the fields
that is getting more attention from researchers [25][26][27][28]. This is a complex problem
but it has a number of diverse applications, being one of the most natural and intuitive
ways to communicate between people and machines [28]. Regarding full-body poses
recognition, E. Suma et al. [29] developed a toolkit which allows customizable full-body
movements and actions to control, among others, games, PC applications, virtual avatars
and the onscreen mouse cursor. This is achieved by binding specific movements to virtual
keyboard and mouse commands that are sent to the current active window.
M. Raptis et al. [30] proposed a system that is capable of recognizing in real time
and with high accuracy a dance gesture from a set of pre-defined dance gestures. This
system differs from games like Harmonixs Dance Central, allowing the user to perform
random sequences of dance gestures that are not imposed by the animated character. In
this paper, authors also identified noise originated from players physique and
clothing, from the sensor and from kinetic IQ as being one of the main disadvantages
of depth sensors, like Kinect, when compared to other motion capture systems.
Following the same idea, D. S. Alexiadis et al. [31] addressed the problem of real-time
evaluation and feedback of dance performances considering a scenario of an online dance
class. Here, online users are able to see a dance teacher performsome choreography steps
which they try to imitate later. After comparing the user performance with the teacher
performance, the user performance is automatically evaluated and some visual feedback
is provided.
In a different perspective, F. Kistler et al. [32] adopted a game book and implemented an
interactive storytelling scenario, using full body tracking with Kinect to provide different
types of interaction. In their scenario, two users listen to a virtual avatar narrating parts
of a game book. At specific points, the users need to perform certain gestures to influence
the story. Almost none of the Kinect games developed so far concentrate on a story, and
this may be an interesting approach for the creation of new games. Another interesting
idea is the one presented by M. Caon et al. [33] for the creation of smart environments.
Using multiple Kinects, these researchers developed a system capable of recognizing
with great precision the direction to where a user is pointing in a room. Their smart
living room is composed by several smart objects, like a media centre, several lamps,
a fan, etc., and its able to identify several users postures and pointing gestures. If a user
points to any of those objects, they will automatically change their state (On/Off). A lot
of work can be done to improve home automation based on this idea (and using Kinect).
Motion Detection Real Time 3D Walkthrough in Limkokwing University of
Creative Technology (Modet-Walk) using Kinect Xbox by Behrang Parhizkar,
Kanammal A/p Sandrasekaran and Arash Habibi Lashkari. This research intends to give
an interactive communication by implementing the Kinect into the 3D walkthrough. The
project is based on motion detection which is interacted with virtual 3D walkthrough in
the real environment. Now the possibility of combining 3D walkthrough with the Kinect
Xbox seems to be a success. This paper is simply to emphasize on combining 3D virtual
walkthrough with Kinect Xbox to detect the motion. The research is based on how the
implementation of motion detection using Kinect can help people in
understanding/translating and give meaning to the environment displayed around them
ubiquitously.
In almost all of the papers presented before, researchers have used both the RGB
and depth sensors to track the human body or objects. At the time this literature review
was conducted, we werent able to find any relevant paper where researchers have
studied how well Kinects RGB camera can recognise and track colours and/or objects
changes in size under different lighting conditions, especially on smart textiles.
Searching online, we can find some videos of developers/users demonstrating Kinect
applications where colour recognition and tracking seems to be the main objective.
However, this is clearly not enough to report on or to formulate an informed opinion.
This information would be valuable, as these are very important features for the
successful achievement of the purposed goals.
We believe that, given all the possibilities Kinect offers, much more research will
be done based on it in the next years. One fact that supports this idea was the release
of the Kinect for Windows in February 2012. Kinect for Windows consists of an
improved Kinect device and a new SDK version especially designed for Windows PCs.
This will allow the development of new kinds of applications, and, consequently,
new tools will become available to researchers to perform new studies.
CHAPTER 2
EXISTING SYSTEM
2.1 INTRODUCTION
Virtual reality (VR) is the creation of a highly interactive computer based
multimedia environment in which the user becomes a participant with the computer in
what is known as a synthetic environment. Virtual reality uses computers to immerse
one inside a three- dimensional program rather than simulate it in two-dimensions
on a monitor. Utilizing the concept of virtual reality, the computer engineer integrates
video technology, high resolution image-processing, and sensor technology into the
data processor so that a person can enter into and react with three-dimensional spaces
generated by computer graphics. The goal computer engineers have is to create an
artificial world that feels genuine and will respond to every movement one makes, just
as the real world does.
Naming discrepancies aside, the concept remains the same - using computer
technology to create a simulated, three-dimensional world that a user can manipulate
and explore while feeling as if he were in that world. Scientists, theorists and engineers
have designed dozens of devices and applications to achieve this goal. Opinions differ
on what exactly constitutes a true VR experience, but in general it should include:
Three-dimensional images that appear to be life-sized from the perspective
of the user
The ability to track a user's motions, particularly his head and eye
movements, and correspondingly adjust the images on the user's display
to reflect the change in perspective
Virtual realities are a set of emerging electronic technologies, with applications
in a wide range of fields. This includes education, training, athletics, industrial design,
architecture and landscape architecture, urban planning, space exploration, medicine
and rehabilitation, entertainment, and model building and research in many fields of
science. Virtual reality (VR) can be defined as a class of computer-controlled
multisensory communication technologies that allow more intuitive interactions with
data and involve human senses in new ways. Virtual reality can also be defined as an
environment created by the computer in which the user feels present. This technology
was devised to enable people to deal with information more easily. Virtual Reality
provides a different way to see and experience information, one that is dynamic and
immediate. It is also a tool for model building and problem solving. Virtual Reality is
potentially a tool for experiential learning. The virtual world is interactive; it responds
to the users actions.
Virtual Reality is defined as a highly interactive, computer-based multimedia
environment in which the user becomes the participant in a computer-generated world.
It is the simulation of a real or imagined environment that can be experienced visually
in the three dimensions of width, height, and depth and that may additionally provide
an interactive experience visually in full real-time motion with sound and possibly with
tactile and other forms of feedback. VR incorporates 3D technologies that give a real
life illusion. VR creates a simulation of real-life situation. The emergence of augmented
reality technology in the form of interactive games has produced a valuable tool for
education. One of the emerging strengths of VR is that it enables objects and their
behaviour to be more accessible and understandable to the human user.
2.2 DIFFERENT KINDS OF VIRTUAL REALITY
There is more than one type of virtual reality. Furthermore, there are different schemas
for classifying various types of virtual reality. Jacobson (1993a) suggests that there are
four types of virtual reality:
(1) Immersive virtual reality
(2) Desktop virtual reality
(3) Projection virtual reality
(4) Simulation virtual reality
(5) Augmented virtual reality
(6) Text-based virtual reality.
2.2.1 Immersive virtual reality
An Immersive VR system is the most direct experience of virtual environments.
Here the user either wears an head mounted display (HMD) or uses some form of head-
coupled display such as a Binocular Omni-Orientation Monitor (BOOM) to view the
virtual environment, in addition to some tracking devices and haptic devices. It is a type
of VR in which the user becomes immersed (deeply involved) in a virtual world. It is
also a form of VR that uses computer related components.
2.2.2 Augmented Reality
A variation of immersive virtual reality is Augmented Reality where a see-
through layer of computer graphics is superimposed over the real world to highlight
certain features and enhance understanding. Augmented virtual reality is the idea of
taking what is real and adding to it in some way so that user obtains more information
from their environment. Azuma (1999) explains, Augmented Reality is about
augmentation of human perception: supplying information not ordinarily detectable
by human senses. According to Isdale (2001), there are four types of augmented reality
(AR) that can be distinguished by their display type, including:
1. Optical See-Through AR uses a transparent Head Mounted Display (HMD)
to display the virtual environment (VE) directly over the real world.
2. Projector Based AR uses real world objects as projection surface for the VE.
3. Video See-Through AR uses an opaque HMD to display merged video of the
VE with and view from cameras on the HMD.
4. Monitor-Based AR also uses merged video streams but the display is a more
conventional desktop monitor or a hand held display. Monitor-Based AR is perhaps the
least difficult to set up since it eliminates HMD issues.
2.2.3 Text-based Virtual Reality
In this type of virtual reality, a reader of a certain text, form a mental model of
this virtual world in their head from the description of people, places and things.
2.2.4 Through the Window
With this kind of system, also known as desktop VR, the user sees the 3-D
world through the window of the computer screen and navigates through the space with
a control device such as a mouse. Like immersive virtual reality, this provides a first-
person experience. One low-cost example of a Through the window virtual reality
system is the 3-D architectural design planning tool Virtus Walkthrough that makes it
possible to explore virtual reality on a Macintosh or IBM computer. Another example
of through the window virtual reality comes from the field of dance, where a computer
program called Life Forms lets choreographers create sophisticated human motion
animations.
2.2.5 Projected Realities
Projected realities (Mirror worlds) provide a second-person experience in
which the viewer stands outside the imaginary world, but communicates with characters
or objects inside it. Mirror world systems use a video camera as an input device. Users
see their images superimposed on or merged with a virtual world presented on a large
video monitor or video projected image.
2.3 EXISTING SYSTEMS
2.3.1 Head-Mounted Display (HMD)
The head-mounted display (HMD) was the first device providing its wearer with
an immersive experience. Evans and Sutherland demonstrated a head-mounted stereo
display in 1965. A typical HMD houses two miniature display screens and an
optical system that channels the images from the screens to the eyes, thereby,
presenting a stereo view of a virtual world. A motion tracker continuously measures the
position and orientation of the user's head and allows the image generating computer to
adjust the scene representation to the current view. As a result, the viewer can look
around and walk through the surrounding virtual environment. To overcome the often
uncomfortable intrusiveness of a head-mounted display, alternative concepts (e.g.,
BOOM and CAVE) for immersive viewing of virtual environments were developed.
Fig 1: HMD
2.3.2 BOOM
The BOOM (Binocular Omni-Orientation Monitor), from Fake space lab is a
head- coupled stereoscopic display device. Screens and optical system are housed in a
box that is attached to a multi-link arm. The user looks into the box through two holes,
sees the virtual world, and can guide the box to any position within the operational
volume of the device. Head tracking is accomplished via sensors in the links of the arm
that holds the box.
Fig 2: BOOM
2.3.3 CAVE
The CAVE (Cave Automatic Virtual Environment) was developed at the
University of Illinois at Chicago and provides the illusion of immersion by projecting
stereo images on the walls and floor of a room-sized cube. Several persons wearing
lightweight stereo glasses can enter and walk freely inside the CAVE. A head tracking
system continuously adjusts the stereo projection to the current position of the leading
viewer. The advantages of CAVE are that, it gives a wide surrounding field of view
and it has the ability to provide a shared experience to a small group. A variety of input
devices like data gloves, joysticks, and hand-held wands allow the user to navigate
through a virtual environment and to interact with virtual objects. Directional
sound, tactile and force feedback devices, voice recognition and other technologies are
being employed to enrich the immersive experience and to create more sensualized
interfaces.
Fig 3: CAVE
2.3.4 Data Glove
A data glove is outfitted with sensors on the fingers as well is as an overall
position/ orientation tracking equipment. Data glove enables natural interaction with
virtual objects by hand gesture recognition. Modern VR gloves are used to
communicate hand gestures (such as pointing and grasping) and in some cases return
tactile signals to the users hand.
Fig 4: Data glove
Concerned about the high cost of the most complete commercial solutions,
Pamplona et al. propose a new input device: an image-based data glove (IBDG). By
attaching a camera to the hand of the user and a visual marker to each fingertip, they
use computer vision techniques to estimate the relative position of the finger tips. Once
they have information about the tips, they apply inverse kinematics techniques in order
to estimate the position of each finger joint and recreate the movements of the fingers
of the user in a virtual world. Adding a motion tracker device, one can also map pitch,
yaw, roll and XYZ-translations of the hand of the user, (almost) recreating all the
gesture and posture performed by the hand of the user in a low cost device.
CHAPTER 3
EXISTING SYSTEM
3.1 INTRODUCTION
Microsoft Xbox 360 Kinect has revolutionized gaming In that you are able to
use your entire body as the controller. Conventional Controllers are not required
because the Kinect Sensor picks Up on natural body movements as inputs for the
game. Three major components play a part in making the Kinect function as it
does; the movement tracking, the speech recognition, and the motorized tilt of the
sensor itself. The name Kinect is a permutation of two words; Kinetic and Connect.
3.2 EXPLANATION
Kinect can only be described by its features which override all others virtual
reality devices. Such as
Full Body Gaming
Controller-free gaming means full body play. Kinect responds to how
you move. So if you have to kick, then kick. If you have to jump, then jump.
You already know how to play. All you have to do now is to get off the couch.
Something For Everyone
Whether you're a gamer or not, anyone can play and have a blast. And with
advanced parental controls, Kinect promises a gaming experience that's safe,
secure and fun for everyone.
Its All About You
Once you wave your hand to activate the sensor, your Kinect will be able
to recognize you and access your Avatar. Then you'll be able to jump in and out
of different games, and show off and share your move
3.3 ARCHITECTURE/ COMPONENTS
3.3.1 COMPONENTS
The main components of KINECT include:
Video
Color CMOS Camera
Infrared (IR) CMOS Camera
Infrared Projector
Audio
Multi-Array Microphone
Tilt Control
Motor
Accelerometer
Processor And Memory
Prime-Sense Chip PS1080-A2
64 MB DDR2 SDRAM
3.3.2 ARCHITECTURE
3.3.2.1 AN RGB COLOR SPACE
An RGB color space is any additive color space based on the RGB color model. A
particular RGB color space is defined by the three chromaticities of the red, green, and blue
additive primaries, and can produce any chromaticity that is the triangle defined by those
primary colors. The complete specification of an RGB color space also requires a white
point chromaticity and a gamma correction curve. An LCD display can be thought of as a
grid of thousands of little red, green, and blue lamps, each with their own dimmer switch.
The gamut of the display will depend on the three colors used for the red, green and blue
lights.
3.3.2.2 3D SENSOR
It is a device that analyzes a real-world object or environment to collect data on its
shape and possibly its appearance (i.e. color). The collected data can then be used to
construct digital, three dimensional models useful for a wide variety of applications.
Multiple scans, even hundreds, from many different directions are usually
performed and the required information about all sides of the subject is obtained. These
scans have to be brought in a common reference system, a process that is usually called
alignment or registration, and then merged to create a complete model. This whole process,
going from the single range map to the whole model, is usually known as the 3D scanning
pipeline.
3.3.2.3 MULTIPLE MICROPHONE ARRAY
A microphone array is any number of microphones operating in tandem. Typically,
an array is made up of omni directional microphones distributed about the perimeter of a
space, linked to a computer that records and interprets the results into a coherent form.
Arrays may also be formed using numbers of very closely spaced microphones. In Kinect,
the microphone array features four microphone capsules and operates with each channel
processing 16-bit audio at a sampling rate of 16 kHz.
3.4 TECHNOLOGIES USED
KINECT uses a combination of the above mentioned hardware to create a virtual
environment to map the actual physical data to various rendering points and commands.
3.4.1 SENSING TECHNOLOGY
Behind the scene of PrimeSense's 3D sensing technology there are three main parts
that make it work. An infrared laser projector, infrared camera, and the RGB colored
camera. The depth projector simply floods the room with IR laser beams creating a depth
field that can be seen only by the IR camera. Due to infrareds insensitivity to ambient
light, the Kinect can be played in any lighting conditions. However, because the face
recognition system is dependent on the RGB camera along with the depth sensor, light is
needed for the Kinect to recognize a calibrated player accurately. The following image
shows a generalized concept of how kinect's depth sensing works.
Figure 3.1: How the sensor sees in 3D
In more detail, the IR depth sensor is a monochrome complimentary metal--
oxide-- semiconductor (CMOS) camera. This means that it is only sees two colors, in
this case black and white which is all thats needed to create a "depth map" of any room.
The IR camera used in the Kinect is VGA resolution (640x480) refreshing at a rate of
30Hz. Each camera pixel has a photodiode connected to it, which receives the IR light
beams being bounced off objects in the room. The corresponding voltage level of each
photodiode depends on how far the object is from the camera. An object that is closer to
the camera appears brighter than an object that is farther away. The voltage produced by
the photodiode is directly proportional to the distance the object. Each voltage produced
by the photodiode is then amplified and then sent to an image processor for further
processing. With this process being updated 30 times per second, you can imagine the
Kinect has no problem detecting full--body human movements very accurately
considering the player is within recommended distance.
Figure 3.2: Infrared beams in the room
Although the hardware is the basis for creating an image that the processor
can interpret, the software behind the Kinect is what makes everything possible. Using
statistics, probability, and hours of testing different natural human movements the
programmers developed software to track the movements of 20 main joints on a human
body. This software is how the Kinect can differentiate a player from say a dog that happens
to run in front of the IR projector or different players that are playing a game together.
The Kinect has the capabilities of tracking up to six different players at a time, but as of
now the software can only track up to two active players.
One of the main features of the Kinect is that it can recognize you individually.
When calibrating yourself with the Kinect, the depth sensing and the color camera work
together to develop an accurate digital image of how your face looks. The 8-- bit color
camera, also VGA resolution, detects and stores the skin tone of the person it is calibrating.
The depth sensor helps make the facial recognition more accurately by creating 3--D shape
of your face. Storing these images of your face and skin tone color is how the Kinect can
recognize you when you step in front of the projected IR beams. As mentioned earlier, for
the facial recognition to work accurately there needs to be a certain amount of light.
Another added feature of the color camera is it takes videos or snapshots at key moments
during game play so you can see how you look while playing.
Figure 3.3: facial recognition
3.4.2 SPEECH RECOGNITION
The Xbox 360 Kinect is also capable of speech recognition, which is it will not only
respond to natural body movements, but it will respond to a voice commands as well. This
was a technology designed for Kinect solely developed by Microsoft. Microsoft engineers
travelled to an estimated 250 different homes to test out their voice recognition system.
They placed 16microphones all over each room to test the acoustics, echoing, etc. to get a
feel of how the Kinect would respond in different environments. The end result was
placing4 downward facing microphones on the bottom of the Kinect unit, which would
then listen to human voices. This is also a part of why the Kinect is so physically wide,
because of the microphone placement. The 3D sensing portion only needs about half of the
width the Kinect is now. The combination of the microphone placement and the motion
sensing technology allows the Kinect to zeroin on the users voice and able to tell where
the sound is coming from while cancelling out other ambient noise. There are 4
microphones, so this means that the audio portion of the Kinect has 4 separate channels.
The resolution of the audio is 16 bits and the audio is also sampled at 16 kHz. There are
three major languages supported by Kinect thus far: English, Spanish, and Japanese with
plans to support other popular languages soon. The Kinect is always listening as long
as it is turned on, when the user says Xbox the user will be prompted to select one
of the options from the screen. Popular options are Play Game, Watch a Movie or
Sign In. One of the major techniques involved with the Kinects ability to block out noise
is known as echo cancellation.
Figure 3.4-Kinectic sensor
3.4.3 MOTORISED TILT
The Kinect comes equipped with a built in motor that is able to tilt the entire unit
up or down, expanding its field of view. Without moving, the Kinect is capable of having
a 43vertical viewing angle and a 57 horizontal viewing angle. With the addition of a tilt,
its vertical view is expanded to 27. The Kinect is powered via standard USB connection;
however it also requires a special type of connector for the motor. USB is capable of
supplying2.5W; however this is not enough power to run the sensor and the motor
simultaneously. So Microsoft developed a special connector that draws power from the
Xboxs power supply, however this comes with the newer Xboxs only. Older Xbox
models must have a separate power supply for the Kinect.
3.4.4 HUMAN DETECTION
Figure 3.5-overview of human detection method
3.4.4.1 Pre-processing
To prepare the data for processing, some basic pre-processing is needed. In the
depth image taken by the Kinect, all the points that the sensor is not able to measure depth
are offset to 0 in the output array. We regard it as a kind of noise. To avoid its interference,
we want to recover its true depth value. It is supposed that the space is continuous, and the
missing point is more likely to have a similar depth value to its neighbours. With this
assumption, we regard all the 0 pixels as vacant and need to be filled.
3.4.4.2 2D chamfer distance matching
The first stage of the method is to use the edge information embedded in the depth
array to locate the possible regions that may indicate the appearance of a person. It is a
rough scanning approach in that we need to have a rough detection result with a false
negative rate as low as possible but may have a comparatively high false positive rate to
provide to the next stage. We use 2D chamfer distance matching in this stage for quick
processing. Also, chamfer distance matching is a good 2D shape matching algorithm that
is invariant to scale, and it utilizes the edge information in the depth array which means the
boundary of all the objects in the scene.
We use Canny edge detector to find all edges in the depth array. To reduce
calculation and reduce the disturbance from the surrounding irregular objects, we eliminate
all the edges whose sizes are smaller than a certain threshold. We use a binary head
template and match the template to the resulted edge image. To increase the efficiency,
a distance transform is calculated before the matching process. This results in a distance
map of the edge image, where pixels contain the distances to the closest data pixels in the
edge image. Matching consists of translating and positioning the template at various
locations of the distance map; the matching measure is determined by the pixel values of
the distance image which lie under the data pixels of the transformed template. The lower
these values are, the better the match between image and template at this location.
3.4.4.3 Generate 3D model
Considering the calculation complexity of 3D model fitting is comparatively high,
we want the model to be view invariant so that we dont have to use several different
models or rotate the model and run several times. The model should generalize the
characteristics of the head from all views: front, back, side and also higher and lower views
when the sensor is placed higher or lower or when the person is higher or lower. To meet
these constraints and make it the simplest, we use a hemisphere as the 3D head model.
3.4.4.4 Extract contours
We extract the overall contour of the person so that we may track his/her hands and
feet and recognize the activity. In an RGB image, despite the person is standing on the
ground, it is less a problem to detect the boundary between the feet and the ground plane
using gradient feature. However, in a depth array, the values at the persons feet and the
local ground plane are the same. Therefore, it is not feasible to compute humans whole
body contours from a depth array using regular edge detectors. The same applies when the
person touches any other object that is partially in the same depth with the person. To
resolve this issue, we take advantages of the fact that persons feet generally appear upright
in a depth array regardless of the posture.
We use the filter response to extract the boundary between the persons and the
ground. We develop a region growing algorithm to extract the whole body contours from
the processed depth array. It is assumed that the depth values on the surface of a human
object are continuous and vary only within a specific range. The algorithm starts with a
seed location, which is the centroid of the region detected by 3-D model fitting. The rule
for growing a region is based on the similarity between the region and its neighboring
pixels. The similarity between two pixels x and y in the depth array is defined as:
S(x, y) =| depth(x) - depth(y) |
A B
Figure 3.6 Contour extraction
A- Original depth ray. Some parts of the body has been merged with the backgrond
B- Input depth ray to the region growing algorithm.
Start region growing until similarity between the region and neighboring pixels is higher
than a threshold
Initialize: region = seed ii.
(1) Find all neighboring pixels of the region
(2) Measure the similarity of the pixels and the region s1, s2 and sort the pixels
according to the similarity.
(3) If smin < threshold
(3.1). Add the pixel with the highest similarity to the region. (3.2). Calculate the
new mean depth of the region.
(3.3). Repeat (1)-(3)
else
Algorithm terminate iii. Return the region
C D
Figure 3.7 Region growing algorithm
C- Result of our region growing algorithm.
D-The extracted whole body contours are superimposed on the depth map.
The depth of a region is defined by the mean depth of all the pixels in that region:
3.4.4.5 TRACKING
Finally, we give preliminary results on tracking using depth information based on
our detection result. Tracking in RGB image is usually based on color, the assumption is
that the color of the same object in different time frames should be similar. But in depth
images we dont have such color information. What we have is the 3D space information
of the objects, so that we can measure the movements of the objects in a 3D space.
We assume that the coordinates and speed of the same objects in neighboring frames
change smoothly.
3.5 WORKING OF KINECT
The Kinect uses structured light and machine learning.
Inferring body position is a two-stage process: first compute a depth map
(using structured light), then infer body position (using machine learning).
The system uses many college-level math concepts, and demonstrates
the remarkable advances in computer vision in the last 20 years.
Fig.3.8- Stage 1-The depth map is constructed by analyzing a speckle
pattern of infrared laser light
The Kinect uses an infrared projector and sensor; it does not use its RGB
camera for depth computation.
The technique of analyzing a known pattern is called structured light.
The Kinect combines structured light with two classic computer vision techniques:
depth from focus, and depth from stereo structured light general principle: project
a known pattern onto the scene and infer depth from the deformation of that pattern.
Fig.3.9: Depth from focus uses the principle that stuff that is more blurry is
further away.
The Kinect dramatically improves the accuracy of traditional depth from focus. The
Kinect uses a special (astigmatic) lens with different focal length in x- and y-
directions.
A projected circle then becomes an ellipse whose orientation depends on depth
Depth from stereo uses parallax.
If you look at the scene from another angle, stuff that is close gets shifted to
the side more than stuff that is far away.
The Kinect analyzes the shift of the speckle pattern by projecting from one
location and observing from another.
Inferring body position is a two-stage process: first compute a depth map, then infer
body position.
Stage 1: The depth map is constructed by analyzing a speckle pattern of infrared
laser light.
Stage 2: Body parts are inferred using a randomized decision forest, learned
from over 1 million training examples.
The basic techniques used behind all these methods implemented for creating a
virtual reality environment for the users are:
Motion Sensor:
Kinect uses a motion sensor that tracks your entire body. So when you play, its
not only about your hands and thumbs. Its about all of you. Arms, legs, knees, waist,
hips and so on. Which means to get into the game, youll need to jump off the couch.
Skeletal Tracking:
As you play, Kinect creates a digital skeleton of your avatar based on depth data.
So when you move left or right or jump around, our sensor will process it and translate
it into gameplay.
Facial Recognition:
Kinect ID remembers who you are by collecting physical data thats stored in their
profile. So when you want to play again, your Kinect will know its you.
Voice Recognition:
Kinect uses four strategically placed microphones within the sensor to determine
a profile of the room you play in, which makes it calibrated perfectly to pick up your
voice.
CHAPTER 4
APPLICATIONS
4.1 VIRTUAL REALITY
As the technologies of virtual reality evolve; the applications of VR become
literally unlimited. It is assumed that VR will reshape the interface between people and
information technology by offering new ways for the communication of information,
the visualization of processes, and the creative expression of ideas. A virtual
environment can represent any three- dimensional world that is either real or abstract.
This includes real systems like buildings, landscapes, underwater shipwrecks,
spacecrafts, archaeological excavation sites, human anatomy, sculptures, crime
scene reconstructions, solar systems, and so on. Of special interest is the visual and
sensual representation of abstract systems like magnetic fields, turbulent flow
structures, molecular models, mathematical systems, auditorium acoustics, stock
market behavior, population densities, information flows, and any other
conceivable system including artistic and creative work of abstract nature. These
virtual worlds can be animated, interactive, shared, and can expose behavior and
functionality.
Useful applications of VR include training in a variety of areas (military,
medical, equipment operation, etc.), education, design evaluation (virtual
prototyping), architectural walk-through, human factors and ergonomic studies,
simulation of assembly sequences and maintenance tasks, assistance for the
handicapped, study and treatment of phobias (e.g., fear of height), entertainment, and
much more. Virtual reality appears to offer educational potentials in the following areas:
(1) Data gathering and visualization,
(2) Project planning and design,
(3) The design of interactive training systems,
(4) Virtual field trips,
(5) The design of experiential learning environments.
Virtual reality also offers many possibilities as a tool for non-traditional
learners, including the physically disabled and those undergoing rehabilitation who
must learn communication and psychomotor skills.
In industry, VR has proven to be an effective tool for helping workers
evaluates product designs. In 1999, BMW explored the capability of VR for verifying
product designs. They concluded that VR has the potential to reduce the number of
physical mock-ups needed to improve overall product quality, and to obtain quick
answers in an intuitive way during the concept phase of a product. In addition, Motorola
developed a VR system for training workers to run a pager assembly line (Wittenberg,
1995).
In the past decade medical applications of virtual reality technology have been
rapidly developing, and the technology has changed from a research curiosity to a
commercially and clinically important area of medical informatics technology. Virtual
reality is under exploration as a therapeutic tool for patients. For example, psychologists
and other professionals are using virtual reality as tool with patients that are afraid of
heights. NASA has developed a number of virtual environment projects. This includes
the Hubble Telescope Rescue Mission training project, the Space Station Coupola
training project, the shared virtual environment where astronauts can practice
reconnoitring outside the space shuttle for joint training, human factors, and
engineering design. NASA researcher Bowen Loftin has developed the Virtual
Physics Lab where learners can explore conditions such as changes in gravity. Virtual
reality can make it possible to reduce the time lag between receiving equipment
and implementing training by making possible virtual prototypes or models of the
equipment for training purposes.
In entertainment field, virtual realities are used in movies and games. One of the
advantages of using the VR games is that it creates a level playing field. These virtual
environments eliminate contextual factors that create inequalities between learners,
thereby interfering with the actual learning skills featured in the training program, that
is, interpersonal skills, collaboration, and team-building. Serious games are being more
and more deployed in such diverse areas as public awareness, military training, and
higher education. One of the driving forces behind this stems from the rapidly growing
availability of game technologies, providing not only better, faster, and more realistic
graphics, physics, and animations, but above all making the language of game
development accessible to increasingly more people. Game based simulations propose
an architecture for a professional fire-fighter training simulator that incorporates
novel visualization and interaction modes. The serious game, developed in cooperation
with the government agency responsible for the training of fire and rescue personnel, is
a good example of how virtual reality and game technology helps making the delicate
combination of engaging level design and carefully tuned learning objectives.
The emergence of augmented reality technology in the form of interactive
games has produced a valuable tool for education. The Live communal nature of these
games, blending virtual content with global access and communication, has resulted in
a new research arena previously called, edutainment but more recently called
learning games. Windows Live combined with Xbox 360 with Kinect technology
provides an agile, real-time environment with case-based reasoning, where learners
can enjoy games, simulations and face to face chat, stream HD movies and
television, music, sports and even Twitter and Facebook, with others around the world,
or alone, in the privacy of the home.
4.2 TELEIMMERSIVE CONFERENCING
Fig.4.1:Avatar Kinect virtual environment.
With increasing economic globalization and workforce mobilization, there is a
strong need for immersive experiences that enable people across geographically
distributed sites to interact collaboratively. Such advanced infrastructures and tools
require a deep understanding of multiple disciplines. In particular, computer vision,
graphics, and acoustics are indispensable to capturing and rendering 3D environments
that create the illusion that the remote participants are in the same room. Existing
videoconferencing systems, whether they are available on desktop and mobile devices
or in dedicated conference rooms with built-in furniture and life-sized high-definition
video, leave a great deal to be desiredmutual gaze, 3D, motion parallax, spatial audio,
to name a few. For the first time, the necessary immersive technologies are emerging
and coming together to enable real-time capture, transport, and rendering of 3D
holograms. The Immersive Telepresence project at Microsoft Research addresses the
scenario of a fully distributed team. Figure illustrates three people joining a
virtual/synthetic meeting from their own offices in three separate locations. A capture
device (one or multiple Kinect sensors) at each location captures users in 3D with high
fidelity (in both geometry and appearance). They are then put into a virtual room as if
they were seated at the same table.
The users position is tracked by the camera so the virtual room is rendered
appropriately at each location from the users eye perspective, which produces the right
motion parallax effect, exactly like what a user would see in the real world if the three
people met face to face. Because a consistent geometry is maintained and the users
position is tracked, the mutual gaze between remote users is maintained. In Figure 10,
users A and C are looking at each other, and B will see that A and C are looking at each
other because B only sees their side views. Furthermore, the audio is also spatialized,
and the voice of each remote person comes from his location in the virtual room. The
display at each location can be 2D or 3D, flat or curved, single or multiple, transparent
or opaque, and so forththe possibilities are numerous. In general, the larger a display
is, the more immersive the users experience. Because each person must be seen from
different angles by remote people, a single Kinect does not provide enough spatial
coverage, and the visual quality is insufficient. Cha Zhang at Microsoft Research, with
help from others, has developed an enhanced 3D capture device that runs in real time
with multiple IR projectors, IR cameras, and RGB cameras.
4.2 OTHER APPLICATIONS
Meet students from different schools.
Practice proper footwork for dancing (ballroom, square, etc.).
See if students really studied for a test by checking if they logged on to the
content
Enhance real-world environment.
Your Shape: Fitness Evolved game helps students with disabilities practice
range of motion on a prescribed schedule and then assess their performance.
Displays virtual humans so you can learn their parts and do an anatomy
course without the formaldehyde.
Easier for special needs people to play because they dont have to use hard
to hold and manage controllers.
Kinect Adventures is good for full body motion encouraging people to get
moving.
Kinectimals teaches students how to care and feed for a pet.
Kinect Sports can be used for students with disabilities that cannot
participate in gym.
This game can also be used to encourage movement and reduce boredom
during indoor recess.
Brings kids, parents, educators, mentors, etc together by breaking the ice.
Promotes teamwork.
Sportsmanship and fair play.
In Defence to control non-piloted automated weaponry
In deep space exploration
Underwater exploration
Remote surgery
CHAPTER 5
CONCLUSION
Kinect is a "controller-free gaming and entertainment experience" for the Xbox
360. By integrating all these techniques to a single console, Kinect act as a perfect
device for creating a virtual reality for the user. Several project researches are now
carried out using Kinect as the main tracking device. Some of the researches and
projects have already proved that Kinect is not just a gaming console, but also an eye
to a computer. The Kinect sensor offers an unlimited number of opportunities for old
and new applications. This article only gives a taste of what is possible. Thus far,
additional research areas include hand-gesture recognition, human-activity recognition,
body biometrics estimation (such as weight, gender, or height), 3D surface
reconstruction, and healthcare applications. Here, I have included just one reference per
application area, not trying to be exhaustive.
REFERENCES
[1] www.xbox.com/KINECT
[2] www.ieee.org
[3] http://kinecthacks.net/
[4] K. Sung, Recent Videogame Console Technologies, Computer, vol. 44, no. 2, pp.
9193, Feb. 2011.
[5] P. Doliotis, A. Stefan, C. McMurrough, D. Eckhard, and V. Athitsos, Comparing
Gesture Recognition Accuracy Using Color and Depth Information, in Proceedings of
the 4th International Conference on PErvasive Technologies Related to Assistive
Environments, 2011.
[6] Z. Ren, J. Meng, J. Yuan, and Z. Zhang, Robust hand gesture recognition with
kinect sensor, in Proceedings of the 19th ACM international conference on
Multimedia, 2011, pp. 759760.
[7] F. Kistler, D. Sollfrank, N. Bee, and E. Andr, Full Body Gestures enhancing a
Game Book for Interactive Story Telling, in Interactive Storytelling, vol. 7069,
Springer Berlin / Heidelberg, 2011, pp. 207218.

S-ar putea să vă placă și