Documente Academic
Documente Profesional
Documente Cultură
net/publication/309740491
CITATIONS READS
0 2,796
1 author:
Lakis Christodoulou
BIOMED MEDICAL SYSTEMS
39 PUBLICATIONS 87 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
AI-Machine Learning can predict Heart Attacks and Brain Strokes Better and more Accurately than Any Medical Doctor View project
Bioinformatics Processing & Analysis of Genomic Genetic Big Data View project
All content following this page was uploaded by Lakis Christodoulou on 08 November 2016.
Lakis Christodoulou
FINAL
Table of Contents
List of Figures
Figure 1. HARV ISR Stereo Vision Camera
Figure 2. HARV 3D Vision System
Figure 3. Capella - Stereo vision camera reference design
Figure 4. e-CAM_9V024_STEREO - Stereo Vision Camera Board Features
Figure 5. e-CAM_9V024_STEREO Camera Board
Figure 6. Point Grey - Bumblebee2 Stereo Vision Camera
Figure 7. Point-Grey - Stereo Vision Conversion from 2D to 3D
Figure 8. Sharp 3D Mobile Stereo Vision module
Figure 9. HTC EVO 3D
Figure 10. VGA Camera MT9V022 Module
Figure 11. Stereo Camera Set-up from Xilinx Spartan-3A DSP Video Starter Kit (VSK)
Figure 12. Active Stereo Vision System
Figure 13. USB OEM Stereo Vision Camera module
Figure 14. Quantum Stereo Vision Module
Figure 15. Red Rover's Stereo Vision Camera
Figure 16. An example 3D anaglyph made from NASA Mars rover navigation images
Figure 17. MobileRanger™ C3D Stereovision System
Figure 18. MobileRanger™ C3D Stereovision System
Figure 19. PCI nDepth™ 6cm Baseline Stereo Vision Camera
Figure 20. Videre Stereo and Multi-view Vision Camera Systems
Figure 21. US Army Stereo Vision Camera System
Figure 22. Microsoft Kinect 3D Camera
Figure 23. Microsoft Kinect Sensor Bock Diagram – SoC Hardware Architecture
Figure 24: Stereo Vision Triangulation
Figure 25. The Microsoft Kinect 3D Camera Sensor-System
Figure 26: PrimeSensor Reference Design Hardware
Figure 27. Primesense Depth Image CMOS
Figure 28. The SAFFiR Autonomous Car of the Future
Figure 29. The 6D-Vision’s Algorithm – Stereo Vision Diagram
Figure 30. Volvo Automotive Stereo Vision Camera System
Figure 31. Intelligent Stereo Goggles Vision
Figure 32. 3DSS Stereo Vision Camera-Scanner
Executive Summary
This technical report and book chapter summarizes the State of the Art of the latest and
most recent R&D 3D Stereo Vision Camera-sensors, advancements, and technologies in the
field of 3D Stereo Vision Camera-sensors and systems. The need of a research survey in 3D
Stereo Vision Camera-sensors and systems is of high importance since our research is based
on 3D Stereo image and video processing, and analysis algorithms for human surveillance,
security, and monitoring. Our work has been focused primarily on algorithms and techniques
for motion detection and object tracking in human behavior monitoring scenes. After having
investigated existing 3D Stereo Vision Camera-sensors and systems in the literature, we
proposed and implemented a robust 3D Stereo Vision for Multimodal Segmentation,
Detection, Recognition, and Tracking System for human scene segmentation and object
detection and tracking in video sequences with complex, moving background. The study of
3D Stereo Vision Camera-sensors will provide the essential understanding of 3D Stereo
Computer Vision, 3D Depth information, increase the performance of 3D Stereo
segmentation and recognition, and will support us to develop a low-cost 3D Stereo Vision
system for Multimodal Segmentation, Detection, Recognition, and Tracking System.
Depth Map and 3D Imaging Applications: Algorithms and Technologies present various 3D
algorithms developed in the recent years and to investigate the application of 3D methods in various
domains. Containing five sections, this book offers perspectives on 3D imaging algorithms, 3D shape
recovery, stereoscopic vision and autostereoscopic vision, 3D vision for robotic applications, and 3D
imaging applications. This book is an important resource for professionals, scientists, researchers,
academics, and software engineers in image/video processing and computer vision.
http://www.igi-global.com/book/depth-map-imaging-applications/52998
Abstract
The recent advancements in hardware multi-processing and 3D stereo computer vision have
triggered the 3D video technology to design and develop new Stereo vision cameras for a
variety of applications. The rapid deployment of fast hardware-electronic prototype, the
acceleration of the field-programmable gate array (FPGA) technology in designing and
developing new optimized and high performance digital electronic circuitry, and the
research advancements developed in the field of the 3D Stereo computer Vision and 3D
Stereo video processing-transmission-view-playback have started a huge technological and
industrial market for 3D stereo vision Camera-sensors and systems. There is a variety of
range-sensing hardware devices capable generating point clouds now available including
industry-standard stereo camera models, such as the Point Grey BumbleBee2, laser range-
finders, the Microsoft Kinect, and even smart phones like the HTC EVO 3D. Mobile robots
use various range sensors to construct representations of their surrounding environment in
order to make safe navigation decisions, to build useful maps and to acquire models from
the environment. Stereo cameras provide visual information about the environment. Two
overlapping images (i.e., `stereo') allows us to measure the visual disparity of corresponding
features in the scene, from which we can construct a depth map, or range image, of the
visible surface in the environment. These surfaces are represented as a collection of discrete
3D points with respect to the location of the stereo camera, often referred to as a `point
cloud.' The current paper consists of an overview of the latest advancements and
technologies of the developed stereo vision cameras.
The key solution of stereo vision is that it can be used to locate an object in 3D space. It can
also give valuable information about that object (such as color, texture, and patterns that
can be used by intelligent machines for classification). A visual system, or light sensor
retrieves a great deal of information that other sensors cannot. Stereo vision is also a passive
sensor, meaning that it uses the radiation available from its environment. It is non-intrusive
as it does not need to transmit anything for its readings. An active sensor sends out some
form of energy into the atmosphere, which it then collects for its readings. For example, a
laser sends out light that it then collects; and radar sends out its own form of
electromagnetic energy. A passive sensor is ideal when one wants to not influence the
environment or avoid detection.
Research Motivation
3D Stereo Vision Camera-sensors provide superior advantages than the standard mono-
scopic conventional camera sensors. The main benefit of the 3D Stereo Vision Camera-
sensor is that they can provide additional optical spatial view due to the stereo imaging
format of the side to side capture which is based on the disparity factor, the 3D Depth
information, and the avoidance of occlusion. The following advantages are:
• 3D Vision sensors simulate human eyes to provide both color and depth information of
each pixel in the field of view. Depth information allows researchers to develop efficient
algorithms to quickly extract out pixels within a defined distance, and identify objects of
interest from noisy visual background.
• 3D Vision accelerator provides a board-level solution for stereo disparity and depth,
relieving the computer CPU from this demanding task.
• 3D Stereo Vision can dramatically increase the rate of detection, classification, and
tracking performance.
• 3D Stereo Vision costs thousands of US dollars when using laser or scanners, whereas
inexpensive stereo vision camera-sensors can offer a low-cost stereo vision solutions
Keywords 3D stereo vision, computer vision, depth map, stereo camera
Problem Statement
Estimation Principle
Today, many sensors are used for various purposes. The main principles used for
people/object detection are infrared ray, stereo camera, and time of flight.
Of the several distance measuring methods available, the stereo camera method. This
method uses two images acquired from a right eye CMOS sensor and a left eye CMOS
sensor. It compares the difference between the two images and calculates the distance to
the object. Compared to infrared ray sensors, which use only one or a few distance
measurement spots, the stereo camera uses images to measure distance, which makes it
possible to acquire distance information for each pixel. With this capability, stereo camera
sensors can detect not only the existence of the object, but also its size and movement.
Ricoh’s assembly technology is what made this high accuracy and miniaturization possible.
To minimize module size, Ricoh adopted the smallest possible image pick up devices.
Extraordinary accuracy is required for assembly to minimize the influence of miss
registration.
The adhesive process has a critical impact on the registration accuracy of the module. With
conventional methods, bonds are typically used, but Ricoh adopted laser welding (See
Table1).
Required time
(Registration and 30 seconds/piece Few minutes/piece
fixing)
By using laser welding, Ricoh was able to avoid miss registration caused by glue hardening
and scattering of glue application. Beyond that, instant adhesion significantly shortened the
process time.
Appling laser welding enabled Ricoh to satisfy miniaturization and achieve high accuracy.
Applications - Technologies
Being able to solve the stereo matching problem is important in two respects. Firstly, it
sheds light on the way how human depth perception might work. Secondly, there are
numerous applications in computer vision. For example, automated 3D visualisation of
terrain and cities has recently gained popularity. In this context, stereo-derived 3D models
can be integrated into Google Earth to allow the user to take a walk in a 3D computer
reconstruction of Vienna. In medical imaging, 3D reconstructions of organs created from
multiple 2D (MRI) images can aid in diagnosis. Apart from visualisation, stereo
reconstructions can be applied for robot navigation (autonomously driving car), but also to
assist handicapped (blind) people to navigate in their environment. Without being
exhaustive, other applications include 3D tracking (surveillance, pose estimation, human-
computer interaction), depth segmentation (z-keying), industrial applications (quality
assurance) and novel view generation (free viewpoint video), to name just a few of them.
Basically, whenever one needs to infer geometric information from the surrounding world,
stereo vision represents a low-cost and non-intrusive alternative to active devices, such as
range finders.
We have added value to our products by combining various technologies with stereo
camera distance measurement sensor. And we were able to halve the auto-focus time by
using the distance sensor as an adjunct of the contrast method auto focus traditionally used
for auto-focusing digital cameras.
Beyond that, the sensor can be used as an industrial embedded module (Figure3)
incorporating a high speed distance calculation chip that provides luminance image and
distance information simultaneously.
This sensor module, which achieves both miniaturization and high accuracy, is not limited to
consumer products. It can also be used as an input device for image recognition, or as a
embedded module for industrial goods.
There is a large interest in Stereo Vision Cameras and Systems with great potential in the
R&D sector for stereoscopic remote vision, robot vision, tele-medical vision, automotive
vision, and many other applications. Stereo Vision camera systems show great potentials
and opportunities in hybrid sensor systems including GPS/GPRS , laser telemetry sensor,
digital signal processing and intelligent video analysis, 3D Depth View and 3D Object
segmentation, Infrared IFR Night Vision sensors, etc.
Stereo Vision Camera Sensors or Systems are identified in the following categories:
The remote vision work focuses on human/robotic interfaces, compact sensor gimbals, and
head-aimed vision systems for military and industrial applications
Based on the same technological concept, Chatten Associates has also presented the HARV
3D Vision System [2], which provides an end-to-end solution for high-definition, immersive
visual tele-presence with natural stereoscopic depth perception.
This vision system shows that stereo and high definition each offer advantages over
standard definition monoscopic camera video, particularly for remote manipulation. The
HARV-3D Head-Aimed Remote Viewer provides the viewers with an immersive tele-presence
associated with natural stereoscopic depth perception. The HARV-3D Digital Video provides
low latency, stereo digital video encoding/decoding, and ethernet video and control transport
based on the H.264/AVC Video codec. The HARV-3D Vision system supports a Stereo HD Zoom,
Stereo SD Zoom, Micro Stereo, Mono HD Zoom, Mono SD Zoom.
HD Spatial Resolution:
Visible & NIR
Resolution: HD (720p/59.94 or 1080i/59.94*)
and SD (NTSC crop or squeeze)
Aspect Ratio: 16:9
HFOV: 63.5° wide
Lakis Christodoulou, All Rights Reserved ® 2013 , CopyRights© 2013 30-Oct-2013
Electrical & Computer Engineering & Computer Science
Lakis Christodoulou FINAL
SD Spatial Resolution:
Visible
Resolution: SD (NTSC)
Aspect Ratio: 4:3
HFOV: 40°
The HARV 3D Vision System includes a HARV-3D Gimbal paired with a HARV-3D Viewer
(wherever you look, the camera looks), with options for low-latency digital video &
transport.
The Stereo vision Camera Reference Design, Capella [3] , can be used to develop their Stereo
vision algorithms and also by people who would want to integrate Stereo vision in their
product design. The Capella features the Gumstix® Overo® COM, Tobi base board and a
camera daughter card e-CAM_9V024_STEREO. The e-CAM_9V024_STEREO delivers pixel-
synchronous stereo frames to the OMAP/DM37x processor.
Capella is the world’s first embedded pixel synchronous Stereo Vision Camera Reference
Design for Texas Instruments’ (TI) OMAP35x and AM/DM37x processor on Gumstix® Overo®
COMs, designed and developed by e-con Systems.
e-con Systems offers a complete ready-to-use Stereo camera reference design along with
the software drivers and SDK. This Stereo Vision Package contains the Stereo Camera Board,
e-CAM_9V024_STEREO with pre-aligned and pre-calibrated M12 lenses assembled over
Gumstix® Tobi baseboard powered by Gumstix® Overo® COM in an ABS enclosure. All the
necessary software with V4L2 stereo camera drivers and OpenCV library will be preloaded in
the SD card of the Overo COM. A set of external interface cables, power adaptors including a
mini tripod for convenient mounting and CD with necessary documentations, illustrations
and sample stereo test applications are also included in the package.
Software support
The following picture shows the major hardware elements of the Capella - Stereo vision
reference Design. The Gumstix® Overo® COM along with the Tobi base board used in the
Reference Design kit is shown along with the e-CAM_9V024_STEREO, Stereo camera board
and the interfacing 27pin flex cable.
PointGrey has presented the BubleBee Stereo Vision Camera System based on two and three
view vision sensing.
Point Grey Research unveiled the Bumblebee2 stereo vision camera system with faster
acquisition times, improved 3-D data quality, on-board colour processing and GPIO
connectors for external trigger and strobe functionality. The binocular Bumblebee2 contains
two 1/3 progressive scan CCD sensors, and transmits both the left and right images to a PC
via an IEEE-1394 interface. All Point Grey Research stereo vision products include the
Digiclops and Triclops SDK (software development kit), enabling users to control camera
settings, adjust image quality and access real-time depth range images using stereo vision
technology. Extensive sample programmes and source code are also included for ease of
integration. The camera is available in color or monochrome as a 640 x 480 at 48 FPS or a
1024 x 768 at 18 FPS option, and offers a choice of 3.8 or 6.0 mm focal length lenses.
Lenses: 2.5mm (100° HFOV), 3.8mm (65° HFOV), 6mm (43° HFOV) focal lengths
*Based on a stereo resolution of 640x480 and is valid for all camera models. Calibration
accuracy will vary from camera to camera.
Distance/depth (Z) is calculated using the above equation. The unit of the depth will be
measured in real world linear measurements (i.e. meters).
(r) is the focal length. The focal length can be calculated by placing a reference object of
known linear width at a known distance from the cameras. Measure the length of the object
as seen by the camera (unit in pixels). The ratio between pixels and real linear distance is
used later in conversion of units for (a) and (c). The distance at which this ratio is calibrated
is the focal length; therefore the unit of the focal length should be in real linear
measurements (i.e. centimeters).
(a) and (c) are the optical displacement of the target; these measurements' units are in
pixels initially. To find (a), measure the distance between the center of camera one's view
and the center of the target. (c) is found the same way in the case of camera two. Convert
these distances from pixels into centimeters using the reference ratio/focal length so that
distance(Z) will have the correct units.
This whole idea works around the principles of optical displacement. The closer an object is,
the more it will appear to move when the viewer changes perspectives. In vice versa, the
farther away an object is, the less it will appear to move when the viewer changes
perspectives in the same way. The reason the sun is visible for so long in the day despite the
earth's motion is because the sun is far enough away that the optical displacement is small.
Cars on the highway move much slower relative to a stationary viewer than the earth does
relative to a stationary sun, but those cars move out of view much faster than the sun moves
out of the earth's view. The optical displacement of those moving cars is much greater than
the sun's optical displacement, because the distance between the target (cars) and the
viewer is so much smaller than in the other case where the sun is the target and the earth is
the viewer. Distance calculation via optical displacement measurement does not require
target dimensions to be known, making it an ideal method for any kind of exploratory
mission.
Applications
In 2010 Sharp has developed a 3D Camera module for mobile devices capable of capturing
High-Definition (HD) 3D Video Images using a progressing capturing process of 720 effective
scanning lines (progressive scanning system) and a spatial resolution: 1280 H x 720 V pixels.
Deploying stereo vision for mobile and wireless handheld devices can support 3D vision and
depth mapping for bringing real time face to face communication and also offer 3D video
games for i-Pad.
Embedding this camera module in mobile devices such as digital cameras, mobile phones,
and smartphones such as the Sharp Aquos SH-12C and HTC EVO 3D smartphone include a
stereoscopic (dual-lens) rear-facing 3D camera. The latest feature people want to see on
smartphones is 3D and with devices like the LG Optimus 3D and HTC EVO 3D soon to hit the
market, they now have a new player to contend with in the shape of the Gingerbread totting
Sharp Aquos SH-12C Phone, featuring 3D and dual cameras.
The Sharp Aquos SH-12C Phone is not a name that is going to roll of peoples tongues easily.
However, with dual cameras sitting nicely on its back that go all the way up to 8 megapixels
(compared to that of the HTC EVO 3D‘s 5 megapixels),
HTC EVO 3D
The current HTC EVO 3D Capture and view life in 3D - no glasses required
Dual 5 megapixel camera lenses and a 720p HD camcorder with stereo sound recording let
you capture an immersive life that jumps out at you
15. 2D/3D Switch allows you to set the camera to capture photos or videos in 2D or 3D.
17. Camera lenses allow you to capture high-definition photos and videos in 2D or 3D.
HTC is a bit late on the mobile 3D front as one of its competitors, namely LG, has already
introduced a 3D-capable smartphone called Optimus 3D that features an autostereoscopic
3D display as well as a 3D camera. However HTC is now ready to offer its own alternative in
the form of the new EVO 3D smartphone, a mobile device that is… surprisingly similar to the
Optimus 3D from LG. The HTC EVO 3D features a 4.3-inch autostereoscopic 3D display (not
requiring you to wear glasses to get the 3D effect) with a 540×960 QHD resolution in 2D
mode. The new 3D-capable smartphone from HTC weights 170 grams (6.0 ounces) with the
battery and has the following dimensions: 126x65x12.05 mm (4.96″ x 2.56″ x 0.47″) which is
a slightly bigger than a traditional high-end smartphone. The phone uses a fast 1.2GHz dual-
core Snapdragon processor and features a higher capacity 1730 mAh battery to ensure
longer use of the device. On the back fo the device there is a 3D camera using two 5
megapixel sensors with a dual-LED flash in between them. The 3D camera can record 5
megapixel 2D images or 2 megapixel 3D photos as well as 720p video sin 2D or 3D mode
The device’s camera system lets you capture photos and videos in glasses-free 3D. That
means you can enjoy viewing the 3D media you’ve captured without
wearing 3D glasses.
To capture photos and record videos in 3D, slide the 2D/3D switch to 3D before you take a
picture or record
a video clip.
To record video in HD, set Video quality to HD 720P
(1280 x 720).
high-definition MP4 video formats
in Gallery:
" H.263 profile 0 @ 30 fps, WVGA (800x480), max
2 Mbps
" MPEG-4 simple profile @ 30 fps, 720p
(1280x720), max 6 Mbps
" H.264 baseline profile @ 30 fps, 720p (1280x720),
max 6 Mbps
More information about this camera can be found in the MT9V022 product brief available from
Micron Technology Inc.
Figure 11. Stereo Camera Set-up from Xilinx Spartan-3A DSP Video Starter Kit (VSK)
The camera provided with VSK is based on a Micron MT9V022 CMOS image
sensor. The image sensor is interfaced to the FMC-Video via an RJ45 connector with
a proprietary pin assignment. A standard CAT6 Ethernet cable can be used to connect
the camera to FMC-Video.
Interface Standards/Protocols:
A typical Active Vision-System is a camera-head which can control some (or all) of the
following parameters:
Spatial-position
Orientation of the viewing-direction Neck-movements (Pan- and Tilt-motors)
Focus
Zoom
Iris
Further camera parameters
http://ni.www.techfak.uni-bielefeld.de/node/2908
USB camera with 2 image sensors which are simultaneously triggered for exposure.
This camera gives you the raw data straight from the sensors, so you can write your own
stereovision software for it.
Features
http://www.visionhardwarepartner.nl/products/camera-%252aslash%252a-
imaging/camera/oem-stereovision-camera.html
http://www.visionhardwarepartner.nl/products/camera-%252aslash%252a-
imaging/camera/oem-stereovision-camera.html
QuantumVision's 3-D system performs stereo vision with two image sensors
The Hammerhead 3-D machine-vision system [5] provides stereo vision without multiple
cameras. It measures 6 × 2.37 × 1.4 in., and includes two image sensors, integrated
environmental control for cold-temperature applications, Ethernet , PoE, RS-232, RS-
422/485, USB, and industrial digital I/O. Synchronization of the left and right image sensors
is automatically handled by the hardware. It can address security and surveillance, robotic
pick and place, vehicle guidance, quality assurance, sorting, material handling, and optical
gauging applications. The system can improve 2-D image-processing with a library of
algorithms that include 1-D and 2-D barcodes, linear measurement tools, and pattern
matching.
QuantumVision
Four decades ago, Apollo astronauts landed on the moon and captured 3D images of the
lunar surface. Astrobotic will return to the moon and not only generate 3D imagery, but also
produce high-definition 3D video. This media is used for driving, exploration, science, and to
convey a rich, remote experience.
Red Rover is equipped with two stereo cameras that extract 3D structure and create maps of
the moon. The rover uses these maps to plan a safe path around obstacles, such as rocks or
craters. The locations of these obstacles are detected by measuring the disparity between
the obstacle’s position in the left and right stereo images. Human eyes detect the position of
objects and perceive depth in much the same way. If only one camera were used for
navigation, precise positions of obstacles relative to the rover would be very difficult to
determine.
Additionally, scientists and the public alike can move through these 3D maps to experience
what it would be like to walk on the moon. Soon all of mankind can take that “one small
step” and walk in Neil Armstrong’s footprints.
Figure 16. An example 3D anaglyph made from NASA Mars rover navigation images. The
yellow lines illustrate the disparity between matching objects from each stereo photo. The
greater the disparity, the closer the object is to the rover
RE2's manipulation technologies are also being used on mobile platforms. They are
developing a Robotic Nursing Assistant (RNA) to help nurses with difficult tasks, such
as helping a patient sit up, and transferring a patient to a gurney. The RNA uses a
mobile hospital platform with dexterous manipulators to create a capable tool for
nurses to use. RE2 is also working on an autonomous robotic door opening kit for
unmanned ground vehicles.
RE2 had it own set of challenges: build a robust and capable hardware and software
platform for these participants to use. The ARM robot is a two-arm manipulator with
sensor head. The hardware, valued at around half a million dollars, includes:
• Manipulation
• Two Barrett WAM arms (7-DOF with force-torque sensors)
• Two Barrett Hands (three-finger, tactile sensors on tips and palm)
• Sensor head
Lakis Christodoulou, All Rights Reserved ® 2013 , CopyRights© 2013 30-Oct-2013
Electrical & Computer Engineering & Computer Science
Lakis Christodoulou FINAL
http://www.ros.org/news/robots/mobile-manipulators/
Figure 17. MobileRanger™ C3D Stereovision System. Left: A custom built trinocular stereo
cluster made of 3 Point Grey gray scale cameras and 1 color camera. Middle: A TYZX
DeepSea G2 stereo network camera with on board CPU. Right: A G2 camera with side
mounted FLIR Photon thermal infrared camera.
3D MLI sensor
3D MLI Sensor™.
http://www.tyzx.com/products/DeepSeaG2.html
MobileRanger
http://www.mobilerobots.com/Libraries/Downloads/MobileRobots_MobileRangerC3D_Ster
eocamera_Datasheet_-_ACA0195_ACA0196_ACA0295_ACA0296.sflb.ashx
your applications needs. The stereo camera connects easily to the PCI card using one
standard CAT6 cable up to 5 meters in length. The cable carries all necessary power, data,
and control for the stereo camera. With progressive scan, global shutter and low-noise
imaging technology, this stereo camera is ideal for a wide variety of imaging applications in
real-world environments.
Technical Specification
-
nDepth™ Vision Processor Subsystem
Resolution WVGA (752x480)
Disparity Frame Rate 30 frames per second WVGA with 92 disparity
levels.
Disparity Range Up to 124.
Camera Callibration Calibration coeficients generated at the factory.
Processor rectifies and undistorts images in real-
time
Calibration Error 0.1 pixel RMS error.
Stereo Algorithm Sum of Absolute Differences (SAD) with 9x9 block
matching.
Left/Right Check Identifies places where correlation is contradictory
and thus uncertain
Host Interface Standard PCI 33, direct DMA access.
Processor Upgrades Ability to upgrade processor functionality in the
field.
6cm Baseline Stereo Vision Camera
Resolution Two 752x480 1/3-inch wide-VGA CMOS digital
image sensors.
Frame Rate Programmable up to 60 frames per second.
Baseline 6cm. Contact us for custom baseline cameras.
Mounting Includes three standard Tripod mounts on the
Videre Design
a standard 1394 PCI board or PCMCIA card can be used. The card must be OHCI (Open Host
Controller
• 320x240
• 640x480
• 1280x960
http://www.videredesign.com/index.php?id=1
http://www.videredesign.com/assets/docs/manuals/smallv-4.4d.pdf
Designed and tested on vehicles as depicted in Figure 1, the system can detect both
stationary and moving pedestrians and exploits passive sensors, which detect apparent
motion by comparing the change in infrared temperature when, for example, a human
passes in front of an infrared source with a different temperature, such as a building.
At the start of four processing steps, the two stereo systems are used independently to scan
the target area. In this phase, different approaches are used to highlight portions of the
images that warrant further attention. For example, warm areas are detected on FIR images,
the density of edges from FIR and daylight images, and techniques such as disparity space
image, among others, further process the initial data.
Stereo-based computation of the scene allows the 3D position of features such as roads, as
well as their slope, distance, and size to be measured against the calibration parameters of
the system so that features incompatible with the presence of a person (or a small group of
people) can be discarded.
In the second step, areas highlighted in the two different spectra are filtered and fused
applying symmetry, size, and distance constraints. In the third step, different models and
filters are used to evaluate the presence of human shapes, which include neural networks,
adaptive boosting, and others.3
Microsoft has recently released the Microsoft Kinect for video gaming as a real-feeling
game-experience. The Kinect sensor is also a major breakthrough for robotics and stereo
vision application based on the interesting stereo vision capabilities and features that
provides.
Microsoft Kinect has introduced a very smart and successful stereo vision circuitry and
stereo vision concept-methodology. Microsoft Kinect has applied the stereo vision theory by
combining the color and the depth image captured, in which one can project the color image
back out into space and create a "holographic" representation of the persons or objects that
were captured.
The interesting part of the Kinect sensor is that it can easily converted into a 3D camera by
combining the depth and color image streams received from the device, and projecting them
back out into 3D space in such a way that real 3D objects inside the cameras' field of view
are recreated virtually, at their proper sizes.
The following figure shows the Kinect sensor-system hardware block diagram. The Kinect
contains a regular color camera, sending images of 640x480 pixels 30 times a second. It also
contains an active-sensing depth camera using a structured light approach (using what
appears to be an infrared LED laser and a micromirror array), which also sends (depth)
images of 640*480 pixels 30 times a second (although it appears that not every pixel is
sampled on every frame). Kinect contains a RGB camera, a depth sensor,
Figure 23. Microsoft Kinect Sensor Bock Diagram – SoC Hardware Architecture
PS1080 SoC Functionality. Photo by ifixit from:
http://www.ifixit.com/Teardown/Microsoft-Kinect-Teardown/4066/1
How does a Kinect sense depth?
– The IR emitter projects an irregular pattern of IR dots of varying intensities.
– The Depth Camera reconstructs a depth image by recognizing the distortion in this pattern.
The Microsoft Kinect is an accessory for the XBOX 360 console that turns the user’s body
into the controller. It is able to detect multiple bodies simultaneously and use their
movements and voices as input. The hardware for the Kinect is comprised of a color VGA
camera, a depth sensor, and a multi-array microphone. The VGA camera is used to
determine different features of the user and space by detecting RGB colors. It is mainly used
for facial recognition of the user. The multi-array microphone is a set of four microphones
that are able to isolate the voices of multiple users from the ambient noises in the room,
therefore allowing users to be a few feet away from the device but still be able to use the
voice controls. The third component of the hardware, the depth sensor (generally referred
to as the 3D camera), has two parts to it: an infrared projector and a CMOS (complimentary
metal-oxide semiconductor) sensor. The infrared projector casts out a myriad of infrared
dots that the CMOS sensor is able to “see” regardless of the lighting in the room. This is,
therefore, the most important portion of the Kinect which allows it to function. But there is
a second component that would render the Kinect quite useless otherwise: the software
that interprets the inputs from the hardware. These two components will be the main focus
of the paper.
User videos posted on YouTube show the Kinect’s large array of scattered infrared dots
literally painting the user’s living room in a swathe of green lights. The rays are cast out via
the infrared projector in a pseudo-random array across a large area. The CMOS sensor is
able to then read the depth of all of the pixels at 30 frames per second. It is able to do this
because it is an active pixel sensor (APS), which is comprised of a two-dimensional array of
pixel sensors. Each pixel sensor has a photo detector and an active amplifier. This camera is
used to detect the location of the infrared dots.
Following this, depth calculations are performed in the scene using a method called Stereo
Vision Triangulation. Stereo Triangulation requires two cameras to be able to perform this
calculation. The depth measurement requires that corresponding points in one image need
to be found in the second image. Once those corresponding points are found, we can then
find the disparity (the number of pixels between a point in the right image and the
corresponding point in the left image) between the two images. If the images are rectified
(along the same parallel axis), then, once we have the disparity, we can then use
triangulation to calculate the depth of that point in the scene.
Stereo vision is a technique that uses two cameras to measure distances from the cameras,
similar to human depth perception with human eyes. The process uses two parallel cameras
aligned at a known distance of separation. Each camera captures an image and these images
are analyzed for common features. Triangulation is used with the relative position of these
matched pixels in the images as seen in Figure24 below.
Triangulation requires knowing the focal length of the camera (f), the distance between the
camera bases (b), and the center of the images on the image plane (c1 and c2). Disparity (d) is
the difference between the lateral distances to the feature pixel (v2 and v1) on the image
plane from their respective centers. Using the concept of similar triangles, the distance from
the cameras (D) is calculated as D = b * f / d.
The result for the computer vision system is a depth field map which is a grayscale image of
equal size to the original image. Each gray level represents a distance from the camera. For
example, a black pixel identifies a pixel in the computer’s vision as being at infinity distance
and a white pixel signifies being at near-infinity. This processing can be on a computer, but
some cameras do exist that do the processing on-board using FPGA.
This Stereoscopic Triangulation requires two cameras, but the Kinect is unique in that the
depth sensor only has one camera to perform these calculations. This is because the infrared
projector is, in and of itself, a “camera” in the sense that it has an image to compare with
the image taken from CMOS sensor camera. The projected speckles are semi-random in the
fact that they are a generated pattern that the Kinect knows where they can be found. Since
the device knows where the speckles are located, it has an image which can be compared to
find the focal points. The CMOS sensor captures an offset image to detect differences in the
scene where the disparity between dots can be analyzed and the depth can therefore be
calculated. I am assuming that the images are rectified, making it simple to calculate the
depth with the equation in Figure 24. After the depth calculations are obtained, all the data
is interpreted and used in the system.
Figure 25. The Microsoft Kinect 3D Camera Sensor-System ( an I.R. transmitter, 3D Depth
Sensors, (RGB) Camera, a multi-array microphone, and a motorized tilt base)
The RGB camera delivers the three basic color components, displays the video and helps
enable facial recognition. It outputs video at a frame rate of 30 Hz and uses a maximum
resolution of 640 × 480 pixels, 32-bit color.
The 3D depth sensor in Figure 25,[39], consists of an infrared laser projector which captures
video data in 3D under any lightning conditions. The laser is projected into the room. The
sensor is able to detect the information based on what is reflected back at it. Together, the
projector and sensor create a depth map. Thus, the 3D depth camera provides detailed 3D
information about the environment. Simply said, it determines how far away an object is
from the camera. It has a practical ranging limit of 1.2–3.5 m distance when used with the
Xbox software.
The infrared (IR) camera is used for tracking the movement and the depth. Combined with
an IR emitter, the IR camera spotlights the room with invisible infrared light. Thus, the eye
does not see the IR light, and the lightening becomes a non-issue for Kinect. The multi-array
microphone enables voice recognition to recognize different voices in a room among the
different players, and it extracts the ambient noise. The four microphones are located along
the bottom of the Kinect and they dictate the size and shape of the sensor device. The
microphone array operates with each channel processing 16-bit audio at a sampling rate of
16 kHz.
The motorized tilt is a pivot for sensor adjustment to track the users, even if they move
around. It is capable of tilting the sensor up to 27° either up or down, while the angular field
of view is of 57° horizontally and 43° vertically.
The area required to use Kinect is approximately 6m², although the sensor can maintain
tracking through an extended range of near 0.7 m to 6 m. In the Kinect manual it is specified
that the sensor can detect the users approximately 2 meters from the sensor. While for two
people, the user should stay approximately 2.5 meters from the sensor.
Kinect is capable of simultaneously tracking up to six people, including two active players
and it can track 20 joints per player in real time. However, PrimeSense, which developed the
3D depth sensors, has stated that the number of people the device can "see" (but not
process as players) is only limited by how many will fit in the field-of-view of the camera.
PrimeSensor 3ITE Middleware
PrimeSense, an Israeli startup, is the leader in sensing and recognition solutions, and its
product portfolio includes the PrimeSensor Reference Design hardware, a 3D data
generation unit and the PrimeSense NITE Middleware.
The PrimeSensor Reference Design (Figure 26) [40], is a low-cost, plug-and-play USB device.
This solution enables a device to perceive the world in 3D and to translate these perceptions
into a synchronized depth image, in the same way that humans. Basically, the Reference
Design generates real time depth, color and audio data of the scene.
The 3D data generation unit is used for the 3D sensing technology for Kinect Camera device.
It is a motion-control system that lets the players control the interface through full-body
gestures.
It is important to understand the difference between 3D cameras like the Kinect on one
hand, regular (2D) cameras on the other hand, and so-called "3D cameras" -- actually,
stereoscopic 2D cameras -- on the third hand (ouch).
Any camera, 2D or otherwise, works by projecting 3D objects (or people...), which you can
think of as collections of 3D points in 3D space, onto a 2D imaging plane (the picture) along
straight lines going through the camera's optical center point (the lens). Normally, once 3D
objects are projected to a 2D plane that way, it is impossible to go back and reconstruct the
original 3D objects. While each pixel in a 2D image defines a line from that pixel through the
lens back out into 3D space, and while the original 3D point that generated the pixel must lie
somewhere on that line, the distance that 3D point "traveled" along its line is lost in
projection. There are approaches to estimate that distance for many pixels in an image by
using multiple images or good old guesswork, but they have their limitations.
A 3D camera like a Kinect provides the missing bit of information necessary for 3D
reconstruction. For each 2D pixel on the image plane, it not only records that pixel's color,
i.e., the color of the original 3D point, but also that 3D point's distance along its projection
line. There are multiple technologies to sense this depth information, but the details are not
really relevant. The important part is that now, by knowing a 2D pixel's projection line and a
distance along that projection line, it is possible to project each pixel back out into 3D space,
which effectively reconstructs the originally captured 3D object(s). This reconstruction,
which can only contain one side of an object (the one facing the camera), creates a so-called
Lakis Christodoulou, All Rights Reserved ® 2013 , CopyRights© 2013 30-Oct-2013
Electrical & Computer Engineering & Computer Science
Lakis Christodoulou FINAL
facade. By combining facades from multiple calibrated 3D cameras, one can even generate
more complete 3D reconstructions.
There exist stereoscopic cameras on the market, which are usually advertised as "3D
cameras." This is somewhat misleading. A stereoscopic camera, which can typically be
recognized by having two lenses next to each other, does not capture 3D images, but rather
two 2D images from slightly different viewpoints. If these two images are shown to a viewer,
where the viewer's left eye sees the image captured through the left lens, and the right eye
the other one, the viewer's brain will merge the so-called stereo pair into a full 3D image.
The main difference is that the actual 3D reconstruction does not happen in the camera, but
in the viewer's brain. As a result, images captured from these cameras are "fixed." Since they
are not really 3D, they can only be viewed from the exact viewpoint from which they were
originally taken. Real 3D pictures, on the other hand, can be viewed from any viewpoint,
since that simply involves rendering the reconstructed 3D objects using a different
perspective.
While it is possible to convert stereo pairs into true 3D images using computer vision
approaches (so-called depth-from-stereo methods), those do not work very well in practice.
Figure 27. Primesense Depth Image CMOS. (The I.R. invisible light is emitted and tracked by
the depth image CMOS. The PS1080 SoC then generates a depth image.)
This is a Prime Sense diagram explaining how their reference platform works. The Kinect is
the first (and only) implementation of this platform.
One camera (and one IR transmitter) provide input for the depth map (rumored to be just
320x240), while the third camera detects the human visual spectrum at 640x480 resolution
Saffir Robot
SAFFiR, also known as the Shipboard Autonomous Firefighting Robot, is being shaped by
scientists at the Naval Research Laboratory. America’s Naval Research Laboratory has
The robot is designed with enhanced multi-modal sensor technology for advanced
navigation and a sensor suite that includes a camera, gas sensor, and stereo IR camera to
enable it to see through smoke. Its upper body will be capable of manipulating fire
suppressors and throwing propelled extinguishing agent technology (PEAT) grenades. It is
battery powered that holds enough energy for 30 minutes of firefighting. Like a sure-footed
sailor, the robot will also be capable of walking in all directions, balancing in sea conditions,
and traversing obstacles.
http://www.nrl.navy.mil/media/news-releases/2012/nrl-designs-robot-for-shipboard-
firefighting
The near future cars will be advanced based on multi-sensor devices for enabling radar
range, communication latency, and pixel resolution.
http://www.wired.com/magazine/2012/01/ff_autonomouscars/3/
1 Radar
High-end cars already bristle with radar, which can track nearby objects. For instance,
Mercedes’ Distronic Plus, an accident-prevention system, includes units on the rear bumper
that trigger an alert when they detect something in the car’s blind spot.
2 Lane-keeping
Windshield-mounted cameras recognize lane markings by spotting the contrast between the
road surface and the boundary lines. If the vehicle leaves its lane unintentionally, brief
vibrations of the steering wheel alert the driver.
3 LIDAR
Google employs Velodyne’s rooftop Light Detection and Ranging system, which uses 64
lasers, spinning at upwards of 900 rpm, to generate a point cloud that gives the car a 360-
degree view.
4 Infrared Camera
Mercedes’ Night View assist uses two headlamps to beam invisible, non-reflective infrared
light onto the road ahead. A windshield-mounted camera detects the IR signature and shows
the illuminated image (with hazards highlighted) on the dashboard display.
5 Stereo Vision
Mercedes’ prototype system uses two windshield-mounted cameras to build a real-time 3-D
image of the road ahead, spotting potential hazards like pedestrians and predicting where
they are headed.
6 GPS/Inertial Measurement
A self-driver has to know where it’s going. Google uses a positioning system from Applanix,
as well as its own mapping and GPS tech.
7 Wheel Encoder
Wheel-mounted sensors measure the velocity of the Google car as it maneuvers through
traffic.
Mercedes-Benz 6D-Vision
The “6D-Vision” system from Mercedes-Benz uses two cameras that view their surroundings
in the same manner that a human being’s two eyes do. This stereo arrangement enables 3D
depiction of the vehicle’s surroundings in real time. The system uses this information to
identify every object around the vehicle and assess the risk it might pose for a potential
collision.
They modeled their system on the function of the human eye and brain whose abilities they
were even able to surpass. The new “6D vision” technology succeeds in identifying children
at play at the side of the road in less than 0.2 seconds – a human being takes more than
twice as long. To achieve this remarkable feat, a stereo camera records three dimensional
images in rapid succession of the surroundings in front and next to the vehicle. An algorithm
developed just for this purpose analyzes the images virtually instantaneously. By comparing
the sequence of images, the system also recognizes whether and how fast objects such as
cyclists, pedestrians, or cars are moving. It even works very reliably in inclement weather
and at twilight.
Daimler will soon be including 6D vision systems in the Mercedes vehicles series – as the
basis for innovative assistance systems that recognize pedestrians, assist drivers as they pass
through blind crossings or navigate narrow highway construction sites. The research team
from Sindelfingen hopes that their innovations will find widespread acceptance in the
automotive industry – so that as many road users as possible are provided with an additional
safety feature. In attempt to ensure that this is the case, the company plans to make the
technology available to other manufacturers. 6D vision has the potential to revolutionize
electronic vision not only in cars, but also in service robots that act independently. These
robots are designed to serve as household helpers or to assist in caring for the infirm. To do
so, they must be able to monitor their surroundings and to recognize where and how their
charge moves around. The six-dimensional look at the world provided by automotive
research makes it possible.
When cars and pedestrians meet it’s rarely the pedestrian that gets the better of the
collision. This is why Continental has announced a stereo camera system designed to help
prevent this type of accident. (“Two Eyes Are Better Than One – The Stereo Camera”)
The system uses two CMOS cameras mounted 20cm (8 inches) apart and facing through the
windshield. This separation apparently allows the distance of an object in the 20 to 30 meter
range to be determined within 20 to 30cm (that’s 8 to 12 inches.) Stereo vision is becoming
quite well known for tasks like robot guidance, so I guess it’s a logical extension to move it
into an unconstrained urban environment. What’s not clear to me is how the system works
at night. Perhaps there’s a passive IR illumination system built-in? I also wonder how it will
deal with rain, frost or snow. In fact, why not just go with a Kinect-style IR pattern projection
approach?
“It detects objects and people who move within the visual field that a person with no visual
pathologies would have,” said Professor Vergaz, leader of the research team who has
developed the “intelligent” goggles. “Very often the patient does not detect them due to
problems of contrast. The information regarding depth is what is most missed by patients
who use this type of technical aid.”
Minoru 3D Webcam
An alternative would be to just get a 3D webcamera that has the two lenses already fixed
and software that will synchronize them, so you will only need to capture the output in the
right 3D format. Such a 3D webcam is Minoru for example:
The latest Fujifilm FinePix Real 3D W1 digital camera represents an advanced developed
Stereo Vision Camera of digital still stereo images as shown in the following figure image.
The Fujifilm 3D Stereo Digital Camera acts like a digital human vision system, Figure…. In the
following diagrams in Figure…, shows how the screen sends different images to each eye,
similar to how we see reality, an eye go with a different angle to another that causes the
brain to interpret a sense of depth.
Figure 34. Fujifilm 3D Stereo Digital Camera as a Digital Human Vision System
The key advantage of the Fujifilm FinePix Real 3D W1 is that it can record both in 2D and in
3D (still images and movies). The Real 3D W1 camera uses two separate Fujinon lenses along
with two 10 megapixel CCD sensors to be able to simultaneously take two images just like
the eyes of a normal human see everything. The two images can then be combined into a 3D
stereo picture with the help of the RP (Real Photo) Processor 3D or into a 3D stereo video
clip with resolution of up to 640×480. On the back of the camera a 2.8-inch LCD screen is
available that is capable of displaying both 2D and 3D content so that you will be able to
preview the images and videos you shoot without having to worry if they are in 2D or 3D
mode. Another interesting feature of the camera, when not shooting in 3D mode is the
ability to take two 2D pictures with different settings in order to get the best result without
having to take two different pictures one after the other. And you get all that in a compact
and a good-looking camera that is very easy to use in taking both 2D or 3D images or even
shooting videos
The Sony 3D Stereo Camera is also a digital 3D Stereo Camera of Stereo still images [26],
[27]. Sony Bloggie 3D can record 1080p 3D videos via its two sensors/lenses and also has a
2.4-inch glasses-free 3D LCD. The Sony 3D Stereo Camera brings two Exmor- CMOS sensor
with spatial resolution of 1920x1080p MP4 Full-HD format at a contrast ration 16:9.
Sony Bloggie 3D camera certainly is one of the latest advanced stereo vision cameras for
capturing 2D and 3D still images at Full HD 1080p (1920pX1080p) based on the advanced
H.264/MPEG-4 AVC technology. In particular the Sony 3D Camera captures high-definition
2D or 3D videos with the MHS-FS3 3D Bloggie HD camera. 3D videos can be played directly
on the 2.4-inch LCD screen without any special 3D glasses required. Or watch the 3D fun
with others by connecting your Bloggie camera directly to any compatible 3D HDTV.Watch
all 3D video in 2D as well. The Bloggie 3D comes with dual lenses for recording high-
definition 3D video and still images. Playback your 3D content via a 3D capable HDMI cable
(sold separately) and compatible 3D HDTV right from your camera.The LCD is also 3D ready,
enabling you to view your 3D content directly on the camera’s LCD without the need for 3D
glasses.The Bloggie 3D camera lets you record your favorite moments in High Definition MP4
(H.264) format and features a 5MP CMOS sensor that lets you take crisp 5MP still images
(2D) and 2MP 3D still photos.The 2.4-inch screen LCD screen will rotate its orie tation
automatically; however you hold the camera – horizontally, vertically, even upside down.
Features :
The previous generation of Sony CineAlta CCD-based Super 35mm cameras used linear RGB
pattern
The dual camera's setup can be adjusted at horizontal/vertical direction as well as the
distance between two cameras. This provides more flexibilities for camera's alignment, 3D
depth adjustment and base object distance selection. Besides it still keeps the two webcams
as two separate devices, much better than fixed single 3D webcam's design in terms of
value and functionalities. And the 12 different video mixing modes provide multiple 2D
and
3D viewing methods besides anaglyph 3D.
AKC D32 3D Stereo Web-camera, with a microphone, stereo 3D/2D camera, 3d webcam
This 3D webcam [28] connects easily to your PCs USB port just like any other webcam but
that's where the similarities end. The 3D webcam software has stereoscopic anaglyphic
processing that lets you be seen in three dimensions. The red and cyan anaglyph image
produced by it can be viewed by anybody who is wearing commonly available red and cyan
3D glasses (one pair included). The 3D webcam can also be used as a standard 2D webcam
for anyone who doesn't have the 3D glasses at hand. There is also a Picture in Picture
function allowing you to show the two images separately.
True3Di 3D stereoscopic microscope system [32] is made with the hardware and software,
microscopic monitor (true3di), and microscopic software (SIDP). It can be used for anatomy
operation or high precision electronic parts assembly. It provides accurate distance and
depth perception for precise and safe operation.
For 3d stereoscopic display enables many viewers to watch the screen simultaneously.
On each scope, SM-045 has 2 CCD cameras to transfer the images to the monitor rather than
looking at the small scope.
Lakis Christodoulou, All Rights Reserved ® 2013 , CopyRights© 2013 30-Oct-2013
Electrical & Computer Engineering & Computer Science
Lakis Christodoulou FINAL
Professional, complete and modular Stereoscopic solutions for MIS Medical market.
Manual 3D stereoscopic vision system and integration with best surgical robots.
• 3D Stereoscopic FDA approved rigid optical scopes (10mm, 5mm(in development))
• Multiple format 3D stereoscopic Full HD visualization in real-time
• Recording and playback of Full HD 3D stereoscopic images in real-time.
• Full high definition 3D Medical Grade camera (Six 1/2" Sensors - 30fps at
1920x1080p)
• Stereoscopic 3D displays from 19" to 300", from single viewer with no glasses to
multiple viewers in full high definition (1920x1080 per each eye).
• 3D Stereoscopic HD Video Image Streaming and Satellite transmission in Real-Time
[34] http://www.solid-look.com/
[36]
2k Full HD cameras compliant on base in synch.
Creating a stereo Full HD camera system.
In 2006-2007, the research group of the Multimedia Lab (MLAB), at Florida Atlantic
University has carried a large scale 3D Vision Imaging Systems with the start of multiple R&D
projects based on 3D stereo and multi-view vision systems based on advanced 3D image-
video processing algorithms, multi-view video coding and interpolation techniques, and 3D
Stereo Depth image processing and analysis [37], [38].
The Center for Coastline Security Technology (CCST) focuses on research, simulation, and
evaluation of coastal defense and marine domain awareness equipment, sensors and
components. It builds upon the existing efforts and expertise in coastal systems and sensor
research at the Institute for Ocean and Systems Engineering (IOSE), the NASA Imaging
Technology Center, the Department of Computer Science and Engineering and the
University Consortium for Intermodal Transportation Safety and Security at Florida Atlantic
University.
Figure 46 shows the general architecture of the CCST 3D video system. The stereo views are
encoded at the sender by exploiting the large amount of redundancies among the views.
This asymmetric view coding provides the opportunity to exploit and achieve a range of
video quality, in which the quality of each view can be dropped, so that can reduce the video
bit rate without significantly affecting the 3D observation’s quality. The MLab research group
has used the H.264/MPEG-4 AVC standard as the core compression engine with inter-view
prediction to increase compression efficiency. The coded views are communicated to the
receiver while the decoded views are rendered on an appropriate display. The 3D displays
use a pair of coded views (left and right view) to display 3D video with depth perception.
The Sharp LL-151-3D autostereoscopic display was used to render the stereoscopic videos.
The display is 15-inches, XGA resolution (1024 by 768 pixels). This display which uses
lenticular imaging techniques and renders depth very accurately gives a true 3D experience.
The perception of depth is achieved by a parallax barrier that diverts different patterns of
light to the left and right eye.
For a low-cost 3D Stereo Vision Camera System the MLab research group has utilized the
Panasonic SD/HD Camera Video-recorders ….
The MLab-CCST research group has developed a 3D/multi-view video coding system Fig…
with an initial focus on security and surveillance. The goal of this project is to develop
technologies and tool for efficient compression, communication, and playback of multi-view
and 3D video.
Figure 48 shows the general architecture of a multi-view video system. The multiple views
are encoded at the sender by exploiting the large amount of redundancies among the views.
We use H.264 as the core compression engine with inter-view prediction to increase
compression efficiency. The coded views are communicated to the receiver where the
decoded views are rendered on an appropriate display. The 3D displays use a pair of coded
views to display 3D video with depth perception.
The main technical objectives focus of the research analysis and development of the 3D
Stereo/Multi-view Camera System was to support and advance the following:
• 3D Video Compression
• 3D video improvement of surveillance applications
• Efficient compression of stereo views
• Development of a 3D video player for Sharp autostereoscopic displays – no glasses
required
• Asymmetric video encoding that exploits the human visual system
• One of the stereo views can be coded at lower quality
• Multi-view Video Coding
• Arrays of cameras record the same scene
• Video compressed by exploiting redundancies among views
The pair of color HDMAX cameras Figure 50 was used for the 3D video imaging system
based on the Ultra high resolution (UHR) camera system’s development supported by NASA
Imaging research group at FAU, US NAVY, and private industry has resulted in QUAD HD
CMOS Camera System with 8 times resolution of HDTV at 60 FPS – no commercial camera in
the world is available with comparable specifications.
The main technical objectives of the CCST Research group was to develop a new advanced
HDMAX High-Resolution QUAD HD Progressive Scan Electronic Camera System Figure 49
with to support 3D Imaging and 3D Video Technologies for Coastline Security Applications.
Two Quad HD cameras equipped with 50mm lenses as shown in Fig… were mounted side by
side and polarization-coded 3-D imagery was projected onto the screen with good results.
The 3D Stereo Quad HD-MAX Camera system in Figure 50 is defined by the following
research technologies:
• Thickness variations in the linear polarizers were found to introduce aberrations in the
imaging system with a subsequent reduction in image sharpness. Thinner, optically flatter
polarizing film was ordered and tested with positive results.
• Expected differences in the psychological impact of the 3D imagery were observed
when camera separation and angular orientation were changed.
• The 3-D imaging system, combining two cameras with the two projectors, was tested
on multiple occasions with different groups of people as observers.
Figure 50. The pair of color HDMAX cameras used for the
NASA-CCST 3D Stereo Video Imaging system.
Lakis Christodoulou, All Rights Reserved ® 2013 , CopyRights© 2013 30-Oct-2013
Electrical & Computer Engineering & Computer Science
Lakis Christodoulou FINAL
3D viewing for 3D view-play back and real time view-experience was accomplished with the
use of the 3D Stereo HDMAX Camera System that was interfaced with the and Sony SRX-
R105 projector configuration. The 3D Stereo HDMAX Camera System was capable to deliver
3D imaging at the NASA Imaging Technology Center and the MLab research group at Florida
Atlantic University. This configuration gave pleasing results, although it does not provide full
realism in the display because of discrepancies between the apparent distance to an object
as determined by the angular subtense of the object and the apparent distance to the object
as determined by the angular convergence of the observer’s two eyes. In simple terms, it
distorts depth somewhat. Basic configuration information is as follows.
The objective of this segment of the project was to develop a high definition, high frame rate
color 3D Stereo HDMAX Camera System for surveillance. A 3840x2160 30P color Stereo
HDMAX CMOS camera with variable frame rate and remotely controlled infrared filter
changers was designed, fabricated, tested and demonstrated. The camera gathers 50 times
the amount of information in its field of view as with standard resolution video cameras.
Camera Positioning
The two HDMAX cameras, equipped with their standard 50-mm focal length Canon
camera lenses, are positioned side-by-side and separated by approximately 4 inches. At this
separation, the cameras will produce 3D imagery that exaggerates somewhat the depth of
the scene as viewed on the projection screen. The cameras should both be level and at the
same elevation. They are then aligned such that the optical axes of the cameras converge
approximately 12 feet in front of the camera pair.
Figure 51.a Pt. Hueneme image test target at 3 miles – long shot
Figure 51.b Pt. Hueneme image test target at 3 Figure 51.c Pt. Hueneme image test target at 3
miles – 5x zoom miles – 10x zoom
Summary
This section provides a technical documentation and an introduction overview of the current
research activities in the field of 3D Stereo image and video analysis algorithms for human-
object segmentation, detection, recognition, and tracking system. The current system is
worked out at Cyprus University of Technology by Lakis Christodoulou and Takis Kasparis.
The Multimedia image-video processing and analysis research group at the Dept. of
Electrical and Computer Engineering and Computer Science (C&EC&CS) at CUT has been
focused on developing robust techniques and methodologies, advanced and innovative
statistical adaptive image-video processing algorithms for 3D stereo video investigating and
developing techniques, and technologies needed to create and analyze 3D images and 3D
videos provided by a low cost stereo standard-definition (SD) and high-definition (HD)
cameras with the specific focus on human surveillance, boarder and coastline security
applications.
Figure 4.1.1 illustrates how these research efforts fit in the overall surveillance system based
on a low cost stereo vision webcam mounted high-definition camera.
For our 3D Stereo Vision Camera system we propose the use of a low-cost Stereo Vision
System solution
No 3D Auto-stereoscopic Display!
3D Video content evaluation based on 3D Depth video processing and display on 2D
format
Software Specs
MATLAB R2009a V.7.2.
Image Processing Toolbox
Signal Processing Toolbox
Algorithm Matlab m-script development
Introduction
The current research project proposes a novel hybrid motion object detection and
segmentation algorithm based on a statistical and adaptive threshold. Moving object
detection and segmentation is very important in intelligent video surveillance. The main
motivation of this research work is to overcome current technical difficulties of existing
motion and segmentation techniques, and realize efficient and fast detection and
segmentation algorithm. The actual driven-motivation is to use the proposed hybrid motion
object detection and segmentation algorithm for an introduced 3D Stereo/Multiview image
sensor system introducing multi-data fusion and modeling for smart video surveillance and
motion object monitoring. The introduced new hybrid object detection and segmentation
would be deployed and used in a prototype 3D Stereo vision system for automatic and
intelligent surveillance and monitoring that will provide object recognition and tracking. In
the case of stereo or multi-view video capturing, recording, processing, and analyzing it is
very important to develop and build efficient, robust, and fast detection and segmentation
algorithms. The research project will focus on developing intelligent, biologically inspired
image and video analysis algorithms that are capable of performing relevant human or other
object motion surveillance tasks based on visual information acquired from one and more
cameras.
We are introducing a smart and efficient algorithm based on motion detection, foreground
and background segmentation, using DSP and adaptive threshold techniques that are
superior of existing conventional motion object detection and segmentation algorithms. The
proposed algorithm is based on a hybrid motion technique that relies on the three frame
differencing and on statistical quantities, such as the mean and the variance. Also, the
algorithm shows a foreground-background segmentation methodology that is combined
with a moving detection algorithm. The hybrid motion algorithm has been tested and
verified for gate entrance and access control for human object surveillance system.
Experimental results show the improved hybrid motion algorithm overcomes the technical
difficulties of the three frame-differencing method. The hybrid motion algorithm has a low
computational complexity, a high detection segmentation accuracy rate, a fast and
computational processing speed. This methodology is also providing a low-cost web-camera
solution for visual surveillance and automated monitoring applications, efficient and robust
for video analytics. The main benefit is the development of a novel hybrid motion detection
and segmentation video object algorithm based on adaptive and statistical DSP algorithms.
Future work involves additional adaptive motion detection and probability techniques,
multi-object detection-segmentation-recognition, and incorporating 3D Stereo/Multi-view
Work Done
Work Done
Camera Environment Modeling
Object Segmentation
Object Detection
Current Status
Disparity Map
3D Depth Map
Correspondence Matching Algorithms
3D Stereo Vision Algorithms
Evaluation Algorithms
Stereo to Left and Right video channel conversion
Next Targets
Object Classification
Object Tracking
Multi-Object Detection
Future Work
Design and implement a Novel IR Infrared Stereo Vision camera system with
• The Development of a Remotely Piloted for Boarder and Coastline Surveillance
• Infrared High Definition High Frame-Rate Color Stereo Camera for Surveillance
• Video Analysis and Image and Video Data Mining
• Surveillance, Monitoring, and Detection Simulation
Lakis Christodoulou, All Rights Reserved ® 2013 , CopyRights© 2013 30-Oct-2013
Electrical & Computer Engineering & Computer Science
Lakis Christodoulou FINAL
3D VISION CAMERAS
Solution to challenges: Range/3D vision camera
3D vision camera: 2D camera + depth perception
Low cost: one sensor vs. current systems:
-SRR 24GHz (30m) + LLR 77GHz (150m)
-LRR (150m) + stereo camera (25m)
-Laser radar (40m)
High performance: combined ranging detection + obstacle classification (2D pixel array)Two
main techniques:
Stereo vision
Time of flight (TOF)
3D VISION CAMERAS –STEREO VISION Range measurement using stereo vision:
▸Depth perception achieved by comparing and elaborating the scene from two different
points of view
▸Main drawbacks:
-2 sensors
-Very complex elaboration system (correspondence problem)
-Weak systems (alignment)
Research Literature
Statistical
Adaptive
Threshold
In 2011 we have presented the research work on ADVANCED STATISTICAL AND ADAPTIVE
THRESHOLD TECHNIQUES FOR MOVING OBJECT DETECTION AND SEGMENTATION at the
DSP2011 International Conference.
The proposed statistical adaptive algorithm is implemented in four main stages of algorithm:
i) statistical analysis and computation based on the spatial content of the current frame ii)
temporal n-frame differencing consisted of the two-frame and three-frame-differencing
method for object detection, ii) adaptive threshold based on robust statistical quantities’
computation as described in (i) and depending on the temporal differencing taking into
account the variations of the moving pixels, and iv) foreground moving object and
background non-moving object segmentation based on the statistical comparison evaluation
of the two-frame and three-frame differencing. Figure 1 shows the analytical processing
stages of the statistical and adaptive algorithm for motion object detection and
segmentation. The discussion and analysis of the algorithm is divided into four subsections.
i col. pixels Image Pre-processing
Classic Statistical
j row
Threshold Analysis &
Single- Three-
Frame Diff. Frame Diff.
n0 n1 n2 n3 nF-1
Adaptive
F&B-ground
Frame n-1
Frame n
Frame n+1
Frame F-3
Frame F-2
FrameDiff_I: n-(n-1) Frame F-1
FrameDiff_II: (n+1)-1
Video Input
Block1
Block3 Block4
Statistical Adaptive Object
Object Threshold
Segmentation
Video Input
Pre-processing
Hybrid
Block2
Motion
Block3 Block4
Statistical Adaptive
Object Object
Threshold
Segmentation
Lakis Christodoulou, Takis Kasparis, and Christos Loizou, A Novel Hybrid Motion Object
Detection and Segmentation Algorithm based on a Statistical and Adaptive threshold, Dep.
of Electrical Engineering & Information Technology, (EE&IT), Cyprus University of
Technology, Intercollege, Dep. of Computer Science, School of Sciences, Limassol, Cyprus,
4th Cyprus Workshop on Signal Processing and Informatics University of Cyprus, Nicosia,
Cyprus, New Campus, THEE001 ROOM 148, July 14, 2011
http://cwspi2011.cs.ucy.ac.cy
http://www.dsp-conferences.info/4th%20CWSPI%20Final%20Program-rev.pdf
Left Camera
Left
Side IA&P
Side to Video
Left and
Right Right IA&P
Video
Right Camera
Right Image
Left
Video
Algorithm
Analysis &
IA&P: Image Analysis & Preprocessing
[ICCV_01MARCH2011.doc]
Published by Hindawi
http://www.hindawi.com/journals/ivp/
[ICCV_01MARCH2011.doc]
ICCV_01MARCH2011.doc
References
[1] http://www.chattenassociates.com/content/harv-isr-gimbal
[2] http://www.chattenassociates.com/content/harv-3d-hd-stereo-vision-system
[3] http://www.e-consystems.com/Stereo-Vision-Camera.asp
[4] http://www.ptgrey.com/products/stereo.asp
[5] http://sharp-world.com/corporate/news/100512.html
[6] http://qvcorp.com/#loc=testimonials
[7] http://www.roadnarrows.com/store/e-con-cameras-solutions/capella-stereo-vision-
camera.html
[8] http://astrobotic.net/2011/02/21/stereo-vision-for-3d-mapping-and-navigation/
[10] http://www.tyzx.com/products/DeepSeaG2.html
[11] http://www.mobilerobots.com/researchrobots/accessories/mobilerangerc3d.aspx
Lakis Christodoulou, All Rights Reserved ® 2013 , CopyRights© 2013 30-Oct-2013
Electrical & Computer Engineering & Computer Science
Lakis Christodoulou FINAL
[12] http://www.focusrobotics.com/products/systems.html
[14] http://www.videredesign.com/index.php?id=1
http://www.videredesign.com/assets/docs/manuals/smallv-4.4d.pdf
[17] http://www.mercedes-benzsa.co.za/media-room/news/15032386811/daimler-6d-
vision-technology-as-one-of-the-most-promising-innovations-in-germany/
[18] Uwe Franke, Clemens Rabe, Hernan Badino, and Stefan Gehrig
6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception
DaimlerChrysler AG, 70546 Stuttgart, Germany
fuwe.franke,clemens.rabe,hernan.badino,stefan.gehrigg@daimlerchrysler.com
[19] http://www.deutscher-zukunftspreis.de/en/nominierter/6d-vision-%E2%80%93-
recognizing-danger-faster-humans
[20] http://www.6d-vision.com/
[21] http://3dvision-blog.com/tag/fujifilm-3d-camera/
[22] http://www.fujifilm.com/products/3d/camera/finepix_real3dw1/
[23] www.drt3d.com/W06-Fuji3d.pdf
[24]http://www.digitalcamerareview.com/default.asp?newsID=4756&review=sony+bloggie+
3d
[26] http://thetechjournal.com/electronics/camera-electronics/sony-bloggie-3d-now-
available.xhtml#ixzz1vUmcstDu
[27] http://www.docs.sony.com/release/MHSFS3_handbook.pdf
[28] http://www.hytekautomation.ca/BNE001.aspx?productId=20
[29] http://www.aliexpress.com/product-fm/513577626-Free-shipping-popular-3D-
webcam-3D-stereo-camera-with-a-microphone-stereo-3D-2D-camera-
wholesalers.html?tracelog=back_to_detail_a
[30] http://www.htc.com/www/smartphones/htc-evo-3d/
[31] http://dl3.htc.com/htc_na/user_guides/htc-evo-3d-sprint-ug.pdf
[32] http://www.true3di.com/3d-microscopic.html
[33]http://www.quantificare.com/index.php?option=com_content&view=article&id=9&Ite
mid=53
[34] http://www.solid-look.com/
[36] http://www.surveyor.com/stereo/stereo_info.html
[37] Lakis Christodoulou, Liam M Mayron, Hari Kalva, Oge Marques, Borko Furhtin, ‘‘3D TV
Using MPEG-2 and H . 264 View Coding and Autostereoscopic Displays’’, Proceedings of the
14th annual ACM international conference on Multimedia(2006).
[38] Lakis Christodoulou, Hari Kalva, Liam Mayron, Oge Marques, and Borko Furht
CHALLENGES AND OPPORTUNITIES IN VIDEO CODING FOR 3D TV’, Dept. of Computer
Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431
[39] Glennoah Billie, ‘Microsoft Kinect Sensor Evaluation’, NASA USRP – Internship Final
Report Johnson Space Center 1 8/5/2011, Southwestern Indian Polytechnic Institute,
Albuquerque, New Mexico, 87184
[40] Daria NITESCU, Denis Lalanne, and Matthias Schwaller, ‘EVALUATION OF POINTING
STRATEGIES FOR MICROSOFT KINECT SENSOR DEVICE’, FINAL PROJECT REPORT, University of
Bern, University of Neuchatel, University of Fribourg, 14 February 2012
[41] http://www.digitalmanu.com/pr-e.htm
[42] True-View™ – The 3D Device for Your Smartphone
http://www.bornrich.com/true-view-the-3d-device-for-your-smartphone.html