Sunteți pe pagina 1din 8

A Survey on Autostereogram

Sweta.Kinagi1, Pooja.Deshpande2 Gogte Institute of Technology, Udyambag, Belgaum- 590 008, Karnataka - India E-mail: 1 swetakinagi@gmail.com; poooo.bd@gmail.com2;

ABSTRACT:

With

the

development

of

3D

The depth map generation methods are roughly divided into two kinds: the active methods and the passive methods. The active methods include stereo camera with structured light, time-of-flight sensor, depth from defocus, etc. The passive methods include depth from motion, depth from shading, depth from texture, depth from geometry perspective etc. As for the traditional 2D images to 3D conversion, the passive methods are used. There is lot of research going on to obtain 3D form from a 2D video [2]. The use of a depth map as part of the 2D to 3D conversion process has a number of desirable characteristics: 1. The resolution of the depth map may be lower than that of the associated 2D image. 2. It can be highly compressed. 3. 2D compatibility is maintained and 4. Real time generation of stereo, or multiple stereo pairs, is possible. II. LITERATURE REVIEW S. Battiato et al [10] summarizes the ways of generation of a depth map, which is based on several steps, including gradient planes generation, depth gradient assignment, consistency verification of detected region, and finally depth map generation. It combines the depth from texture with depth from geometry perspective to achieve a depth map generation from pictorial cue. There are three depth cues in human vision: the pictorial cue, the monocular cue, and the binocular cue. For 2D video to 3D conversion, the binocular cue can only be used

technologies, the conversion of existing 2D videos to 3D videos has become an important component of 3D content production. One of the key steps in 2D to 3D conversion is how to generate a dense depth map. However, for 2D to 3D video conversion, the depth map is constructed by estimating the relative depth for each pixel, and thus a depth value is assigned for each point in a depth map. In a 3D video generation flow, depth map plays an important role [1, 7, 8]. The survey describes some of the existing depth map generation methods for 2D to 3D conversion by various computer vision research communities across the world. I. INTRODUCTION The 3D image processing has become a trend in the related visual processing fields. The video object modeling and depth map generation requires lots of visual knowledge about depth cues which are the main parts for the depth map generation. A depth map as in [1] is a 2D function that gives the depth (with respect to the viewpoint) of an object point as a function of the image coordinates. Usually, it is represented as a gray level image with the intensity of each pixel registering its depth. The laser element emits a light wall towards the real world scene, which hits the objects in the scene and reflected back. This is subsequently registered and used for the construction of a depth map.

by depth from motion. If theres no motion area in the 2D video, the pictorial and monocular cues are utilized. In [2] researchers have discussed short term motion assisted color segmentation for generating depth map which is temporally and spatially smooth. The method contains four parts: motion/edge detection, K-means algorithm for color segmentation, connected component, and motion/image segment adaptation. The Short-Term Motion Assisted Color Segmentation. The flow is as shown in Fig. 1. The current frame is processed by the K-means algorithm and the motion/edge detection parts. The frame buffer stores the previous frame information, and the information is sent into the motion and edge detection part for motion extraction. After the K means procedure, the connected component algorithm is implemented to get the color silhouette of the objects. And the edge registration part keeps the motion and edge information in the memory. Finally, the motion/image segment adaptation part combines the data of the connected components and the motion/edge information into object segments. The object segments are then converted to depth map.

video sequence is a complex scene with people walking on the stage. This method separates the moving people from the complex stage. The children video sequence shows two boys playing the ball. The two boys are correctly separated, but the shadows on the floor are not cancelled. The last two pictures show the results of the weather video sequence. The announcer cannot be separated without background registration in the traditional online short-term segmentation methods. The announcer can be distinguished from the synthetic background.

(a)

(b)

(c)

(d)

(e)

(f)

Fig.2. Subjective view results (a) Original Akko & Kayo sequence (b) The depth map of Akko & Kayo (c) Original Children sequence (d) The depth map of Children (e) Original Weather sequence (f) The depth map of Weather. Fig.1. Short-Term Motion Assisted Color Segmentation The subjective view results are shown in Fig. 2 (a), (b), (c), (d), (e), (f). The Akko & Kayo Thus, for the depth map generation on a 3Dtechnology system, the depth maps are generated smoothly both in the spatial and temporal domain

Based on 2D wavelet analysis of Lipschitz regularity for defocus estimation on edges, and this method can effectively eliminate the horizontal stripes in the depth map resulted from traditional onedimensional wavelet based approaches.

Step5:

Color-based

segmentation

and

depth

optimization in each homogeneous color segment is done. This helps to optimize the depth of foreground regions with low frequency energy and smooth the depth map in each homogeneous color region. In their experiments, first they compared their method with the one of S. A. Valencia et.al using the color image provided by S. A. Valencia et.al, as shown in Fig. (g). The final results are shown in Fig. (h) and Fig. (i) respectively. From the figures it can be observed that their result avoids the horizontal stripes and well retains the object edges. The foreground details such as the trees are also recognized with their relative depths. However, errors still happen in some background regions since they are surrounded by the focused edges. Segmentation parameters influence the final depth yet. Segments which coincide with the edges detected by the Lipschitz singularity have good results. According to [4] the depth extraction method based on motion and geometric information for 2D to 3D conversion, has two major depth extraction modules, the depth from motion and depth from geometrical perspective. The H.264 motion estimation result is utilized and cooperates with moving object detection to diminish block effect and generates a motion-based depth map. On the other hand, a geometry-based depth map is generated by edge detection and Hough transform. Finally, the motion-based depth map and the geometry-based depth map are integrated into one depth map by a depth fusion algorithm. For their experimental setup a 352288 size and H.264 encoded test stream named Hall Monitor with dynamic foreground objects in front of the static background was used. Fig. 4 shows the overall depth extraction process.

Fig. 3. (g) Original image; (h) Final depth map. (i) The final depth map after optimization The main steps of the algorithm are as follows: Step1: In this step, the frequency energy of the local regions based on the wave transforms in local block windows is analyzed. For each point, its local window is created with the point as its center. The size of each window is N X N. The number of the non-zero coefficients in the high frequency wavelet bands (the LH, HL, and HH bands) showed how much the details are not blurred, and therefore gave a relative depth value. The range of depth is adjusted from 0 to 255 (0 denotes black and 255 denotes white in the depth map). More nonzero coefficients correspond to larger depth value, which indicates nearer in distance. Step 2: Edge focus analysis based on the Lipschitz exponents in 2-D wavelet is performed. This method retains the edge contour with a non-striped depth map. Step 3: Edge points connection. Edge enhancement helps to form complete edges and lessen the errors caused by edge discontinuities when refining the depth on the basis of Lipschitz exponents. Step 4: Depth map is refined according to the Lipschitz exponents on edges in 2-D wavelet.

The MLA is first trained by using the samples along with the known depth The MLA then adjusts its internal configuration to learn the (a) (b) (c) relationships between the samples and their associated depth.

(d)

(e)

(f)

Fig.4: The overall depth extraction process. (a) The original 2D image, (b) the binarization result of moving objects detection, (c) the modified motionbased depth map, (d) mainlines, (e) the geometrybased depth map, (f) the final fused depth map. According to [5] the manual way of generating the depth maps such as hand drawn object outlines manually associated with an artistically chosen depth value; and semi-automatic outlining with corrections made manually by an operator are slow, expensive, time consuming and unreliable where complex outlines are encountered. Therefore [5] have developed an efficient interactive or semiautomated process in which a special effects artist guides the generation of depth maps using a machine learning algorithm (MLA), for rapid 2D to 3D conversion. MLA: A MLA can be considered as a black box that is trained to learn the relationships between a set of inputs and a set of outputs. As such, most MLAs consist of two stages, training and classification. For their application of MLAs the inputs relate to the position and color of individual pixels. And they have defined the 5 inputs of a pixel as: x, y, r, g, and b. Where x and y represent the Cartesian coordinates and r, g, b respectively represent the red, green and blue color components of any given pixel. The output of the MLA is the depth of a pixel, which is denoted by the output z. After training, next comes is classification. During classification the MLA is presented with the samples with unknown depth values. MLA here uses the relationships established during the training to determine an output depth value. The learning process is applied in two related phases of the rapid 2D to 3D conversion process: 1. Depth mapping: Assigning depths to key frames 2. Depth tweening: Generating depth maps for frames between the key frames mapped in the previous phase Depth Mapping During the depth mapping phase of rapid 2D to 3D conversion the MLA is applied to a single key frame. Figure 5 indicates how an MLA provided with a relatively small number of training samples, as indicated by the depth colored dots on the source frame, can generate an accurate depth map. In this instance 623 samples were used this represents approximately 0.2% of the total number of pixels in the image. In more complex scenes additional training data is required, but it is rarely necessary to supply more than 5% of the image as training samples to achieve an acceptable result. In this example, the results from the MLA are composited on top of a perspective depth ramp by

adding a horizon line. Depth maps are median filtered and smoothed to reduce stereoscopic rendering artifacts.

completed depth map derived from the MLA with added depth ramp. Depth Tweening Depth maps are generated for key frames using the process described above. These frames are strategically located at points in an image sequence where there is significant change in the color and/or position of objects. Key frames may be identified manually, or techniques used for detecting shot transitions, may be used to automate this process. During the Depth Tweening phase of the rapid conversion process MLAs are used to generate depth maps for each frame between any two existing key frames. As indicated, a separate MLA is trained for each key frame source and depth pair. For any other frame in the sequence the x, y, r, g, b values are input into both MLAs and the resulting depths (z1 and z2) are combined using a normalized time-weighted sum As indicated, a separate MLA is trained for each key frame source and depth pair. For any other frame in the sequence the x, y, r, g, b values are input into both MLAs and the resulting depths (z1 and z2) are combined using a normalized time-weighted sum:

(a)

(b) Fig.5. An example source frame1 the dots indicate the position of training samples. The color of the dots indicates the depth associated with the pixel. A horizon line may be used to add depth ramps. The

Fig.6. An illustration of the depth tweening process. At each key frame an MLA is trained using the known depth map of the source image. At any given tween frame the results of these MLAs are combined to generated a tweened depth map.

This 43 frame sequence was successfully .. (1) depth mapped by providing around 8,000 training samples over the 3 key frames. This represents only 0.05% of the total number of pixels depth mapped in ........................ (2) this sequence. III. APPLICATIONS . (3) f is the time-code of the frame under consideration, k1 is the time-code of the first key frame and k2 is the time-code of the second key frame. The parameter P is used to control the rate at which the influence of a MLA decays with time. On a short sequence consisting of 43 frames, three key frames (at frames 1, 14 and 43) were depth mapped and the remaining frames were converted by depth tweening. 2D to 3D conversion has its applications in wide range of fields like

Robot vision
Smart phone applications

Animations
Interactive marketing and communications,

3D CAD drafting etc.

(a)

Fig.7. Source (left) and depth map (right) generated by depth tweening at frame 6. Fig. 7 shows the depth map generated from tweening at frame 6 using the key frames at frame positions 1 and 14. The frames that are furthest away from a key frame generally contain the most errors as the difference between the source at training and classification is highest. The depth map in fig. 7 accurately represents the major structure of the scene, although there are misclassification errors between the oarsmans head and the background. Similarly, fig. 8 shows the depth map generated by tweening at frame 32 using the key frames at frame 14 and 43. IV. CONCLUSION A variety of depth map generation methods/techniques have been developed over the years. The results of these 2D to 3D conversion algorithms are 3D coordinates of a small set of points (b) Fig. 8. Source (a) and depth map (b) generated by depth tweening at frame 32.

in the images. This group of algorithms is less suitable for the most popular 3D television application. The depth cues based on multiple images yield in general more accurate results, while the depth cues based on single still image are more versatile. A single solution to convert the entire class of 2D images to 3D models does not exist. Combing depth cues enhances the accuracy of the results. It has been observed that machine learning is a new and promising research direction in 2D to 3D conversion. There is a need to explore the alternatives than to confine ourselves only in the conventional methods based on depth maps. References [1] Q. Wei, Converting 2D to 3D: A Survey, http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.100.2308. [2] Yu-Lin Chang, Chih-Ying Fang, Li-Fu Ding, Shao-Yi Chen, and Liang-Gee Chen, Depth Map Generation for 2D-to-3D Conversion by Short-term Motion Assisted Color Segmentation, Proceedings of the IEEE International Conference on Multimedia and Expo, July 2007. [3] Ge Guo Nan Zhang, Longshe Huo, Wen Gao, 2D to 3D Conversion Based On Edge Defocus And Segmentation, ICASSP 2008. [4] Xiaojun Huang, Lianghao Wang, Junjun Huang, Dongxiao Li, Ming Zhang, A Depth Extraction Method Based On Motion and Geometry for 2D to 3D Conversion, Third International Symposium on Intelligent Information Technology Application , IEEE, 2009. [5] Phil Harman, Julien Flack, Simon Fox, Mark Dowley, Rapid 2D to 3D Conversion, Proceedings of SPIE Volume 4660 Stereoscopic Displays and Virtual Reality Systems IX, San Jose, CA; 23 May 2002; p. 78-86; ISBN / ISSN: 9780819444004

[6] S. A. Valencia, R. M. Rodrguez Dagnino, Synthesizing Stereo 3D Views from Focus Cues in Monoscopic 2D images, Proc. SPIE, vol. 5006, pp. 377-388, 2003. [7] Xiaojun Huang, Lianghao Wang, Junjun Huang, Dongxiao Li, Ming Zhang, A Depth Extraction Method Based On Motion and Geometry for 2D to 3D Conversion, Third International Symposium on Intelligent IEEE, 2009. [8] Li Sisi, Wang Fei, Liu Wei, The Overview of 2D to 3D Conversion System, 11th IEEE International Conference on Computer-Aided Industrial Design & Conceptual Design (CAIDCD-20), 17-Nov-2010, ISBN: 978-1-4244-7974-0/10. [9] Chenglei Wu, Guihua Er, Xudong Xie, Tao Li, Xun Cao, Qionghai Dai, A Novel Method for Semiautomatic 2D to 3D Video Conversion, 3DTV Conference: The True Vision Capture, Transmission and Display of 3D Video, May 28-30, 2008, Istanbul, Turkey. [10] S. Battiatoa, S. Curtib, M. La Casciac, M. Tortorac, E. Scordato, Depth-Map Generation by Image Classification, http://wenku.baidu.com/view/ d9bca4c69ec3d5bbfd0a74bc.html [11] Donghyun Kim, Dongbo Min, And Kwanghoon Sohn, Stereoscopic Video Generation Method Using Motion Analysis, 3DTV Conference, IEEE, 7-9 May 2007, ISBN: 978-1-4244-0722-4 [12] Lai-Man Po, Xuyuan Xu, Yuesheng Zhu, Shihang Zhang, Kwok-Wai Cheung and Chi-Wang Ting, Automatic 2D-to-3D Video Conversion Technique Based On Depth-from-Motion and Color Segmentation, 10th IEEE International Conference on Signal Processing (ICSP), 2010, ISBN: 978-14244-5897-4. Information Technology Application,

[13] Eduardo Ramos-Diaz, Miguel Cruz-

Irisson,

Luis Nino-de-Rivera, and Volodymyr Ponomaryov, 3D Color Video Conversion from 2D Video Sequence Using Stereo Matching Technique, 52nd IEEE International Midwest Symposium on Circuits and Systems, 2009, ISBN 978-1-4244-4480-9/09/. [14] Ianir. A. Ideses*, Leonid. P. Yaroslavsky, Barak Fishbain, Compressed Displays and Roni 2D Vistuch, Video, 3D from Stereoscopic Systems

Virtual

Reality

XIV, Proc. SPIE 6490, 64901C,2007.

S-ar putea să vă placă și