Sunteți pe pagina 1din 11

Chapter 3 Theoretical Considerations

3.1 Image and Video

Image Representation

A digital image is a representation of a two-dimensional image as a finite set of


digital values, called pixels derived from the word “picture element”. It has been
discretized both in spatial coordinates and in brightness. Each pixel of an image
corresponds to a part of a physical object in the 3D world, which is illuminated by some
light which is partly reflected and partly absorbed by it. Part of the reflected light reaches
the sensor used to image the scene and is responsible for the value recorded for the
specific pixel. The pixels are stored in computer memory as a raster image or raster map,
a two-dimensional array of small integers. (Petrou, M., et. al, 1999). The usual size of
such an array is between a few hundred pixels by a few hundred pixels, but most of the
images are simplified in size with a power of 2 like 512 x 512, 256 x 256 etc. The number
of horizontal and vertical samples in the pixel grid is called Image dimensions, it is
specified as width x height. These values are often transmitted or stored in a compressed
form. The number of bits, b, we need to store an image with a size N x N with 2m
different grey level is:
b=NxNxm
That is why we often try to reduce m and N, without significant loss in the quality
because it determines the storage size. Digital images can be created in a variety of ways
with input devices like digital cameras, scanners and etc.

Binary and Grayscale

There are many kinds of digital image like binary, grayscale, and color. These
digital images can be classified according to the number and nature of the values of a
pixel. Each pixel of an image is represented by a specific position in some 2D region. A
binary image are images that have been quantized to two values, usually denoted 0 and 1,
but often with pixel values 0 and 255, representing black and white. A grayscale image is
image in which the value of each pixel is a single sample. Images of this sort are typically
composed of shades of gray, varying from black to white depending on its intensity,
though in principle the samples could be displayed as shades of any color, or even coded
with various colors for different intensities. An example of this image is in figure 3.1. The
original image is the letter a (leftmost) is a grayscale image that has an intensity of 0 to
255, the center image is a zoomed in version of the image and it reveals the individual
pixels of the letter a. The rightmost image is the normalized numerical values of each
pixel. For this example the coding used is that 1(255) is brightest and 0(0) is darkest.
Figure 3.1
Color

A color image is a digital image that includes color information for each pixel, usually
stored in memory as a raster map, a two-dimensional array of small integer triplets; or as
three separate raster maps, one for each channel. One of the most popular colour model is
the RGB model. The colors red, green, and blue was formalized by the CIE (Commission
Internationale d’Eclairage) which in 1931 specified the spectral characteristics of red(R),
blue(B), green(G) to be monochromatic light of wavelengths of 700 nm, 546.1nm, 435.8
nm respectively. (Morris, T., 2004). Almost any colour can be made to match using linear
combinations of red, green, and blue:
C = rR + gG + bB
Today there are many RGB standards in use. Some of these are ISO RGB, sRGB,
ROMM RGB, and NTSC RGB. (Buckley, R. et. al, 1999). These standards are
specifications for specific applications of the RGB color spaces.

Another color model is the HSV model. HSV uses three components to represent an
image: the underlying color of the sample- the hue (H), the saturation or depth of the
sample’s colour – S, the intensity of the sample or brightness –the value (V).
Figure 3.2
RGB and HSV Colorspaces

Resolution

The term resolution is often used as a pixel count in digital imaging. Resolution is
sometimes identified by the width and height of the image as well as the total number of
pixels in the image. For example, an image that is 2048 pixels wide and 1536 pixels high
(2048X1536) contains (multiply) 3,145,728 pixels (or 3.1 Megapixels). Resolution of an
image expresses how much detail we can see in it and clearly and it depends on N and m.
It is a measurement of sampling density, resolution of bitmap images give a relationship
between pixel dimensions and physical dimensions. The most often used measurement is
ppi, pixels per inch.

Megapixels

Megapixels refer to the total number of pixels in the captured image, an easier metric is
image dimensions which represent the number of horizontal and vertical samples in the
sampling grid. An image with a 4:3 aspect ratio with dimension 2048x1536 pixels,
contain a total of 2048x1535=3,145,728 pixels; approximately 3 million, thus it is a 3
megapixel image.

Table 3.1. Common image dimensions

Dimensions Megapixels Name Comment


640x480 0.3 VGA VGA
CCIR 601 DV
720x576 0.4 Dimensions used for PAL DV, and PAL DVDs
PAL
768x576 0.4 CCIR 601 PAL PAL with square sampling grid ratio
Dimensions Megapixels Name Comment
full
800x600 0.4 SVGA
The currently (2004) most common computer
1024x768 0.8 XGA
screen dimensions.
1280x960 1.2
1600x1200 2.1 UXGA
1920x1080 2.1 1080i HDTV interlaced, high resolution digital TV format.
Typically used for digital effects in feature
2048x1536 3.1 2K
films.
3008x1960 5.3
3088x2056 6.3
4064x2704 11.1

Scaling / Resampling

When we need to create an image with different dimensions from what we have we scale
the image. A different name for scaling is resampling, when resampling algorithms try to
reconstruct the original continous image and create a new sample grid.

Sample depth

Sample depth is the level at which binary representation is used to represent the image
The spatial continuity of the image is approximated by the spacing of the samples in the
sample grid. The values we can represent for each pixel is determined by the sample
format chosen.

8bit
A common sample format is 8bit integers, 8bit integers can only represent 256 discrete
values (2^8 = 256), thus brightness levels are quantized into these levels.

12bit
For high dynamic range images (images with detail both in shadows and highlights) 8bits
256 discrete values does not provide enough precision to store an accurate image. Some
digital cameras operate with more than 8bit samples internally, higher end cameras also
provide RAW images that often are 12bit (2^12bit = 4096).

16bit
The PNG and TIF image formats supports 16bit samples, many image processing and
manipulation programs perform their operations in 16bit when working on 8bit images to
avoid quality loss in processing, the film industry in Hollywood often uses floating point
values to represent images to preserve both contrast, and information in shadows and
highlights.

3.2 Input and Output Devices

3.2.1 PC Camera

PC Camera, popularly known as web camera or webcam, is a real time camera

widely used for video conferencing via the Internet. Acquired images from this device

were uploaded in a web server hence making it accessible using the world wide web,

instant messaging, or a PC video calling application. Over the years, several

applications were developed including in the field of astrophotography, traffic

monitoring, and weather monitoring. Web cameras typically includes a lens, an image

sensor, and some support electronics. Image sensors can be a CMOS or CCD, the

former being the dominant for low-cost cameras. Typically, consumer webcams offers

a resolution in the VGA region having a rate of around 25 frames per second. Various

lens were also available, the most being a plastic lens that can be screwed in and out

to manually control the camera focus. Support electronics is present to read the image

from the sensor and transmit it to the host computer.

3.2.2 Projector

Projectors are classified into two technologies, DLP (Digital Light Processing)

and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the

projector uses to compose the image (Projectorpoint).

3.2.2.1 DLP
Digital Light Processing technology used in projectors uses an optical

semiconductor known as the Digital Micromirror Device, or DMD chip to

recreate the source material. Originally developed by Texas Instruments there are

two manners by which DLP projection creates a color image, first employs the

usage of single-chip DLP projectors and the other was on the use of three-chip

projectors. On a single DMD chip colors are generated by placing a color wheel

between the lamp and the DMD chip. Basically a color wheel is divided into four

sectors: red, green, blue and an additional clear section to boost brightness. The

later is usually omitted since it is only use to reduce color saturation. The DMD

chip is synchronized with the rotating color wheel thus when a certain color

section of the color wheel is in front of the lamp that color is displayed at the

DMD. While on a three chip DLP projector, a prism is used to split the light from

the lamp. Each primary color of light is routed to its own DMD chip, recombined

and directed out through the lens. Three chip DLP is referred to the market as

DLP2.

Advantages of DLP projectors

There are advantages of DLP projectors over the LCD projectors.

First, there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels

in DLP are much closer together. Another advantage is that it has higher

contrast compared to LCD. DLP projectors are much portable for it only

requires fewer components and finally, claims had shown that DLP projectors

last longer than LCD (Projectorpoint).


Disadvantages of DLP projectors

Certainly, DLP projectors also have disadvantages to consider. The

picture dims as the lamp deteriorates with time. It has less color saturation.

The ‘rainbow effect’ which is only present on single chip DLP projectors is

appearing when looking from one side of the screen to the other, or when

looking away from the projected image to an off-screen object

(Projectorpoint). To reduce the effect, manufacturers use color wheels rotating

at a much higher speed or use a color wheel with more color segments.

3.2.2.2 LCD

LCD projectors contain three separate LCD glass panels, one for red,

green, and blue components of the image signal being transferred to the projector.

As the light passes through the LCD panels, individual pixels can be opened to

allow light to pass or closed to block the light. This activity modulates the light

and produces the image that is projected onto the screen (Projectorpoint).

3.3 Image Processing

Image processing is basically the transformation of images into images. These


images undergo signal processing techniques to manipulate the images to the users’
desire. These techniques will either enhance or suppress wanted and unwanted part of an
image respectively.

Preprocessing Algorithms

Preprocessing algorithms and techniques are used to make the necessary data
reduction and to make the analysis easier. This stage is basically where we eliminate
unwanted information in different specific applications. Such techniques include
extracting the Region-of-Interest (ROI), performing basic mathematical operations,
enhancement of specific features and data reduction. (Umbaugh, 2005 )

• Defining Region-of-Interest

In image analysis we seldom need the whole image, we only want to concentrate
in a specified area of the image called the Region-of-Interest (ROI). Image
geometry operations are used to extract ROI. Examples of these operations
include crop, zoom, rotate, etc. (Umbaugh, 2005 ).

• Arithmetic and Logical Operations

Arithmetic and logical operations are applied in preprocessing stage to combine


images in different ways. These operations include addition, subtraction,
multiplication, division, AND, OR, and NOT (Umbaugh, 2005 ).

• Spatial Filters

Spatial filtering is used for noise reduction and image enhancement. This is done
by applying filter functions or filter operators in the domain of the image space.
(Umbaugh, 2005).

• RGB to Binary Conversion

Converting RGB to binary is important besides making the analysis easier, it also
reduces the size of the image because a binary image has only two intensity
values (0 and 1) contrast to an RGB image, which has three levels each having
256 intensity values (0 to 255).

Thresholding

Thresholding

Thresholding is the process of reducing the gray scale of monochrome images to


two values and the simplest way to do image segmentation. One of which is the “object
pixel” and the other is the “background pixel”. An image will be marked as an object
pixel when its value is greater than the threshold value and background pixel otherwise.
Usually, an object pixel is given a value of '1' while a background pixel is given a value
of '0'.

0 if . f (i, j ) ≤ θ
g (i, j ){
1 otherwise

The main parameter in thresholding lies in selecting the correct value for the
threshold. There are many ways to acquire the value of threshold and the simplest way to
select the threshold value would be to choose the mean or median value. This is effective
provided that the object pixels are brighter than the background, and they should also be
brighter than the average. Using a histogram to record the frequency of occurrence of the
image pixel and use the valley point as the threshold would be the next. The histogram
approach assumes that there is some average value for the background and object pixels,
but that the actual pixel values have some variation around these average values. A more
effective way to acquire the value of threshold is by using iterative methods.

There are two ways to possibly perform the iterative method. The first method
will incrementally search through the histogram for a threshold. Starting at the lower end
of the histogram, the average of the gray values less than the suggested threshold will be
computed thus labeled L, and the same thing with gray values greater than the suggested
threshold labeled G. The average of L and G will be then computed. If the average is
equal to the suggested threshold, it will be the threshold. Otherwise the suggested
threshold is incremented and the process repeats. (Umbaugh, 2005)

The second method searches the histogram persistently. First an initial threshold
value is suggested: a suitable choice is getting the average of the image’s four corner
pixels. Then the next steps will be similar to the first method, the only difference lies on
updating of the suggested threshold, on this method the updated value is now equal to the
average the value of L and G. (Umbaugh, 2005)

3.4 Motion Detection

Image Differencing

A common method for detecting moving objects is by use of image differencing. Image
differencing over successive pairs of frames should reveal the different pixels which
should be composed of the moving object. However certain considerations complicate the
matter. Regions of constant intensity and edges parallel to the direction of motion
give no sign of motion (Davies, E. , 2005). Also image differencing suffers from noise. It
is prone to contain errors due to subtle changes in illumination. This can be caused due to
environmental changes and the digitization process of the camera where in internal noise
causes subtle changes in successive frames.

The documentation of the OpenCV library suggests to use a mean of a number of frame
as the reference of the differencing. The mean is calculated as

And the standard deviation as.

Where S(x,y) is the sum of the individual pixel intensities at point x and y
Sq(x,y) is the sum of the squares of the individual pixel intensities at point x and y
N is the total number of frames
A pixel is regarded as part of the moving object if satisfies the condition that
(m( x , y ) − p ( x , y ) ) > cσ ( x , y )
C is a certain constant that controls the sensitivity of the differencing. If C = 3, it is
known as the 3 sigma rule (Intel, 2001).

3.5 Object Detection

3.6 OpenCV

3.7 Visual C++

3.8 .Net Windows API

References

Petrou, M., and Bosdogianni, P (1999). Image Processing, The Fundamentals. John Wiley
& Sons, LTD : New York

Morris, T. (2004) Computer vision and image processing. Palgrave Macmillan: NY

Kolas, O. (2005) Image Processing with gluas: introduction to pixel molding. Available:
http://pippin.gimp.org/image_processing/chap_dir.html

Buckley, R., et. al. (1999). Standard RGB color spaces. In the IS&T/SID Seventh Color
Imaging Conference: Color Science, Systems and Applications. Scottsdale,
Arizona

DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from
http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm.

Webcam. (n.d.). Wikipedia. Retrieved June 03, 2006, from Answers.com Web site:
http://www.answers.com/topic/web-cam.
Davies, E. (2005). Machine vision: theory, algorithms, practicalities. Elsiever: CA

Intel (2001). Open source computer vision library reference manual. Available:
http://developer.intel.com

Umbaugh, S. (2005). Computer Imaging: Digital Image Analysis and Processing. CRC
Press: Boca Raton, Florida.

Shapiro, L. and Stockman, G. (2001). Computer Vision. Prentice Hall. Upper Saddle
River, New Jersey.

Sites: http://www.microscope-microscope.org/imaging/image-resolution.htm

S-ar putea să vă placă și