NPTEL Transcript PDF

Digital Image Processing
Prof .P. K. Biswas

Department of Electronics and Electrical Communication Engineering
Indian Institute of Technology, Kharagpur
Lecture - 1
Introduction
Welcome to the course on digital image processing. I am Doctor P K. Biswas, professor in the
department of electronics and electrical communication engineering, IIT Kharagpur and I will be
the faculty for this course on digital image processing.
Now, in today’s lecture, we will have an introduction to the various image processing techniques
and their applications and in subsequent lectures, we will go to the details of different image
processing algorithms. To start with, let us see that what does digital image processing mean.
(Refer Slide Time: 1:40)
So, if you just look at this name - digital image processing; you find that there are 3 terms. First
one is processing, then image, then digital. So, a digital image processing means processing of
images which are digital in nature by a digital computer. Before we come to the come to the
other details, let us see that why do we need to process the images.
1
So, you find that digital image processing techniques is motivated by 2 major applications. The
first application is improvement of pictorial information for human perception. So, this means
that whatever image you get, we want to enhance the quality of the image so that the image will
have a better look and it will be much better when you look at the image.
The second important application of the digital image processing techniques is for autonomous
machine applications. This has various applications in industries, particularly for quality control
in assembly automation and many such applications. We will look at them one after another.
And of course, they use a third application which is efficient storage and transmission. Say for
example, if we want to store an image on a computer, then this image will need certain amount
of disk space.
Now, we will look at whether it is possible to process the image using certain image properties
so that the disk space required for storing the image will be less. Not only that, we can also have
applications where we want to we want to transmit the image or the video signal over a
transmission media and in that case, if the bandwidth of the transmission medium is very low, we
will see that how to process the image or the video so that the image or the video can be
transmitted over low bandwidth communication channels. So, let us first look at the first major
application which is meant for human perception.
2
Now, these methods mainly employ the different image processing techniques to enhance the
pictorial information for human interpretation and analysis. Typical applications of these kinds
of techniques are noise filtering. In some cases, the images that you get may be very very noisy.
So, we have to filter the images filter those images so that the noise present in that image can be
removed and the image appears much better.
In some other kind of applications, we may have to enhance certain characteristics of the image.
So, the different kind of applications under this category: one is the contrast enhancement. So,
sometimes the image may be very very poor contrast and we have to enhance the contrast of that
image so that it is better visually.
In some other cases, the image may be blurred. This blurring may occur because of various
reasons. May be, the camera setting is not proper or the lens is not focused properly. So, that
leads to one kind of blurring. The other kind of blurring can be if we take a picture from a
moving platform; say for example, from a moving car or from a moving train. In that case also
you might have absorbed that the image that you get is not a clear image. But it is a blurred
image.
So, we look whether the image processing techniques can help to rectify those images. The other
kind of application is remote sensing. In remote sensing, the types of images which are used are
the aerial images and in most of the cases, the aerial images are taken from a satellite.
Now, let us look at the different examples under these different categories.
3
Here you find that you have a noisy image. The first image that is shown in this slide is a noisy
image and this kind of image is quite common on a TV screen. Now, find that the digital image
processing techniques can be used to filter these images and the filtered image is shown on the
right hand side and you find that the filtered image looks much better than the noisy image that
we have shown on the left side.
In the second category, for image enhancement; you find that again on the left hand side we have
an image and on the right hand side, we have the corresponding image which is processed to
enhance its contrast. If you compare these two images, you find that the low contrast image in
4
this case, there are many details which are not clearly visible. Say for example, the water line of
the river. Simultaneously, if you look at the right image which is the processed and enhanced
version of the low contrast image, you find that the water lines of the river are clearly visible.
So, here after processing, we have got an image which is visually much more better than the
original low contrast image.
There is another example of image enhancement here. On the left hand side, we have a low
contrast image but in this case it is a color image and I am sure that none of you will like to have
a color image of this form. On the other hand, on the right hand side, we have the same image
which is enhanced by using the digital image processing techniques and you find that in this
case, again the enhanced image is much better than the low contrast image.
5
I talked about the other content enhancement where we have said that in some of the
applications, some cases, the image may be blurred. So here, on the top row, the left side, you
find an image which is blurred and in this case, the blurring has occurred because of the
defocused lens. When the image was taken, the lens was not focused properly.
And, we have talked about another kind of blurring which occurs if you take a picture from a
moving platform, may be from a moving train or from a moving car. In such cases, the type of
image that we usually get is the kind of image that we have shown on the right hand side of the
top row.
And, here you find that this kind of blurring is mostly the motion blurring and the third image on
the bottom row, it shows the processed image where by processing these different blurred
images, we have been able to improve the quality of the image.
6
Now, other major application of digital image processing techniques is in the area of medicine. I
am sure that many of you must have seen the CT scan images where the images of the human
brain are formed by using the CT scan machines. Here, it shows one slice of a CT scan image
and the image is used to determine the location of a tumor.
So, you find that the left hand image is the original CT scan image and the middle and the image
on the right hand side, there are the processed images. So, here in the processed image, the
region of yellow and red that tell you that the presence of a tumor in the brain.
Now, these kind of images and image processing techniques are very very important in medical
applications because by this processing techniques, the doctors can find out the exact location of
the tumor, the size of the tumor and many other things which can help the doctor to plan the
operation process and obviously this is very very important because in many cases, its saves our
lives.
7
This is another application of the image processing techniques in the medical field where you
have shown some images, some mammogram images which shows the presence of cancerous
tissues. So, this image processing techniques in the medical field is very very helpful to detect
the formation of cancers.
This shows another image which is very very popular and I believe most of you have heard the
name Ultra Sonography. So, here we have shown 2 images, 2 ultrasonic images which are used
to study the growth of a baby while the baby is in the mother’s womb and this also helps the
doctor to monitor the health of the health of the baby before the baby is actually born.
8
The image processing techniques are also very very important for remote sensing applications.
So here, this is an image, a satellite image which is taken over the region of Calcutta and you
find that many of the informations which are present in the image; the blue line, the blue thick
line, it shows the river Ganges and there are different color coding used for indicating different
regions and when we have a remote sensing image and aerial image of this form which is taken
from a satellite, we can study various things.
For example, we can study that whether the river has changed its path, we can study what is the
growth of vegetable over a certain region, we can study if there is any pollution in some region
in that area. So, these are various applications of these remote sensing images. Not only that such
kind of remote sensing images or aerial images can also be used for planning a city.
Suppose, we have to form a city; we have to build a city over certain region; then through this
aerial images what we can study is what is the nature of the region over which the city has to be
built and through this, one can determine that where the residential area has to be grown, where
an industrial area has to be grown, through which regions the paths have to be formed, the roads
have to become constructed, where you can construct a car parking region and all those things
can be planned if you have an aerial image like this.
9
Here is another application of remote sensing images. So, here you find that the remote sensing
images are used for terrain mapping. So, this shows the terrain map of a hilly region which is not
accessible very easily. So, what we can do is we can get the images to the satellite of that region
which is not accessible. Then process those images to find out the 3D terrain map and here this
particular image shows such a terrain map of a hilly region.
This is another application of the remote sensing images. Here you find that this particular
satellite image shows a fire which took place in Borneo. Then you see that these kind of images
are useful to find out what is the extent of fire or in which direction the fire is moving and once
10
you identify that, you can determine that what is the loss that has been made by the wake of this
fire and not only that; if we can find out the direction in which the fire is moving, we can warn
the people before hand much early so that the precautionary action can be taken and many lives
as well as the property can be saved.
The image processing techniques is also very very important for weather forecasting. I am sure
that whenever you look at the TV news on a television channel, when the weather forecasting is
given; in that case, on a map some images are overlapped which tells you that what is the cloud
formation in certain regions. That gives you an idea that whether there is going to be some rain,
that is going to be some storms and things like that.
This is an image which shows the formation of hurricane over Dennis which happened in 1990
and through this image, we can find out that what is the extent of this hurricane, what is strength
of this hurricane and what are the precautionary measures that can be taken to save live as well
as property before. Image processing techniques are also very useful for atmospheric study.
11
So, if you look at this image, you find that in a center part of the image what has been shown is
the formation of an ozone hole. Many of you know that this ozone layer is very very important
for us because it gives us a protective layer over our atmosphere and because of this ozone
protective layer, many of the unwanted rays from the sun cannot enter our earth’s atmosphere
and by that our health is saved.
Whenever there is formation of such an ozone hole, so this indicates that all those unwanted rays
can enter the earth’s surface through those through that ozone hole. So, the region over which
such ozone hole is formed, people of that region has to take some precautionary measure to
protect them against such unwanted radiation. So, this is also very very important. Such image
processing techniques are very very important for atmospheric study.
12
Image processing techniques are also important for astronomical studies. Say for example, in this
particular case, it shows the image of a star formation process.
Again, the next image, it shows the image of a galaxy. So, you find that the application of the
image processing techniques is becoming unlimited. So, these are applied in various fields for
various purposes.
13
Next we come to the other domain of application of image processing techniques which is the
machine vision applications. You find that all the earlier applications which we have shown;
there the purpose was the visualization, the improvement of the visual quality of the image so
that it becomes better for human perception.
When it comes to machine vision application, the purpose of image processing techniques is
different. Here, we are not much interested in improving the visual quality. But here, what we are
interested in is processing the images to extract some description or some features which can be
used for further processing by a digital computer and such a kind of processing can be applied in
industrial machine vision for product, assembly and inspection. It can be used for automated
target detection and tracking. This can be used for finger print recognition. This can also be used
for processing of aerial and satellite images for weather prediction, crop assessment and many
other applications.
So, let us look at this different applications one after another.
14
So, this shows an application of automation of a bottling plant here. What the plant does is it fills
of some liquid, some chemical into a bottle and after it is filled up the bottled is bottles are
carried away by the conveyor belts and after that this are packed and finally sent to the
customers.
So, here checking the quality of the product is very very important and in this particular
application, the quality of the product indicates that whether the bottles are filled properly or
some bottles are coming out empty or partially filled. So naturally, the application will be that if
we can find out that some bottles are partially filled or some bottles are empty, then naturally we
do not want those bottles to be delivered to the customers because if the customer gets such
bottles, then the good will of that company will be lost.
So, detection of the empty bottles or partially filled bottles is very very important and here image
processing techniques can be used to automate this particular process. So here, you find that we
have shown an image, the snap shot of this bottling process or you find that the awesome bottles
which are completely filled up and one bottle in the middle which is partially filled.
So naturally, we want to detect this particular bottle and remove it from the production line so
that finally when the bottles go to the customer, no empty bottle or no partially filled bottle are
given to the customers.
15
Let us see another application of image processing in machine vision for machine vision
purpose. Now, before I go to that application, I have shown an image to highlight the importance
of boundary informations in image processing. So here, you find that we have shown the
boundary image of an animal. There is no other information available in this image except the
boundary contours and you find that if I ask you that can you identify this particular animal and I
am sure that all of you will identify this to be a giraffe.
So, you find that even though we do not have any other information except the boundary or the
border of the giraffe; but still we have been able to identify this particular animal. So, in many
cases or in most of the cases, the boundaries contain most of the information of the objects
present in the scene and using this boundary information, we can develop various applications of
image processing techniques. Here is an application.
16
So, this is again an automated inspection process and here the objects that we are interested to
inspect are some refractive kits. So here, you find that we have shown 4 different images. The
first one is the original image which is off the refractive which is captured by the camera. The
second one is what we call is a thresholded image or a segmented image, we will come to the
details of this later; where we have been able to identify that what are the regions which actually
belong to this object and what are the regions which belong to the boundary.
Naturally, when we are interested in inspecting this particular object, we will not be interested in
the background region. What we will be interested is in is the region that belongs to the
particular object. So, this background and object separation process is very very important in all
these kind of applications.
The third image that is the left one on the bottom is a field image. You find that the second
image, it is not very smooth. There are a number of black patches over the white region. So, the
second one has filled up all those black patches and it shows that what is the profile of the object,
the 2 D projection of the object that we can get.
And the fourth image, you find that it shows that what is the boundary of this object and using
this boundary information, we can inspect various properties of this particular object. For
example, in this particular application there can be 2 different types of defects. One kind of
defect is the structural defect.
Now, when you say structural defect, by structure what I mean is what is the dimension of every
side of the object, what is the angle at every corner of the object; these are the structural
informations of that particular object. And other kind of inspection that we are interested to do is
what are the surface characteristics of this particular object; whether the surface is uniform or the
surface is non uniform. So, let us see that how these inspections can be made.
17
So, here you find that the first image, what we have done is we have processed the boundary
image in such a way that since there are 4 different boundary regions, we have fitted 4 different
straight lines and these 4 different straight lines that tells you that what should be the ideal
boundary of the object. And, once you get this 4 different straight lines; using this 4 different
straight lines, we can find out what are the points of intersections of this 4 different straight lines
and using this point of intersections, we know that in the ideal situation, those point of
intersections are actually the location of the corners of the object.
So, you find that in the first image, there are 4 white dots which indicate the corners of the object
and once you get these informations - the corners of the object, the boundary line of the object;
we can find out what is the dimension of or the length of each and every side of the object. We
can also find out what is the corner subtended, what is the angle subtended at every corner of the
object.
And from this, if I compare these informations that we have obtained through image processing
with the information which is all ready stored in the database for this particular object; we can
find out whether the dimensions that we have got is within the tolerable limit are not. So, if it is
within the tolerable limit, then we can accept the object. If it is not within the tolerable limit, then
we will not accept the object.
Now, if you look at the original image once more, you will find that there are 2 different corners;
the corner on the right hand side and the corner on the left hand side. These corners are broken.
Not only that; on the left hand side, if you look at the middle, you can identify that there is
certain crack. So, these are also the defects of this particular object and through this image
processing technique, we are interested to identify this defects.
Now, let us see that how these defects have been identified. So, here again, in the first image;
one we have once we have got the ideal boundary and ideal corners of the object, we can fill up
18
the region bounded by these 4 different edges to get an ideal projection of the object. So, the
second image in this particular slide shows you that what is the ideal projection. The third image
that shows you that what is the actual projection that has been obtained after image processing
techniques.
Now, if this ideal projection, if we take the difference of this ideal projection and the actual
projection; then we can identify these defects. So, you find that in that fourth image, the 2
different corner breaks have been represented by white patches and also in the left hand side in
the middle, you can see that the crack is also identified. So, these image processing techniques
can be used for inspection of the industrial objects like this.
And as we mentioned, the other application or the other kind of inspection that we are interested
in is the surface characteristics; whether the surface is uniform or the surface is non uniform. So,
when we want to find out or study the surface characteristics, the type of processing techniques
which will be used is called texture processing. And, this one shows that for the surface of the
object it is not really uniform, rather it contains 2 different textures and in the right image, those
two textures are indicated by 2 different gray shades.
19
It shows the application of the image processing techniques for automated inspection in other
applications. For example, the inspection of integrated circuits during the manufacturing phase.
Here you find that in the first image, there is a broken bond, whereas in the second image some
bond is missing which should have been there. So, naturally these are the defects which ought to
be identified because otherwise if this IC is made, then the IC will not function properly.
So, those are the applications which are used for mission vision applications for automating
some operation and in most of the cases; it is used for automating the inspection process or
automating the assembly process. Now, we have another kind of applications by processing the
20
sequence of images which are known as video sequence. The video sequence is nothing but the
different image frames which are displayed one after another.
So naturally, when the image frames are displayed one after another, then if there is any
movement in the image that movement is clearly detected. So, the major emphasis in image
processing, in a sequence processing is to detect the moving parts. This has various applications.
For example, detection and tracking of moving targets and major application is in security
surveillance.
The other application can be to find the trajectory of a moving target. Also, monitoring the
movement of organ boundaries in medical applications is very very important and all these
operations can be done by processing video sequences. So, let us take one such example.
Here you find that in the first image, some person is moving against a green background. So, let
us see this. So, here you find that a person is moving against the background. So, through image
processing techniques, we can identify this movement. So, in the second processed sequence,
you find that the person is moving which is clearly shown against a black background. That
means we have been able to separate the background from the moving object.
21
Now, this particular application which has been shown; here the image was taken or the video
sequence was taken on broad day light but in many other application, for example particularly
for security applications, the images ought to be taken during the night also, when there is no sun
light. Then what kind of image processing techniques can be used for such surveillance
applications?
So, here we find that we have shown an image which is or a sequence which is taken during the
night and the kind of imaging that you have to take is not the ordinary optical imaging. But here
we have to go for infrared imaging or thermal imaging. So, this particular sequence is a thermal
sequence. So, here again you find that a person is moving against a steel background.
So, if you just concentrate in this region, you find that the person is moving and again through
image processing techniques, we have identified just this moving person against the steel
background. So, here you find that the person is moving and the background is completely black.
So, these kinds of image processing techniques can also be used for video sequence processing.
Now, let us take look at another application of this image processing technique.
22
Let us look at this. Here, we have a moving target, say like this and our interest is to track this
particular target. That is we want to find out what is the trajectory that this particular moving
object is following. So, what we will do is we will just highlight the particular point that we want
to track. So, I make a window like this with the window covers the region that I want to track.
And, after making the window, I want to make a template of the object region within this
window.
So, after making the template, I go for tracking this particular object. So, you find that again in
this particular application, the object is being tracked in this video sequence. Just look at this, so
find that over the sequences, the object is changing it is shape. But even after changed shape, we
have not been able to track this particular object. But when the object cannot be matched any
further, the shape has changed so much that the object cannot be matched any further, it indicates
poor detection.
So, what is the application of this kind of image processing techniques? Here, the application is
if I track this moving object using a single camera, then with help of single camera, I can find out
what is the Azimuth and elevation of that particular object with respect to certain difference
coordinate system.
If I track the object with 2 different cameras and find out the Azimuth and elevation with the
help of 2 different cameras, then I can identify that what is the X Y Z coordinate of that object
with respect to that 3 D coordinate system and by locating those locations in different frames, I
can find out that over the time which path the object is following and I can determine that what is
the trajectory that the moving object follows. So, these are the different applications of video
sequence processing.
23
So, we have mentioned that we have a third application which is compression. And in
compression, what we want is we want to process the image to reduce the space required to store
that image or if you want to transmit the image, we should be able to transmit the image over a
low bandwidth channel.
Now, let us look at this image and let us look at the region that blue circular region; you find that
in this particular region, the intensity of the image is more or less uniform. That is if I know that
the intensity of the image at a particular point, I can predict what is the intensity of it is
neighboring point.
So, if the prediction is possible, then it can be approved that why do we have to store all those
image points. Rather, I store one point and the way it can be predicted, its neighborhood can be
predicted; we just mention that prediction mechanism. Then the same information can be stored
in a much lower space. You find the second region. Here again, in most of the regions, you find
that the intensity is more or less uniform except certain regions like eye like the head boundary
like the ear and things like that. So, these are the kind of things which are known as redundancy.
So, whenever we talk about an image, the image usually shows 3 kinds of redundancies. The first
kind of redundancy is called a pixel redundancy which is just shown here. The second kind of
redundancy is called a coding redundancy and third kind of redundancy is called a psycho visual
redundancy. So, these are the 3 kinds of redundancy which are present in an image.
So, whenever we talk about an image, the image contains 2 types of entities. The first one is
information content of the image and the second one is the redundancy and these are the 3
different kinds of redundancies.
So, what is termed for image compression purpose is you process the image and try to remove
the redundancy present in the image, retain only the information present in the image. So, if we
24
retain only the information; then obviously, the same information can be stored using a much
lower space.
The applications of this are reduced storage as I have all ready mentioned. If i want to store this
image on a hard disk or if I want to store the video sequence on a hard disk, then the same image
or the same digital video can be stored in a much lower space.
The second application is reduction in bandwidth. That is if I want to transmit this image over a
communication channel or if I want to transmit the video over a communication channel, then
the same image or the same video will take much lower bandwidth of the communication
channel. Now, given all these applications, this again shows that what we get after compression.
So here, we find that we have the first image which is the original image. The second one shows
the same image but here it is compressed 55 times. So you find, if I compare the first image and
the second image; I find that the visual quality of the 2 images is almost same at least visually we
cannot make much of difference.
Whereas, if we look at third image which is compressed 156 times; now if you compare this third
image with the original image, you find that in the third image there are a number of blocked
regions or blocky, these are called blocking artifacts which are clearly visible when you compare
it with the original image.
The reason is as we said that the image contains information as well as redundancy. So, if I
remove the redundancy, maintain only the information; then the reconstructed image does not
look much different from the original image. But there is another kind of compression techniques
which are called lossy compression.
25
In case of lossy compression, what we remove is not only the redundancy but we also remove
some of the informations so that after removing those informations, the quality of the
reconstructed image is still acceptable.
Now, in such cases, because you are removing some of the information which is present in the
image; so naturally, the quality of the reconstructed image will not be as the original image. So,
naturally there will be some loss or some distortion and this is taken care by what is called rate
distortion theorem.
Now, if I just compare the space requirement of these 3 images; if the original image is of size
say 256 by 256 bytes that is 64 kilobytes, the second image which is compressed 55 times, the
second image will take slightly above something around say 10 kilobytes. So, you find that the
difference; the original image takes 64 kilobytes, where the second one takes something around
10 kilobytes whereas, the third one will take something around 500 bytes or even less than 500
bytes. So, you find that how much reduction in the space requirement we can achieve by using
these image compression techniques.
So, given this various applications, now let us look at some history of image processing. Though
the application of digital image processing has become very very popular for last 1 or 2 decades
but the concept of image processing is not that young. In fact, as early as 1920’s, image
processing techniques were been used and during those days, the image processing techniques or
the digital images were used to transmit the news paper pictures between London and New York
and these digital pictures were carried by submarine cables: the systems which was known as
Bartlane systems.
Now, when you transmit these digital images via submarine cable; then obviously on the
transmitting side, I have to have a facility for digitization of the image. Similarly, on the
receiving side, I have to have a facility for reproduction of the image.
26
So, in those days, on the receiving side, the pictures were been reproduced by the telegraphic
printers. And, here you find a particular picture which was reproduced by a telegraphic printer.
Now next, in 1921, there was an improvement in the printing procedure. In the earlier case, the
images were reproduced by telegraphic printers.
In 1921, what was introduced was the photographic process for picture reproduction and in this
case, on the receiver; instead of using the telegraphic printer, the digital images or the codes of
digital images were perforated on a tape and photographic printing was carried on using those
tapes.
So, here you find that there are 2 images. The second image is obviously the image that we have
shown in the earlier slide; the first image is the image which has been produced using this
photographic printing process.
So, here you find that the improvement, both in terms of tonal quality as well as in terms of
resolution is quite evident. So, if you compare the first image and the second image, the first
image appears much better than the second image.
27
Now, the Bartlane system that I said which was being used during 1920’s, that was capable of
coding 5 distinct brightness levels. This was increased to 15 levels by 1929. So, here we find an
image with 15 different intensity levels and here the quality of this image is better than the
quality of the image which was produced by the Bartlane system.
Now, since 1929, for next 35 years; the researches have paid their attention to improve the image
quality or to improve the reproduction quality. And in 1964, these image processing techniques
were being used at Jet Propulsion Laboratory to improve the pictures of moon which have been
transmitted by ranger 7. And, we can say that this is the time from where the image processing
28
techniques or the digital image processing techniques has got a boost and this is considered to be
the basis of modern image processing techniques.
Now, given the applications as well as and the history of the digital image processing techniques;
now let us see that how an image is to be represented in a digital computer. This representation is
very very important because unless we are able to represent the image in a digital computer,
obviously we cannot process the image.
So, here you find that we have shown an image and at a particular point X Y in the image,
conventionally the X coordinate is taken vertically downwards and the Y axis is taken
horizontally towards right. And if I look at this image, this image is nothing but a 2 dimensional
intensity function which is represented by f (x, y).
Now, if at any particular point X Y, we find out the intensity value which is represented by f (x,
y). This f (x, y) is nothing but a product of 2 terms. So, here you find that this f (x, y) is
represented by product of 2 terms; one term is r (x, y) and the other term is i (x, y). This r (x, y)
is the refractivity of the surface of the corresponding image point.
After all, how do you get an image or how do we, how can we see an object? You find that there
must be some light source. If I take an image in the day light, this light source is usually the sun.
So, the light from the light source falls on the object surface, it gets reflected, reaches our eye
and then only we can see that particular object.
So, here we find that this r (x, y) that represents the reflectivity of the point on the object surface
which is used from where the light gets reflected and falls on the imaging plane. And this i (x, y),
it represents the intensity of the incident light. So, if I take the product of the reflectivity and the
intensity, these 2 terms are responsible for giving intensity at a particular point in the image.
29
Now if you look at this, you find if this is an analog image; then how many points we can have
on this image? Obviously, there are there is information at every possible point both in the X
direction and in the Y direction. That means there are infinite number of points in this particular
image and at every point, the intensity value also continuous between some minimum and some
maximum and theoretically, the minimum value can be 0 and the maximum value can be infinite.
So, can we represent or can we store such an image in a digital computer where I have infinite
number of points and I have infinite possible intensity values? Obviously not, so what we have to
do is we have to go for some processing of this image.
And, what we do is instead of storing all the intensity values at all possible points in the image;
we try to take samples of the image on a regular grid. So here, the grid is superimposed on this
particular image and what we do is we take samples image samples at various grid points.
So, the first level that we need for representation of an image in a digital computer is spatial
discretization by grids. And once we get these sample values, at every point, the value of that
particular sample is again continuous. So, it can assume any of the infinite possible values which
again cannot be represented in a digital computer. So, after sampling the second operation that
we have to do is discretization of the intensity values of different samples: the process which is
called quantization.
30
So, effectively what we need is an image is to be represented by a matrix like this. So, this is a
matrix of finite dimension, it has n number of row m number of rows and n number of columns.
Typically, for image processing applications, the image size which is used is either 256 by 256
elements, 512 by 512 elements, 1 k by 1 k elements and so on; each of these elements in this
matrix representation is called a pixel or a pale.
Now, quantizations of these matrix elements; now you find that each of the locations represents a
particular grid location where I have to I have stored a particular sample value. Each of this
sample values are quantized and typically for image processing applications, the quantization is
done using 8 bits for black and white image and using 24 bits for color image. Because in case of
color, there are 3 color planes - red, green and blue. For each of the planes, if I use 8 bits for
quantization, then it gives us 24 bits which is used for representation of digital color image.
31
So, here we find that it just shows an example that given this image; if I take a small rectangular
area somewhere here, then the intensity values of that rectangular area is given by a matrix like
this.
Now, let us see that what are the steps in digital image processing techniques. Obviously, the
first step is image acquisition. The next step after acquisition is we have to do some kind of
processing which are known as pre processing which takes care of removing the noise or
enhancement of the contrast and so on. The third operation is segmentation. That is partitioning
32
an input image into constituent parts of objects. The segmentation is also responsible for
extracting the object points from the boundary points.
After segmentation, the next step is to extract some description of the image objects which are
suitable for further computer processing. So, these steps are mainly used for machine vision
applications. Then we have to go for recognition. So, once we get description of the objects;
from those descriptions, we have to interpret or recognize what that object is. And, the last step is
the knowledge base where the knowledge bases helps for efficient processing as well as inter
module cooperation of all the previous processing steps.
33
So, here we have shown all those different steps with the help of a diagram where the first step is
image acquisition, the second step is pre processing and then go for segmentation, then go for
representation and description and finally go for recognition and interpretation and you get the
image understanding result.
And at the core of this system, we have shown a knowledge base and here you find that the
knowledge base has a link with all these different modules. So, the different modules can take
help of the knowledge base for efficient processing as well as for communicating or exchanging
the information from one module to the another module.
So, these are the different steps which are involved in digital image processing techniques and in
our subsequent lectures, we will elaborate on this digital on this different processing steps one
after another.
Thank you.
34
Prof. P. K. Biswas
Lecture - 2
Image Digitization- I
Hello, welcome to the video lecture series on digital image processing. In our earlier class that is
during the introductory lecture on this digital image processing, we have seen the various
applications of digital image processing technique.
We have also talked about the history of image processing techniques and we have seen that
though the digital image processing techniques are very popular and used in wide application
areas these days but the digital image processing techniques is quite old.
In fact, we have seen that as early as in 1920’s, the digital image processing techniques were
been used to transmit the newspaper images from one place to another. After talking about the
history, we have also seen the various steps that are involved in image processing techniques and
while talking about the various steps, we have seen that the first step that has to be done before
any processing can be done on the images is digitization of images.
So, in today’s lecture and in the next lecture, we will talk about the digitization process through
which an image taken from a camera can be digitized and that digital image can be finally
1
processed by a digital computer. So, in today’s lecture, we will talk about digital image
digitization techniques.
Now, during this course, we will talk about why image digitization is necessary, we will also talk
about what is meant by signal bandwidth, we will talk about how to select the sampling
frequency of a given signal and we will also see the image reconstruction process from the
sampled values.
2
So, in today’s lecture, we will try to find out the answers to 3 basic questions. The first question
is why do we need digitization? Then, we will try to find out the answer to what is meant by
digitization and thirdly, we will go to how to digitize an image. So, let us talk about this one after
another. Firstly, let us see that why image digitization is necessary.
You find that in this slide, we have shown an image, this is the image of a girl and as we have
just indicated in our introductory lecture that an image can be viewed as a 2 dimensional function
given in the form of f (x, y).
Now, this image has certain length and certain height. The image that has been shown here has a
length of L. This L will be in units of distance or units of length. Similarly, the image has a
height of H which is also in units of distance or units of length. Any point in this 2 dimensional
space will be identify the image coordinates X and Y.
Now, find that conventionally, we have said that X axis is taken as vertically downwards and Y
axis is taken as horizontal. So, every coordinate in this 2 dimensional space will have a limit like
this. That value of X will vary from 0 to H and value of L value of Y will vary from 0 to L.
Now, if I consider any point X Y in this image, the point X Y or the intensity or the colour value
at the point X Y which can be represented as a function of X and Y where X Y identifies a point
in the image space that will be actually a multiplication of 2 terms. One is r (x, y) and other one
is i (x, y).
We have said during our introductory lecture that this r (x, y) represents the reflectance of the
surface point of which these particular image points corresponds to and i (x, y) represents the
intensity of the light that is falling on the object surface. Theoretically, this r (x, y) can vary from
0 to 1 and i (x, y) can vary from 0 to infinity.
3
So, a point f (x, y) in the image can have a value anything between 0 to infinity. But practically,
the intensity at a particular point or the colour at a particular point given by X Y that varies from
certain minimum which is given by I min and certain maximum I max . So, the intensity at this
point X Y that is represented by X Y will vary from minimum intensity value to certain
maximum intensity value.
Now, find the second figure in this particular slide. It shows that if I take a horizontal line on this
image space and if I plot the intensity values along that line; the intensity profile will be
something like this. It again shows that this is the minimum intensity value along that line and
this is the maximum intensity value along the line. So, the intensity at any point in the image or
intensity along a line; whether it is a horizontal or vertical, can assume any value between the
maximum and minimum.
Now, here lies the problem. When we consider a continuous image which can assume any value,
intensity can assume any value between certain minimum and certain maximum and the
coordinate points X and Y, they can also some value between X can vary from 0 to H, Y can
vary from 0 to L.
Now, from the theory of real numbers you know that given any 2 point that is between any 2
points, there are infinite numbers of points. So again, when I come to this image as X varies from
0 to H, there can be infinite possible values of X between 0 and H.
Similarly, there can be infinite values of Y between 0 and L. So effectively, that means that if I
wants to represent this image in a computer, then this image has to be represented by infinite
number of points and secondly when I consider the intensity value at a particular point, we have
said that the intensity value f (x, y), it varies between certain minimum I min and certain
maximum I max
4
Again, if I take these 2 - I min and I max to be minimum and maximum intensity values possible
but here again the problem is the intensity values, the number of intensity values that can be
between minimum and maximum is again infinite in number. So, which again means that if I
want to represents an intensity value in a digital computer, then I have to have infinite number of
bits to represent an intensity value and obviously such a representation is not possible in any
digital computer.
So, naturally, we have to find out a way out. That is our requirement is we have to represent this
image in a digital computer, in a digital form. So, what is the way out? In our introductory
lecture, if you remember that we have said that instead of considering every possible point in the
image space, we will take some discrete set of points and those discrete set of points are decided
by grid.
So, if we have a uniform rectangular grid; then at each of the grid locations, we can take a
particular point and we will consider the intensity at that particular point. So, this is the process
which is known as sampling.
So, what is desired is we want that an image should be represented in the form of a finite 2
dimensional matrix like this. So, this is a matrix representation of an image and this matrix has
got finite number of elements. So, if you look at this matrix, you find that this matrix has got M
number of rows varying from 0 to M minus 1 and the matrix has got N number of columns
varying from 0 to N minus 1.
Typically, for image processing applications, we have mentioned that the dimension is usually
taken either as 256 by 256 or 512 by 512 or 1 k by 1 k and so on. But still whatever be the size,
the matrix is still finite; we have finite number of rows and we have finite number of columns.
So, after sampling what we get is an image in the form of a matrix like this.
5
Now, the second requirement is if I do not do any other processing on this matrix elements; now
what this matrix elements represent? Every matrix element represents an intensity value in the
corresponding image location and we have said that these intensity values or the number of
intensity values can again be infinite between certain minimum and maximum which is again not
possible to be represented in a digital computer.
So, here what we want is each of the matrix elements should also assume one of finite discrete
values. So, when I do both of this that is first operation is sampling to represent the image in the
form of a finite 2 dimensional matrix and each of the matrix elements again has to be digitized so
that the intensity value at a particular element or a particular element in the matrix can assume
only values from a finite set of discrete values. These 2 together completes the image digitization
process. Now, here is an example.
You find that we have shown an image on the left hand side and if I take a small rectangle in this
image and try to find out what are the values in that small rectangle; you find that these values
are in the form of a finite matrix and every element in this rectangular, in this small rectangle or
in the small matrix assumes an integer value. So, an image when it is digitized will be
represented in the form of a matrix like this.
6
So typically, what we have said till now? It indicates that by digitization what we mean is an
image representation by a 2D, 2 dimensional finite matrix; the process known as sampling. And,
the second operation is each matrix element must be represented by one of the finite set of
discrete values and this is an operation which is called quantization.
In today’s lecture, we will mainly concentrate on the sampling and quantization we will talk
about later.
7
Now, let us see that how what should be the different blocks in an image processing system?
Firstly, we have seen that computer processing of images require that images be available in
digital form and so we have to digitize the image and the digitization process is a 2 step process.
The first step is sampling and the second step is quantization. Then finally, when we digitize an
image processed by computer, then obviously our final aim will be that we want to see that what
is the processed output.
So, we have to display the image on a display device. Now, when the image is being processed,
the image is in the digital form. But when we want to have the display, we must have the display
in the form of analog.
So, whatever process we have done during digitization; during visualization or during display,
we must do the reverse process. So, for displaying the images, it has to be first converted into the
analog signal which is then displayed on a normal display.
So, if you just look in the form of a block diagram, it appears something like this that while
digitization; first we have to sample the image by a unit which is known as sampler, then every
sample values we have to digitize - the process known as quantization and after quantization we
get a digital image which is processed by the digital computer.
And, when we want to see the processed image, that is how does the image look like after the
processing is complete; then for that operation, it is the digital computer which gives the digital
output. This digital output goes to D to A converter and finally, the digital to analog converter
output is fed to the display and on the display, we can see that how the processed image looks
like.
8
Now, let us come to the first step of the digitization process that is sampling. To understand
sampling, before going to the 2 dimensional images, let us take an example from 1 dimension.
That is let us assume that we have a 1 dimensional signal x (t) which is a function of t. Here, we
assume this t to be time and you know that whenever some signal is represented as a function of
time; whatever is the frequency content of the signal that is represented in the form of hertz and
this hertz means it is cycles per unit time.
So here again, when you look at this particular signal X (t), you find that this is an analog signal.
That is t can assume any value, t is not discretized. Similarly, the functional value X (t) can also
assume any value between certain maximum and minimum. So obviously, this is an analog
signal and we have seen that an analog signal cannot be represented in a computer.
So, what is the first step that we have to do? As we said that for digitization process, the first
operation that you have to do is the sampling operation.
9
So, for sampling what we do is instead of taking considering the signal values at every possible
value of t; what we do is we consider the signal values at certain discrete values of t. So here, in
this figure it is shown that we assume the value of the signal X (t) at t equal to 0. We also
consider the value of the signal X (t) at t equal to 2 delta t S . Assume the value of signal X (t) at t
equal to delta 2 t S , at t equal to delta 3 t S and so on.
So instead of considering the signal values at every possible instant, we are considering the
signal values at some discrete instants of time. So, this is a process known as sampling and here
when we are considering the signal values at an interval of delta t S , so we can find out what is
the sampling frequency.
So, delta t S is the sampling interval and corresponding sampling frequency if I represent it by f
S , it becomes 1 upon delta t S . Now, when you sample the signal like this, you find that there are
many in formations which are being missed. So for example, here we have a local minimum,
here we have a local maximum, here again we have a local minimum local maximum, here again
we have a local maximum and when we sample at an interval of delta t S , these are the
information which cannot be captured by these samples. So, what is the alternative?
10
The alternative is; let us increase the sampling frequency or let us decrease the sampling interval.
So, if I do that you find that these bold lines, bold golden lines, they represent the earlier samples
that we had like this. Whereas, this dotted green lines, they represent the new samples that we
want to take and when we take this new samples; what we do is we reduce the sampling interval
by half. That is our earlier sampling interval was delta t S , now I make the new sampling interval
which are represented as delta t S dash which is equal to delta t S by 2.
And obviously, in this case, the sampling frequency which is f S dash equal to 1 upon delta t S
dash, now it becomes twice of f S . That is earlier we had the sampling frequency of f S , now we
have the sampling frequency of delta 2 f S , twice f S and when I increase the sampling frequency,
you find that with the earlier samples represented by this solid lines, you find that this particular
information that is steep in between these 2 solid lines were missed.
Now, when I introduce a new sample in between, then some information of this minimum or of
this local maximum can be retained. Similarly here, some information of this local minimum can
also to be retained. So obviously, it says that when I increase the sampling frequency or I reduce
the sampling interval, then the information that I can maintain in the sampled signal will be more
than when the sampling frequency is less.
Now, the question comes whether there is a theoretical background by which we can decide that
what is the sampling frequency, proper sampling frequency for certain signals that we can
decide. We will come to that a bit later.
Now, let us see that what does the sampling actually mean. We have seen that we have a
continuous signal X (t) and for digitization; instead of considering the signal values at every
possible value of t, we have consider the signal values at some discrete instants of time t.
11
Now, this particular sampling process can be represented mathematically in the form that if I
have if I consider that I have a sampling function and this sampling function is a 1 dimensional
array of Dirac delta functions which are situated at a regular spacing of delta t. So, this sequence
of Dirac delta functions can be represented in this form.
So, you find that each of these are sequences of Dirac delta functions and the spacing between 2
delta functions is delta t. In short, these kind of function is represented by comb function, a comb
function t at an interval of delta t and mathematically, this comb function can be represented as
delta t minus m into delta t, where m varies from minus infinity to infinity.
Now, this is the Dirac delta function. The Dirac delta function says that if I have a Dirac delta
function delta t, then the functional value will be 1 whenever t equal to 0 and the functional value
will be 0 for all other values of t. In this case, when I have delta t minus m of delta t, then this
functional value will be 1 only when this quantity that is t minus m delta t within the parenthesis
becomes equal to 0. That means this functional value will assume a value 1 whenever t is equal
to m times delta t for different values of m varying from minus infinity to infinity.
So effectively, this mathematical expression gives rise to a series of Dirac delta functions in this
form where at an interval of delta t, I get a value of 1. For all other values of t, I get values of 0.
12
Now this sampling, as you find that we have represented the same figure here, we had this
continuous signal X (t), original signal. After sampling, we get a number of samples like this.
Now here, these samples can now be represented by multiplication of X (t) with the series of
Dirac delta functions that we have seen that is comb of t delta t.
So if I multiply this, whenever this comb function gives me a value 1; only the corresponding
value of t will be retained in the product and whenever this comb function gives you a value 0,
the corresponding points, the corresponding values of X (t) will be said to 0.
So effectively, this particular sampling when from this analog signal, this continuous signal, we
have gone to this discrete signal; this discretization process can be represented mathematically as
x S (t) is equal to X (t) into comb of t delta t and if I expand this comb function and consider only
the values of t where this comb function has a value 1, then this mathematical expression is
translated to x of m delta t into delta t minus m delta t where m varies from minus infinity to
infinity.
13
So, after sampling, what you have got is from a continuous signal we have got the sampled
signal represented by x S (t) where the sample values exist at discrete instant of time. Sampling,
what we get is a sequence of samples as shown in this figure where x S (t) has got the signal
values at discrete time instants and during the other time intervals, the value of the signal is said
to 0.
Now, this sampling will be proper if we are able to reconstruct the original continuous signal X
(t) from these sampled values and we will find out that while sampling, we have to maintain
certain conditions so that the reconstruction of the analog signal X (t) is possible.
Now, let us look at some mathematical back ground which will help us to find out the conditions
which we have to impose for this kind of reconstruction.
14
So, here you find that if we have a continuous signal in time which is represented by X (t), then
we know that the frequency components of this signal X (t) can be obtained by taking the Fourier
transform of this X (t).
So, if I take the Fourier transform of X (t) which is represented by f of X (t) which is also
represented in the form of capital X of omega where omega is the frequency component and
mathematically, this will be represented as X (t) e to the power minus j omega t dt and we have
to take the integrate integration of this from minus infinity to infinity. So, this mathematical
expression gives us the frequency components which is obtained by the Fourier transform of the
signal X (t).
Now, this is possible if the signal X (t) is aperiodic. But when the signal X (t) is periodic, in that
case; the instead of taking Fourier transform, we have to go for Fourier series expansion and the
Fourier series expansion of a periodic signal say v (t) where we assume that v (t) is a periodic
signal is given by this expression where omega 0 is the fundamental frequency of this signal v (t)
and we have to take the summation from n equal to minus infinity to infinity.
15
Now, in this case, the C (n) is known as Fourier coefficient. So, n’th Fourier coefficients and the
value of C (n) is obtained as C (n) is equal to one upon T 0 v (t) e to the power minus j n omega
naught t dt and this integration has to be taken over a period that is T 0 .
Now, in our case, when we have v (t) in the form of series of Dirac delta functions, in that case
we know that the value of v (t) will be equal to 1 when t equal to 0 and value of v (t) is equal to
0 for any other value of t within a single period. So, in our case T 0 that is the period of this
periodic signal is equal to delta T S because every delta function appears at an interval of delta
T S.
And, we have v (t) is equal to 1 for t is equal to 0 and v (t) is equal to 0 otherwise.
Now, if I impose this condition to calculate the value of C (n); in that case, we will find that the
value of this integral will exist only at t equal to 0 and it will be 0 for any other value of t.
16
So by this, we find that C (n) now becomes equal to 1 upon delta t S and this 1 upon delta t S is
nothing but the sampling frequency we will put as say omega s. So, this is the frequency of the
sampling signal.
Now, with this value of C (n), now the periodic signal v (t) can be represented as 1 upon delta t s
summation of e to the power j n omega naught t for n equal to minus infinity to infinity. So, what
does it mean? This means that if I take the Fourier series expansion of our periodic signal which
is in our case Dirac delta function; this will have frequency components, various frequency
components where the fundamental components of the frequency is omega naught and it will
have other frequency components of twice omega naught, thrice omega naught, 4 times omega
naught and so on.
So, if I plot those frequencies or frequency spectrum, we find that we will have the fundamental
frequency omega naught or in this case this omega naught is nothing but same as the sampling
frequency that is omega s, we will also have a frequency component of twice omega s, we will
also have a frequency component of thrice omega s, and this continues like this.
So, you find that the comb function as the sampling function that we have taken, the Fourier
series expansion of that is again a comb function.
17
Now, this is about the continuous domain. When we go to discrete domain; in that case, for a
discrete time signal say x (n) where n is the nth sample of the signal x, the Fourier transform of
this is given by X (k) is equal to sum of x (n) e to the power minus j 2 pi by N into n k where
value of n varies from 0 to N minus 1, where this capital N indicates that the number of samples
that we have for which we are taking the Fourier transform.
And, given this Fourier transform, we can find out the original sampled signal by the inverse
Fourier transformation which is obtained as x (n) is equal to sum of X (k) e to the power j 2 pi by
N n k and this time the summation has to be taken over k for k equal to 0 to N minus 1.
So, you find that we get a Fourier transform pair. In one case, from the discrete time signal, we
get the frequency components, discrete frequency components by the forward Fourier transform
and in the second case, from the frequency components, we get the discrete time signal by the
inverse Fourier transform and these 2 equations taken together forms a Fourier transform pair.
Now, let us go to another concept, a concept called convolution.
18
You find that we have represented our sampled signal as x S (t) is equal to X (t) multiplied by
comb function t delta t. So, what we are doing is we are taking 2 signals in time domain and we
are multiplying these 2 signals. Now, what will happen if we take the Fourier transform of these
2 signals?
Or let us put it like this, I have 2 signals X (t) and I have another signal say h (t). Both these
signals are in the time domain. We define an operation called convolution which is defined as x h
(t) convolution with X (t). This convolution operation is represented as h of tau x of t minus tau d
tau. Integration is taken over tau from minus infinity to infinity. Now, what does it mean?
This means that whenever we want to take the convolution of 2 signals h (t) and X (t); then
firstly what we are doing is we are time inverting the signal X (t). So, instead of taking x tau we
are taking x of minus tau. So, if I have 2 signals of this form say h (t) is represented like this and
we have a signal say X (t) which is represented like this; then what we have to do is as our
expression says that the convolution of h (t) X (t) is nothing but h tau X (t) minus tau d tau
integration over minus infinity to infinity and h (t) is like this and X (t) is like this. This is h (t)
and this is X (t).
19
Then what we have to do is for convolution purpose, we are taking h of tau and x of minus tau.
So, if I take x of minus t, this function will be like this. So, this is x of minus t and for this
integration, we have to take h of tau for a value of tau and x of minus tau that has to be translated
by this value t and then the corresponding values of h and x have to be multiplied and then we
have to take the integration from minus infinity to infinity.
So, if I take an instance like this, so at this point I want to find out what is the convolution value.
Then I have to multiply the corresponding values of h with these values of x. Each and every
time instants, I have to do the multiplication, then I have to integrate from minus infinity to
infinity. I will come to the application of this a bit later.
20
Now, let us see that if we have a convoluted signal, say we have h (t) which is convoluted with X
(t) and if I want to take the Fourier transform of this signal, then what will get? The Fourier
transform of this will be represented as h tau x of t minus tau d tau. So, this is the convolution.
Integration over tau from minus infinity to infinity and then for the Fourier transform, I have to
do e to the power minus j omega t dt and then again, I have to take the integral from minus
infinity to infinity. So, this is the Fourier transform of the convolution of those 2 signals h (t) and
X (t).
Now, if you do this integration, you find that this same integration can be written in this form. I
can take out h tau out of the inner integral; the inner integral I can represent as x of t minus tau e
to the power minus j omega t minus tau dt. So, I can put this as the inner integral, then I have to
multiply this whole term by e to the power minus j omega tau d tau and then this integration will
be from tau equal to minus infinity to infinity.
Now, find that what does this inner integral mean? From the definition of Fourier transform, this
inner integral is nothing but the Fourier transform of X (t). So, this expression is equivalent to h
of tau x of omega e to the power minus j omega tau d tau where this integration will be taken
over tau from minus infinity to infinity.
Now, what I can do is because this x omega is independent of tau, so I can take out this x omega
from this integral.
21
So, my expression will now be x omega, then within the integral, I have h of tau e to the power
minus j omega tau d tau where the integration is taken over tau from minus infinity to infinity.
Again, you find that from the definition of Fourier transformation, this is nothing but the Fourier
transformation of the time signal h (t).
So effectively, this expression comes out to be x of omega into h of omega where x of omega is
the Fourier transform of the signal X (t) and h of omega is the Fourier transform of the signal h
(t). So effectively, this means that if I take the convolution of 2 signals X (t) and h (t) in time
domain, this is equivalent to multiplication of the 2 signals in the frequency domain. So,
convolution of the 2 signals X (t) and h (t) in the time domain is equivalent to multiplication of
the same signals in the frequency domain. The reverse is also true.
That is if we take the convolution of x omega and h omega in the frequency domain, this will be
equivalent to multiplication of X (t) and h (t) in the time domain. So, both these relations are true
and we will apply these relations to find out that how the signal can be reconstructed from its
sample values.
So, now let us come back to our original signal. So here, we have seen that we have been given
this sample values and from the sample values our aim is to reconstruct this continuous signal X
(t).
22
And, we have seen that this sampling is actually equivalent to multiplication of 2 signals in the
time domain. One signal is X (t), the other signal is comb function, comb of t delta t. So, these
relations, as we have said that these are true that if I multiply 2 signals X (t) and y (t) in time
domain that is equivalents to convolution of the 2 signals x omega and y omega in the frequency
domain.
Similarly, if I take the convolution of 2 signals in time domain, that is equivalent to

multiplication of the same signals in frequency domain. So, for sampling when we have said that
we have got x S (t) that is the sampled values of the signal X (t) which is nothing but
multiplication of X (t) with the series of Dirac delta functions represented by comb of t delta t.
So, that will be equivalent to in frequency domain, I can find out x S of omega which is
equivalent to the frequency domain representation x omega of the signal X (t) convoluted with
the frequency domain representation of the comb function, comb t delta t and we have seen that
this comb function, the Fourier transform or the Fourier series expansion of this comb function is
again a comb function.
So, what we have is we have a signal x omega, we have another comb function in the frequency
domain and we have to take the convolution of these 2.
23
Now, let us see this convolution in details. What does this convolution actually mean? Here we
have taken 2 signals h (n) and x (n). Both of them, for this purpose are in the sample domain. So,
h (n) is represented by this and x (n) is represented by this.
You find that this h (n) is actually nothing but a comb function where the delta t S in this case we
have value of h (n) is equal to 1 at n equal to 0, we have a value of h (n) equal to 1 at n equal to
minus 1, we have value of h (n) equal to 1 at n equal to minus 9, we have a value of h (n) equal
to 1 at n equal to plus 9 and these things repeats.
So, this is nothing but representation of a comb function and if I assume that my x (n) is of this
form that is at n equal to 0; value of x (n) is equal to 7, x minus 1 that is at n equal to minus 1 it
is 5, n minus 1 minus 2 it is equal to 2.
Similarly on this side, for n equal to 1, x (1) equal to 9 and x (2) equal to 3 and the convolution
expression that we have said in the continuous domain, in discrete data domain, the convolution
expression is translated to this form. That is y (n) equal to h (m) into x (n) minus m where m
varies from minus infinity to infinity. So, let us see that how this convolution actually takes
place.
24
So, if I really understand this particular expression that h (m) x of n minus m, sum of this from m
equal to minus infinity to infinity; we said that this actually means that we have to take the time
inversion of the signal x (n).
So, if I take the time inversion, the signal will be something like this – 3, 9, 7, 5 and 2 and when I
take the convolution that is I want to find the various value of y (n), that particular expression
can be computed in this form.
So, if I want to take the value of y minus 11; so what I have to do is I have to give a translation
of minus 11 to this particular signal x of minus m. So, it comes here, then I have to take the
summation of this product from m equal to minus infinity to infinity. So, here what these do?
You find that I do point by point multiplication of these signals, so here 0 multiplied with 3 plus
it will be 0 multiplied with 9 plus 0 multiplied with 7 plus 0 multiplied with 5 plus 1 multiplied
with 2. So, the value that I can get is 2 and this 2 comes at this location y of minus 11.
25
Now, for getting the value of y of minus 10, again I do the same computation and here you find
that this 1 gets multiplied with 5 and all other values gets multiplied with 0 and when you take
the summation of all of them, I get 5 here.
Then I get value at minus 10, I get 7 here following the same operation. Sorry, this is at minus 9.
26
I get at minus 8.
I get at minus 7.
27
I get at minus 6. At minus 6, you find that the value is 0.
If I continue like this, here again at n equal to minus 2, I get value equal to 2.
28
29
At n equal to minus 1, I get value equal to 5.
30
At n equal to 0, I get value of 7.
At n equal to plus 1, I get value of 9.
31
At n equal to plus 2, I get value of 3.
At n equal to plus 3, again I get the value of 0.
32
So, if I continue like this, you find that after completion of this convolution process, this h (n)
convoluted with x (n) gives me this kind of pattern and here you notice one thing that when I
have convoluted this x (n) with this h (n); the convolution output y (n), this is you just notice this
that it is the repetition of the pattern of x (n) and it is repeated at those locations where the value
of h (n) was equal to 1. So, by this convolution what I get is I get the repetition of the pattern x
(n) at the locations of delta functions in the function h (n).
33
So, by applying this, when I convolute 2 signals X (t) and the Fourier transform of this comb
function that is comb omega in the frequency domain; what I get is something like this.
When X (t) is band limited, that means the maximum frequency component in the signal X (t) is
omega naught; then the frequency spectrum of the signal X (t) which is represented by x omega
will be like this.
Now, when I convolve this with this comb function comb of omega, then as we have done in the
previous example; what I get is at those locations where the comb function had a value 1, I will
get just a replica of the frequency spectrum x omega. So, this x omega gets replicated at all these
locations.
So, what we find here? You find that the same frequency spectrum x omega, when it gets
translated like this, when X (t) is actually sampled that means the frequency s spectrum of x S or
x S omega is like this. Now, this helps us in reconstruction of the original signal X (t). So, here
what I do is you find that around omega equal to 0, I get a copy of the original frequency
spectrum.
So, what I can do is if I have a low pass filter whose cut off frequency is just beyond omega
naught and this frequency signal, these spectrum, the signal with this spectrum; I pass through
that low pass filter. In that case, the low pass filter will just take out this particular frequency
band and it will cut out all other frequency bands.
So, since I am getting the original frequency spectrum of X (t), so signal reconstruction is
possible. Now, here you notice one thing as we said that we will just try to find out that what is
the condition that original signal can be reconstructed.
34
Here you find that we have a frequency gap between this frequency band and this translated
frequency band. Now, the difference of between centre of this frequency band and the centre of
this frequency band is nothing but 1 upon t S which is equal to omega s that is the sampling
frequency.
Now, as long as this condition that is 1 upon t S minus omega naught is greater than omega
naught, that is the lowest frequency of this translated frequency band is greater than the highest
frequency of the original frequency band; then only these 2 frequency bands are disjoint and
when these 2 frequency bands are disjoint, then only by use of a low pass filter, I can take out
this original frequency band.
And from this relation, you get the condition that 1 upon delta t S or the sampling frequency
omega s in this case, it is represented as f S must be greater than twice of omega naught where
omega naught is the highest frequency component in the original signal X (t) and this is what is
known as Nyquist product. That is we can reconstruct, perfectly reconstruct the continuous
signal only when the sampling frequency is greater than, more than twice the maximum
frequency component of the original continuous signal.
Now, let us have some quiz questions on today’s lecture. The first question - what are the steps
involved in image digitization process? I repeat, what are the steps involved in image digitization
process? The second question - what is sampling? What is sampling?
The third question - here you find that we have given a periodic signal in time which is at
periodic square wave, in this square wave the on time is 3 micro second and the off time is one
micro 7 micro second. So, you have to find out the frequency spectrum of this periodic signal.
So, for this periodic signal; on time is 3 micro second, off time is second micro second 7 micro
second. So obviously, the time period of this periodic signal is 10 micro second. You can assume
35
the amplitude of this signal to be 1 and you have to find out the frequency spectrum of this
periodic signal.
The forth question - if a speech signal has a bandwidth of 4 kilo hertz, a speech signal has a
bandwidth of 4 kilo hertz, then if every sample is digitized using 8 bits and the digital speech is
to be transmitted over a communication channel; then what is the minimum bandwidth
requirement of the channel?
So, speech signal is has a bandwidth of 4 kilohertz, every sample is digitized using 8 bits and the
digital speech is to be transmitted over a communication channel; then you have to find out that
what will be the minimum bandwidth requirement of the channel.
Obviously, because the signal is digital; so by band width requirement, I mean that what is the
bit rate requirement of the channel.
The next question - here again we have given 2 signals in time. One is a periodic square wave,
the second signal is an aperiodic, it is just a square pulse. We can assume that on time of this
square wave and on time of this square pulse is same. Then we have to find out that what will be
the convolution result if you convolve these 2 signals in the time domain. So, you have to find
out the convolution output when these 2 signals are convolved in the time domain.
Thank you.
36
Prof. P. K. Biswas
Lecture - 3
Image Digitization - II
Hello, welcome to the course on digital image processing.
In the last class we have seen different sampling techniques particularly the sampling of 1
dimensional signal f (t) which is a function of a single variable t. We have also talked about what
is meant by the bandwidth of a signal and to find out the bandwidth of a signal, we have made
use of the mathematical tools like Fourier series and Fourier transform.
We have used Fourier series if the 1 dimensional signal f (t) is a periodic signal and if f (t) is an
aperiodic signal, then we have made use of the Fourier transform to find out the bandwidth or the
frequency spectrum of that signal.
Then we have talked about the sampling of this 1 dimensional signal and when we talked about
the sampling, we have said that the sampling frequency must be greater than twice of the
bandwidth of the signal to ensure proper reconstruction of the original signal from the sampled
values and this particular minimum sampling frequency which has to be the twice of the
bandwidth of the signal is known as Nyquist rate. And, you have also said that if the sampling
1
frequency is less than nyquist rate that is less than twice the bandwidth of the signal, then what
occurs is known as aliasing.
In today’s lecture we will see the frequency spectrum of an image and we will also explain that
how to sample the image in 2 dimension and then we will go to the second stage of the
digitization process.
We have already said that image digitization consists of 2 faces. In the first face, we have to go
for sampling and in the second face we have to go for quantization of each of the samples. Then
we will talk about that what is meant by this quantization, we will also talk about the optimum
mean square error or Lloyd-max quantizer. Then we will also talk about that how to design an
optimum quantizer which with the given signal probability density function.
Now, let us briefly recapitulate that what we have done in the last class.
2
This is a signal X (t) which is a function of a single variable say t. And then, what we have done
is we have sampled these 1 dimensional signal with a sampling function which is represented in
the form of comb function say comb of t delta t and we get the sample values as represented by
X S (t) and we have we have also said that this X S (t) can be represented in the form of
multiplication of X (t) by comb function of t delta t.
Now, the same function can also be represented in the form of summation of X (m delta t) into
delta (t minus m delta t) where this t minus m delta t, when you take the summation from m
equal to minus infinity to infinity; this gives you what is the comb function.
So, this X S of t that is the sampled value, that is the sampled version of the signal X (t) can be
represented in the form of X (m delta t) into delta (t minus m delta t) where m varies from minus
infinity to infinity.
Now, our problem is that given these sample values; how to reconstruct the original signal X (t)
from the sampled values of X (t) that is X S (t)?
3
And for this purpose, we have introduced what is known as convolution theorem.
The convolution theorem says that if we have 2 signals X (t) and y (t) in time domain, then the
multiplication of X (t) and y (t) in time domain is equivalent to if you take the convolution of the
frequency spectrum of X (t) and frequency spectrum of y (t) in the frequency domain. So, that is
to say that X (t) into y (t) is equivalent to X omega convoluted with Y omega.
Similarly, if you take the convolution of X (t) and y (t) in time domain, that is equivalent to
multiplication of X omega and Y omega in the frequency domain. So, by using this concept of
4
the convolution theory, we will see that how to reconstruct the original signal X (t) from the
sampled values of X S (t).
Now, as per this convolution theorem, we have seen that X S of t is nothing but multiplication of
X (t) into the comb function comb of t delta t. So, in the frequency domain that will be
equivalent to X S of omega is equal to X omega convoluted with the frequency spectrum of comb
of t delta t S where delta t S is the sampling interval.
We have also seen that if X omega is the frequency spectrum or the bandwidth of the signal, a
frequency spectrum of the signal which is presented here and this is the frequency spectrum of
the sampling function; then when these 2 I we convolute, the convolution result will be like this
where the original frequency spectrum of the signal gets replicated along the frequency axis at an
interval at a interval of 1 upon delta t S where 1 upon delta t S is nothing but the sampling
frequency f of s.
And here you find that for proper reconstruction, what we have to do is this original spectrum,
the spectrum of the original signal has to be taken out and if you want to take out this, then we
have to make use of a filter which will only take out this particular band and the remaining
frequency components will simply be discarded. And for this filtering operation to be successful,
we must need that 1 upon delta t S minus omega naught where omega naught is the bandwidth of
the signal or the maximum frequency component present in the signal X (t).
So, 1 upon delta t S minus omega naught must be greater than or equal to omega naught and that
leads to the condition that the sampling frequency f S must be greater than twice of omega naught
where omega naught is the bandwidth of the signal and this is what is the Nyquist rate.
5
Now, what happens if the sampling frequency is less than twice of omega naught? In that case,
as it is shown in this figure; you find that subsequent frequency bands after sampling, they
overlap and because of this overlapping, a single frequency band cannot be extracted using any
of the low pass filters.
So effectively, as a result, what we get is after low pass filtering the signal which is reconstructed
is a distorted signal, it is not the original signal. And, this effect is what is known as aliasing. So,
now let us see, what happens in case of 2 dimensional image which is a function of 2 variables x
and y.
6
Now, you find here, in this slide, we have shown 2 figures. On the top, we have shown the same
signal X (t) which we have used earlier which is a function of t and the bottom figure, is an
image which is a function of 2 variables x and y.
Now, if t is time, in that case X (t) is a signal which varies with time and for such a signal, the
frequency is measured as you know in terms of hertz which is nothing but cycles per unit time.
Now, how do you measure the frequency in case of an image?
You find that in case of an image, the dimension is represented either in the form of say 5
centimeter by 5 centimeter or say 10 centimeter by 10 centimeter and so on. So, for an image,
when you measure the frequency; it has to be cycles per unit length, not the cycles per unit time
as is done in case of a time varying signal.
Now, in this figure we have shown that as in case of the signal X (t), we had its frequency
spectrum represented by X of omega and we say that the signal X (t) is band limited if X of
omega is equal to 0 for omega is greater than omega naught where omega naught is the
bandwidth of the signal X (t).
Similarly, in case of an image, because the image is a 2 dimensional signal which is a variable,
which is a function of 2 variables x and y; so it is quite natural that in case of image, we will
have frequency components which will have 2 components - one in the x direction and other in
the y direction. So we call them, omega x and omega y.
So, you see that the image is band limited if f of omega x omega y is equal to 0 for omega x
greater than omega x 0 and omega y greater than omega y 0 . So in this case, the maximum
frequency component in the x direction is omega x 0 and the maximum frequency component in
the y direction is omega y 0 .
7
And, this figure on the bottom left shows how the frequency spectrum of an image looks like and
here you find that the base of this frequency spectrum on the omega x omega y plane is what is
known as the region of support of the frequency spectrum of the image.
Now, let us see what will happens in case of 2 dimensional sampling or when we try to sample
an image. The original image is represented by the function f of x y and as we have seen, in case
of a 2 1 dimensional signal that if x of t is multiplied by comb of t delta t for the sampling
operation; in case of image also f (x, y) has to be multiplied by comb of x y delta x y to give you
the sampled signal f S (x, y).
Now, this comb function because it is again a function of 2 variables x and y is nothing but a 2
dimensional array of the delta functions where along x direction, the spacing is delta x and along
y direction, the spacing is del y.
So again, as before, this f S (x, y) can be represented in the form f (m delta x n delta y) multiplied
by delta function x minus m delta x, y minus n delta y where both m and n varies from minus
8
So, as we have done in case of 1 dimensional signal; if we want to find out the frequency
spectrum of this sampled image, then the frequency spectrum of the sampled image f S omega x
omega y will be same as f omega x omega y which is the frequency spectrum of the original
image f (x, y) which has to be convoluted with comb omega x omega y where comb omega x
omega y is nothing but the Fourier transform of comb x y delta x delta y.
And if you compute this Fourier transform, you find that comb omega x omega y will come in
the form of omega xs omega ys comb of omega x omega y, 1 upon delta x, 1 upon delta y where
this omega xs and this omega ys, omega xs is nothing but 1 upon delta x which is the sampling
frequency along the x direction and omega ys is equal to omega 1 upon delta y which is nothing
but the sampling frequency along the y direction.
9
So, coming back to similar concept as we have done in case of 1 dimensional signal X (t) that F S
omega x omega y which is now the convolution of F omega x omega y which is the frequency
spectrum of the original image convoluted with comb omega x omega y where comb omega x
omega y is the Fourier transform of the sampling function in 2 dimension.
And, as we have seen earlier that such a type of convolution operation replicates the original the
frequency spectrum of the original signal in along the omega axis in case of 1 dimensional
signal; so here again, in case of 2 dimensional signal this will be replicated, the original spectrum
will be replicated along both x direction and y direction.
10
So as a result, what we get is a 2 dimensional array of the spectrum of the image as shown in this
particular figure. So here again, you find that we have simply shown the region of support that
getting replicated. You find that along y direction and along x direction, the spectrum gets
replicated and the spacing between 2 subsequent frequency band along the x direction is equal to
omega xs which is nothing but 1 upon delta x and along y direction, the spacing is 1 upon delta y
which is nothing but omega ys but which is the sampling frequency along the y direction.
Now, if you want to reconstruct the original image from this particular spectrum, then what we
have to do is we have to take out a particular frequency band. So, a frequency band which is
around the origin in the frequency domain and if you want to take out this particular frequency
band, then as we have seen before that these signal has to be low pass filtered and if we pass this
through a low pass filter whose response is given by H omega x omega y is equal to 1 upon
omega xs into omega ys for omega x omega y in the region R where region R just covers this
central band and it is equal to 0 outside this region R.
In that case, it is possible that we will be able to take out just this particular frequency
component within this region R by using this low pass filter and again for taking out this
particular frequency region, the same condition of the Nyquist rate applies. That is sampling
frequency in the x direction must be greater than twice of omega x naught while which is the
maximum frequency component along x and sampling frequency along the y direction again has
to be greater than twice of omega y naught which is the maximum frequency component along
direction y.
11
So, let us see some result. This is here we have shown 4 different images. So, here you find that
the first image which is shown here was sampled with 50 dots per inch or 50 samples per inch.
Second one was sampled with 100 dots per inch, third one with 600 dots per inch and the fourth
one with 1200 dots per inch.
So, out of these 4 images, you find that the quality of the first image is very very bad. It is very
blurred and the details in the image are not at all recognizable. As we increase the sampling
frequency, when you go for the second image where we have 100 dots per inch; you find that the
quality of the reconstructed image is better than the quality of the first image. But here again,
still you find that if you study this particular region or wherever you have edges, the edges are
not really not continuous; they are slightly broken.
So, if I increase the sampling frequency further, you find that these breaks in the edges have been
smoothed out. So, at with a sampling frequency of 600 dots per inch, the quality of the image is
quiet acceptable. Now, if we increase the sampling frequency further, when we go for 600 dots
per inch to 1200 dots per inch sampling rate; you find that the improvement in the image quality
is not that much as the improvement we have got when you moved from say 50 dots per inch
from 100 dots per inch or 100 to 600 dots per inch.
So, it shows that when your sampling frequency is above the Nyquist rate, you are not going to
get any improvement of the image quality. Whereas, when it is less than the Nyquist rate, the
sampling frequency is less than the nyquist rate the reconstructed image is very bad.
So till now, we have covered the first face of the image digitization process that is quantization
and we have also seen through the examples of this reconstructed image that if we vary the
sampling frequency below and above the Nyquist rate; how the quality of the reconstructed
image is going to vary. So, now let us go to the second face that is quantization of the sampled
values.
12
Now, this quantization is a mapping of the continuous variable u to a discrete variable u prime
where u prime takes values from a set of discrete variables.
So, if your input signal is a u, after quantization the quantized signal becomes u prime where u
prime is one of the discrete variables as shown in this case as r 1 to r L . So, we have L number of
discrete variables r 1 to r L and u prime takes a value of one of these variables.
13
Now, what is this quantization? You find that after sampling of a continuous signal, what we
have got is a set of samples. These samples are discrete in time domain. But still, every sample
value is an analog value; it is not a discrete value.
So, what we have done after sampling is instead of considering all possible time instants, the
signal values at all possible time instants; we have considered the signal values at some discrete
time instants and at each of this discrete time instants, I get a sample value. Now, the value of
this sample is still an analog value.
Similar is the case with an image. So here, in case of an image, the sampling is done in 2
dimensional grids where at each of the grid locations, we have a sample value which is still
analog.
Now, if I want to represent a sample value on a digital computer, then this analog sample value
cannot be represented. So, I have to convert this sample value again in the discrete form. So, that
is where the quantization comes into picture.
14
Now, this quantization is a mapping which is generally a staircase function. So, for quantization
what is done is you define a set of decision or transition levels which in this case has been shown
as transition level t k where k varies from 1 to L plus 1.
So, we have defined a number of transition levels or decision levels which are given as t 1 , t 2 , t 3 ,
t 4 upto t L plus 1 L plus 1 and here t i is the minimum value and t L plus 1 is the maximum value
and we also defined a set of the reconstruction levels that is r k .
So, what we have shown in the previous slide that the reconstructed value r prime u prime takes
one of the discrete values r k . So, the quantized value will take the value r k if the input signal u
lies between the decision levels t k and t k plus 1. So, this is how you do the quantization.
15
So, let us come to this particular slide. So, it shows the input output relationship of a quantizer.
So, it says that whenever your input signal u, so along the horizontal direction we have put the
input signal u and along the vertical direction we have put the output signal u prime which is the
quantized signal.
So, this particular figure shows that if your input signal u lies between the transition levels t 1 and
t 2 ; then the reconstructed signal or the quantized signal will take a value r 1 . If the input signal
lies between t 2 and t 3 , the reconstructed signal or the quantized signal will take a value r 2 .
Similarly, if the input signal lies between t k and t k plus 1, then the reconstructed signal will take
the value of r k and so on.
So, given an input signal which is analog in nature, you are getting the output signal which is
discrete in nature. So, the output signal can take only one of these discrete values, the output
signal cannot take any arbitrary value.
Now, let us see that what is the effect of this. So, as we have shown in the second slide that
ideally we want that whatever is the input signal, the output signal should be same as the input
signal and that is necessary for a perfect reconstruction of the signal. But whenever we are going
for quantization, your output signal as it takes one of the discrete set of values is not going to be
same as the input signal always.
So, in this in this particular slide, again we have shown, the same staircase function where along
the horizontal direction we have the input signal and in the vertical axis we have put the output
signal. So, this peak staircase function shows what are the quantization function that will be used
and this green line which is inclined at an angle of 45 degree with the u axis, this shows that
what should be the ideal input output characteristics.
16
So, if the input output function follows this green line; in that case, for every possible input
signal. I have the corresponding output signal. So, the output signal should be able to take every
possible value. But when we are using this staircase function; in that case, because of this
staircase effect, whenever the input signal lies within certain region, the output signal takes a
discrete value. Now because of this staircase function, you are always introducing some error in
the output signal or in the quantized signal. Now, let us see that what is the nature of this error.
So, here we have shown the same figure. Here you find that whenever this green line which is
inclined at 45 degree with the u axis crosses the staircase function; at this point, whatever is your
signal value, it is same as the reconstructed value.
So, only at these cross over points, your error in the quantized signal will be 0. At all other
points, the error in the quantized signal will be a non zero value. So, at this point, the error will
be maximum, which will maximum and negative which will keep on reducing. At this point, this
is going to be 0 and beyond this point, again it is going to increase.
So, if I plot this quantization error, you find that the plot of the quantization error will be
something like this between every transition levels. So, between t 1 and t 2 the error value is like
this, between t 2 and t 3 the error continuously increases, between t 3 and t 4 the error continuously
increases and so on. Now, what is the effect of this error on the reconstructed signal?
17
So for that, let us take again a 1 dimensional signal f (t) which is a function of t as is shown in
the slide. And, let us see that what will be the effect of quantization on the reconstructed signal.
So, here we have plotted the same signal. So, here we have shown the signal is plotted in the
vertical direction so that we can find out what are the transition levels or the part of the signal
which is within which particular transition level. So, you find that this part of the signal is in the
transition levels at t k minus 1 and t k .
18
So, when the signal, input signal lies between the transition levels t k minus 1 and t k ; the
corresponding reconstructed signal will be r k minus 1. So, that is shown by this red horizontal
line.
Similarly, the signal from this portion to this portion lies in the range t k and t k plus 1. So,
corresponding to this, the output reconstructed signal will be r k . So, which is again shown by this
horizontal red line and this part of the signal, the remaining part of the signal lies within the
range t k plus 1 and t k plus 2 and corresponding to this, the output reconstructed signal will have
the value r k plus 1.
So, to have a clear figure, you find that in this, the green curve, it shows the original input signal
and this red staircase lines, staircase functions, it shows that what is the quantization signal
quantized signal or f at f prime t.
Now from this, it is quite obvious that I can never get back the original signal from this
quantized signal because within this region, the signal might have might have had any arbitrary
value and the details of that is lost in this quantized one in this quantized output.
So, because from the quantized signal I can never get back the original signal; so we are always
introducing some error in the reconstructed signal which can never be recovered and this
particular error is known as quantization error or quantization noise.
Obviously, the quantization error or quantization noise will be reduced if the quantizer step size
that is the transition intervals say t k to t k plus 1 reduces; similarly, the reconstruction step size r k
to r k plus 1 that interval is also reduced.
19
So, for quantizer design, the aim of the quantizer design will be to minimize this quantization
error. So accordingly, we have to have an optimum quantizer and this optimum mean square
error quantizer known as Lloyd-Max quantizer, this minimizes the mean square error for a given
number of quantization levels and here we assume that let u be a real scalar random variable with
a continuous probability density function p u of u and it is desired to find the decision levels t k
and the reconstruction levels r k for an n L level quantizer which will reduce or minimize the
quantization noise or quantization error.
Let us see how to do it. Now, you remember that u is the input signal and u prime is the
quantized signal. So, the error of reconstruction is the input signal minus the reconstructed
signal.
20
So, the mean square error is given by the expectation value of u minus u prime square and this
expectation value is nothing but if I integrate u minus u Prime Square multiplied by the
probability density function of u du and I integrate this from t 1 to t L plus 1. You will find that
you remember that t 1 was the minimum transition level and t L plus 1 was the maximum
transition level. So, if I just integrate this function, u minus u prime square p u (u) du over the
interval t 1 to t L plus 1, I get the mean square error.
This same integration can be rewritten in this form as u minus r i square because r i is the
reconstruction level or the reconstructed signal in the interval t i to t i plus 1. So, there is an error.
It is not t 1 to t L plus 1; it should be t i to t i plus 1.
So, I integrate this u minus r i square p u (u) du over the interval t i to t i plus 1; then I have to take
a summation of this for i equal to 1 to L. So thus, this modified expression will be same as this
and this tells you that what is the square error of the reconstructed signal and the purpose of
designing the quantizer will be to minimize this error value.
So obviously, from school level mathematics we know that for minimization of the error value
because now we have to design the transition levels and the reconstruction levels which will
minimize the error. So, the way to do is do that is to differentiate the error function, the error
value with t k and with r k and equating those equations to 0.
21
So, if I differentiate this particular error value - u minus r i square p u (u) du, integration from t i to
t i plus 1; in that case what I get is zeta is the error value del zeta del t k is same as t k minus r k
minus 1 square pu t k minus t k minus r k square pu t k and these has to be equated to 0.
Similarly, the second equation - del zeta del r k will be same as twice into integral u minus r k
p u (u) du equal to 0 where the integration has to be taken from t k to t k plus 1.
Now, by solving these 2 equations and using the fact that t k minus 1 is less than t k ; we get 2
values. One is for transition level and the other one is the for the reconstruction level. So, the
22
transition level t k is given by r k plus r k minus 1 by 2 and the reconstruction level r k is given by
integral t k to t k plus 1 u p u (u) du u p u (u) du divided by integral from t k to t k plus 1 p u (u) du.
So, what we get from these 2 equations? You find that these 2 equations tell that the optimum
transition level t k lie halfway between the optimum reconstruction levels. So, that it is quite
obvious because t k is equal to r k plus r k minus 1 by 2. So, this transition level lies halfway
between r k and r k minus 1 and the second observation is that the optimum reconstruction levels
in turn lie at the center of mass of the probability density in between the transition levels; so
which is given by the second equation that is r k is equal to u puu du integral from t k to t k plus 1
divided by p u (u) du integral again from t k to t k plus 1. So, this is nothing but the center of mass
of the probability density between the interval t k and t k plus 1.
So, this optimum quantizer or the Lloyd-Max quantizer gives you the reconstruction value, the
optimum reconstruction value and the optimum transition levels in terms of probability density
of the input signal.
23
Now, you find that these 2 equations are non linear equations and we have to solve these non
linear equations simultaneously, given the boundary values t 1 and t L plus 1. And for solving this,
one can make use of the Newton method, Newton iterative method to find out the solutions.
An approximate solution or an easier solution will be when the number of quantization levels is
very large. So, if the number of quantization levels is very large; you can approximate p u (u) the
probability density function as piecewise constant function.
24
So, how do you do this piecewise constant approximation? So in this figure, you see that there is
a probability density function has been shown which is like a Gaussian function. So, we can
approximate it this way that in between the levels t j and t j plus 1, we have the mean value of this
as t j hat which is halfway between t j and t j plus 1 and within this interval; we can approximate
p u (u) where p u (u) is actually a nonlinear one, we can approximate this as p u (t j hat).
So, in between t j and t j plus 1 that is in between every 2 transition levels, we approximate the
probability density function to be a constant one which is same as the probability density
function at the midway halfway between these 2 transition levels. So, if I do that, this continuous
probability density function will be approximated by the staircase functions like this.
So, if I use this approximation and recompute those values, we will find that this t k plus 1 can
now be computed as p u (u) cubic root of that du integral from t 1 to z k plus t 1 multiplied by A
divided by again p u (u) to the power 1 third, minus 1 third du integration from t 1 to t L plus 1 plus
t 1 where this A the constant A is t L plus 1 minus t 1 and we have said that t L plus 1 is the
maximum transition level and t 1 is the minimum transition level and z k is equal to k by L into A
where k varies from 1 to L.
So, we can find out t k plus 1 by using this particular formulation when the continuous probability
density function was approximated by piecewise constant probability density function and once
we do that, after that we can find out the values of the corresponding reconstructed
reconstruction levels.
25
Now, for solving this particular equation, the requirement is that we have to have t 1 and t L plus 1
to be finite. That is the minimum transition level and the maximum transition level, they must be
finite. At the same time, we have to assume t 1 and t L plus 1 a priory before placement of decision
and reconstruction levels.
This t 1 and t L plus 1 are also called as called as valued points and these 2 values determine the
dynamic range A of the quantizer. So, if you find that when we have a fixed t 1 and t L plus 1;
then any value less than t 1 or any value greater than t L plus 1, they cannot be properly quantized
by this quantizer. So, this represents that what is the dynamic range of the quantizer.
26
Now, once we get the transition levels, then we can find out the reconstruction levels by
averaging the subsequent transition levels. So, once I have the reconstruction levels and the
transition levels, then the quantization mean square error can be computed as this. That is the
mean square error of this designed quantizer will be 1 upon 12 L square into p u (u) to the power
1 third du integration between t 1 to t L plus 1 and cube of this whole integration. And this
expression gives an estimate of the quantizer quantizer error in terms of probability density and
the number of quantization levels.
Normally, 2 types of probability density functions are used. One is Gaussian where the Gaussian
probability density function is given by this well known expression p u (u) equal to 1 upon root
over 2 pi sigma square exponentiation of minus u minus mu square by twice sigma square and
the laplacian probability density function which is given by p u (u) equal to 1 upon alpha into
exponentiation of minus alpha u minus mu absolute value where mu and sigma square denote the
mean and variance of the input signal u. The variance in case of laplacian density function is
given by sigma square is equal to 1 upon alpha.
27
Now, find that though the earlier quantizer was designed for any kind of probability density
functions; but it is not always possible to find out the probability distribution function of a signal
a priori.
So, what is in practice is you assume an uniform distribution, uniform probability distribution
which is given by p u (u) equal to 1 upon t L plus 1 minus t 1 where u lies between t 1 and t L plus 1
and p u (u) equal to 0 when u is outside this region t 1 to t L plus 1.
28
So, this the uniform probability distribution of the input signal u and by using this uniform
probability distribution, the same Lloyd- max quantizer quantizer equations give r k as if I
compute this, then you will find the reconstruction level r k will be nothing but t k plus 1 plus t k
by 2 where t k will be r k plus 1 plus r k by 2 which is same as t k plus 1 plus t k minus 1 by 2. So, I
get the reconstruction levels and the decision levels for a uniform quantizer.
Now, these relations leads to t k minus t k minus 1 is same as t k minus 1 minus t k and that is
constant equal to q which is known as the quantization step. So finally, what we get is the
quantization step is given by t L plus 1 minus t 1 by L where t L plus 1 is the maximum transition
level and t 1 is the minimum transition level and L is the number if quantization steps.
We also get the transition level t k in terms of transition level t k minus 1 as t k equal to t k minus 1
by plus q and the reconstruction level r k in terms of the transition level t k as r k equal to t k plus q
by 2. So, we obtain all the related terms of a uniform quantizer using this mean square error
quantizer design which is the Lloyd-Max quantizer for a uniform distribution.
29
So, here you find that all the transactions, all the transition levels as well as the reconstruction
levels are equally spaced and the quantization error in this case is uniformly distributed over the
interval minus q by 2 to q by 2. And the mean square error in this particular case if you compute,
will be given by 1 upon q u square du you take the integral from minus q by 2 to q by 2 which
will be nothing but q square by 12.
So, for uniform distributions, the Lloyd-Max quantizer equations becomes linear because all the
equation that we have derived earlier they are all linear equations giving equal intervals between
30
transition levels and the reconstruction levels and so this is also sometimes referred as a linear
quantizer.
So, there are some more observations on this linear quantizer. The variance sigma u square of a
uniform random variable whose range is A is given by A square by 12. So for this, you find that
for a uniform quantizer with B bits. So, if we have a uniform quantizer where every level has to
be represented by B bits; we will have q equal to A by 2 to the power B because the number of
steps will be 2 to the power B number of steps. And thus, the quantization step will be q equal to
A upon 2 to the power B and from this you will find that the error decided by sigma u square will
be equal to 2 to the power minus 2B.
And from this, we can compute the signal to noise ratio in case of an uniform quantizer where
the signal to noise ratio is given by 10 log 2 to the power 10 where 2 to the power twice B or the
logarithm has to be taken with base 10 and this is nothing but 6 BdB. So, this says that signal to
noise ratio that can be achieved by an optimum mean square quantizer for uniform distribution is
6 dB per bit.
That means if I increase the number of bits by 1, so if you increase the number of bits by 1, that
means the number of quantization levels will be increased by 2 by a factor of 2. In that case, you
gain a 6 dB in the signal to noise ratio in the reconstructed signal. So, with this we come to an
end on our discussion on the image digitization process.
So, here we have seen that how to sample an image or how to sample a signal in 1 dimension,
how to sample an image in 2 dimension. We have also seen that after you get the sample values
where each of the sample values are analog in nature; how to quantize those sample value so that
you can get the exact digital signal as well as exact digital image?
31
So now, you remember that in the previous class we had given some tutorials, tutorial problems.
I will discuss only few of the tutorial problems which are bit critical the tutorial problems some
of them are very simple.
So, one of the tutorial problems was that find the frequency spectrum of the following periodic
signal where this periodic signal is a square wave, where the on period is 3 micro second and the
off period is 7 micro second. Let us see, what is the solution to this particular problem.
Let us try to solve a general case. That is again we have a square wave whose time period is say
T 0 and the on period is tau and we divide this between minus tau by 2 and plus tau by 2. We
have also said earlier and you will find that this signal is a periodic signal and we have said that
for a periodic signal, the frequency spectrum is obtained by Fourier series expansion.
So, if we expand a signal v (t) with Fourier series, then the expansion will be c n f 0 e to the
power j 2 pi n f 0 t where f 0 equal to 1 upon t 0 that is the fundamental frequency.
32
And this term, you have to take the summation from n equal to minus infinity to infinity where
the c n f 0 that is the n’th Fourier coefficient is given by the expression c n f 0 equal to 1 upon t 0
then integral v (t) e to the power minus j 2 pi n f 0 t dt where this integration has to be taken over
the period t 0 .
And if you take this integration, you will find that the final expression will come in this form A
f 0 tau sin of pi f pi n f 0 tau divided by pi n f 0 tau which is represented in the form A f 0 tau; then
you have a sin c function or sinc function n f 0 tau.
33
And if I plot this, the plot would be something like this where we have plotted c n f 0 versus A f
and this will be a line spectrum where I get lines at f equal to 0, f equal to f 0, f equal to twice f 0
and so on and the those components will vary the envelop of this frequency components will
follow the sin c function.
Now in this, if we put tau equal to 3 micro second and t 0 equal to 10 micro second; then you will
get the solution to the problem that was given in the last class.
Let me put the other one. That is a speech signal has a bandwidth of 4 kilo hertz. If every sample
is digitized using 8 bits and the digital speech is to be transmitted over a communication channel;
what is the minimum bandwidth requirement of the channel?
Again, this problem is a very simple problem because the bandwidth of the speech signal is
stated to be 4 kilo hertz, so minimum sampling frequency following the nyquist rate will be
twice of omega which is equal to 8 samples per 8 k samples per second. And they stated that
number of bits per sample equal to 8. So, the number of bits generated by this sampled signal
will be 8 into 8 k that is 64 kilo bits per second.
So, if we want to transmit this digitized speech over a channel, then the minimum channel
bandwidth requirement will be 64 kilo bits per second. So again, this is a simple problem.
34
Coming to the next problem which was given, we had given 2 different signals and you have to
find out what will be the convolution of these 2 these two different signals.
Again here, you will find that if I simply follow the convolution equation which is y (t) equal to
ht h tau x tau minus td tau where the integration has to be taken from minus infinity to infinity,
you will find that final convolution that you will get is a triangular wave of this form.
35
Of course, the shape of the triangular wave form will depend upon what is the on period and off
period of these 2 different signals. You can work it out and you will find that the final
convolution output will be rectangular wave like this.
Now, let us come to today’s tutorial questions. So, today I give you 3 tutorial questions. The first
one is what is aliasing? The second question is an image is described by the function f (x, y)
equal to 2 cos 2 pi into 3x plus 4y. If this image is sampled at delta x equal to delta y equal to
0.2; then what will be the reconstructed image?
So, you have been given an image which is represented by a function that is f (x, y) equal to
twice cos 2 pi into 3x plus 4y, the image is sampled both in x direction and y direction where the
sampling interval in both the directions is 0.2; then you have to find out that what will be the
reconstructed image.
And the third problem - the output of an image sensor takes values between 0.0 and 10.0. If it is
quantized by in uniform quantizer with 256 levels; what will be the transition and reconstruction
levels? So, you have an image sensor, the image sensor produces analog values between 0 and
10. These analog values are to be quantized by uniform quantizer which is having 256 numbers
of levels. Then you have to find out that what will be the optimum transition and reconstruction
levels.
Thank you.
36
Prof. P. K. Biswas
Department of Electronics & Electrical Communication Engineering
Lecture – 4
Pixel Relationships
Hello, welcome to the video lectures series on Digital Image Processing. Till the last class, we
have talked about the image digitization process and we have seen that there are 2 steps involved
for digitization of an image.
The first step is image sampling in which case instead of taking the possible intensity values at
every possible location in the image; we think of, we take the pixels values or intensity values at
some discrete locations in the 2 dimensional space and this is the process that we call as image
sampling.
After sampling, what we get is some discrete set of points and at those discrete set of points, we
get the sample values and the sample values or the intensity values at those discrete set of points
are analog in nature.
So, the final step of digitization of image of an image is the quantization in which case those
analog sample values are quantized to one of the discrete levels and depending upon the number
of levels that we choose, the number of bits needed to represent each and every sample value is
different.
1
So in general, what is used is 8 bits for digitization or quantization of every sample value. So, if
it is a black and white image or a simple grey label image; then power point or power pixel we
have 8 bits whereas, if it is a color image and we know that in case of color image, there are 3
different planes - the red plane, green plane and blue plane. For each of these different planes,
every point is represented by 8 bits.
So, for a color image, normally we have 24 bits for every pixel; whereas for a grey level image
or black and white image, we have 8 bits per pixel. So, after digitization, what we have found is
that an image is represented in the form of a matrix or a 2 dimensional matrix which we call a
digital image and each of the matrix elements are is now called a pixel.
Now, when we represent the image by a matrix and the different points are called pixels, then it
is found that some important relationships exist among those pixels. And in today’s lecture we
will try to find out what are the different important relationships that exist among different pixels
of an image.
So, in today’s lecture, we will try to see that what are the relationships that exist among the
pixels of an image and among these relationships, the first relationship that we will talk about is
the neighborhood and we will also see that what are different types of neighborhood of a pixel in
an image.
Then we will also try to explain that what is meant by connectivity in an image. We will also
learn the connected component labeling algorithm, the importance of this connected component
labeling algorithm and the properties that is connectivity, we will discuss about later. We will
also explain what is adjacency and will see what are the different types of adjacency
relationships.
2
Then we will also learn different distance measures and towards the end of today lecture, we will
try to find out what are the different image operations. We will try to see what are pixel by pixel
operations and what are the neighborhood operations in an image. So, as we say that the first
relationship is the pixel relationship or the neighborhood relation.
Now, let us first try to understand that what is meant by neighborhood. We say that the people
around us are our neighbors or we say that a person who is living in the house next to mine is my
3
neighbor. So, it is the closeness of different persons which forms the neighborhood of the person.
So, it is the persons, who are very close to me, they are my neighbors.
Similarly, in case of an image also, we say that the pixels are neighbor if the pixels are very close
to each other. So, let us try to see formally what is meant by neighborhood in case of an image
pixel. Here, let us consider a pixel at location a pixel p at location x y; as shown in this middle
pixel.
Now, find that because the image in our case is represented by a 2 dimensional matrix; so the
matrix will have a number of rows and a number of columns. So, when I consider this pixel p
whose location is x y that means the location of the pixel in the matrix is in row number x, in row
x and in column y. Obviously, there will be a row which is just before x that is row x minus 1,
there will be a row just after the row x which is row x plus 1. Similarly, there will be a column
just before the column y that is column y minus 1 and there will be a column just after column y
which is column y plus 1.
So, come back to this figure. Coming to this particular pixel p which is at location x y, I can have
2 different pixels. One is in the row just above row x, other one is in the row just below row x
but in the same column location y. So, I will have 2 different pixels. One is in the vertically
upward direction, the other one is in the vertically downward direction. So, these are the 2 pixels
which are called the vertical neighbors of point p.
Similarly, if I consider the columns, there will be a column pixel at location x i minus y minus 1
that is in row number x, column number y minus 1. There is a pixel x and y plus 1 that is row
number x and column number y plus 1. So in this case, these are the 2 pixels which are the
horizontal neighbors of the point p. So, in this case these are not 4, rather this should be 2. Here,
this will also be 2. So, this pixel p has 2 neighbors in the horizontal direction and 2 neighbors in
the vertical direction.
So, these total 4 pixels are called 4 neighbors of the point p and is represented by p equal to and
is represented by N 4 (p). That is these pixels are 4 neighbors of the pixel p or point p. Each of
these neighbors; if you find out the distance between these neighboring pixels, you find that each
of these neighbors is at a unit distance from point p. Obviously, if p is a boundary pixel, then it
will have less number of neighbors. Let us see why?
4
So, I have a 2 dimensional image where this image is represented in the form of a matrix. So, I
have pixels in different rows and pixels in different columns. Now, if this pixel p, if the point p is
one of the boundary pixels say I take this corner pixel; then as we said that for a pixel p usually
there are 4 different pixels taken from a row above it, a row below it, the column before it and
the column after it.
But when I consider this particular pixel p, this pixel p does not have any pixel in the row above
this pixel; it does not have any pixel in the column before this particular column. So, for this
particular pixel p, I will have only 2 neighboring pixels. One is in this location, the other one is
in this location which are part of 4 neighbors or N 4 (p).
So, find that for all the pixels which belong to the boundary of an image, the number of
neighboring pixels is less than 4. Whereas for all the pixels which belong to which is inside an
image, the number of neighborhood pixels is equal to 4. So, this is what is the 4 neighborhood of
a particular pixel.
Now, as we have done, as we have taken the points from vertically upward direction and
vertically downward direction or horizontally from the left as well as from right; similarly, we
can find that there are 4 other points which are in the diagonal direction. So, those points are
here.
5
Again, I consider this point p at location x y. But now if I consider the diagonal points, you find
that there are 4 diagonal images, there are 4 diagonal points. One at location x minus 1 y minus
1, the other one at location x plus 1 y plus 1, one at location x minus 1 y plus 1 and the other one
is at location x plus 1 and y minus 1.
Now, we say that these 4 pixels because they are in the diagonal direction; these 4 pixels are
known as the diagonal neighbors of point p and it is represented by N D (p). So, I have got 4
pixels which belong to N 4 (p) that is those are the 4 neighbors of point p and I have got 4 more
points which are the diagonal neighbors are represented by N D (p).
Now, I can combine these 2 neighborhoods and I can say an 8 neighborhood. So, again coming
back to this, if I take the points both from N 4 (p) and N D (p), they together are called 8 neighbors
of point p and represented by N 8 (p). So obviously, this N 8 (p) is the union of N 4 (p) and N D (p).
And naturally, as we have as we have seen in the previous case that if the point p belongs to the
boundary of the image, then N D (p) - the number of diagonal neighbors of the point p will be less
than 4. Similarly, the points belongs to N 8 (p) or the number of 8 neighbors of the point p will be
less than 8. Whereas, if p is inside an image, it is not a boundary point; in that case, there will be
8 neighbors - 4 in the horizontal and vertical directions and 4 in the diagonal directions. So, there
will be 8 neighbors of point p if point p is inside an image. So, these are the different
neighborhoods of the point p.
6
Now, after this we have another property which is called as connectivity. Now, the connectivity
of the points in an image is a very very important concept which is used to find out the region
property of an image or the property of a particular region within the image.
You recollect that in our introduce lecture, we have given an example of segmentation. That is
we had an image of certain object and we wanted to find out the points which belong to the
object and the points which belong to the background. And for doing that, we had used a very
very primitive operation called the thresholding operation.
So here, in this particular case, we have shown that if the intensity value or if x y at a particular
point x y is greater than certain threshold say Th; in that case we decided that the point x y
belongs to the object. Whereas, if the point or the intensity level at the point x y is less than the
threshold; then we have said, we have decided that the point x y belongs to the background.
So, by simply by performing this operation and if you represent every object point as a white
pixel or assign a value 1 to it and every background pixel as a black pixel or assign a value 0 to
it; in that case, the type of image that we will get after that the thresholding operation is like this.
7
So, here this is the original image and you find that for all the points which belong to the object,
the intensity value is greater than the threshold. So, the decision that we have taken is these
points belong to the object, so in case of segmented image or the thresholded image; we have
assigned a value 1 to each of those image points. Whereas, in other regions, say region like this;
we decided that these points belong to the background. So, we have assigned a value 0 to these
particular points.
Now, just by performing this operation, what we have done is we have identified certain number
of pixels, the pixels which belong to background and the pixels which belong to the object. But
just by identification of the pixels belonging to the background, I cannot or the pixels belonging
to the object, I cannot find out what is the property of the object until and unless I do some more
processing to say that those pixels belong to the same object. That means I have to do some sort
of grouping operation. Not only this, I can have a situation like this.
8
Say for example, I have in this entire image, 2 different objects; one object may be in this
location and another object may be say somewhere here. So, just by using this thresholding
operation, what I have done is I have decided that all the pixels in this region will get a value 1,
all the pixels in this region will also get a value 1. So, both these 2 sets of pixels, they belong to
the object.
But here, our solution does not end there. We have to identify that a set of pixels or this
particular set of pixels belong to one object, this particular set of pixels belong to another object.
They do not belong to the same object. So for this, what is needed is I have to identify that which
pixels are connected and also I have to identify which pixels are not connected.
So, I will say that the pixels having value equal to 1 which are connected, they belong to one
region and another set of pixels or points having value equal to 1 but not connected to the other
set, they belong to some other object.
So, this connectivity property between the pixels is a very very important property and using this
connectivity property; we can establish the object boundaries, we can find out what is the area of
the object and likewise we can find out many other properties of the object or the descriptors of
the object which will be useful for further high level processing techniques where we will try to
recognize or identify a particular object. So, now let us try to see that what is this connectivity
property? What do we mean by connectivity?
9
We say that 2 pixels are connected if they are adjacent in some sense. So, this term - some sense
is very very important. So, this adjacent means that they have to be neighbors. That means one
pixel, if i say that 2 pixels 2 points p and q are connected; then by adjacency, we mean that p
must be a neighbor of q or q must be a neighbor of p. That means q has to belong to N 4 (p) or N D
(p) or N 8 (p) and in addition to this neighborhood, one more constant that has to be put is that the
intensity values or the gray levels of the 2 points p and q must be similar.
So, let us take this example. Here, we have shown 2 3 different situations where we have taken
points p and q. So, here you find that point q, it belongs to the 4 neighborhood of point p; here
point q belongs to the diagonal neighborhood of point p, here again point q belongs to the 4
neighborhood of point p. And in this case, we will say that point p and q are connected.
Obviously, the neighborhood restrictions holds true because q and p, they are neighbors and
along with this, we have said that another restriction or another constant must be satisfied that
their intensity values must be similar.
So, in this particular case, because we are considering a binary image; so we will say that if q
belongs to the neighborhood of p or p belongs to the neighborhood of q and the intensity value at
point p is same as the intensity value at point q, so because it is binary image, this value will be
either 0 or 1.
So in this case, if I assume that if the pixels have value equal to 1, then we will assume that those
2 pixels to be connected. So in this case, if for both p and q, the intensity value is equal to 1 and
since they are the neighbors, so we will say that points p and q are connected.
10
Now from this, connectivity can be defined in a more general way. So, the earlier example that
we have taken is the connectivity in case of a binary image where the intensity values are either 0
or 1. This connectivity property can also be defined in case of gray level image. So, how do we
define connectivity in case of a gray level image? In case of a gray level image, we define a set
of gray levels.
Say for example, in this case, we have defined V to be set of gray levels which is used to define
the connectivity of 2 points p and q so that if intensity values at points p and q belongs to the set
V. So, this is not point p by p and q but the intensity values are fp and fq.
So, the if the intensity values at the points p and q, belong to set v and points p and q are
neighbors; then we will say that points p and q are connected and here again, we can define 3
different types of connectivity, one is 4 connectivity. That is in this case, the intensity values at p
and q must be from the set v and p must be a 4 neighbor of q or q must be a 4 neighbor of p; in
that case, we define 4 connectivity.
Similarly, we define 8 connectivity if the intensity values at point p and q belong to the set v and
p is an 8 neighbor of q or q is an 8 neighbor of p. There is another type of connectivity which is
defined which is called M connectivity or mixed connectivity. So, in case of M connectivity, it is
defined like this that p and q, the intensity values at points p and q; obviously, they have to be
from the same set of v and if q belongs to neighborhood or p or q belongs to N D (p) that is
diagonal neighborhood of p and N 4 (p) intersection with N 4 (q) is equal to phi.
So, this concept extends or put some restriction on the 8 connectivity in the sense that here we
say but either q has to be a 4 neighbor of p or p has to be a 4 neighbor of q or q has to be a
diagonal neighbor of p. But at the same time, N 4 (p) intersection with N 4 (q) must be equal to phi
11
and you find that this N 4 (p) intersection with N 4 (q), this indicates the set of points which are 4
neighbors of both the points p and q.
So, this says that if the point q belongs to the diagonal neighbor of p and there is a common set
of points which have 4 neighbors to both the points p and q; then M connectivity is not valid. So,
the reason why this M connectivity is introduced is to avoid some problems that may arise with
simple 8 connectivity concept. Let us see what are these problems.
So, the problem is like this. Here again, we have taken the example from a binary image and in
case of a binary image, we say that 2 points may be connected if the values of both the points
equal to 1. So, set V contains a single intensity value which is equal to 1. Now, here we have
depicted one particular situation where we have shown the different pixels in a binary image. So,
find that if I consider this point at the middle of this image which is having a value 1, there is one
more pixel on the row above this which is also having a value 1 and a diagonal pixel which is
having a value 1 and a diagonally downward pixel which is also having a value equal to 1.
Now, if I define 4 connectivity; then you find that this point is 4 connected to this point, this
point is 4 connected to this point because this particular point is member of the 4 neighbor of this
particular point, this point is the member of 4 neighbor of this point. But by 4 connectedness, this
point is not connected because this is not a 4 neighbor of any of these points.
Now, from 4 connectivity, if I move to 8 connectivity; then what I get? Again, I have the same
set of points. Now, you find that we have defined 8 connectivity to be a union of or 8
neighborhood to be a union of 4 neighborhood and diagonal neighborhood. So, because this is
union of 4 neighborhood and diagonal neighborhood, so I will have set of points which are
connected to 4 neighbors. I will also have set of connections or set of points which are connected
through diagonal neighbors.
12
So, as shown in the second figure, here you find that when I consider this central pixel; again
these 2 connectivity which are 4 connectivity, they exist. In addition to this, this point which was
not connected considering the 4 neighborhood, now gets connected because this belongs to the
diagonal neighborhood of this central point. So, these 2 points are also connected.
Now, the problem arises here. This point was connected through 4 neighborhood and at the same
time, this point because this is a diagonal neighbor of the central point; so this point is also
connected to this diagonal neighborhood. So, if I consider this situation and I simply have 8
connectivity, I consider 8 connectivity; then you find that multiple number of paths for
connection exists in this particular case.
So, the M connectivity or mixed connectivity has been introduced to avoid this multiple
connection path. So, you just recollect the restriction that we have put in case of mixed
connectivity. In case of mixed connectivity we have said that 2 points are M connected if one is
the 4 neighbor of other or one is 4 neighbor of other and at the same time, they do not have any
common 4 neighbor.
So, just by extending this concept in this case; you find that for M connectivity, these are
diagonal neighbors, so they are connected. But these 2 points, though they are diagonal
neighbors, but they are not M connected because these 2 points have a point here. This point is a
4 neighbor of this, at the same time; this point is 4 neighbor of this.
So, when I introduce this 4 connectivity concept, you find that the problem that arises that is the
multipath connection which have come in case of 8 connectivity, no more exists in case of M
connectivity. So, in case of M connectivity, even if we consider the diagonal neighbors but the
problem of multiple path does not arise. So, this is the advantage that you get in case of M
connectivity.
13
Now, from the connectivity, we come to the relationship of adjacency. So, you say that 2 pixels p
and q are adjacent if they are connected. So, since for connectivity we have introduced 3
different types of connectivity; that is 4 connectivity, 8 connectivity and M connectivity. So, for
all these 3 different types of connectivity, we will have 3 different types of adjacency because
our adjacency definition is 2 points are adjacent if they are connected.
So, just by extension of this definition, you find that we have got 3 different types of adjacency.
The first one is 4 adjacency, the second one is 8 adjacency and the third one is m adjacency and
you define this type of adjacency depending upon the type of connectivity that is used.
Now, this is about the adjacency of 2 different points or 2 or more different points. We can
extend this concept of adjacency to image regions. That is, we can also say that 2 image regions
may be adjacent or they may not be adjacent. So, what is the condition for adjacency of 2 image
regions?
So, in this case you find that we define the adjacency for 2 image regions like this that if there
are 2 image subsets - S i and S j , we say that S i and S j will be adjacent if there exists a point p in
image region S i and a point q in image region S j such that p and q are adjacent.
14
So, just let us try to elaborate this. So, I have this overall image region. This is the whole image
and within this image, I have 2 image regions; one is here, the other one is here. So, the
adjacency relation between 2 image regions is defined like this that I have to have some point in
one region which is adjacent to a point in the other region.
So, if I call, say this is image region S i and this is image region S j ; then I must have some point p
in image S i and some other point q in the image S j so that this p and q, they are adjacent. So, if p
and q are adjacent, then I say that this image region S i is adjacent to image region S j . That means
S i and S j ; they must appear one after the other, one adjacent to the other. So, this is the
adjacency relation.
15
After the adjacency relation, we can also define a path between 2 points p and q. So, the
definition of a path is like this that we say that a path exists from a point p having a coordinate
(x, y) to a point q having coordinate (s, t) if there exists a sequence of distinct pixels say (x 0 , y 0 )
(x 1 , y 1 ) (x 2 , y 2 ) and so on upto (x n , y n ) where (x 0 , y 0 ) is equal to (x, y).
That is the same as point p and (x n , y n ) is equal to (s, t) which is the same as point q. And all the
other intermediate points like a (x 1 , y 1 ) x 2 and y 2 , they must be adjacent; the subsequent points
must be adjacent. In the sense that (x i y i y i ) has to be adjacent to x minus 1 y minus 1 for all
values of i lying between 1 and n.
So, if I have such a sequence of points between p and q such that all the points which are
traversed in between p and q, all those subsequent points; they are adjacent, then we say that a
path exists between from point p to point q. And we also define the length of the path to be the n.
That is considering both point p and q, if I have n plus 1 number of points including the end
points p and q and all the points in between; then the length of the path is said to be n. So, this is
what we define as path.
Now, very important concept that can arise from here, that is how to define a connected region.
We have said that 2 pixels are connected, we said 2 pixels are connected if they are adjacent in
some sense that is they are neighbors and their intensity values are also similar. We have also
defined 2 regions to be adjacent if there is a point in one region which is adjacent to some other
point in another region and we have also defined a path between a point p and q if there are a set
of points in between which are adjacent to each other. Now, this concept can be extended to
define what is called a connected component.
16
So, let us see what is a connected component. We take a sub set S of an image I and we take 2
point p and q which belong to this subset S of image I. Then you say that p is connected to q in
S. So, you just mind this term that p is connected to q in the subset S if there exists a path from p
to q consisting entirely of pixels in S. For any such p belonging to S, the set of pixels in S that
are connected to p is called a connected component of S. So, the concept is like this.
Say, this is my entire image I and within this image, I take a sub image say S and here say we
have a point p and we say that take any other point q if there exists a path from p to q consisting
17
of all other intermediate points, this intermediate points must also belong to the same subset S.
So, there exists a path between a p and q consisting of intermediate points belonging to the same
subset S; then we say that the point p and q they are connected. And, if there are a number of
such points to which a path exists from p, then set of all these points are said to be connected to p
and they form a connected component of S. So, find that just by using this concept of connected
component, I can identify a region in an image.
So, going back to our earlier example where we have said that simply by identifying that a pixel
belongs to an object, does not give me the entire solution because I have to group the pixels
which belong to the same object and give them some identification that these are the group of
pixels which belong to the same object and then I can go for extracting the region property which
will tell me what is the property of that particular object.
And now, that belongingness to a particular object can be found out by using this concept of
connected component. So, any 2 pixels of a connected component we say they are connected to
each other and distinct connected components are disjoint. Obviously, the points belonging to
one particular region and points belonging to another particular region, they are not connected
but the points belonging to a particular region they are connected with each other. So, for this
identification or group identification, what you have to do is when we identify a set of pixels
which are connected; then for all those set for all those points belonging to a particular group, we
have to assign a particular identification number.
Say for example, in this particular figure, you find that there are 2 group of pixels. So, in the first
figure we have a set of pixels, here we have another set of pixels here. So, find that these set of
pixels are connected, these set of pixels they are also connected. So, this forms one connected
component, these set of pixels form another connected component.
18
So, this connected component labeling problem is that I have to assign a group identification
number to each of these pixels. That means the first set of pixels which are connected to each
other; I have to give them one group identification number. In this particular case, all these pixels
are identified to be yellow and I have to give another group identification to this second set of
pixels. So, in this particular case all these pixels are given the color red.
Now, once we identify this group of pixels to belong to the belong to a particular region; then we
can go for finding out some region properties and those region properties may be shape up that
particular region, it may be area of that particular region, it may be the boundary, the length of
the boundary of this particular region and many other shape, area or boundary based features can
be extracted once we identify these different region properties.
Now, let us see that what will be the algorithm that has to be followed to find out what is the
group identification for a particular region or the pixels belonging to a particular region. So, the
idea will be the algorithm like this that you scan the image from left to right and from top to
bottom.
So, as shown in this particular figure, if I scan the image like this, from left to right and from top
to bottom, so this will be our scanning order and for the timing being let us assume that we are
using 4 connectivity; so by using this 4 connectivity, whenever we reach a particular point set p,
you find that before reaching this particular point p and following this order of scanning, the
other 4 neighbors of point p which will be a scanned is the points which is above this that is point
r in this particular figure and the point which is just left to p that is point t before this figure.
So, before this particular point p is scanned, the point belonging to 4 neighbors of p which will
be scanned are points r and point t. So, by using this particular fact that which are the points
which will be scanned before you scan point p; we can develop a component leveling algorithm.
19
The purpose of the component leveling algorithm is to assign an identification number to each of
the pixels, connected pixels which will identify it to belong to a particular region.
So, the steps involved will be like this. So, when I consider a point p, I assume that I (p) is the
pixel value at that particular location and I also say that L (p) will be the label assigned to the
pixel at location p. Then the algorithm steps will be like this that if I (p) equal to 0 because as we
have seen in the previous case that after segmentation, we say that whenever the intensity value
at a particular location is above certain threshold, we assign a value 1 to that particular location.
Whereas, if the intensity value is less than the threshold; we assign a value 0 to that particular
location.
So, by using this convention when I wanted to find out the region property based on the shape,
the pixels which are of important or the points are which points which are of important or the
points having a value equal to 1 and we assume the points having a value equal to 0 that belong
to the background, so they are not of importance.
So, just by this, if a point has a value equal to 0 that is I (p) equal to 0, we do not assign any label
to this. So, we just move to the next scanning position; either to the left or either to the right or to
bottom. But if I (p) equal to 1, that is the value at a particular point equal to 1 and we find while
scanning this, we have come across 2 points r and t, so when I find a point p for which value
equal to 1 and the values at both the points r and t equal to 0; then we assign a new label to
position p.
If I (p) equal to 1 and only one of the 2 neighbors that is r and t is 1 and because r and t has been
already been scanned, so both r and t if those values were equal to 1 should have got a particular
label L. So, in this particular case if I (p) equal to 1 and 1 of r and t is equal to 1; in that case, to p
we assign the same label which was assigned to r or t whichever was1 before.
20
So, if I (p) equal to 1 and only one of the 2 neighbors is 1; then assign the label of that neighbor
to point p. But the problem comes if I (p) equal to 1 and both r and t are equal to 1. So, the
problem becomes simpler or assignment is simple if the label which was assigned to t and the
label which was assigned to r, that were same. So, if L (r) equal to L (t), then you assign the
same label to point p. So, you see in this particular case that if L (r) equal to L (t), then L (p) gets
L (r) which is obviously same as L (t). But the problem comes if the label assigned to r and the
label assigned to t, they were at not same.
So, in this particular case what we have to do is we have to assign one of the 2 labels to point p
and we have to note that these 2 labels are equivalent because p and t or p and r, they are
adjacent and for r and t the labels were different. So, after doing the initial labeling, we have to
do some post processing so that all these pixels p, r and t, they get the same label.
So, here what we have to do is we have to assign point p one of the labels, the label of r or the
label of t and we have keep a note that the label of the other pixel and the label of this the label
which is assigned to p, they are equivalent so that in the post processing space, this anomaly that
has been generated that can be avoided.
So, at the end of the scan, all pixels with value 1 will have some labeled and some of the labels
will be equivalent. So, during post processing or the second pass, what we will do is we will
identify all the equivalent pairs to form an equivalence class and we can assign a different label
to each of this equivalence classes and in the second pass, you go through the image once more
and for all the labels which belongs to a particular equivalence class, you replace its original
label by the label that has been assigned to the equivalence class.
21
So, these 2 passes gives at the end of the second pass, you get a labeled image where you
maintain where you identify the region belongingness of a particular pixel by looking at what is
the label assigned to the particular pixel.
So here, let us come to see an example. So, in this particular example, you find that we have
shown 2 different regions or 2 different connected regions of pixels having value equal to 1. So,
here what we do is during the initial pass, as we scan this image from left to right and from top to
bottom; what we do is the first image that I get, I assign a label 1 to it.
Then you continuous you your scanning. When I when you come to the second pixel - I, second
white pixel, you find that by connectivity it belongs to the same region. But when I come to this
particular pixel; if I go through the … top pixel and the left pixel, I find that there is no other
pixel which is having a value equal to 1. So, I have to assign a new value to this particular pixel
and this pixel gets a value equal to 2.
Come to the next one. This pixel again gets a value equal to 1 because its top pixel is equal to 1.
Coming to the next one, this one gets a value equal to 2 because its top pixel has a value equal to
2. Come to the next one. This gets a value 3 because it has to be a new label because neither its
top neighbor nor the left neighbor has any other label. The next one again gets a value 3 because
its left neighbor has the label 3.
Again this gets a value one because top neighbor is equal to 1. This gets a value one because the
left neighbor is equal to 1. This gets a value 1, this gets a value 1. Now, you find that in this case,
there is an anomaly because for this pixel, the top pixel has label equal to 2 and the left pixel has
value equal to 1. So, I have to assign one of these 2 labels to this particular pixel.
22
So here, we have to assigned label 1 to this pixel and after this, what we have to do is we have to
declare that 2 and 1; they are equivalent. Then you continue your processing. Here again, I have
to assign a new label because none of the top or the left pixels of this, neighbors of this have got
any label. So, it gets the label 4.
Coming to the next one, here you find that its top one has got label 3 and left one has got label 4.
So, I have to assign one of the labels and in this case, the label assigned is 4. But at the same
time, I have to keep a note that label 3 and label 4, they are equivalent. So, you mark 3 and 4
equivalent. So, if you continue like this; the next one is again 4, the next one again gets 4, this
one gets 1, this one again gets 1, this one gets 4, this one gets 4, this one gets the 5 and that is a
new label because for this particular pixel its top or left one does not have any label.
Coming to the next one, it gets a label 1 because here the top pixel has already had a label 1 but
at this particular point, we have to keep a note that 5 and 1 are equivalent. So, you note 5 and 1
to be equivalent. You continue like this. The other pixel gets the label 4, this pixel gets the label
4, this pixel gets the label 4, this pixel again gets the label 5 because its top pixel is already
having a label equal to 5.
So, at the end of this scanning, I get 3 equivalent pairs. One equivalent pair is (1, 2), the other
equivalent pair is (3, 4) and the third equivalent is (1, 5). So, in the second pass, what I have to
do is I have to process this equivalent pairs to identify the equivalence classes. That means the
labels all the labels which are equivalent.
So, by processing this, you find that 1 and 2 and 1 and 5, they are equivalent. So, 1 and 2 and 5,
these 2 labels form a particular equivalence class and similarly 3 and 4, they are equivalent
forming an equivalence class. So, if I assign a label 1 to the equivalence class containing the
labels 1 2 and five and at the same time, I assign a label 3 to the equivalence class containing
23
labels 3 and 4; then during the second pass, what I will do is I will scan over this labeled image
which is already labeled or I will reassign the labels.
So, wherever the label was equal to 1, I will maintain that equal to 1 and wherever I get a label
which is 2 or 5, I will reassign that label to be equal to 1. So, in this particular case, if you
remember this pixel had got a label equal to 2, I reassign because 2 belongs to an equivalence
class consisting of the labels 1, 2 and 5 to which we have assigned the label equal to 1. So,
wherever i get label to 2, I reassign that label equal to 1.
So, continuing this way; this was already 1, this was 2 which has been reassigned to be 1, this
was 3 which remains because 3 and 4 form equivalence class and the label assigned to this
equivalence class was 3. This is also 3 that remains, this was 1 that remains, this was 1 that
remains, this was 1 that remains, this was say possibly 2 or 1. So, I make it equal to 1. This had
got a label equal to 4. So, that has been reassigned the label value equal to 3.
So, find that at the end of this second pass, I identify all the pixels belonging to a particular
group to have a single label and similarly all the pixels belonging to this particular group to have
another label. So, I will stop here today, I will continue with this lecture in the next class.
Thank you.
24
Prof. P. K. Biswas
Lecture - 5
Pixel Relationships II
Hello, welcome to the video lecture series on digital image processing. In our last class, we have
started our discussion on pixel relationships. Today we will continue with the same topic of
discussion on pixel relationships. In the last class, we have seen what is meant by pixel
neighborhood and we have also seen different types of neighborhood.
We have explained what is meant by connectivity, we have seen what is adjacency and different
types of adjacency and we have also seen a connected component labeling problem.
1
In today’s lecture, we will learn different distance measures, we will see application of distance
measures, we will see arithmetic and logical operations on images and we will see neighborhood
operation on images.
So, let us first repeat what we have done in the last class that is on connected component
labeling. Connected component leveling is a very very important step specifically for high level
image understanding purpose. Because, as we have seen earlier that if an image contains a
number of objects, then the image region can be divided into 2 regions - object region and
background region. And there you have seen that you have taken a decision role that if the
2
intensity value at a particular region is above certain threshold, then we assume that particular
pixel belongs to the object and if the intensity is less than the threshold, we assume that the
particular pixel belongs to the background.
Now, this simple rule, simply tells that a pixel whether it belongs to an object or it belongs to a
background. But if there are multiple number of objects present in the same, this simple rule does
not tell which pixel belongs to which particular object where the component leveling algorithm
or the component labeling problem gives a solution because it keeps tells you not only the pixel
belonging to a particular object but it also associates every pixel to a particular object.
So, once you have this association of the pixels belonging to a particular object, you can find out
different properties of the particular object. For example; what is the area of that object region,
what is the shape of the object region, what is the boundary of that object region and different
other shape related features which can be used for a high level understanding purpose.
Now, let us see that what is meant by a connected component. Here we assume S to be a subset
of an image I and let us assume that we have 2 pixels p and q which belongs to the image subset
S. Then we say that the pixel p is connected to pixel q or point p is connected to point q in the
image subset S if there is a path from p to q consisting entirely of pixels in the region S.
So for any point p belonging to S, the set of pixels in S that are connected to p is called a
connected component of p. So, any 2 pixels of a connected component are connected to each
other and you find that distinct connected components have to be disjoint.
So, the problem of connected component labeling is something like this. Here, we want to assign
some label to each of the pixels or the label indicates that to which particular object that
particular pixel belongs. So, the object pixels belonging to a particular object should obtain the
3
same label whereas the pixels belonging to different objects should be labeled with different
numbers.
So, the connected component labeling problem is the ability to assign different labels to various
disjoint connected components of an image. So, as has been shown in the particular example;
here, in this particular case, we have got 2 different object regions. One object region is this and
the other object region is this. So, after connected component labeling, we should be able to say
that this, this, this and so the other pixels belong to one particular region. Similarly, this pixels
belongs to another particular object. So, after component labeling, we should be able to patrician
these pixels in 2 different course as is shown here in yellow color and red color.
So, the pixels having yellow color, they belong to one particular object and pixels having a red
color, they belong to another object. So once I identify this belongingness of a pixel to a
particular object, I can find out the different properties of that particular region as we have all
ready said; we can find out what is the shape of the particular region, we can find out what is the
area of the object, we can find out what is the boundary of this object and also we can find out
different other shape area or boundary based features.
So now, let us see that what should be the algorithm for connected component labeling. To tell
you the algorithm, what is done is given an image; now because we have distinguished between
the pixels belonging to the object and the pixels belonging to the background, so the image now
will be a binary image, there will be a set of points having value equal to 1 and a set of points
having value equal to 0. So, the pixels with value equal to one belongs to an object, whereas
pixels with value equal to 0 belong to the background. So, for component labeling the approach
taken is like this you scan the image that is the binary image in a raster scan fashion that is from
left to right and from top to bottom.
4
So, given in this form, you find that if we scan this image from left to right and from top to
bottom, then what we get is something like this; if I scan this image from left to right and then
top to bottom and while scanning, we want to level the different points in this image with a
particular number which is the leveling problem. Now in this particular case, if we try to find out
what should be the level at this particular point p, here I assume that p is an object point; then
following the way of our scanning, you find that before p is scanned, the above points which
belong to four neighbors of this point p which will be scanned or one point above p in the same
column and one point to the left of p in the same row.
So, those points here are point r and point t. So, r is above p and t is to the left of p and these are
the only points belonging to the 4 neighbor of p which will be scanned before the point p is
scanned.
So here, our leveling algorithm will be like this that if I find if I want to label the point p, I
assume that L to be I (p) is the intensity at value at the location p and L (p) is the label assigned
to the pixel at location p. Now obviously, if I (p) is equal to 0 that means point p belongs to the
background. So, I do not need to label that particular point p. So similarly, I move to the next
point following the same scanning order. But if the intensity at point p is equal to 1, then I have
to level that particular point.
So here, I check that what is the intensity values at r and t. If both at r and t, the intensity values
are equal to 0; then I assign a new level to that particular point p. So, if L (p) equal to 1 and both
I (p) equal to 1 and both I (r) and I (t) they are equal to 0; then I assign a new level to pixel
position p. But if I (p) equal to 1, but one of its neighbors that is either r or t is having a value
equal to 1 because r and t are already scanned before scanning the point p. So, here we find that
either r or t whichever is having a value equal to 1 that will all ready be labeled.
5
So in this case, you assign the same level to the point p. If I (p) equal to 1 and both r and t are
ones, then I have 2 situations. That is the level at r and level at t, they are same. So, if they are
same, you assign the same level to point p. But the problem comes if the level at r and level at t,
they are different. So, both r and t are the object points but the levels are different.
So in this case, when I assign a level to the particular point p, I have to choose one of those two
levels; either the level of t or the level of r which will be assigned to point p. Now, whichever
level I assigned to point p, I have to make an association with other level saying that the level
which is assigned to point p and the other level which is not assigned to point p, they are
equivalent.
So once I do this; after this pass of the algorithm, what I have is again the binary image but now
the different object points will be having different levels and at the same time, I will also have a
set of equivalent pairs. So, what I have to do is I have to process all this equivalent pairs to find
out the equivalent process. That is all the levels which are equivalent following the transitivity
relation and then to each of the equivalence class, I have to assign a unit level.
And, after that I have to do a second pass over the leveled image and all the levels which belongs
to a particular equivalent class, I have to replace its original level to the level which is assigned
to the equivalence class.
So, find that this component leveling algorithm in this case is a 2 pass algorithm. In the first pass,
you assign some level and you generate some equivalence pairs or equivalent relations and in the
second pass, you do the final leveling of the object points where different pixels belonging to a
same region will get the same level.
6
So now, let us see how the algorithm works. You find that in this case, we have shown 2
different regions of connected pixels. So, the objects the pixels belonging to a particular object
are marked with white with color white and the pixels belonging to the background, they are
marked with color black.
Now, if I scan this particular binary image from left to right and from top to bottom then and I
assign the levels to different pixels; then what I get is the first object point that I encounter gets a
level 1. Then I continue my scanning, go to the next point belonging to the object and here you
find that the next point gets a level 2 because none of the points above it or below has got any
level.
I continue further, I come to the third point and you find that for the third point, the point above
it was all ready having a level 1. So, I assign the same level 1 to this particular point. Continue
further, the next point gets level 2 because the point above 8 had a level equal to 2. Next point
gets a new level and in this case, the level is equal to 3 because when I assign the level to this
particular point, I find that none the points above it or to the left of it have got any level. So, this
point gets a new level and which is equal to 3 in this particular case.
If I continue, you find that the next point again gets level 3. Continue further; the next point gets
level 1, the next one also gets level 1, next one also gets level 1. Now, comes the problem; when
I go to the next point, you find that the point above it had a level equal to 2 and point to the left
of it had a level equal to 1. So in this case, we have given level 1 to this point. But once I have
given level 1 to this point, now find that this 1 and 2, they are equivalent. So, after leveling this
point with level equal to 1, I have to set that 1 and 2 are equivalent and that has to be noted. So, I
note that 1 and 2, they are equivalent.
You continue further; next point gets level equal to 4, next point also gets level equal to 4.
Again, I get the same situation that the point above it had a level equal to 3. So, I have to note
7
that 3 and 4 are equivalent. So, the equivalence between 3 and 4 are noted. I continue this way;
next point gets 4, next point also gets 4, next point gets level 1, next point level 1, next point gets
level 4, the next point also gets 4, then I get a level 5, the next point gets a level 1 but at this
point again I have to note that 5 and 1 are equivalent.
So, if I continue this, I get the level of this point. So, I get at the end of the first scan, 3
equivalent pairs; (1, 2) (3, 4) and (1, 5). So, next what I have to do is I have to process this
equivalent pairs to find out what are the equivalence classes.
So, that is what I do next; so if I process the equivalent pairs 1 and 2 and 1 and 5; I find that all 1,
2 and 5, they are equivalent and here I assign a level 1 to this equivalence class containing levels
1, 2 and 5. Similarly 3 and 4, they are also equivalent and here I assign the level 2 level 3 to this
particular class (3, 4).
So, after this new levels are assigned to the equivalence classes, I do a second pass over the same
leveled image and every level is now replaced by the label assigned to its equivalence class and
at the end of the second pass, you find that what I get is leveled regions in the image. Now, the 2
regions are clearly identifiable. One region get level equal to 1 and the other region gets level
equal to 3.
So once I get this, I know that which pixel or which point belong to which particular object. All
the points having level equal to 1, belong to one object and all the points having level equal to 3
that belong to some other object. So, now by identifying the levels; I can find out what is the
shape of different objects, what is the area of different objects and many search shape related
features that we can extract after this leveling.
So, find that this component leveling algorithm or component leveling operation is a very very
important operation which is useful for a high level image understanding purpose.
8
So, after doing this component leveling algorithm, now let us move to another operation, another
concept that is distance measures. Now, finding out the distance between 2 points, we are all
familiar that if I know the coordinate or the location of 2 different points, I can find out what is
the distance between the 2 points.
Say for example, what I can do is if I have 2 points, say one point P and other point q and I know
that the coordinate of point P is given by (x, y) and the coordinate of point q is given by s and t;
then we all know from our school level mathematics that the distance between the 2 points P and
q is given by the relation that I represent this as D (p, q) that is distance between p and q which
9
will be given by x minus s square plus y minus t square and square root of this term. So, this is
what all of us know from our school level mathematics.
Now, when I come to digital domain, then this is not only this is not the only distance measure
that can be used. There are various other distance measures which can be used in digital domain.
Those distance measures are say; city block distance, chess board distance and so on. So, to see
that if D is a distance function or a distance metric, then what is the property that should be
followed by this distance function D?
So for this, let us take 3 points. We take 3 points here; p having a coordinate (x, y), q having a
coordinate (s, t) and I take another point z having the coordinate (u, v). Then D is called a
distance measure is a valid distance measure or valid distance metric. If D (p, q) is greater than
or equal to 0 for any p and q, any 2 points p and q; D (p, q) must be greater than or equal to 0 and
D (p, q) will be 0 only if p is equal to q.
So, that is quite obvious because the distance of the point from the point itself has to be equal to
0. Then the distance metric distance function should be symmetric that is if I measure the
distance from p to q, that should be same as the distance if I measure from q to p. That is the
second property that must hold true. That is D (p, q) should be equal to D (q, p).
And, there is a third property which is an inequality. That is if I take a third point z, then the
distance between p and z that is D (p, z) must be less than or equal to the distance between the p
and q plus the distance between q and z and this is quite obvious, again from our school level
mathematics you know that if I have say 3 points (p, q) and I have another point z and if I
measure the distance between p and z, this must be less than the distance between pq plus the
distance between pz.
So, this what we all have done in our school level mathematics and the same property must hold
true if an in case of digital domain where we talk about different other distance functions. So,
10
these are the 3 properties which must hold true for a function if the function is to be considered
as a distance function or a distance metric.
Now, the first of this, that is the distance between p and q which we have all ready seen that if p
has a coordinate (x, y) and q has a coordinate (s, t), then D (p, q) the distance between p and q is
equal to x minus s square plus y minus t square and square root of this whole term. This is a
distance measure which is called a Euclidean distance.
So, in case of Euclidean distance, you find that set of points q where D (p, q) the distance
between p and q, obviously we are talking about the Euclidean distance is less than or equal to
some value r. So, set of all this points are the points contained within a disk of radius r where the
center of the disk is located at location p.
11
And again, this is quite obvious, you find that suppose I have a point p here and I take a point q
and I say that the distance between p and q is r. So, if I take set of all these points are the distance
is equal to r that forms a circle like this. So, all other points having a distance less than r from the
point p will be the points within this circle. So, set of all this points or the distance value is less
than or equal to r obviously we are taking about the Euclidean distance; in that case, the set of all
this points forms a disk of radius r or the center of the disk is at location p.
12
Now, coming to the second distance measure which is also called D4 distance or city block
distance or this is also known as Manhattan distance; so this is defined as D 4 (p, q) is equal to x
minus s absolute value plus y minus t absolute value.
So, in this case you find that it is some like this that if I have point p with coordinate (x, y) and I
have point q with coordinate (s, t). So, this D 4 distance as it is defined that it is equal to x minus
s, take the absolute value plus y minus t, again you take the absolute value.
So, this clearly indicates that if I want to move from point p to point q, then how much distance I
have to move along the x direction and how much distance I have to move along the y direction.
Because x minus s, the absolute value of this is the distance travelled along x direction and y
minus t absolute value of this is the travel is the distance travelled along the y direction.
So, the sum of these distances along x direction and y direction gives you the city block distance
that is D4. And, here you find that the points having a city block distance from point p less than
or equal to some value r, will from a diamond center centered at point p. So, which is quite
obvious from here that here you find that if p is the point at the center, then all the points having
city block distance, they are just the 4 neighbors of the point p.
Similarly, all the points having the city block distance is equal to 2, they are simply the points
which are at distance 2. That is the distance taken in the horizontal direction plus the distance
taken in the vertical direction that becomes equal to 2 and set of all this points with city block
distance is equal to 2 that simply forms a diamond of radius 2 and similarly other points at
distances 3 4 and so on.
13
Now, we come to the third distance measure which is the chess board distance. As you have seen
that in case of city block distance, the distance between 2 points was defined as the sum of the
distances that you cover along x direction plus the distance along the y direction. In case of chess
board distance, it is the maximum of the distances that you cover along x direction and y
direction.
So, this is D 8 (p, q) which is equal to max of x minus s and y minus t where we take the absolute
value of both x minus s and y minus t and following the same argument, here you find that the
set of points with a chess board distance of less than or equal to r, now forms a square centered at
point p. So here, all the points with a chess board distance of equal to 1 from point p, they are
nothing but the 8 neighbors of point p.
Similarly, the set of points with a chess board distance will be equal to 2 will be just the points
outside the points having a chess board distance equal to 1. So, if you continue like this you will
find that all the points having a chess board distance of less than or equal to r from a point p will
form a square with point p at the center of the square. So, these are the distance different distance
measures that can be used in the digital domain.
Now, let us see that what is the application of this distance measure. One of the obvious
applications is that if I want to find out the distance between 2 points, I can make use of either
Euclidean distance or city block distance or the chess board distance. Now, let us see one
particular application other than just finding out the distance between 2 points.
14
Say for example, here I want to match 2 different shapes which are shown in this particular
diagram. Now, here you find that these 2 shapes are almost similar except that you have a hole in
the second shape. But if I simply go for matching these 2 shapes, they will be almost similar. So,
just by using these original figures, I cannot possibly distinguish between these 2 shapes.
So, if I want to say that these 2 shapes are not same that they are dissimilar; in that case, I cannot
work on this original shapes but I can make use of some other feature of this particular shape. So,
let us see what is that other feature. If I take the skeleton of this particular shape; in that case, you
find that this third figure gives you what is the skeleton of the first shape.
Similarly, the fourth figure gives you what is the skeleton of the second shape.
Now, if I compare these 2 skeletons rather than comparing the original shapes, you will find that
there is lot of difference between these 2 skeletons. So, I can now describe the shapes with the
help of the skeletons in the sense that I can find out that how many line segments are there in the
skeleton. Similarly, I can find out that how many points are there where more than 2 line
segments meet.
So by this, if I compare the 2 skeletons, you find that for the skeleton of the first shape, there are
only 5 line segments, whereas for the skeleton of the second shape there are 10 line segments.
Similarly, the number of points where more than 2 line segments meet; in the first skeleton there
are only 2 such points, whereas in the second skeleton there are 4 such points.
So, if I compare using the skeleton rather than comparing using the original shape, you find that
there is lot of difference that can be found out both in terms of the number of line segments, the
skeleton has and also in terms of the number of points where more than 1 line segments meet.
15
So using these descriptions which I have obtained from the skeleton, I can distinguish between
the two shapes as shown in this particular figure. Now, the question is that how do we get this
skeleton and what is this skeleton.
So, you find that if you analyze the skeletons, you will find that the skeletons are obtained by
removing some of the foreground points but the points are removed in such a way that the shape
information as well as the dimension likely that is what is the length of rate of that particular
shape is more or less retained in the skeleton. So, this is how this is what is the skeleton of the
particular shape and now the question is how to obtain the skeleton.
16
Now, before coming to how do you obtain the skeleton; let us see that how the skeleton can be
found out. So, the skeleton can be found out in this manner; if I assume that the foreground
region in the input binary image is made of some uniform slow burning material and then what I
do is I light fire at all the points across the boundary of this region that is the foreground region.
Now, if I light fire across the boundary points simultaneously, then the fire lines will go in
slowly because the foreground region consists of slow burning material, then you will find that
as the fire lines they go in, there will be some points in the foreground region where the fire
coming from 2 different boundaries will meet and at that point, the fire will extinguish itself.
So, the set of all those points is what is called the quench line and the skeleton of the region is
nothing but the quench line that we obtained by using this fire propagation concept.
Now, to obtain this kind of skeleton, you find that this simple description of movement of the
fire line does not give you an idea of how to compute the skeleton of a particular shape. So, for
that what we can use is something called distance measure. Now, the distance measure is in the
same manner. We can define that when we a lighting the fire across all the boundary points
simultaneously and the fire is moving inside the foreground region slowly, we can note at every
point that how much time the fire takes to reach that particular point that is the minimum time
the fire takes to reach that particular point and at every such foreground point if we note this time
taken to reach, the time the fire takes to reach that particular point; then effectively what we get
is a distance transformation of the image.
So, in this case, you find that distance transform is normally used for binary images and because
at every point we are noting the time the fire takes to reach that particular point; so by applying
distance transformation, what we get is an image or the shape of the image is similar to the input
binary image but in this case, the image itself will not a binary but it will be a grey level image
17
where the grey level intensity of the points in the inside the foreground region had changed to
show the distance of that point from the closest boundary point.
So, let us see that what this distance transform means. Here, you find that here we have shown a
particular binary image where the foreground region is a rectangular region and if I take the
distance transform of this, the distance transformed image is shown in the right hand side. Here
you find that all the boundary points, they are getting a distance value equal to 1. Then the points
inside the boundary points, they get a distance value equal to 2 and the points further inside, they
gets a distance value equal to 3.
So, you find that the intensity value that we are assigning to different points within the
foreground region, the intensity value increases slowly from the boundary to the interior points.
So, this is nothing but a grey level image which you get after performing the distance
transformation.
18
So now, you find that if I apply the distance transformation to the shapes that we have just
discussed, the 2 rectangular shapes; on the left hand side, we have the original rectangular shape
or the binary image and on the right hand side, what is shown is the distance transformed image
and here you find that again in this distance transformed image as you move inside, inside the
foreground region, the distance value increases gradually.
And now, if you analyze this distance transformed image, you find that there are few points at
which there is some discontinuity of the curvature. So, from this distance transformed image, if I
can identify the points of discontinuity or curvature discontinuity, those are actually the point
which lies on the skeleton of this particular shape.
19
So, as shown in the next slide, you find that on the left hand side, we have the original image; the
middle column tells you the distance transformed image and the right most columns tells you the
skeleton of this particular image. And, if you now correlate this right most column with the
middle column, you find that the skeleton in the right most columns can now be easily obtained
from the middle column which tells you that what is the distance transform of the shape that we
have considered.
So, this shows some more skeletons of some more shapes. Again, on the left hand side we have
the original image, in the middle column we have the distance transformed image and on the
20
right most columns we have the skeleton of this particular shape. So, here again you can find that
the relation between the skeleton and the distance transformed image, they are quit prominent.
So, for all such shapes or whenever we go for some shape matching problem or shape
discrimination problem; in that case, instead of processing on the original shapes if we compare
the shapes using this skeleton, then the discrimination will be better than if we compare the
original shapes.
Now here, when I am going for a distance transformation of a particular shape, as we have seen
that we can have different types of distance measures or distance metrics like Euclidian distance
metric, we can have city block distance metric or if an we can have a chess board distance
metric; similarly, when I take this distance transformation for each of the distance metrics, there
will be different transformations, different distance transformations.
And obviously, the difference different distance transformations will produce different results
but all the results that we will get similarly from the distance transformed image when you get
the skeleton, all the skeletons that we will get using different distance metrics, they will be a
slightly different but they will be almost similar. So, this can be just another application of the
distance metric.
And, you find that here the skeleton is very very useful because it provides a simple and compact
representational shape that preserves many of the topological and size characteristics of the
original shape.
21
Also, from this skeleton, we can get a rough idea of the length of a shape because when I get the
skeleton, I get different end points of the skeleton and if I find out the distances between every
pair of end points in the skeleton, then the maximum of all those pair wise distances will give me
an idea of what is the length of that particular shape and as we have all ready said that using this
distance using this skeleton, we can qualitatively differentiate between different shapes because
here we can find out we can get a description of the shape from the skeleton in terms of the
number of line segments that the skeleton has and also in terms of the number of points in the
skeleton were more than 2 line segment meet.
So, as we have said that the distance metric of the distance function is not only useful for finding
out the distance between 2 points in an image but the distance metric is also used for useful for
other applications. Though here also, we have found out distance measure between different pair
of points and for skeletonization what we have used is we have first taken the distance
transformation and in case of distance transformation, we have taken the distance of every
foreground pixel from its nearest boundary pixel and that is what gives you a distance
transformed image and from the distance transformed image, we can find out the skeleton of that
particular shape considering the points of curvature discontinuity in the distance transformed
image. And, latter on also we will see that this distance metric is useful in many other cases.
22
Now, after our discussion on this distance metrics, let us see that what are simple operations that
we can perform on the images. So, as you have seen that in case of numerical system, whether it
is decimal number system or binary number system; we can have arithmetic as well as logical
operations. Similarly, for images also we can have arithmetic and logical operations.
Now coming to images, I can add 2 image pixel by pixel. That is a pixel from an image can be
added to the corresponding pixel of a second image. I can subtract 2 images pixel by pixel that is
pixel of an image can be subtracted from the corresponding pixel of that of another image. I can
go for pixel by pixel multiplication; I can also go for pixel by pixel division. So, these are the
different arithmetic operation that I can perform on 2 images and these operations are applicable
both in case of grey level image as well as in case of binary image.
Similarly, in case of binary images, we can have logical operation; the logical operation in terms
of ANDing pixel by pixel, ORing by pixel and similarly inverting pixel by pixel. So, these are
the different arithmetic logical operations that we can do on grey level image and similarly
logical operations on binary image.
23
So here, is an example that if I have a binary image A where again all the pixels with value equal
to 1 are shown as shown in green color and the pixels with value equal to 0 are shown in black
color. Then I can just invert this particular binary image that is I can make a NOT operation or
invert operation, so NOT of A is another binary image where all the pixels in the original image
which was black now becomes white or 1 and pixels which are white or 1 in the original pixel,
those pixels became become equal to 0.
24
Similarly, I can perform other operations like given 2 images - A and B, I can find out A and B,
the logical ANDing operation which is shown in the left image. Similarly, I can find out the
XOR operation and after XOR, the image that I get is shown in the right image.
So, these are the different pixel operations or pixel by pixel operations that I can perform. In
some other applications, we can also perform some neighborhood operations. That is the
intensity value at a particular pixel may be replaced by a function of the intensity values of the
pixels which are neighbors of that particular pixel p.
Say for example; in this particular case, if this 3 by 3 matrix, this represents the part of an image
which has got nine pixel elements Z 1 to Z 9 and I want to replace every pixel value by the
average of its neighborhood considering the pixel itself.
So, you find that at location Z 5 if I want to take the average, the average is simply given by Z 1
plus Z 2 plus Z 3 plus Z 4 upto plus Z 9 that divided by 9. So, this is a simple average operation
at individual pixels that I can perform which is nothing but a neighborhood operation because at
every pixel level, we are replacing the intensity by a function of the intensities of its
neighborhood pixels. And, this averaging operation, we will see later that this is the simplest
form of low pass filtering to remove noise from a noisy image.
25
Now, this kind of neighborhood operation can be generalized with the help of templates. So here,
what we do is we define a 3 by 3 template which is shown on this right hand figure where the
template contains nine elements W 1 to W 9 and if I want to perform the neighborhood operation;
what you do is you put this template, this particular template on the original image in such a way
that the pixel at which I want to replace the value, the center of the template just coincides with
that pixel.
And then at the particular location, we replace the value with the weighted sum of the values
taken from the image and the corresponding point from the template. So, in this case, the value
which will be replaced is given by Z equal to W i Z i summation of this I varying from 1 to 9. And
here, you find that if I simply put W i equal to 1 by 9 that is all the points in the template have the
same value which is equal to 1 by 9, then the resultant image that I will get is nothing but the
averaged image which we have done just in the previous slide.
So, this neighborhood operation using the template is a very very general operation, it is useful
not only for averaging purpose; it is useful for many other neighborhood operations and we will
see later that this can be used for noise filtering, it can be used for thinning a binary images.
26
This same template operation can also be used for edge detection operation in different images.
So, with this, we complete our lecture today. Now, let us see some of the solutions of the
problems that we had given in lecture 3.
So, at the end of lecture 3, I had given a problem that an image is described by a function f (X,
Y) equal to 2 cos 2 phi 3 X plus 4Y and this image is sampled at delta x equal to delta y equal to
0.2; then what will be the reconstructed image?
27
The solution is like this if I take the Fourier transform of the image which is given, then you will
find that the Fourier transform will be a set of will be a 2 dimensional delta function which is
given by del w x minus 3 w y minus 4 plus del w x plus 3 w y plus 4 and in this particular case, the
maximum frequency in the X direction that is w x0 equal to 3 and the maximum frequency in the
Y direction that is w y0 is equal to 4.
So, these are the bandwidths of the image in X direction and Y direction and here the sampling
interval has been given as delta x is equal to delta y equal to 0.2. So, from this sampling interval
if I calculate the sampling frequencies, you find that the sampling frequency in X direction
omega XS will be same as omega YS sampling frequency in Y direction which will be equal to 5.
Now, once I have these sampling frequencies; then the sampled image, frequency spectrum of
the sampled image will be as given here and given a sampling frequency for reconstruction the
low pass filter that is used, usually has a bandwidth which is half of the sampling frequency.
That means the low pass filter that will be used in this particular case will have a bandwidth of
minus 2.5 to 2.5 both for in the Y direction, both for Y X and omega x and omega Y .
28
So, by using this low pass filter if you take out a particular frequency band from the sampled
image, you will find that the frequency band that you will take out is nothing but at frequency 2
along X direction and frequency 1 along Y direction.
So, using this frequency band your low pass filter I mean the reconstructed image will be given
by 2 cos 2 phi into 2X plus Y. So here, again, you find that naturally your reconstructed image is
not same as the original image. So, lot information has been lost and the reason being that the
sample frequency that we have chosen does not meet our criteria. Then we had given a second
problem which is for a uniform quantizer design.
29
And here again, the solution is very simple. Simply follow the steps of the uniform quantizer
design and you will find that the transition levels and the reconstruction levels will get as given
in this 2 expressions.
Now, let us give some quiz questions for lecture number 4 and 5. In the first one you have been
given 2 points p and q, you have to determine; whether q is a 4 neighbor of p, q is a 8 neighbor
of p or q is a diagonal neighbor of p. Second problem, again between 2 points p and q; you have
to find out what is the Euclidian distance, what is the city block distance and what is the chess
board distance.
30
In the third problem, you have been given 2 regions - image region S 1 and image region S 2 . You
have to find out whether S 1 and S 2 are 4 connected or 8 connected or m connected.
The fourth problem, it is a binary image; you have to find out what is the skeleton of this binary
image.
Thank you.
31
Prof. P.K.Biswas
Department of Electronic Electrical Communication
Lecture - 6
Basic Transformations
Hello, welcome to the video lectures series on digital image processing. In today’s lecture we
will discuss about some basic mathematical transformations and we will see that how these
digital mathematical transformations help us in understanding the imaging model and image
formation process by a camera.
So, in our last lecture, we have learnt different distance measures, we have seen the application
of the distance measure which we have said that in addition to finding out the distance between 2
different points, the distance measures are also used to find out the distance transformation of a
binary image and using the distance transformation we can find out the skeleton of the image
which gives a compact representation or compact description of the shape.
We have also seen different arithmetic and logical operations that can be performed on 2 images
and we have also seen what are the different neighborhood operations that can be performed on a
single image.
So, in today’s lecture as we said that we will discuss about some basic mathematical
transformations which will include translation, rotation and scaling and this we will discuss both
in 2 dimension as well as in 3 dimension.
We will also discuss about the inverse transformations of these different mathematical
transformations. We will find out the relationship between Cartesian coordinate system and
homogeneous coordinate system and we will see that this homogeneous coordinate system is
very very useful while discussing about the image formations by a camera.
We will also talk about the perspective transformation and imaging process and then we will talk
about the inverse perspective transformation.
Now, coming to the basic mathematical transformations, let us first talk about that what is the
translation operation and we will start our discussion with a point in 2 dimensional. So, you
know that if I have a 2 dimensional coordinate system given by the axis x and y and if I have a
point P which is having a coordinate given by say (x, y) and I want to translate this point P(x, y)
by a vector x 0 y 0 . So, after translating this point by the vector x 0 y 0 , I get the translate point say
at point P prime whose coordinates are x prime and y prime.
And because the translation vector in this case we have assumed as x 0 y 0 ; so you know that after
translation, the new position x prime will be given by x plus x 0 and y prime will be given by y
plus y 0 . Now so, this is the basic relation when a point at location xy is translated by a vector x 0
y 0 . Now, let us see that how this can be represented more formally by using a matrix equation.
So, if I translation this equation in the form of a matrix, the equation look like this.
I have to find out the new location vector x prime y prime and we have said that this x prime is
nothing but x plus x 0 and y prime is nothing but y plus y 0 . So, this particular relation if I
represent in the form of a matrix, it will simply look like this. So, you find that if you solve this
particular matrix expression, it gives you the same expression x prime equal to x plus x 0 and y
prime is equal to y plus y 0 .
So, on the right hand side, you find that I have product of 2 matrices which is added to another
column matrix or column vector.

Now, if I what to combine all these operation in a single matrix form, then the operation will be
something like this - on the left hand side I will have x prime and y prime which will be in the
form of matrix and on the right hand side I will have 1, 0 then x 0 0, 1 y 0 and then I we will have
x y and 1. So, if I again do the same matrix computation, it will be x prime equal to x plus 0 plus
x 0 which is nothing but x plus x 0 . Similarly, y prime will be 0 plus y plus y 0 which is nothing but
y plus y 0 .
But you find that in this particular case, there is some asymmetry in this particular expression.
So, if I want to make this expression symmetric, then I can write it in this form - x prime y prime
and I introduce one more component which I make equal to 1, this is equal to 1 0 x 0 0 1 y 0 than 0
0 1 and x y 1. So, we find that the second expression which I have just obtained from the first
one is now a symmetric expression and this is what is called and unified expression.
So, we find that basically what I have is I had the original coordinate (x, y) of the point P which
is appended with one more component that is given as 1 and if this modified coordinate is now
transformed by a transformation matrix which is given as 1 0 x 0 0 1 y 0 and 0 0 1; then I get the
translate point as x prime y prime 1 where if I just neglect the additional component which in this
case is 1, then I get the translate point P prime. So, this is about the translation.
In the same manner, given a point P again in 2D; so again I have this point P which is having
again having a coordinate (x, y) and suppose I want to rotate this point P around the origin by an
angle theta. Now, one way of representing this point p is if r is the distance of point P from the
origin; then these of coordinates of the point p. This is the x coordinate, this is the y coordinate, I
can also represent and suppose this angle it is alpha, then I can also represent x as x equal to r
cosine alpha and y equal to r sin alpha.
Now suppose, I want to rotate this point P by an angle theta in the clockwise direction; so, the
new position of P will now P prime having the coordinate location x prime and y prime and this
rotation, angle of the rotation is now angle theta. So, our job is that what will be these points, the
coordinate points x prime and y prime? So, here you find that I can write this x prime as r cosine
alpha minus theta and I can write y prime as r sin alpha minus theta.
So if I simply expand this, so I have x prime is equal to r cosine alpha minus theta and y prime is
equal to r sin alpha minus theta. So, if I simply expand this cosine term, it will simple be r cosine
alpha cosine theta plus r sin alpha sin theta. Now, we know that r cosine alpha is nothing but x,
so it becomes x cosine theta plus r sin alpha is nothing but y, so it becomes y sin theta.
Similarly, in this case if I expand this, it becomes r sin alpha cosine theta minus r cosine alpha
sin theta. So again, r sin alpha this is nothing but y, so this expression y cosine theta minus x sin
theta. So again, so here we find that x prime given by x cosine theta plus y sin theta and y prime
is given by minus x sin theta plus y cosine theta.
So, even now if I repersent this in the form of an matrix equation, it becomes x prime y prime is
equal to cosine theta sin theta, then minus sin theta cosine theta and then I have the original
coordinates x and y. So, here you find that if I rotate the point p by an angle theta around the
origin in the clockwise direction; in that case, the transformation matrix which gives the rotation
transformation is given by this particular matrix which is cosine theta sin theta minus sin theta
cosine theta.
Now, in the same manner if I go for scaleing, say for example if I have scaling S x , scaling vector
of S x in the x direction and I have scaling vector S y in the y direction; in that case, the
transformation matrix for scaling can also repesented as x prime y prime is equal to S x 0 0 S y
and which is multiplied by the originat coordinate (x, y).
So, here you find that the the transformation matrix for performing the scaling operation is
nothing but the matrix S x 0 and 0 S y . So, these are the simple transformation that I can have in 2
dimension. Now, it is also possible to concatenate the transformations. For example, here I have
consider the rotation of a point around the origin. Now, if my application demands that I have to
rotate the point p around the arbitary point q in the 2 dimention, then find out the expression for
this rotation of point p by an angle theta around another point q is not an is easy job, I mean that
expression will be quite complicated.
So, I can simplify this operation just by translating the point q to the origian and the point P also
has to be translated by the same vector and after performing this transformation, the translastion,
if I now rotated point p by the same angle theta and now it will be rotation around the origin. So,
whatever expression that we have found here that is cosine theta sin theta minus sin theta cosine
theta, the same transformation matrix applicable and after getting this rotation, now you translate
back the rotated point by the same vector but in the opposite direction.
So, here the transformation that we are appling is first we are performing a translation of point P
by the vector and after performing this translation, we are performing the rotation. So, this is
your the transformation say R theta. So, first we are translating by a vector say r, then we are
performing the rotation by vector r theta and after doing this, whatever point I get that has to be
translated back by minus r. So, I will put it as translation by the vector minus r.
So, this entire operation will give you the rotation of a point p. Suppose, this point p and I want
to rotete this arount the point Q; so if I want rotete P arount Q by an angle theta, then this
operastion can be performed by concartesian of this translation rotation then followed by
inverese tranlastion which puts back the point to its original point where it should have been after
rotating around point Q by angle theta.
So, these are the different transformations, the basic mathemathcal transformations that we can
do in 2 dimensional space. Now, let us see that what will be corresponding transformations if I
move from 2 dimentional space to the 3 dimentional space.
So, the transformations in the 3D coordinate system that we will consider is translation, rotation
and scaling and the coodinated coordinate system that we will consider in this case is 3D
dimensional coordinate system.
So, first let us see as we have seen in case of 2 dimension that if a point (x, y, z) is translated to
new coordinate say (x star, y star, z star) using a displacement vector x 0 y 0 z 0 ; then this
translated coordinates x star will be given by x plus x 0 y, star will be given by y plus y 0 and z star
will given by z plus z 0 .
So, you see that in our previous case, we have said that because we had only the coordinates x
and y, so this third expression - z star is equal to z plus z 0 that was absent. But now we are
considering a 3 dimentional space a 3D coordinate system, so we have 3 coordinates x, y and z
and all these 3D coordinates, all the 3 coordinate are to be translated by the translation vector x 0
y 0 z 0 and then new translation at the new point we get as x star, y star and z star.
Now, if I write these 3 equation in the form of a matrix, then the matrix equvation will be like
this – (X star, Y star, Z star) on the left hand side will be equal to (1 0 0 x 0 ) (0 1 0 y 0 ) (0 0 1 Z 0 )
into the column vector (X Y Z 1). So, this is the similar situation that we have also seen in case
of 2 dementional that we have to add an additional component which is equal to 1 in our original
ah position vector XYZ .
So in this case, again we have added the additional component which is equal to 1. So, our next
position, our new position vector becomes X Y Z 1 which has to be multiplied with the
translation matrix given by (1 0 0 X 0 ) (0 1 0 Y 0 ) and (0 0 1 Z 0 ). So, again as before, we can go
for an unified expression where this translation matrix which at this moment is having a
dimension 3 by 4. That is it is having 3 rows and 4 columns in unified representations, we will
represent this matrix the dimension of matrix will be 4 by 4 which will be a square matrix and
the left hand side also will have the same unified coordinate that is (X star Y star Z star 1).
So, the unified representation as we have already said is given by (X star Y Star Z star 1) is equal
to the translation matrix (1 0 0 X 0 ) (0 1 0 Y 0 ) (0 0 1 Z 0 ) (0 0 0 1) multiplied by the column
vector (X Y Z 1).
So, this particular matrix that is (1 0 0 X 0 ) (0 1 0 Y 0 ) (0 0 1 Z 0 ) and (0 0 0 1) this represents a

translation a transformation matrix used for the translation and we will represent this matrix by
this uppercase letter T. So, that is about the simple translation that we can have.

So, in unified matrix representation, we have done that if you have a vector V, a position vector
V which is translated by the transformation matrix A, the transformation matrix A is a 4 by 4
transformation matrix; the V if the original position vector was X, Y, Z, we have added and
additional component 1 to it in our unified matrix representation. So, V now becomes a 4
dimensional vector having components X, Y, Z and 1.
Similarly, the transformed position vector V star is also a 4 dimensional vector which is having
components X star, Y star, Z star and 1. So, this is how in the unified matrix representation, we
can represent the translation of a position vector or a translation of a point in 3 dimension.
Similarly, we can have so as I said that this is the transformation matrix which is represented,
which is used for translating a point in point 3D by vector X 1 Y X 0 Y 0 Z 0 or the displacement
vector X 0 Y 0 Z 0 .
Similarly, as we have seen in case of scaling in 2 dimension that if we have the scaling factor of
S x S y and S z along the directions x, y and z; so along direction x, we have the scaling factor S x ,
along the direction y, we have the scaling factor S y and along the direction z, we have the scaling
factor S z , then the transformation matrix for this scaling operation can be written by S equal to
S x 0 0 0, then 0 S y 0 0, then 0 0 S z 0 then 0 0 0 1.
So, here again, if you find you, find that a position vector X Y Z in unified form it will be (X Y Z
1) if that position vector is translated by this scaling matrix, then what we get is the new position
vector corresponding to point (X Y Z 1) in the scale form and there if we remove the last
component that is equal to 1, what we get is the scaled 3D coordinate of the point which has been
scaled up or scaled down.
So, it will be scaling up or scaling down depending upon whether the value of the scale factors is
greater than 1 or they are less than 1.
Then coming to rotation, we have seen that the translation and scaling in 3 dimensions is very
simple, it is as simple as we have done in case of 2 dimensions. But the rotation in 3 dimension is
a bit complicated because in 3 dimension, as we have 3 different axis - x axis, y axis and z axis;
so, when I rotate a point around origin by certain angle, the rotation can be around x axis, the
rotation can y axis, the rotation can be also be around z axis.
So accordingly, I can have 3 different rotation matrixes for representing rotation around a
particular axis and specifically, if the rotation as to be done about an arbitrary point; then what
we have to do is we had to translate the arbitrary point to the origins by using the translation
transformation.
After translating the point to the origin, then we have to perform rotation around the origin, then
we have to translate back the point to its original position. So, which gives us the desired rotation
of any point p in 3D around any other arbitrary point Q also in 3D.
So, now let us see that how this rotation will look like in 3D. So, here you find that we have
shown on the right hand side, this particular figure where this figure shows the rotation of the
point along x axis. So, if the point is rotated along x axis, the rotation is given by is indicated by
alpha. If it is rotated along z axis, the rotation is indicated by theta and if the rotation is done
along y axis, the rotation angle is indicated by beta.
So, if I rotate the point along z axis; so when I am rotating a point along z axis, then obviously
the z coordinate of the point will remain unchanged even in the rotated position. But what will
change is the x coordinate and y coordinate of the point in its new rotated position. And because
the z coordinates is remaining unchanged, so we can think that this is a rotation on a plane which
is parallel to the xy pair.
So, the same transformation which we have done for rotating a point in 2 dimensions in the (x, y)
coordinate, the same transformation matrix holds true for rotating this point in 3 dimension along
the axis z. But know because the number of components in our position vector is more, we have
to take care of the other components as well.
So using this, you find that when I rotate the point around z axis, the rotation angle is given by
theta and the rotation matrix is given by cosine theta sin theta 0 0, minus sin theta cosine theta 0
0, then 0 0 1 0 and 0 0 0 1. So, this is the transformation matrix or rotation matrix for rotating a
point around z axis.
So, here you find that the first few components that is cosine theta sin theta, then minus sin theta
and cosine theta; so this 2 by 2 matrix is identical with the transformation matrix or rotation
matrix that we have obtained in case of 2 dimensions. So, this says that because the z coordinate
is remaining the same, so the x coordinate and the y coordinate due to this rotation around z axis
follows the same relation that we have derived in the case of 2 dimensions.
Similarly, when I translate the point around y axis where the rotate the point around y axis, there
angle of rotation is given by beta. So, R beta as given in this particular case, gives you the
rotation matrix and if you rotate the angle around the x axis where the rotation angle is given by
alpha, you find that R alpha gives you the corresponding rotation matrix along the corresponding
transformation matrix for rotation along the x axis.
So as before, here you find that when you rotate the point along around the x axis, the x
coordinate will remind the same. Whereas, the y coordinate and the z coordinate of the point is
going to differ. Similarly, when you rotate the point around y axis, the y coordinate is going to
remain the same but x coordinate and z coordinate, they are going to be different.
Now, as we have also mentioned in case of 2 dimensions that different transformations can be
concatenate. So, here we have shown that say how we can concatenate the different
transformations. Here, you find that all the transformations that we have considered in 3
dimensions, all of them are in the unified form. That is every transformation matrix is a 4 by 4
matrix and all the coordinates that we consider, we add 1 to the coordinate X Y Z so that our
position vector become a 4 dimensional vector and the translated point is also a 4 dimensional
vector.
So, if I want to concatenate this different transformation that is translation, scaling and rotation
and if I want to rotate a point about Z axis; then this translations, scaling and rotation, this can be
concatenated as first you translate the point V by the translation operation T of the translation
matrix T, then you perform scaling, then you perform rotation and this rotation is R theta and all
these 3 different transformation matrix that is R theta, S and T all of them being 4 dimension
matrix can be combined into a single matrix say the single transformation matrix in this case, A
which is nothing but the product of R theta, S and T and this A again will be a matrix of
dimension 4 by 4.
Now, you note that whenever we are going for concatenation of transformations, the order in
which these transformations are to be applied, that is very very important because these matrix
operations are in general not commutative.
Now, just to illustrate that this matrix operations are not commutative, let us take this example.
Suppose, I have this particular point V and to this point V, I want to perform 2 kind of operation.
One is translation by vector, so the translations vector we have represented by this arrow by
which this point has to be translated and the point V is also to be rotated by certain angle.
Now, there are 2 ways in which these 2 operations can be done. The first one shows that suppose
I rotate point V first by using the operation RV. So, here the transformation matrix is A R for the
rotation operation and after rotating this point V by using this transformation R, I translate the
rotated point by using the transformation matrix T.
So, if I do that, you find that this V is the original position of this point. If I first rotate it by
using this rotation transformation, the point V comes here in the rotated position and after this if
I give the translation to this point V by this translation vector, then the translated point is coming
over here which is represented by V 2 .
So, the point V 2 is obtained from V first by applying the rotation by transformation R followed
by applying the translation by the transformation T. Now, if I do it reverse that is first I translate
the point V using this translation transformation T and after this translation, I rotate this
translated point which now Tv by the same angel theta; so what I do is first I translate the point
using the transformation T and after that this translated point is rotated by using the rotation
transformation R and this gives me the rotated point or the new point that is equal to V 1 .
Now, from here you find that in the earlier case where in I got the point V 2 and now I get the
point V 1 . This V 1 and V 2 , they are not the same point. So, this clearly illustrates that whenever
we go for concatenation of different transformations, we have to be very very careful about the
order in which this transformations are to be specified or to the transformations are to be applied
because if the order in which the transformations are applied vary, then we are not going to get
the same end result. So, for any such concatenation, the order in which the concatenation is
applied the transformations are applied that has to be thought of very very careful.
So, that is about translation of the transformation of a single point. Now, if I have to transform a
set of points; say for example, I can have a square figure which, say it is like this, say for
example, in a 2 dimensional space X Y, I have the square figure like this.

So, this will have 4 vertices, the vertices I can represent as point P 1 , point P 2 , point P 3 and point
P 4 . Now, so far the transformations that we have discussed, that is the transformation of a single
point around origin or the transformation of a single point around another arbitrary point in the
same space. Now here, if I have to transform to this entire figure, that is for example, I want to
rotate this entire figure about the origin or I want to translate this entire figure by certain vector
say V, so say for example I want to translate the entire figure to this particular position; so you
find that here all this points P 1 , P 2 , P 3 and P 4 all these points are going to be translated by the
same displacement vector V.
So, it is also possible that we can apply transformation to all the points simultaneously, rather
than applying transformation to individual points 1 by 1. So, for a set of m points, what we have
to do is we have to construct a matrix V of a dimension 4 by m. That is every individual point
will now be considered, of course in the unified form, will now be consider as a column vector of
this matrix which is of dimension of 4 by m. And then, we have apply the transformation A to
this enter matrix and the transformation after this transformation, we get the new matrix V star
which is given by the transformation A multiplied by the B.
So, you have find that any particular columns, i’th columns in the matrix in the V i V star which
is a V i star is the transformed point corresponding to the i’th column of matrix B which is
represented V i . So, if I have a set of points which are to be transformed by the same
transformation, then all those points can be arranged in the form of columns of a new matrix. So,
if I have n number of points, I will have a matrix having m number of columns. The matrix will
also obviously have 4 rows and this new 4 by m matrix that I get, this entire matrix has to be
transformed using the same transformation operation and I get the transformed points again in
the form of a matrix and from that transform matrix, I can identify that which point which is the
transformed point of the original points.

Now, once we get these transformations, again we can get the corresponding inverse
transformation. So, the inverse transformations, in most of the cases can be obtained just by
observations. Say for example, if I apply if I translate a point by a displacement vector V, then
inverse transformation should bring back that translated point that transformed point to its
original position. So, if my translation is by a vector V, the inverse transformation or the inverse
translation should be by a vector minus V. So, the inverse transformation matrix T inverse can be
obtained as (1 0 0 minus X 0 ) (0 1 0 minus Y 0 ) (0 0 1 minus Z 0 ) then (0 0 0 1).
So, you remember that the corresponding transformation matrix that we said was (1 0 0 x 0 ) (1 0
0 (0 1 0 Y 0 ) (0 0 1 Z 0 ) and (0 0 0 1). So, we find that X 0 Y 0 and Z 0 , they have just been negated
to give you the inverse translation matrix T inverse. So similarly, by the same observation, we
can get inverse rotations R theta inverse where what we have to do is in the transformation
matrix original rotation matrix, we have the term cosine theta sin theta minus sin theta cosine
theta; now, all this thetas are to be replace by minus theta which gives me the inverse rotation
matrix around the Z axis.
Similarly, we can also find out the inverse matrix for scaling where the S x will be replaced by 1
upon S x . So, these are the basic transformation, basic mathematical transformations. Now, we
will see another form of transformation which is called a perspective transformation. Now, this
perspective transformation is very very important to understand how a point in the 3 dimension
in the 3D world is imaged by a camera.
So, this perspective transformation is also known as an imaging transformation and the purpose
of this imaging transformation is to project a 3D point, a 3D world point into the image plane
and this gives an approximation to the image formation process which is actually followed by a
camera. Now, let us see what is this perspective transformation.
Here, we have shown a figure which is an approximation of the image perform image formation
process. Here, you find that we have 2 coordinate systems which as a superimposed one over the
other. One is the 3D world coordinate system represented by capital X, capital Y, capital Z. So,
this is the 3D world coordinate system capital X capital, Y and capital Z and I also have the
camera coordinate system which is given by lowercase x, lowercase y and lowercase z.
Now, here we have assumed that this camera coordinate system and the 3D world coordinate
system, the perfectly aligned. That is x axis of the 3D world coordinate systems coincides with
the x axis of the camera coordinate system, y axis of the world coordinate system coincides with
the y axis of the camera coordinate system. Similarly, Z axis of the world coordinate system
coincides the Z axis of the camera coordinate system. So, they have, both this coordinate systems
have the same origin.
Now, if I have a point X, Y, Z in 3D; so this is the point XYZ in 3 dimension and I have assume
that the center of the lens is at location 0 0 lambda; so obviously, the lambda which is the Z
coordinate of the lens center, this also nothing but the focal length of the camera and this X, Y, Z
this particular 3D point, we are assume that it is mapped to the camera coordinate given by
lowercase x and lowercase y.
Now, our purpose is that if I know this 3D coordinate system - capital X, capital Y, capital Z and
I know the value of lambda that is the focal length of the camera whether it is possible to find out
the coordinate, the image coordinate corresponding to this 3D world coordinate X, Y, Z.
So to this, we apply the concept of similar triangles. So here, what we do is by using the similar
triangles we can find out an the expression that lowercase x by lambda is equal to minus capital
X by capital Z minus lambda which is nothing but capital X by lambda minus Z and y by lambda
is in the same manner is given by capital Y by lambda minus capital Z.
So from this, I can find out that the image coordinates of the 3D world coordinate capital X
capital Y, capital Z is given by x coordinate x inverse coordinate x is given by lambda x by
lambda minus capital Z. Similarly, the inverse coordinate y is given by lambda capital Y divided
by lambda minus capital Z. Now, these expressions also can be represented in the form of the
matrix and here we find that if I go for homogeneous coordinate system, then this matrix
expression is even simpler.
So, let us see what is this homogeneous coordinate system. Homogenous coordinate system is if
I have the Cartesian coordinate capital X, capital Y capital Z; then we have said that in unified
coordinate system, we just append a value 1 as an additional component. Homogeneous
coordinate system is instead of simply adding 1, we add an arbitrary non 0 constant say k and
multiply all the coordinated X, Y and Z by the same value k.
So, given the Cartesian coordinate, capital X capital Y capital Z; I can convert this to
homogenous coordinate by k times capital X, k times capital Y and k times capital Z. The inverse
process is also very simple that if I have a homogenous coordinate, then what have to do is I
have to divide all the components of the homogenous coordinate by the fourth term.
So, in this case the fourth term is k and all the other terms where kX, kY and kZ; so if I divided
all this 3 terms by the fourth component k, I get the Cartesian coordinate X Y Z. So, I can
convert a 3D point, the coordinates from the Cartesian coordinate system to the homogeneous
coordinate system and I can also very easily convert from homogenous coordinate from
homogenous coordinate system to Cartesian coordinate system.
Now, to understand the imaging process, let us defined a perspective transformation which is
given by P equal to (1 0 0 0) (0 1 0 0) then (0 0 1 0) then (0 0 minus 1 upon lambda). You
remember, this lambda is focal length of the camera and then 1 and we translate our world
coordinate W to the homogeneous coordinates; so, it becomes kX, kY, kZ and k.
Now, if I translate if I transform this homogenous world coordinate by this perspective

transformation matrix P, then I get the homogenous camera coordinates C h which is given by kX,
kY, kZ then minus k into Z by lambda plus k. So, this is the homogeneous camera coordinate.
Now, if just convert this homogenous coordinate camera coordinate to the Cartesian camera
coordinate, I find that the Cartesian camera coordinate is given by C equal to small x, small y
small z which is nothing but lambda X divided by lambda minus Z, lambda Y divided by lambda
minus Z and lambda Z divided by lambda minus Z. So, in the right hand side, this x, y, z they are
all in uppercase indicating those are the coordinate of the world point.
Now, if I compare this expression with the camera coordinate that we have obtained with
respective our previous diagram, you find that here we get lowercase x is nothing but lambda
capital X divided by lambda minus Z. Similarly, lowercase y is also nothing but lambda capital Z
by lambda minus Z.
So, using our previous diagram, we have also seen that these lowercase x and lowercase y, they
are the image point on the image plane of the world coordinate - capital X, capital y and capital
Z. So, this shows clearly that using the perspective transformation that we have defined as the
matrix p; I can find out the image coordinates of world coordinate point capital X, capital Y
capital Z following this particular transformation P.
Now, here in this particular case, the third component that we have obtained that is the value of Z
which is not importance in our case because in the camera coordinate, the value of Z is always
equal to 0 because we are assuming that the imaging plane is the X Y plane of the world
coordinate system as well as the camera coordinate system.
So, we will stop our discussion here today and the next class, we will see that as we have seen
that with the perspective transformation; we can transform a world coordinate, we can project a
world point, a 3D world point onto an imaging plane similarly using the inverse perspective
transformation; whether it is possible that given a point in inverse plane whether we can find out
the corresponding 3D point in the 3D world coordinate system.
So, now let us discuss about the questions that we have given at the end of lecture 5. The first
question that was asked is there was a figure showing 2 points - p and q and the question was we
have to find out which of the options are true? That is whether q is a 4 neighbor of p, q is 8
neighbor of p or q is diagonal neighbor.
Now, here you find that in this particular figure, when you look at the locations of p and q, the 4
neighbors of P are the 2 points which are vertically upward and downward and the 2 points
which a horizontally to the left and to the right but the point q is in the diagonally upward
direction of point and we have said that all the 8 point around point p, they form the 8 neighbors
of point p.
So, in this particular case, the point q which a diagonal neighbor point p and it is also one of the
8 neighbors and point p. So, the options b, that is q belonged to N 8 p and q belonging to N D (p)
both these options are true; whereas the option a, that is q belongs to N 4 (p), that is wrong. The
second question quit obvious, quit simple. You have to find out the Euclidean distance, city block
distance and chess board distance between the 2 points p and q.
So, this is a very simple question, you can write yourself. Let us go to the third question. Here,
we have given 2 sub images S 1 and S 2 and you have determine whether S 1 S 2 are 4 connected, 8
connected or m connected.
If you remember, when we discussed about the connectivity; we have said that 2 images are 4
connected, 2 sub images are 4 connected if there is any point in one of the sub image which is 4
connected to a point in the other terminals.
In this particular case, you find that there is no point in S 1 which is 4 connected with the point in
S 2 and similarly, there is no point in S 2 which is 4 connected to a point in S 1 . So, S 1 and S 2 , they
cannot be 4 connected. But here you find that I have a point in S 1 , say here and there is a point in
S 2 , here, these 2 are actually connected but there are not 4 connected.
So, these 2 points are connected, we can say that these 2 points are connected in 8 connected
sense because this point is an 8 neighbor of this point. Similarly, this point is also an 8 neighbor
of this point and at the same time, these points are also connected in m connected sense because
there is no common 4 neighbor of this point of and this point. So, we can say that these image
regions S 1 and S 2 ; they are both 8 connected as well as m connected. But these two regions, S 1
and S 2 , they are not for connected.
Now, coming to the next question; here, we had given a binary picture, binary image and you
where asked to find out what is skeleton of this binary image. Now, you remember that when we
have discuses about the skeletonization of any binary figure, the first operation that we have
done was the distance transformation of that particular figure.
Now here, if I find out the distance transformation, you find that all these boundary points, they
will get a distance value equal to 1. Similarly, these points will also get distance value is equal to
1, the other points, these points will get distance value is equal to 2, these points will also get
distance value is equal to 2. So once, in this manner if you find out the distance values at every
point, then from that distance transformation, you can find out the points where there is curvature
is discontinuity and we have said that the point said curvature discontinuity, they form the
skeleton of the particular image.
So now, let us see have some quiz questions on today’s lecture. The first question is what is the
concatenated transformation matrix for translation by vector [1 1] followed by rotation by angle
45 degree in 2 dimension? The second question is here there a figure, a square with 4 corners.
So, if this figure is first scaled by factor 2 and then translated by vector [2 2]; what is the
transformed figure?
Third question - determine the figure if translation is applied first followed by scaling? Fourth
question - a unit cube with vertices (0 0 0) (0 0 1) (0 1 0) (0 1 1) (1 0 0) (1 0 1) (1 1 0) and (1 1
1) is scaled using S x equal to 2, S y equal to 3 and S z equal to 4; then what are the vertices of the
transformed figure?
The fifth one - a camera lens as a focal length of 5, find out the image point corresponding to a
world point at location (50, 70, 100). Assume the image coordinate system and the world
coordinate system to be perfectly aligned.
Thank you.
Prof. P. K. Biswas
Lecture - 7
Camera Model & Imaging Geometry
Hello, welcome to the video lecture series on digital image processing. In our last lecture, we had
talked about a number of basic transformations and we have said that these transformations are
very very useful to understand the image formation process.
So, in the last class, what we had talked about is the basic transformations and we have talked
about the transformations like translation, rotation and scaling and these transformations, we
have said both in the 2 dimensions and 3 dimensional cases.
Then for all these transformations, we have also seen what is the corresponding inverse
transformation. Then, after that we have gone for the conversion from the Cartesian coordinate
system to homogeneous coordinate system and we have seen the use of homogenous coordinate
system in perspective transformation where perspective transformation, we have said is an
approximation of the imaging process so that when a camera takes the image of a point in a 3
dimensional world, then this imaging transformation can be approximated by the perspective
transformation that we have discussed in the last class.
1
Today we will talk about the inverse perspective transformation. We have said that the
perspective transformation takes an image of a point or a set of points in the 3 dimensional world
and these points are mapped to the imaging plane which is a 2 dimensional plane. The inverse
perspective transformation just does the reverse process that is given a point in the imaging
plane; we will see that using this inverse perspective transformation, whether it is possible to find
out that what is the point in the 3 dimensional coordinate system to which this particular image
point corresponds.
Then we will also talk about the imaging geometry where the world coordinate system and the
camera coordinate system are not aligned. You try to remember that in the last class, the imaging
geometry that we had considered; there we have assumed that the 3 dimensional world
coordinate system is aligned with I mean camera coordinate system. That is X axis of the camera
is aligned with the X axis of the 3D world, Y axis of the camera is aligned with the Y axis of the
3D world and Z axis of the camera is also aligned with the Z axis of the 3D world.
In addition to that the origin of the camera coordinate system also coincides with the origin of
the image coordinate system. In today’s lecture we will take a generalized imaging model where
the camera coordinates system and the 3D world coordinate system, they are not aligned and
which is a general situation.
Then we will try to see that what are the transformations which are involved in a such in such a
generalized imaging setup which will help us to understand the image formation process in a
generalized setup. Then we will illustrate this concept with the help of an example. Now, let us
briefly recapitulate what we had down in the last class.
2
Now, this figure shows the image imaging geometry that we had considered where the 3D world
coordinate system is aligned with the camera coordinate system. There we have taken a 3D point
whose coordinates are given by X Y Z all in the capital and (x, y) lowercase coordinates are the
corresponding image point in the imaging plane and we have assumed that the focal length of the
camera is lambda that means the coordinate of the focal point of the lens center is (0, 0, lambda).
Now, using this particular figure, we have tried to find out a relation between the 3D world
coordinate X Y Z and the corresponding image point which is (x, y). For that what we have
3
down is we have taken a conversion from the Cartesian coordinate system to a homogenous
coordinate system.
So, while doing this conversion, what we have down is every component of the coordinate that is
X Y Z is multiplied by a non 0 arbitrary constant k and the same value of k is attended with the 3
components. So, for a Cartesian component (X, Y, Z) the corresponding homogenous coordinate
is given by kX kY kZ and Z.
So, for a world coordinate point X Y Z, once we have the corresponding homogenous coordinate
kX kY kZ and k, then we found that after this conversion if we define a perspective
transformation; so this perspective transformation matrix P which in this case (1 0 0 0) (0 1 0 0)
(0 0 1 0) and (0 0 minus 1 upon lambda 1) and the homogenous coordinate W h is transformed
with this perspective transformation matrix P, then what we get is the homogenous coordinate of
the camera point to which this world point W will be mapped and the homogenous coordinate of
the camera point of the image point after the perspective transformation is obtained as kX kY kZ
minus k Z by lambda plus k.
And, we will see that if I convert this homogenous camera point, the homogenous image point
into the corresponding Cartesian coordinate, then this conversion gives us the Cartesian
coordinates of the image point as (x y z) equal to (lambda X divided by lambda minus Z, lambda
Y divided by lambda minus Z and lambda Z divided by lambda minus Z).
4
So, you just note that (x y z) in the lower case letters, these indicate the camera coordinate the
image coordinate; whereas X Y Z in the capital form, this represents the coordinate in the 3D
world of or the 3D coordinate of the world point W. Now, what we are interested in is the
camera coordinate x and y at this moment, we are not interested in the image coordinate z.
5
So, this can be obtained by simply simple conversion that if we find out the value of lower case z
with respect to lambda and capital Z, then after solving the same equations here that is lower
case x, lower case y and lower case z; we find that the image coordinate x and y in terms of the
3D coordinate capital X and capital Z is given by x equal to lambda X divided by lambda minus
capital Z and image coordinate y equal to lambda times capital Z divided by lambda minus
capital Z.
So, as we said that the other value that is the z coordinate in the image plane is of no important at
this particular moment but we will see later that when we talk about the inverse perspective
transformation, when we try to map an image point to the corresponding 3D point in the 3D
world; then we will make use of this particular coordinate z in the image plane as a free variable.
6
So, now let us see that what is the corresponding inverse perspective transformation that we can
have. So, as we have said that a perspective transformation maps a 3D point onto a point in the
image plane. The purpose of inverse perspective transformation is just the reverse. That is given
a point in the image plane; the inverse perspective transformation or P inverse tries to find out
the corresponding 3D point in the 3D world.
So, for doing that, again we make use of the homogenous coordinate system that is the camera
coordinate C of the image coordinate point C will be replaced will be converted to the
corresponding homogenous form which is given by C h and the world coordinate, world point W
will also be obtained in the form, in the homogenous coordinate from W h .
And, we define a inverse perspective transformation P inverse which is given by (1 0 0 0) (0 1 0

0) (0 0 1 0) and (0 0 1 upon lambda 1) and you can usually verify that this matrix, this
transformation matrix is really an inverse of the perspective transformation matrix P because if
the if we multiply the perspective use transformation matrix by this matrix P inverse, what we
get is really an unitary matrix.
7
Now, given this inverse perspective transformation matrix as we said that if we assume an image
point say x 0 , y 0 and we want to find out what is the corresponding 3D world point W to which
this x 0 , y 0 image point corresponds.
So, the first step that we will do is to convert this image point x 0 , y 0 to the corresponding
homogenous coordinate which will be obtained as kx 0 , ky 0 and 0 and then the forth component
comes as k. Now, here you find that the third component or z coordinate we have taken a 0
because what we have is a point in 2 dimensions that is on the imaging plane. So, we have
assumed the z coordinate to be 0.
Now, if we multiply or if we transform this homogenous coordinate kx, kx 0 , ky 0 , 0, k with the

inverse perspective transformation P inverse, then what we get is the homogenous coordinate
corresponding to the 3D world point which is obtained has W h as given in this equation; W h is
equal to kx 0 , ky 0 , 0, k.
Now, from this homogenous coordinate system if I convert this to the Cartesian coordinate form,
then the Cartesian coordinate corresponding to this homogenous coordinate is obtained as W
equal to capital X capital Y capital Z which is nothing but x 0 , y 0 , 0.
So, you find that in this particular case, the 3D world coordinate is coming as x 0 , y 0 , 0 which is
the same point from where we have started that is the image point from where we have started.
Moreover, for all the 3D coordinate points, the z component always comes as 0. Obviously, this
solution is not acceptable because for every coordinate or for every point in the 3 dimensional
world, the z coordinate cannot be 0. So, what is the problem here?
8
If you remember the figure of imaging system that we have used, let me just draw that particular
figure. We had an imaging plane X Y plane like this on which the camera coordinate system and
the image coordinate system camera coordinate system and the 3D world 3D coordinate system,
they are perfectly aligned.
So, we had this x same as capital X, we had this y same as capital Y, we had this z same as
capital Z and this is the origin of both the coordinate systems and we had somewhere here the
optical center of the lens. Now, if I take some point here, some image point here and if I draw a
line passing through this image point and the camera optical center and the world point W comes
somewhere at this location.
So, we have seen in the previous figures that this point if I call this point as C, this point C is the
image point corresponding to this 3D world point W whose coordinate is given by capital X
capital Y capital Z and this C in our case has a coordinate of x 0 , y 0 and 0 and when we have tried
to map it back this point C to the 3 D world coordinate system; what we have got is for every
point W, the value of z come out to be 0.
Now, the problem that comes here is because of the fact that if I analyze this particular mapping
that is mapping of point W in the 3D world to point C in the image plane, this mapping is not a 1
to 1 mapping. Rather, it is a many to 1 mapping.
Say for example, if I take any point on this particular straight line passing through the point C
and the point (0 0 lambda) which is nothing but the optical center of the camera lens; then all
these points on this line will be mapped to the same point C in the image plane.
So naturally, these being a many to 1 mapping; when I do the inverse transformation using the
inverse perspective transformation matrix from image point C to the corresponding 3D world,
9
the solution that I will get cannot be acceptable solution. So, we have to have something more in
this formulation and let us see what is that we can add over here.
Now here, if I try to find out the equation of the straight line which passes through the point x 0 ,
y 0 that is the image point and the point (0 0 lambda) that is the optical center of the camera lens;
the equation of the straight line will come of this form. That is capital X equal to x 0 by lambda
into lambda minus capital Z and Y equal to capital Y equal to y 0 by lambda into lambda minus
capital Z.
So, this is the equation of the straight line so that the points every point in this straight line is
mapped to the same point x 0 , y 0 in the image plane. So, the inverse perspective transformation as
we have said that it cannot give you a unique point in the 3D world because the mapping the
perspective transformation was not a 1 to 1 mapping.
So, by using the inverse perspective transformation, even if we cannot get exactly the 3D point
but at least the inverse transformation matrix should be able to tell me that the points belonging
to which particular line maps to this point x 0 , y 0 in the image plane. So, let us see whether we
can have this information at least.
10
So for doing this, in earlier case, when we have converted the image point x 0 , y 0 to the
homogenous coordinate, then we have taken kx 0 ky 0 0 and k. Now here, the z coordinate, what
we will do is instead of assuming the z coordinate to be 0, we will assume the z coordinate to be
a free variable. So, in our homogenous coordinate, we will assume the homogenous coordinate to
be kx 0 , ky 0 , kz and k.
Now this point, when it is inverse transformed, using the inverse transformation matrix, then
what we get is the world coordinates, the world point in homogenous coordinate system as W h is
equal to P inverse C h and this particular case, you will find that this W h is obtained as kx 0 , ky 0 ,
kz, kz by lambda plus k. So, this W h we have got in the homogenous coordinate system.
Now, what we have to do is this homogenous coordinate, we have to convert to the Cartesian
coordinate system and as we have said earlier that for this conversion, we have to divide all the
components with the last component. So in this case, kx 0 , ky 0 and kz, z all of them will be
divided by kz by lambda plus k.
11
So, after doing this division operation, what I get in the Cartesian coordinate system is W equal
to sorry here it is not C it should be W, so W equal to X Y Z which is equal to, so this point
should be W; so, what we get is w equal to X Y Z which is equal to lambda x 0 divided by lambda
plus Z, lambda y 0 divided by lambda plus Z and lambda Z divided by lambda plus Z.
So, on the right hand side all the z’s are the lower case letters which is the free variable that we
had assumed that we have used for the image coordinate and for the matrix, the column matrix
on the left hand side, all X Y Z are in upper case letters which indicate that these X Y Z are the
3D coordinate.
12
So, now what we do is we try to solve the values solve for the values of capital X and capital Y.
So, just from this previous matrix, you find that the capital X is give by lambda x 0 divided by
lambda plus lower case z, capital Y is given by lambda y 0 divided by lambda plus lower case z
and capital Z is equal to lambda lower case z divided by lambda plus lower case z.
So, from this these 3 equations, I can obtain capital X equal to x 0 by lambda into lambda minus Z
and Y equal to y 0 by lambda into lambda minus Z. So, if you recall the equation of the straight
line that passes through x 0 , y 0 and (0 0 lambda), you will find that the equation of the state line
was exactly this; that is capital X equal to x 0 by lambda into lambda minus z capital Z and capital
Y equal to y 0 by lambda into lambda minus capital Z.
So, by using this inverse perspective transformation, we have not been able to identify the 3D
world point which is of course not possible but we have been able to identify the equation of the
straight line so that the points on this straight line maps to the image point x 0 , y 0 in the image
plane. And now, if I want to exactly find out a particular 3D point to which this image point x 0 ,
y 0 corresponds; then I need some more information.
Say for an example, I at least need to know what is the Z coordinate value of the particular 3D
point W and once we know this, then using the perspective transformation along with this
information of the Z coordinate value, we can exactly identify the point W which maps to point
x 0 , y 0 in the image plane.
Now, till now all the discussions that we have done, for all these discussions we have assumed
that the image coordinate system and the camera coordinate system they are perfectly aligned.
Now, let us discuss about a general situation where the image coordinate system and the camera
coordinate system, they are not perfectly aligned. So, here we assume that the camera is mounted
on a Gimbal.
13
So, if you mount the camera on a Gimbal, then using the Gimbal; the camera can be a pan of
angle theta, it can also be given a tilt by an angle alpha. So, you remember that pan is the rotation
around Z axis and the tilt is rotation around X axis.
We also assume that the Gimbal center is displaced from the 3 world coordinate origin (0 0 0) by
a vector W 0 which is equal to x 0 , y 0 , z 0 and finally, we also assume that the camera center or the
center of the imaging plane is displaced from the Gimbal center by a vector r which will have
component say r 1 , r 2 and r 3 in the X, Y and Z direction of the 3D world coordinate system.
Now here, our interest is given such a type of imaging arrangement, now if we have a 3D world
a point in the 3D world coordinate W; what will be the camera coordinate, what will be the
image point C to which this world point W will be mapped. So, this is a general situation. And
now, let us see that we can obtain the solution to this particular problem that for this generalized
imaging setup for a world point W, what will be the corresponding image point C.
So, the steps will be like this. Since our earlier formulation were very simple in which case we
have assumed that both the camera coordinates system and the 3D world coordinate system, they
are perfectly aligned; in this generalized situation, we will also try to find out a set of
transformations which if applied one after another will bring the camera coordinate system and
the world coordinate system in perfectly aligned.
So once that alignment is made, then we can apply the perspective transformation to the
transformed 3D world points and this perspective transformation to the transform 3D world
point, give us the corresponding image coordinates of the transform point W. So, what are the
transformation steps that we need in this particular case?
14
So, the first step is we assume that the image coordinate system and 3D world coordinate system,
they are perfectly aligned. So from this, we displace the Gimbal center from the origin by the
vector W 0 and after displacing the Gimbal center from the origin by W 0 , we pan along X axis by
an angle theta followed by tilt around Z tilt of Z axis by angle alpha which will be followed by
the final displacement of the image plane with respect to Gimbal center by the vector r.
So, we have 4 different transformation steps which are to be applied one after another and these
transformation steps will give you the transformed coordinate of the 3D world point W. So, let us
see, how this transformation is to be applied one after another.
15
So here, on the left hand side, we have shown a figure where a camera coordinate system and the
world coordinate system are perfectly aligned. Now, from this alignment, if we give a
displacement by a vector W 0 to the Gimbal center, then the camera will be displaced as shown
on the right hand side of the figure where you find that the center is displaced by vector W 0 .
You remember that if I displace the camera center by vector W 0 , then all the world coordinates
all the world points will be displaced by a vector minus W 0 with respect to the camera. Now, you
just recollect that when we try to find out the image point of a 3D world point; then the image
point, the location of the image point is decided by the location of the 3D world point with
respect to the camera coordinate, it is not with respect to the 3D world coordinate.
So, in this case also after a set of transformations, we have to find out what are the coordinates of
the 3D world point with respect to the camera coordinate system where originally the coordinates
of the 3D world point are specified with respect to the 3 world coordinate systems.
So here, as we displaced the camera center by vector W 0 ; so all the world coordinate points, all
the world points will be displaced by the vector which is negative of W 0 that is by minus W 0 and
if W 0 has components of x 0 along X direction, y 0 along Y direction and z 0 along Z direction, so
the corresponding transformation to the 3D points will be minus x 0 , minus y 0 and minus z 0 .
And, we have seen earlier that if a 3D point is to be displaced by minus x 0 , minus y 0 , minus z 0 ,
then in the unified representation, the corresponding transformation matrix for this translation is
given by G equal to (1 0 0 minus x 0 ) (0 1 0 minus y 0 ) (0 0 1 minus z 0 ) and then (0 0 0 1). So,
this is a transformation matrix which translates all the world coordinates, all the world points by
vector minus x 0 , minus y 0 minus z 0 and this transformation is now with respect to the camera
coordinate system.
The next operation as we said that after this displacement, we pan the camera by angle theta and
this panning is done along the Z axis. So, when we pan along Z axis, the coordinates which are
going to change is the X coordinate and the Y coordinate; the Z coordinate value is not going to
change at all and for this panning by an angle theta, again we have seen earlier that the
corresponding transformation matrix for rotation theta is given by r theta equal to (cosine theta
sin theta 0) (minus sin theta cosine theta 0 0) then (0 0 1 0) and then (0 0 0 1).
So, when we rotate the camera by an angle theta; all the world coordinate points, all the world
points will be rotated by the same angle theta but in the opposite direction and that corresponding
transformation matrix will be given by this matrix R theta.
So, we have completed 2 steps. That is first displacement of the camera center with respect to
origin of the world coordinate system, then panning the camera by angle theta. The third step is
now we have to till the camera by an angle alpha and again we have to find out what is the
corresponding transformation matrix for this tilt operation which has to applied to all the 3D
points.
16
So, for this tilt operation by an angle alpha, the corresponding transformation matrix R alpha will
be given by (1 0 0 0) (0 cosine alpha sin alpha 0) then (0 minus sin alpha cosine alpha 0) and (0 0
0 1). So, just you collect that these are the basic transformations which we have already
discussed in the previous class and how these transformations are being used to understand the
imaging process. So, so far we have applied one displacement and 2 rotation transformations
along R theta and R alpha.
17
Now, find that this R theta and R alpha, they can be combined into a single rotation matrix R
which is equal to R alpha concatenated with R theta and the corresponding transformation matrix
R will be given by (cosine theta sin theta 0 0) then (minus sin theta cosine alpha cosine theta sin
alpha sin alpha 0) then (sin theta sin alpha minus cosine theta sin alpha then cosine alpha 0) and
then (0 0 0 1).
Then final transformation that we have to give to the camera center or the center of the imaging
plane from the Gimbal center by a vector R and this vector R has the components r 1 , r 2 and r 3
along X, Y and Z directions and by this transformation, now all the world points are to be
transformed are to be translated by a vector minus r 1 minus r 2 minus r 3 and the corresponding
translation matrix, now will be T equal to (1 0 0 minus r 1 ) (0 1 0 minus r 2 ) (0 0 1 minus r 3 ) and
then (0 0 0 1).
18
So, if I apply all these transformation one after another and I represent the 3D world point W by
the corresponding homogeneous coordinates W h ; then your find that these transformations T, R
and G taken together on this homogeneous coordinate W h gives you the homogeneous
transformed point W as seen by the camera. And once I have this transformed 3D point, then
simply applying the perspective transformation on this transformed 3D points will give you the
camera coordinate in the homogeneous point.
So, now the camera coordinate C h is given by PTRG into W h . So, you remember that this
coordinate comes in the homogeneous form. Then final operation that we have to do is to convert
this homogenous coordinate C h into the corresponding Cartesian coordinate C.
19
So, that Cartesian coordinate if I solve those equations will come in this form, you can try to
derive this equations that X equal to lambda into X minus X 0 cosine theta plus Y minus Y 0 sin
theta minus R 1 divided by minus X minus Y X 0 sin theta sin alpha plus Y minus Y 0 cosine theta
sin alpha minus Z minus Z 0 cosine alpha plus R 3 plus lambda.
And, the image coordinate Y is given by lambda X minus X 0 sin theta cosine alpha plus Y minus
Y 0 cosine theta cosine alpha plus Z minus Z 0 sin alpha minus R 2 divided by minus X minus X 0
sin theta sin alpha plus Y minus Y 0 cosine theta sin alpha minus Z minus Z 0 cosine alpha plus
R 3 plus lambda.
So, these are the various transformation steps that we have to apply if I have a generalized
imaging setup in which case the 3D coordinate axis and the camera coordinate axis, they are not
aligned. So, the step that we have to follow is first we assume that the camera coordinate axis
and the 3D coordinate axis, they are perfectly aligned. Then, give a set of transformations to the
camera to bring to its given setup and apply the corresponding transformations but in the reverse
direction to the 3D world coordinate points.
So, by applying these transformations to the 3D world coordinate points, the image points the 3D
world coordinate points as seen by the camera will be obtained in the transformed form and after
that if I apply the simple perspective transformation to this transformed 3D points, what I get is
the image point corresponding to those transformed 3D world points.
20
Now, let us try to see an example to illustrate this operation. So, let us take a figure where we
assume that the camera or the center of the image plane imaging plane of the camera is located at
location (0 0 1) with respect to the 3D world coordinate system X, Y, Z. And, we have an object
placed in the X Y plane where one of the corners of the object is at location 1, 1, 02 0.2 and we
want to find out that what will be the image coordinate for this particular 3D world point which
is now a corner of this object as placed in this figure.
So, what we will try do is we will try to apply the set of transformations to the camera plane one
after another and try to find out that what are the different transformations that we have to apply
or what are the different corresponding transformations to the 3D world point that will bring that
will give us the world coordinate points, world points as seen by the camera. So, initially I
assume again that all the points are the image coordinate system and the camera coordinate
system, they are perfectly aligned.
21
Now, after this assumption, what I have to do is I have to give a displacement to the camera by
the vector (0 0 1). So, what I will do is I will bring the camera to a location here, so this is my
camera where the image plane center is at location (0 0 1); so, this is my X axis, this is my Y axis
this is the Z axis. Now, if I do this transformation, then you find that all the 3D points will be
transformed by the vector (0 0 minus 1) with respect to the camera coordinate system.
So, the first transformation matrix which has to be applied to all the points in the 3D coordinate
system is given by (1 0 0 0) (0 1 0 0) (0 0 1 minus 1) and then (0 0 0 1). So, this is the first
transformation that has to be applied to all the 3D points. Now, after the camera is displaced by
the vector (0 0 1), the next operation that we had to apply is to pan the camera by an angle 35
degree.
I just forgot to mention that in that arrangement, the pan was 135 degree; the tilt was also 135
degree. So, after this initial transformation, displacement of the camera by vector (0 0 1); we
have to apply a pan of 135 degree to this camera.
22
So, if I represent that, let us take a 1 2 dimensional view. As we said, the panning is nothing but
rotation around Z axis; so if I say that this is the X axis, this is the Y axis, then by panning, we
have to make an angle of 135 degree between the X axis of the camera coordinate system. So,
the situation will be something like this.
So, this is the Y axis of the camera coordinate system, this is the X axis of the camera coordinate
system and by pan of 135degree, we have to rotate the camera imaging plane in such a way that
the angle between the X axis of the camera coordinate and the X axis of the 3D world coordinate
is 135 degree and once we do this, here you find that this rotation of the camera is in the
anticlockwise direction.
So, the corresponding transformation on the 3D world points will be in the clockwise direction
but by the same angle 135 degree and the corresponding rotation matrix which is given now
given by R theta will be equal to (cosine 135 degrees sin 135 degree 0 0) then (minus sign 135
degrees cosine 135 degree 0 0) then (0 0 1 0) then (0 0 0 1).
So, this is the rotation transformation that has to be applied to all the world coordinate points. So,
after we apply this R theta, the next operation that we have to perform is to tilt the camera by an
angle 135 degree.
23
So again, to have a look at this tilt operation, we take again a 2 dimensional view. So, the view
will be something like this. So, we take this as the Z axis of the 3 D world coordinate system and
in this case, it will be the Y Z plane of the 3D world coordinate system and by tilt what we mean
is something like this.
This is the Z axis of the camera coordinate system and the angle between the Z axis of the 3D
world coordinate system and the camera coordinate system is again 135 degree. So, this is the
angle tilt angle alpha. So here, again you find that the tilt is in the anticlockwise direction, so the
corresponding transformation in the 3D world point will be rotating the 3D world points by 135
degree in the clockwise direction along the around the X axis and the corresponding
transformation matrix in this case will be R alpha is equal to (1 0 0 0) (0 cosine 135 degrees sin
135 degree 0) (0 minus sin 135 degree cosine 135 degree 0) (0 0 0 1).
So, this is the transformation matrix that has to be applied for the tilt operation. So, after doing
this, you will find that the 3D world coordinate, the 3D world point for which we want to find
out the corresponding image point is given by (1, 1, 0.2).
24
This is the 3D world coordinate point and after application of all these transformations, the
transformed coordinate of this 3D world point if we write it as x hat, y hat, z hat and this has to
be represented in unified form; so, this will be like this.
It has to be R alpha R theta then T and the original world coordinate (1, 1, 0.2 and 1) in the
unified form. Now, if I compute this R alpha R theta and T using the transformation matrix that
we have just computed that we have just derived, you will find that this transformation matrix
can be computed as (minus 0.707, 0.707, 0, 0) then (0.5, 0.5, 0.707 minus 0.707) then again (0.5,
0.5, minus 0.707, 0.707) then (0, 0, 0, 1).
25
So, this is the overall transformation matrix which takes care of the translation of the image
plane, then pans by angle theta and also tilt by angle alpha. So, if I apply this transformation to
my original 3 D world coordinates which was (1 1 0.2 then 1); then what I get is the coordinates
of the point as observed by the camera. So, if you compute this, you will find that this will come
in the form (0, 0.43, 1.55 and then 1); again, this is in the unified form.
So, the corresponding world and the corresponding Cartesian coordinates will be given by x hat
equal to 0, y hat equal to 0.43 and z hat is equal to 1.55. So, these are the coordinate for the same
3D point as seen by the camera. So now, what we have to do is we have to apply the perspective
transformation to these particular points.
So, if I apply the perspective transformation and if I assume that the focal length of the camera is
0.035, then we obtain the image coordinates as x equal to lambda x hat divided by lambda minus
z hat which will be in this case of course 0 and y equal to lambda y hat divided by lambda minus
z hat which if you compute this will come as minus 0.0099. So, these are the image coordinates
of the world coordinate point that we are considered.
Now, note that the Y coordinate in the image plane has come out be negative. This is obvious
because the original 3D world coordinate has obtained by after applying the transformations
came out to be positive. So obviously, in case of image plane, there will be an inversion. So, this
value of Y coordinate will come out to be negative.
So, this particular example illustrates the set of transformations that we have to apply followed
by the perspective transformation so that we can get the image point for any arbitrary point in the
3D world. So, with this we complete our discussion on the different transformations and the
definite imaging models that we have taken.
26
Now, let us try to see the answers to the quiz questions, the questions that we have given in the
last class. So, the first question was what is the concatenated transformation matrix for
translation by vector [1 1] followed by rotation by angle 45 degree in 2 dimension? So, here you
find that first you have to apply the transformation followed by the rotation.
So, the concatenated transformation will be (cosine 45 degrees sin 45 degree 0) (minus sin 45
degree cosine 45 degree 0) (0 0 1) then (1 0 1) (0 1 1) (0 0 1). So, if you multiply these 2
matrixes, the concatenated transformation matrix will come out to be (1 over root 2, 1 over root
2, root 2) then (minus 1 over root 2, 1 over root 2, 0) then (0 0 1).
27
The second question that we are asked that for this square figure whose are at locations (1, 1)
(minus 1, 1) (minus 1, minus 1) and (1 minus, 0); if this figure is first scaled by a vector 2 and
then translated by vector [2 2]; then what is the transformed figure? In this case, the transformed
figure will again be a matrix will again be a square whose coordinates after both these
transformations will lie at (0, 0) (4, 0) (4, 4) and (0, 4).
The third question we had asked is what will be the figure if the translation if the transformations
are applied in the reverse manner? So, in this case, first you have to apply the translation
followed by the scaling and then after final transformation, your coordinates will be (2, 2) (6, 2)
(6, 6) and (2, 6).
28
Then we had given another problem with vertices of a queue where the scale factors were 2
along X axis, 3 along Y axis and Z along and 4 along the Z axis.
So, if we transform this queue with the help of these scale factors, then finally I will get the
transformed coordinates like this – (0 0 0), (0 0 4), (0 3 0), (0 3 4), (2 0 0), (2 0 4), (2 3 0) and (2
3 4).
29
Then we had given the fifth camera fifth problem where you had given a 3D world point and the
coordinate of the camera. We had to find out we have to find out what is the corresponding
image point?
So here, the image point is obtained as x equal to minus 2.63 and y equal to minus 3.68 after
applying the transformations that we have discussed.
30
Now, coming to today’s quiz question. The first question is for a camera with focal length of
0.05, find out the locus of the points which will be imaged at location (0.2, minus 0.3) on the
image plane. You assume that the camera and coordinate system and the world coordinate
system are perfectly aligned.
The second question; a camera with focal length of 0.04 meter is placed at a height of 1 meter
and is looking vertically downwards to take image of the XY plane. If the size of the image
sensor plate is 4 millimeter by 3 millimeter, find the area on the XY plane that can be imaged.
31
Then the third question: a camera is mounted on a Gimbal system that enables the camera to pan
and tilt at any arbitrary angle. The Gimbal center is placed at location (0, 0, 5) and the camera
center is displaced from the Gimbal center by (0.2, 0.2, 0.2) in a world coordinate system XYZ.
Assuring that the camera has a pan of 45 degrees and tilt of 135 degree, find out the image
coordinate of a world point (1, 1, 0.5).
Thank you.
32
Prof. P. K. Biswas
Lecture - 8
Camera Calibration and Stereo Imaging
Hello, welcome to the video lecture series on digital image processing. Till the last class, we
have seen that given a particular type of imaging geometry where a camera is placed in a 3D, in a
3D coordinate system where the coordinate system and the camera coordinate system and not
perfectly aligned; in that case, what are the set of transformations which are to be applied to the
points in the 3D world coordinate system which will be transformed in the form as seen by a
camera. Then, followed by that if we apply the perspective transformation; then we get the image
coordinate for different points in the 3D world coordinate system.
So, what we have seen in the last class is once we have the image points in an image plane; how
to applied the inverse perspective transformation to get the equation of the straight line so that
the points on that straight line map to a particular image point on the imagining plane.
Then we have seen a generalized imaging geometry where the world coordinate system and the
camera coordinate system are not aligned and we have also discussed the set of transformations
which are involved in such generalized imaging setup and then we have also seen how to find the
image coordinate for any arbitrary point in the 3 world coordinate system in such a generalized
imaging setup and the concept, we have illustrated with the help of an example.
1
In today’s lecture we will see that given an imaging setup; how to calibrate the camera and then
we will also explain the concept of how to extract the 3D point from 2 images which is also
known as stereo images.
So, in the last class that we have done is we have given an imaging setup like this where the 3D
world coordinate system is given by capital X, capital Y and capital Z. In this world coordinate
system, we had placed the camera where that camera coordinate system is given by small x,
small y and small z and we have assumed that the camera is placed is mounted on a Gimbal
2
where the Gimbal is displaced from the origin of the world coordinate system by a vector W 0
and the center of the camera is displaced from the Gimbal by a vector r.
The camera is given a pan of angle theta and it is also given a tilt of angle alpha and in such a
situation if W is a point in the 3D world coordinate system, we have seen that how to find out the
corresponding image point corresponding to point W in the image plane of the camera.
So, for that we have done a set of transformations and with the help of the set of transformations,
what we have done is we have brought the 3D world coordinate system and the camera
coordinate system in alignment and after the 3D world coordinate system and the camera
coordinate system are perfectly aligned with that set of transformations, then we have seen that
we can find out the image point corresponding to any 3D world point by applying the perspective
transformation.
3
So, the type of transformations that we have to apply is first we have to apply a transformation
for the displacement of the Gimbal center from the origin of the 3D world coordinate system by
vector W 0 followed by a transformation corresponding to the pan of X axis of the camera
coordinate system by theta which is to be followed by a transformation corresponding to tilt of
the Z axis of the camera coordinate system by angle alpha and finally, the displacement of the
camera image plane with respect to Gimbal center by vector r.
So the transformations, the first transformation which translates the Gimbal center from the
origin of the world coordinate system by vector W 0 is given by the transformation matrix G
4
which is in this case (1 0 0 minus X 0 ) (0 1 0 minus Y 0 ) (0 0 1 minus Z 0 ) and (0 0 0 1). The pan
of the X axis of the camera coordinate system by an angle theta is given by the transformation
matrix R theta where R theta in this case is (cosine theta sin theta 0 0) then (minus sin theta
cosine theta 0 0) then (0 0 1 0) and (0 0 0 1).
Similarly the tilt, the transformation matrix corresponding to the tilt by an angle alpha is given
by the other transformation matrix R alpha which in this case is (1 0 0 0), (0 cosine alpha sin
alpha 0) then (0 minus sin alpha cosine alpha 0) and then (0 0 0 1).
Then the next transformation we have to apply is the transformation of the center of the image
plane with respect to Gimbal center by the vector R. So, if we assume that R has the components,
the vector R has components - r 1 r 2 and r 3 in XY along the X direction, Y direction and Z
direction of the 3D world coordinate system; then corresponding transformation matrix with
respect to this translation is given by T equal to (1 0 0 minus r 1 ) (0 1 0 minus r 2 ) (0 0 1 minus r 3 )
and then (0 0 0 1).
We have also seen in the last class that the rotation matrices R theta and R alpha can be
combined together to give a single transformation matrix R which is nothing but the product of R
alpha and R theta.
5
Now, once we get this transformation matrices, then after the transformation first by the
translation matrix G, then by the rotation matrix R, followed by the second translation matrix T;
what we do is we aligned the coordinate system of the camera with the 3D world coordinate
system that means now every point in the 3D world will have a transformed coordinate as seen
by the camera coordinate system.
So, once we do this, then finally applying the perspective transformation to these 3 dimensional
world coordinate systems gives us the coordinate of the point in the image plane for any point in
the 3D world coordinate system. So, here you have find the final form of the expression is like
this that both of the world coordinate system and the camera coordinate system in this case, they
are represented in the homogeneous form.
So, w h is the homogenous coordinate corresponding to world coordinate W and c h is the

homogenous form of the image coordinate C. So, for a world point W whose homogenous
coordinate is represented by w h , here you find that I can find out the image coordinate of the
point W again in the homogenous form which is given by this matrix equation that c h is equal to
PTRG and w h .
And, here you note that each of these transformation matrices that is P, T, R and G all of them
are of dimension 4 by 4. So, when I multiply all these matrices together to give a single
transformation matrix, then the dimension of that transformation matrix will also be 4 by 4.
6
So, what we have now is of this form. So, after doing this transformation, I can find out the
image coordinate of the corresponding point W where X and Y coordinates will be given by
these expressions.
So, after doing this what I have is I have an equation, transformation equation or a matrix
equation so that for any world point W, I can find out what is the homogenous coordinate of
homogenous coordinate of the image point corresponding to that particular point W.
7
Now, the transformation which is involved that is P, T, R and G as we said that each of these
transformation matrices are of dimension 4 by 4; so the combined transformation matrix if I
represent this by a matrix A, then this matrix A will also be a 4 by 4 matrix and now the inter
transformation equation in matrix form will be c h is equal to A into w h .
Now, find that given a particular setup, the transformations T, R and G, they depend up on the
imaging setup. So, G corresponds to translation of the Gimbal center from the origin of the 3D
world coordinate system, R corresponds to pan angle and the tilt angle and T corresponds to
translation of the image plane center from the Gimbal center.
So, these 3 transformation matrices depend up on the geometry of the imaging system. Whereas,
the other transformation matrix that is P or perspective transformation matrix, this is entirely a
property of the camera because we will find that the components of this transformation matrix P
has a term lambda which is equal to the wave length the focal length of the camera.
So, it is possible that for a given camera for which the focal length lambda is known, I can find
out what is the corresponding perspective transformation matrix P. Whereas, to find out the other
transformation matrices like T, R and G, I have to do the measurement physically that what is the
translation of the Gimble center from the origin of the 3D world coordinate system, what is the
pan angle, what is the tilt angle. I also have to measure physically that what is the displacement
of the image center, image plane center from the Gimbal center.
And in many cases, measuring these quantities is not very easy and it is more difficult if the
imaging setup is changed quite frequently. So, in such cases, it is always better that you first
have an imaging setup and then try to calibrate the imaging setup with the help of the images of
some known points of 3D objects that will be obtained with the help of the same imaging setup.
So by calibration, what I mean is as we said that now I have a combined transformation matrix
for the given imaging setup which is A which is nothing but the product of PTR and G. So, these
being a 4 by 4 matrix, what I have to do is I have to estimate the different element values of this
matrix A. So, if I can estimate the different element values of the total transformation matrix A
from some known images, then given any other point in the 3D, I can find out what will be the
corresponding image point.
Not only that if I have an image point, a point in the image by applying the inverse
transformation, I can find out what will be the equation of the straight line on which the
corresponding world point will be lying. So, this calibration means that we have to estimate the
different values of this matrix A. Now, let us see how we can estimate these values of the matrix
A.
8
So, here you find that we have this matrix equation which is of this form. That is c h is equal to A
into w h where we have said, the w h is the world coordinate of the 3D point put in homogenous
form and c h is the image point on the image plane again in the homogenous form and A is the
total transformation matrix.
So here, if the world point W has the coordinate say, X Y and Z, the corresponding homogenous
coordinate system will be given by w h is equal to some constant k times X, some constant k
times Y, some constant k times Z and the 4th element will be K. So, this will be the homogenous
coordinate w h corresponding to the point W.
Now, without any loss of generality, I can assume the value of k equal to 1. So, if I take k equal
to 1 and if I expand this matrix equation; then what I get is I get the component say c h1 c h2 c h3
c h4 . This will be, now I expand the matrix A also. So, A will have component a 11 , a 12 , a 13 , a 14 ,
a 21 , a 22 a 23 , a 24 , a 31 , a 32 , a 33 , a 34 then a 41 , a 42 , a 43 , a 44 into the homogenous coordinate of the
point in the 3D space which is now XYZ and 1.
So, you remember that we have assumed the value of k to be equal to 1. So, I get a matrix
equation like this. Now, from this matrix equation, I have to find out or I have to estimate the
component values a 11 , a 12 , a 13 and so on.
9
Now here, once I have the homogenous image coordinates as c h1 c h2 ch 3 and c h4 ; then we
have already discussed that the corresponding Cartesian coordinate in the image plane is given
by x equal to c h1 divided by c h4 and y is given by c h2 divided by c h4 .
So, this is a simply conversion from the homogenous coordinate system to the Cartesian
coordinate system. Now here, if I replace the values of c h1 and c h2 by X time c h4 and Y time c h4
in our matrix equation; then the matrix equation will look like xc h4 , yc h4 then c h2 let it remain as
it is, then finally we have c h4 this will be equal to a 11 , a 12 , a 13 , a 14 , a 21 , a 22 , a 23 , a 24 , a 31 , a 32 , a 33 ,
a 34 a 41 , a 42 , a 43 , a 44 multiplied by the 3D point coordinate in homogenous form which is (X Y Z
1).
10
So, if I expand this matrix equation, what I get is xc h4 will be given by a 11 X plus a 12 Y plus a 13
Z plus a 14 , then yc h4 will be equal to a 21 X plus a 22 Y plus a 23 Z plus a 24 and c h4 is given by a 41
X plus a 42 Y plus a 43 Z plus a 44 . Now, find that while doing this matrix equation or while trying
to solve these matrix equations, we have ignored the third component in the image the image
point. That is because the third component corresponds to the Z value and we have said that for
this kind of calculation, the Z value is not important to us.
Now, from these given 3 equations, what we can do is we can find out what is the value of c h4 in
terms of XYZ and if I replace this value of c h4 in the earlier 2 equations, then these 2 equations
will simply be converted in the form a 11 X plus a 12 Y plus a 13 Z minus a 41 x small x capital X
minus a 42 small x capital Y minus a 43 small x capital Z plus a 14 , this is equal to 0 and a 21 capital
X plus a 22 capital Y plus a 23 capital Z minus a 41 small x small y capital X minus a 42 small y
capital Y minus a 43 small y capital Z plus a 24 , this is equal to 0.
So, these 2 equations are now converted in this particular form. Now, if you study these 2
equations, you will find that x and y, small x and small y at the coordinates in the image plane of
a point in the 3D world coordinate system whose coordinates are given by capital X capital Y
and capital Z. So, if I take a set of images for which the point in the 3D world coordinated
system that is capital X, capital Y and capital Z are known and I also find out what is the
corresponding image point image coordinate in the image plane; then for every such pair of
readings I get 2 equations. One is the first equation, other one is the second equation.
Now, if you study this particular these 2 equations, you will find that there are 6 unknowns. The
unknowns are; one is a 11 , a 12 , a 13 , a 41 , a 42 , a 43 , a 14 , a 21 , a 22 , a 23 then you have a 24 . So, the
number of unknowns we have in these equations are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11; so, 11 or
12? 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, I have missed something?
Sorry, there should be 1 more term, minus here there should be 1 more term - minus a 44 x and
here should be one more term - minus a 44 y. So, this a 44 , this is another term. So, there are 12
unknowns. So for solving these 12 unknowns, we need 12 different equations and for every
11
known point in the 3D world, I get 2 equations. So, if I take such images for 6 known points,
then I can find out 12 equations and using those 12 equations, I can solve for these 12 unknowns
using some numerical techniques.
So once I have the values of these 12 unknowns; so, what I have is the transformation matrix A,
the total transformation matrix A using which I can find out what will be the image point of any
point in the 3 dimensional coordinate system. And, for given any point in the image plane, I can
also find out what will be the equation of a straight line on which the corresponding 3
dimensional world point will exist.
So, this camera calibration using this procedure can be done for any given imaging setup. But the
problem still exists that given an imaging point an image point, I cannot uniquely identify what
is the location of the 3D world point. So, for identification of the 3D world point or finding out
all the 3 X, Y and Z coordinates of a 3D world point, I can make use of another camera.
So, let us look at a setup like this where on the left side I have image 1 and on the right side I
have image 2. The image 1 is taken is captured with help of 1 camera, image 2 is taken with the
help of another camera.
So, image 1 has the coordinate system say X 1 Y 1 Z 1 , image 2 has the coordinate system X 2 Y 2
Z 2 and we can assume that the 3D world coordinate system that is capital X, capital Y and
capital Z is aligned with the left camera. That means the left image coordinate system is same as
the 3D world coordinate system, whereas the right image coordinate system is different.
Now, once I have this, given a point W in the 3 dimensional world in the 3 dimensional space,
you will find that the corresponding image point in image 1 is given by X 1 Y 1 and the image
point for the same point W in image 2 is given by X 2 Y 2 .
12
I assume that both the cameras are identical that means they have the same value of the wave
length lambda. So, they will have the same perspective transformation as well as inverse
perspective transformation. Now, once I know that in image 1, the image point corresponding to
point W is point X 1 Y 1 ; then by applying the inverse perspective transformation, I can find out
the equation of the straight line on which the 3 dimension the point W will exist.
Similarly, from image 2 where I know the location X 2 Y 2 of the image point, if I apply the
inverse perspective transformation, I also get equation of another straight line on which this point
W will exist. So, now you find that by using these 2 images, I got equations of 2 straight lines.
So, if I solve these 2 equations, then the point of intersection of these 2 straight lines gives me
the X Y Z coordinate of point W.
But here you find that we have taken a general stereo imaging setup where there is no alignment
between the left camera and the right camera or between the first camera and the second camera.
So, for doing all the mathematical operations, what we have to do is we have to again apply a set
of transformations to one of the camera coordinate systems so that both the camera coordinate
systems are aligned.
So, these transformations will again involve may be a transformation for some translation, the
transformation for some rotation and possibly it will also employ some transformation for
scaling if the image resolution of the image resolution of both the cameras are not same. So,
there will be a set of transformations, a number of transformations and the corresponding
mathematical operations to align the 2 camera systems.
But here you find the positioning of the camera is in our control. So, why do we consider such a
generalized setup? Instead, we can arrange the camera in such a way that we can put the imaging
plane of both the cameras to be coplanar and we use the coordinate system in such a way that the
x coordinate system and the x of one camera and the x coordinate system of the other camera are
perfectly aligned. There will be a displacement in the Y axis and the displacement in the Z axis.
13
So effectively, the camera setup that we will be having is something like this. Here you find that
for the 2 cameras, the image plane 1 and the image plane 2, they are in the same plane. The X
axis of both the cameras, the camera coordinate systems, they are collinear. The Y axis and the Z
axis, they have a shift of value B. So, this shift B, this value B is called the camera displacement.
We assume that both the cameras are identical otherwise. That is they have the same resolution,
they have the same focal length W. Again, here in the given 3D point W, we have in image 1 the
corresponding image point as x 1 y 1 and in image 2, we have the corresponding the mage point as
x2 y2.
14
Now, this imaging setup can be seen as a section where you find that these XY planes of both the
cameras are now perpendicular to the plane. So, I have X axis which is horizontal, the Z axis
which is vertical and the Y axis which is perpendicular to this plane.
So, in this figure, I assume that the camera coordinate system of one of the cameras, in this case
the camera 1 which is also called the left camera is aligned with the 3D world coordinate system
capital X, capital Y capital Z. The coordinate system of the left camera is assumed to be X 1 Y 1
Z 1 , the coordinate of the right camera is assumed to be X 2 Y 2 Z 2 .
Now, given this particular imaging setup, you will find that for any particular image point say W
with respect to the cameras camera 1 and camera 2; this point W will have the same value of the
Z coordinate, it will have the same value of the Y coordinate but it will have different values of
the Z coordinate because the cameras are shifted or displaced only in the Z axis not in the X axis
or Y axis.
So, origin of this world coordinate system and origin of the left camera system, they are perfectly
aligned. Now, taking this particular imaging setup, now I can develop a set of equations.
15
So, the set of equations will be something like this. We have seen that for image 1, for point W,
the corresponding image point is at location (x 1 , y 1 ); for the same point W in the right image, the
image point is at location (x 2 , y 2 ). So, these are the image coordinates in the left camera and the
right camera.
Now, by applying inverse perspective transformation, we find that the equation of straight line
with respect to left camera on which point W will lie is given by an equation say X 1 is equal to
x 1 by lambda into lambda minus Z. Similarly, with respect to the right camera, the equation of
the straight line on which the same point W will exist is given by X 2 is equal to x 2 by lambda
into lambda minus Z where this X 1 is the X coordinate of the point W with respect to the camera
coordinate of camera 1 and capital X 2 is the X coordinate of the 3D point W with respects to the
camera of the second camera.
Now, recollect the figure that we have shown that is the arrangement of the camera where the
cameras are displaced by a displacement B. So, with respect to that camera arrangement, we can
easily find out that the value of X 2 will be simply X 1 plus the displacement B. Now, if I replace
if I replace this value of X 2 which is X 1 equal to B in this particular equation, then I get a set of
equations which gives X 1 by lambda into lambda minus capital Z plus B which is equal to x 2 by
lambda into lambda minus Z. And from this, I get an equation of the form Z equal to lambda
minus lambda times B divided by x 2 minus x 1 .
So, you find that this Z is the Z coordinate of the 3D point W with respect to the coordinate
system of the first camera, it is same as the coordinate system, the Z value with respect to
coordinate system of the second camera, it is also the Z value with respect to the 3D world
coordinate system. So, that means it gives me that what is the Z value of the 3D point for which
the left image point was X 2 Y 2 X 1 Y 1 and the right image point was X 2 Y 2 and I can estimate
this value of Z from the knowledge of the wave length lambda, from the knowledge of the
displacement between the 2 camera which is B and the from the knowledge of the difference of
the X coordinate that is X 2 minus X 1 in the left camera, in the left image and right image.
16
So, this x 2 minus x 1 - this term, this particular quantity is also known as disparity. So, if I know
this disparity for a particular point in the left image and the right image, I know the lambda that
is focal length of the camera and I know the displacement between the 2 cameras, I can find out
what is the corresponding depth value that is Z.
And once I know this depth value, I can also find out the x coordinate and the y coordinate of the
3D point W with respect to the 3D world coordinate system for which we have already seen. The
equations are given by X is equal to x 1 by lambda into lambda minus Z and Y equal to y 1 by
lambda into lambda minus Z.
So, first we have computed the value of Z from the knowledge of disparity, camera focal length
and the displacement between the cameras and then from this value of Z and the image
coordinates in say the left image that is the (x 1 y 1 ), I can find out what is the X value, X
coordinate value and Y coordinate value of that particular 3D point. Now in this, you find that
the very very important computation is that given a point in the left image, what will be the
corresponding point in the right image.
17
So, this is the problem which is called as stereo correspondence problem. So, in today’s lecture,
we are not going to deal with the details of the stereo correspondence problem that is how do you
find out a point in the left image and the corresponding point in the right image. But today what
we will discuss is about the complexity of this correspondence operation.
So, our problem is like this; we have a left image and we have a right image. So, this is the left
image and this is the right image. So, if I have a point say C L in the left image, I have to find out
a point C R in the right image which corresponds to C L and once I do this, here I find out what is
the coordinate image coordinate of this point C L which is (x 1 y 1 ) and what is the image
coordinate of this point C R which is (x 2 y 2 ).
So, once I know this image coordinates, I can compute x 2 minus x 1 which is the disparity and
then these x 2 minus x 1 is used for computation of the Z value Z. Now, what about the
complexity of this search operation? Say, I identify a particular point C L in the left image, then a
corresponding point C R in the right image may appear anywhere in the right image.
So, if I have the images whose dimensions are of order N by N that means I have N number of
rows and N number of columns; then you find that for every point in the left image, I have to
search N square number of points in the right image and because there are N square number of
points in the left image, in the worse case, I have to search for N to the power 4 number of points
to find out the corresponding correspondence for every point in the left image and the
corresponding point in the right image.
So, this is a massive computation. So, how reduce this computation? Fortunately, the imaging
geometry that you have used, that helps us in reducing the amount of computation that will be
doing.
18
So, you find that for the point X, Y, Z in the 3D space, the corresponding left images is given by
x 1 is equal to lambda times capital X 1 divided by lambda minus capital Z 1 . So, I assume that
capital X 1 , capital Y 1 and capital Z 1 , they are the image coordinate system of the first camera
and i also assume that capital X 2 , capital Y 2 and capital Z 2 , they are the coordinate system of the
second camera. So, this is for camera 1 and this is for camera 2.
So, with respective camera 1, the value of x 1 image point x 1 is given by lambda times capital X 1
divided by lambda minus capital Z 1 . Similarly y 1 , the y coordinate in the first image is given by
lambda time capital Y 1 divided by lambda minus Z 1 .
Now, with respective to the second image, the image coordinate x 2 is given by lambda times
capital X 2 divided by lambda minus capital Z 2 . Similarly, y 2 is also given by lambda times Y 2
divided by lambda minus capital Z 2 .
Now, you find that the imaging system or imaging setup that we have used in that we have said
we have seen that is Z 1 capital Z 1 is equal to capital Z 2 , capital Y 1 is equal to capital Y 2 but
capital X 1 is not equal to capital X 2 . This is because the 2 cameras are displaced only in the X
direction, they do not have any displacement in the Y direction neither they have any
displacement in the Z direction.
So, for both the camera coordinate systems, the x coordinate sorry the z coordinate and the y
coordinate value for both the cameras will be same whereas, the x coordinate will be different.
So, by applying that since Z 1 is equal to Z 2 and Y 1 is also equal to Y 2 ; so you find that among
the image coordinates on the 2 images image 1 and image 2, y 1 will be equal to y 2 .
So, what does this mean? This means that whatever is the (x 1 y 1 ) value of point C L in the left
image, the corresponding right image point C R will have a different X value X coordinate value
but it will have the same y coordinate value. That means 2 corresponding points image points
must lie on the same row.
19
So, if I pick up a C L belonging to row I in the left image, the corresponding point C R in the right
image will also belong to the same row I. So by this, for a given point I do not have to search the
entire right image to find out the correspondents but I will simply search that particular row to
which C L belong that particular row in the right image to find out a correspondents.
So, this saves a lot of time while searching for correspondence between a point in the left image
and the corresponding point in the right image. So, till now we have discussed that how using 2
different cameras and having a stereo imaging setup, we can find out the 3D world coordinates
of the points which have a point, an image point in the left image and a corresponding point in
the right image.
But by studying this stereo imaging setup, you can find out that it is it may not always be
possible to find out a point in the right image for every possible point in the left image. So, there
will be a certain region, there will be a certain region in the 3 dimensional space where for which
spacer for all the points in that space, I will have image points both in the left image and the right
image. But for any point outside that region, I will have points only in 1 of the images; either in
the left image or in the right image but I cannot have points in both the images and unless I have
points in both the images, I cannot estimate the 3 dimensional XYZ coordinate of those points.
So, till now we have seen that using a single camera, I cannot estimate the depth value of a point
in the 3D but if I have 2 cameras and using stereo setup, I can estimate the depth value of the 3
dimensional points.
20
So, now let us try to answer some of the questions that we had asked in the last class. So, in the
last class we had asked a question that for a camera with focal length of 0.05, find out the locus
of the points which will be imaged at location 0.2 and minus 0.3 on the image plane. Assume the
camera coordinate system and the world coordinate system to be perfectly aligned. Now, this
particular question can be answered very easily.
You find that in this particular case, the value of lambda is equal to 0.05, x 0 is equal to 0.2 and
y 0 is equal to minus 0.3. So once I know the value of lambda, I can find out what is the inverse
perspective transformation matrix. So, inverse perspective transformation matrixes is given by P
inverse which will (1 0 0 0) (0 1 0 0) (0 0 1 0) then 0 0 1 upon lambda; so in this case, 1 upon
lambda will be 20, then 1. This is the inverse perspective transformation matrix.
21
And because x 0 and y 0 are the image coordinate points; so, I can find out the corresponding
homogenous coordinate point of the 3 homogenous coordinate of the 3D point as P inverse k
times x 0 k times y 0 k times z and K.
So, here we have find that when we discussed about the inverse perspective transformation, we
have said that this Z is taken as a free variable which helps us to get the equations of the straight
lines and by solving this, you will get the equations in the form the derivations we have already
done in the previous class.
The equations of the straight lines are given as X equal to x 0 by lambda into lambda minus Z and
if you compute this, in this particular case, it will come out to you come out to be 0.2 minus 4Z
and the Y coordinates is given by Y equal to y 0 by lambda into lambda minus Z and if you put
the values of lambda again, this will come in the form minus 0.3 plus 6Z.
So, these 2 equations together that is X equal to 0.2 minus 4Z and Y equal to minus 0.3 plus 6Z,
these 3 equations together gives you the equation of the straight line on which the point W will
belong. So, this is how the first problem can be solved.
Now, in the second problem, we had seen that a camera with focal length of 0.04 is placed at a
height of 1.0 meter and is looking vertically downwards to take images of the XY plane. If the
size of the image sensor plate is 4 millimeter by 3 millimeter, find the area on the XY plane that
can be imaged. So, here again you find that we can apply a set of transformations to bring the
image coordinate system, to bring the camera coordinate system and the 3D world coordinate
system under alignment.
So, the set of transformations that you have to apply is firstly, the camera has to be translated by
a vector (0 0 1) after that because the camera looks vertically downwards, we do not have to
apply any pan of theta but we have to apply a tilt of 180 degree. So, just by these 2
22
transformations that is first a translation by (0 0 1) followed by a tilt of 180 degree; I can get the
transformation vector, the transformation matrix which will align the 3D world points with the
camera coordinate system. After doing this transformation, if I apply the inverse perspective
transformation; so here if you find that it has been shown that the sensor plate size is 4 mm by 3
mm. So, if I take the corners of the sensor image plane as extreme image points, then using this
extreme image point, if I apply the inverse perspective transformations on these extreme image
points, I get the extreme points on the XY plane which gives us the bound of the region on which
the imaging can be done.
Now, this problem can be solve can also be solved very easily without going for all these
mathematical transformations.
So, the solution is can be obtained like this. Say, I have this imaging point, so imaging plate is
something like this. Let us assume that this is the Z axis, this is the X axis and this is Y axis of
the 3D world point. So, I assume this is X axis, this is Y axis, this is Z axis and this is the center
of the imaging plane. The size of the image plate is given by 4 mm by 3 mm, so here it will be 2,
then 1.5 and similarly I can find out what are the coordinates of the other extreme point of this
image plane.
Now, if I take a line through this, line from this point passing through the focal point center of
the camera; now in this case, the center of the camera, the focal length of the camera has been
given as 0.04 meter. So, from this point to this point, the length of this line segment is 0.04
meter. The height of this, this is at 0.1 meter, at a height of 1 meter. So, height of this is 0.96
meter.
So now, by applying the concept of similar triangles, I can find out that the x coordinate and y
coordinate here will be 4 times the X coordinate sorry 24 times the X coordinate and Y
coordinate here. So here, if X coordinate is having a value 2; this point, this will be in the
23
negative side, so X coordinate will have a value of minus 48 and here the Y coordinate is having
value 1.5. So, in this case, the Y coordinate will have a value minus 1.5 into 24 that is 36.
So, in the same manner, I can find out 4 other extreme points on this XY plane. So, the region
bounded by this 4 points gives me the region on the XY plane or area on the XY plane which can
be imaged by using this particular imaging setup.
Now, coming to the third one; a camera is mounted on a Gimbal system that enables the camera
to pan and tilt at any arbitrary angle. The Gimbal center is placed at location 0.05 and the camera
center is displaced from Gimbal center by 0.02 (0.2, 0.2, 0.2) in a world coordinate system X, Y,
Z. Assuming the camera has a pan of 45 degrees and tilt of 135 degrees, find out the image
coordinate of a world point (1, 1, 0.5).
Now, this is a direct procedure which we have done when we have discussed about the
generalized imaging setup. So, you simply replace the transformation coordinates in that
equation by the translate transformation matrices of this particular problem, you directly get the
solution to this particular problem.
24
Now, I have some question for today’s lecture. First problem is; 2 identical cameras having a
focal length of 0.04 meter are used for stereo imaging. If the camera displacement along X axis
is 6 centimeter, left image point corresponding to a world point W is (0.2, 0.3) and the
corresponding right image point is (0.4, 0.3); find out the 3D location of W with respect to a
world coordinate system aligned with the coordinate system of the left camera.
Then, second problem 2 - identical cameras with focal length of 40 millimeter and image sensor
size of 4 millimeter by 4 millimeter are used for stereo imaging. If the camera separation is 6
centimeter, what is the minimum depth that can be measured using this stereo setup? Thank you.
25
Prof. P.K. Biswas
Lecture - 9
Interpolation & Resampling
Hello, welcome to the video lecture series on digital image processing. Till the last class, we
have seen various geometric transformations and we have seen how those geometric
transformations can be used to the model an image formation process. We have also seen that
how to calibrate a camera, given an particular imaging setup and we have also seen that using 2
identical cameras, how we can have a stereo imaging setup using which the 3D coordinate of a
point in the 3 dimensional scene can be obtained.
Now, in today’s lecture, we will to try to explain some interpolation operations. We will explain
when the interpolation operation is needed and at the end of the today’s lecture, the students will
be able to write algorithms for different image transformations and the needed interpolation
operations. Now, let us see that why and when, do we need image interpolation and image
resampling. So, let us first introduce this problem.
1
Say for example; if we have a 3 by 3 image like this. So, we have this X Y coordinate system
and this X Y coordinate system we have a 3 by 3 image. So, I have an image pixel here, I have
an image pixel here, I have an image pixel here, I have an pixel here, here, here, here, here and
here. So, you can easily identify that the coordinate of the image pixel, this particular image
pixel is (0, 0), this is (1, 0), this is (2, 0) this is (0, 1), this is (1, 1), this is (2, 1), this is (0, 3) this
is (1, 3) and this is (3, 3).
Now, let us try to apply some simple transformations, geometric transformations on these
images. Say for example, I want to scale off this image by a factor of 3 in both the X dimension
and Y dimension. So, if I scale off this image by factor 3; in that case, this 3 by 3 image will
become a 9 by 9 image. And, let us see how those image points, how the pixels in the 9 by 9
image can be obtained?
So, I just apply a scaling operation by factor S x equal to 3 and S y is equal to 3. That is both in the
X direction and Y direction and applying a scaling of factor 3. So naturally, this 3 by 3 image
after being scaled off by the factor 3 in both the directions will be converted to an image of 9 by
9. So, let us see how these pixels values will look like?
Again, I put this X Y coordinate system. Now, you remember that this scaling operation is given
by say (x hat y hat) is equal to (S x 0, 0 S y ) into column vector xy, where this (S x 0) and (0 S y )
this is the transformation matrix, xy is the coordinate of the pixel in the original image and (x hat
y hat) is the coordinate of the pixels in the transform.
So, if I simply apply this scaling transformation, then obviously this (0, 0) point will lie at
location (0, 0) in the transform. But what will happen to the other image points? So, because this
will be converted to a 9 by 9 image, so let us first form a 9 by 9 grid.
2
So now, you find that this (0, 0) pixel, even after this scaling transformation, remains at location
(0, 0). But the other pixels, say for example, this pixel (1, 0) which was originally at location X
coordinate equal to 1 and Y coordinate equal to 0, that will be transformed to Y coordinate will
remain as 0 but now X coordinate will become equal to 3. So, this point will be matched to this
particular location.
Similarly, (2, 0) will be matched to (6, 0) locations. So, this becomes 1, 2, 3, 4, 5, 6. So, this
pixel will be matched to this particular location. Similarly, (0, 1) pixel will now be matched to
(0, 3) location, (0, 2) pixel will now be matched to (0, 6) location; so 3, 4, 5, 6.
Similarly, I will have pixels in these different locations in the scaled image. But you find that
because in the original image I had 9 different pixels, even in the transformed image I got 9
different pixels that is 3 pixels in the horizontal direction and 3 pixels in the vertical direction;
but because I am applying a scaling of factor 3 in both X direction and Y direction, my final
image size after scaling should be 9 pixels in the horizontal direction and 9 pixels in the vertical
direction.
So, you find that there are many pixels which are not filled up in this scaled off image. So, those
pixels, some of them I can simply mark; so this is one pixel which is not been filled up, this pixel
has not been filled up, this pixel has not been filled up, this pixel has not been filled up, this one
has not been filled up. So likewise, there are many pixels in this particular image in this scaled
up image which has not been filled up. Let us try to take another example.
So, I apply a instead of scaling, I apply a rotation operation to all these different pixels. So, I
have this 3 by 3 pixel and suppose I rotate this image by affect by an angle of 45 degree in the
clockwise direction; so if I rotate this image by 45 degree in the clockwise direction, you know
that we had a transformation matrix which takes care of rotation and that is given by (cosine
theta sin theta) then (minus sin theta cosine theta).
3
So, this is the rotation matrix which when applied to different pixels in the original image will
give you the pixels in the rotated image. So, in the rotated image i can represent these pixels by
(x hat y hat) whereas, my original pixels are x and y and in this particular case, the value of theta
is simply 45 degree. So, if I apply this transformation, the rotation transformation to all the pixels
in the original image; you will find that (0, 0) location will be transformed to location (0, 0) even
in the transformed image, (0, 1) location will be transforms to 0.707 and 0.707 location then (0,
2) point will be transformed to location 1.414 and 1.414.
Similarly (1, 0) location, pixel at location (1, 0) will be transformed to location 0.707 and minus
0.707, location (1, 1) will be transformed to location 1.414 and 0 and location (1, 2) will be
transformed to location 2.121and 0.707. Similarly the other coordinates, (2, 0) this pixel will be
transformed to location 1.414 and minus 1.414. (2, 1), this will be transformed to location
2.121and minus 0.707 and (2, 2) this particular image pixel will be transformed to location 2.828
and 0. So, these are the various transformed locations of the pixels in the rotated image.
Now, if you just look at these transformed locations, you find that the coordinates that we get are
not always integer coordinates. In many cases, in fact in this particular example most of the
cases, the coordinates are real value coordinates. But whenever we are going to have a digital
image, whether it is the original image are the transformed image, even in the transformed
image, all the row index and the column index should have an integer value; I cannot represent
any real number or fractional number as a row index or a column index.
So in this case, whenever I am going for this particular transformation, what I have to do is
whenever I am getting a real number as a row index or a column index; I have to take its nearest
integer where that particular pixel will be put. So, in this case for the original image location (0,
1) which has now been transformed to location 0.707 and 0.707, this has to be mapped to a pixel
location (1, 1) in the transformed image.
So, if I do that mapping, if you find that all the pixel locations will now be mapped like this. So,
I put a new grid and the new grid will appear like this. So, you find that the (0, 0) location has
been matched to (0, 0) location; so, I have a point over here, pixel over here.
(0, 1) location in the original image has been transforms to 0.707 and 0.707 in the transformed
image. So, what I have to do this I have to take the nearest integer of these fractional numbers
where this particular pixel will be put. So, 0.707 in the X direction and 0.707 in the Y direction
will be matched to (1, 1) in the transformed image. So, this (0, 1) point will now be mapped to
location (1, 1) in the transformed image; so, I get a point here.
Similarly (0, 2), you find that the corresponding transformed location is 1.414 and 1.414. Again,
the nearest integer of 1.414 is 1; so this point (0, 2) will also matched to location (1, 1) in the
transformed image. (1, 0) in the same way will be matched to location (1, minus 1). So, (1, 0)
will be matched to this particular point in the transformed image. (1, 1) will be matched to
(1.414, 0). Here again, by integer approximation, this point will be matched to (1, 0) location in
the transformed image.
4
So, I have a pixel in this particular location. (2, 1) point will be matched to 2.121and 0.707. So,
again by integer approximation, I map this pixel to (2, 1). So, this is the point where the pixel (2,
1) (1, 2) of the original image will be mapped.
In the same manner, (2, 0) point will be matched to location (1, minus 1) where I already have a
particular point. (2, 1) location will be mapped to location (2, minus 1), so I will have a point
here. (2, 2) location will be matched to (3, 0) in the transformed image, so I will have a pixel in
this particular location.
So, you find that in the original image, we had 9 pixels whereas in this rotated image, we are
going to have only 7 pixels and this comes because when you are rotating any image, the integer
coordinates in the original pixel, some of them turns out to be fractions or real number in this
transformed image and these fractions or real numbers cannot be represented in digital form. So,
we have to round it, round those numbers to the nearest integer and which gives leads to this
kind problem.
And, not only this, you find that if I just rotate this original image, ideally my rotated image
should have been something like this. But here, you find that there are a number of points where
I do not have any information. Say for example, this point; I do not have any information. This
point, I do not have any information. This point, I do not have any information. Similarly all
these points, I do not have any information and the reason is because of digitization. So, to fill up
these points, what I have to do is I have to identify in the transformed image, what are the
locations where I do not have any transformation.
So, take the simple case, take the previous one. Here you find that I do not have any information
at location (1, 1) of the transformed image, of the scaled image. So, because I do not have any
information here, now I have to look in the original image to find out which value should be put
at this particular location.
5
Now, because this image I have obtained using a scaling of 3 in both X and Y direction, if I want
to go back to original image; then to this transformed image, I have to a apply a scaling of 1 third
in both X direction and Y direction. Now, you find that in the transformed image, this particular
location, this particular pixel has a coordinate of (1, 1). So, if I transform this, inverse transform
this using scaling factor of 1 third and 1 third; I get in the original image, the x coordinate should
be equal to 1 upon 3, the y coordinate should also be equal to 1 upon 3.
Now, here comes the problem. In the original image, I have the informations at location (0, 0), I
have the information location information at location (0, 1), I have information at location (1, 0),
I have information at location (1, 1). But at location (1, 3) and (1, 3), I do not have any
information because you remember from our earlier classes that whenever we have gone for
image digitization, the first step we had done was sampling and the second step that we had done
quantization.
Now, the moment we sample the image, what we have done is we have a taken some
representative value from discrete grid points. We have not considered the intensity values at all
possible points in the continuous image. So, in the process of the sampling, whatever value was
there at location (1 upon 3, 1 upon 3) in the continuous image, that information is lost. So, in the
digital image at this particular location (1 upon 3, 1 upon 3); I do not have any information. So,
what is the way out?
Now, the only process that I have to do is I have to go for approximation of the intensity value
which should have been at this location (1 by 3, 1 by 3). Now, how to get that approximate
value? That is the problem. So, only way in which this can be done is I have to interpolate the
image in these points where I do not have any information.
And after interpolation, so in these locations where I do not have any information in my discrete
image, I have to interpolate the values in these locations and after interpolation, I have to check
what should be the interpolated value at location (1 by 3, 1 by 3). And whatever value I get at
this location (1 by 3, 1 by 3), this particular value has to be taken to fill up this location (1, 1) in
the transformed image. Similar is the case for the other transformation that is rotation.
In this case also, this rotated image, we have obtained by rotating the original image by 45
degree in the clockwise direction. So, whenever I find any point in the rotated image where there
is no information, what I have to do is that particular coordinate, I have to inverse transform that
is I have to give a rotation to that particular point by minus 45 degree; go back to the original
image point and obviously in this case, in the original image, these row column will not be
integers but they will be real numbers or fractions.
And because we have real number or fractions, for which we do not have any information in the
original digitized image; we have to go for interpolation and after interpolation, we have to go
for resampling to find out what should be the intensity value or approximate intensity value at
that particular location. Then take that intensity value and put it in to the point in the transformed
image where I do not have the information. So, this is why you find that the interpolation and
resampling is very very important whenever you are working in the digital domain or you are
doing some sort of transformations over a digital image.
6
Now, whenever we go for interpolation; so this gives the situation that we have a 1 dimensional
signal f (t) of function t and after sampling, we have got the sampled signal f S (t). So here, you
find that after sampling, what we have got is the sample values which are represented by f S (t).
Now, as we have f S (t), these values are present only at discrete locations. So, at any
intermediate location in this particular one, we do not have any information of this function f (t)
and because we do not have these informations, we go for interpolation for all those values of t
where we do not have the samples present. And after interpolation, again, we have to go for
resampling to fill up those positions.
So, this slide shows a sampled 1 dimensional signal f (t) of a function t. So, after resampling, we
have represented the signal by f S (t) where f S (t) is nothing but a sequence of sample values. So,
in this case you find that we have the values available for say t equal to 0. Here I can put t equal
to 0 at t equal to 1, t equal to 32, t equal to 3 and so on. But I do not have any information for a
value of t which is in between 0 and 1.
So, if I need to obtain a value of f at location say 0.3, then what I have to do is I have to
interpolate this function f S (t) and I have to find out after resampling that what will be the value
of the function at t equal to 0.3. So, this is why the interpolation and resampling is very very
important whenever you are working with a digital signal and a digital image in particular and
you are going for any type of transformation, particularly the rotation and translation of the
digital image.
Usually the translation operation, if it is only translation does not need any kind of resampling or
interpolation. Now, whenever we interpolate an image, the interpolation operation should have
certain desirable properties. Firstly, the interpolation function that you use for interpolating the
discrete values that should have a finite region of support. That means interpolation should be
7
done based on the local information, it should not based on the global information or it should
not take into consideration all the sample values of that particular digitized signal.
The second desirable property is the interpolation should be very smooth. That is the
interpolation should not introduce any discontinuity in the signal. And the third operation is the
interpolation should be shift invariant. So, if the signal is shifted or given some translation, then
also the same interpolation function should be available should be done and the B spline function
is one such function which satisfies all these 3 desired properties.
Now, let us see, what is this B spline function? B spline function is a piece wise polynomial
function that can be used to provide local approximation of curves using very small number of
parameters and because it is useful for local approximation of curves; it can be very very useful
for smoothening operation of some discrete curves, it is also very very useful for interpolation of
a function from discrete number of samples. So, let us see what this B spline function is.
8
A B spline function is usually represented by say x(t) is equal to sum of p i into B i k (t) where
you take the summation from i equal to 0 to n, where 0 to n that is n plus 1 is the number of
samples which are to be approximated.
Now, these points p i , they are called the control points and B i k is the normalized B spline of
order k. So B i k , this is the normalized B spline; B spline of order k and p i are known as the
control points. So, this control points actually decide that how the B spline functions should be
guided to give you a smooth curve.
Now, this normalized B spline that is B i k can be recursively defined as B i 1 is equal to 1

whenever t i is less than or equal to t less than 1. So, this is B i 1 of t, it is equal to 1 whenever t i is
less than or equal to t and which is less than 1. And B i 1 (t) is equal to 0 whenever value of t
takes other values.
So, this is equal to 1 for all values of t lying between t i and 1, t i is inclusive and B i 1 (t) is equal
to 0 for any other value of t and then we can find out B i and k of t using the relation; so, B i k (t)
is equal to t minus t i into B i k minus 1 (t) upon t i plus k minus 1 minus t i plus t i plus k minus 1 into
B i plus 1 k minus 1 (t) upon t i plus k minus t i plus 1. So once we have B i 1 for different values
of t, then from this B i 1 , we can recursively compute the values of B i k using this relation.
9
Now, you find that once we have this relation of B i k (t), you can easily verify that once I have B 0
k(t); then B i k (t) is nothing but translates of B 0 k (t). So, this B i k (t) can be written as this is
nothing but B 0 k (t minus i). So, this can be easily verified from the way this B spline function is
defined.
Now, this B spline function for various values of i and k can be obtained like this. You can easily
get that B 0 1 (t) will be equal to 1 whenever 0 less than or equal to t less than 1 because earlier we
had said that B i 1 (t) is equal to 1 whenever t i is less than or equal to t which is less than 1 and it
is 0 otherwise. So, just by extending this, I can just write that B 0 1 (t) will be equal to 1 whenever
t lies between 0 and 1, 0 inclusive and it will be equal to 0 otherwise.
So, you find that B 0 , 1 (t) is constant in the region 0 to 1. Similarly, we can find out B 0 2 (t) will
be equal to t for 0 less than or equal to t less than 1. It will be equal to 2 minus t for 1 less than or
equal to t less than 2 and it will be equal to 0 otherwise.
10
Similarly, you can find that B 0 3 (t) can be written as t square by 2 for 0 less than or equal to t
less than 1. This will be equal to minus t square plus 3t minus 1.5 for 1 less than or equal to t less
than 2. It will be 3 minus t square upon 2 for 2 less than or equal to t less than 3 and it will be 0
otherwise.
Similarly, we can also find out B 0 4 (t) will be equal to t cube by 6 for 0 less than or equal to t
less than 1. It will be equal to minus 3t cube plus 12 t square minus 12 t plus 4 by 6 for 1 less
than or equal to t less than 2. It will be equal to 3t cube minus 24 t square plus 60 t minus 44
divided by 6 for 2 less than or equal to t less than 3. This will be equal to 4 minus t cube by 6 for
3 less than or equal to t less than 4 and it will be 0 otherwise.
So, you can obtain all these values from the definition of the definition of the B spline function
that we have defined earlier. Now, whenever I write this B i k , this k is called the order of the
order of B spline function.
11
Now, after giving these equations, let us try to see that what is the nature of this B spline
functions. So, you find that B i 1 (t) because it will be equal to 1 for i equal to 1 to 2, between i
equal to 1 to 2 this B i 1 (t) will be equal to constant and it will be equal to 1.
So, this first figure in this particular case tells you that what is the nature of this B 1 B i 1 . The
second figure shows what is the nature of this B spline function if it is B i 2 (t) and it shows that it
is a linear function. So, B i 2 (t) will lie between, will have a support from i to i plus 2 and the
points which will be supported by this B i 2 (t) are i, i plus 1 and i plus 2.
Similarly, a quadratic function which is B i 3 (t) is given by this figure and a cubic function which
is B i 4 (t) is given by this fourth figure and here you find that the region of support for this cubic
function is 5 points, the region of support for quadratic function is 4 points, the region of support
for the linear B spline is 3 points whereas, the region of support for B for the B spline of order 1
is only 2 points. So, in the all the cases, the region of support is finite.
Now, let us see that using these B splines, how we can go for interpolation.
12
(Refer Slide time: 38:31)
As we have said earlier that using these B splines, a function is approximated as f (t) equal to p i
B i 1 (t) or B i k (t) where i varies from 0 to n. So in this case, let us take the value of k to be equal
to 1. That means we have a B spline of order 1 and we have shown that if we have a B spline of
order 1, then the nature of the B spline function is it is constant between i and i plus 1 and which
is equal to 1.
So using this, if I try to interpolate this particular function as shown in this diagram which are
nothing but some sample values, I take say, t is equal to 0 here, t equal to 1 here, t equal to 2
here, 3 here, 4 here, 5, 6, 7, 8 and so on. Now, if I want to find out say f of 1.3. So, f of 1.3
should lie somewhere here. So, to find out this f of 1.3, what I have to do is I have to evaluate
this function f (t) where I have to put t is equal to 1.3 and this f (t) is given by this expression that
is p i B i 1 (t) where i varies from 0 to n.
Now, find that this B i (t) if I take i equal to 1, so B 1 1 is a function like this. So, in between 1 and
2, this B 1 1 is constant and that is equal to 1. So, if I find out the value at this particular point, it
will be p i that is f (s) 1 multiplied by B 1 1 at t equal to 1.3 and which is equal to 1 and this B 1 1 is
equal to 1 for all values of t from t equal to 1 to t equal to 2 but excluding t equal to 2.
So, if I interpolate this function using this B i 1 (t), you will find that this interpolation will be of
this form sorry this interpolation will now be of this form; so, it goes like this. So, all the values
at all points between t equal to 1 and t equal to 2, the interpolated value is equal to f S (1).
Similarly between t equal to 2 and t equal to 3, the interpolated value is equal to f S (2) and so on.
13
Similarly, if I go for interpolation using say linear interpolation where value of k is equal to 2; in
that case, again you find that if I put say t equal to 0 here, t equal to 1, t equal to 2, t equal to 3, 4,
5, 6, 7, 8 like this; now, B 1 2 is a linear function between 1 and 3, so B 1 2 is something like this.
Similarly, B 2 2 is something like this, B 3 2 is something like this.
So, if now I want to have I want to interpolate or I want to find out the value of this function at
point say 2.5, at t equal to 2.5, so here. Then you will find that the sample values which take part
in this interpolation are f 2 and f 3 and by using these 2 sample values, by linear interpolation; I
have to find out what is the interpolated value at t equal to 2.5.
Now, you take a case that I want to interpolate this function value f (t) at t equal to say 3.1 that
means somewhere here. So, if I want to do this, then what I have to do is this will be an
interpolation of p 3 . So, this will be nothing will be nothing but p 3 into B 3 2 at point t equal to 3.1
plus you can easily find out it will be p 4 into B 4 2 again at point 3.1. So, I want to interpolate this
value here.
Now, you find that weight of this p 3 is given by only this much whereas, weight of sorry this will
be p 2 B 2 2 and p 3 B 3 2 . So, weight of B 2 is given by this value, whereas weight of p 3 is given by
this value. So, when I am interpolating at 3.1 and giving less weightage to p 3 which is nearer to
this particular point t equal to 3.1 and I am giving more weightage to this particular point p 2
which is away from 3.1which is not very logical. So in this case, we have to go for some
modification of this interpolation function. So, what is the modification that we can think of?
14
The kind of modification that we can do in this case instead of having the interpolation as say, f
(t) is equal to p i B i k (t) summation over i equal to 0 to n; I will just modify this expression as f
(t) is equal to i will put same B i p i but B, I will shift by some value. So, I will take it B i minus s
k (t). Again, i will vary from 0 to n and the value of this i will be, value of this shift s will depend
upon what is the value of k.
So, I will take s is equal to 0.5 if I have k equal to 1 that is constant interpolation, I will take s
equal to 1 if I have k equal to 2 that is linear interpolation and I will take s equal to 2 if k is equal
15
to 4 that is I have by I have cubic interpolation. Now, you find that I have not considered k equal
to 3 which is quadratic interpolation because quadratic interpolation leads to asymmetric
interpolation.
If I go for other interpolations like k equal to 1 with of course a shift of B i k by a value of 0.5; so
s equal to 0.5 or if I take k equal to 2 which is equal to 1 or if I take k equal to 4 with s equal to
2, what I get is a symmetric interpolation. So, these are the different interpolations, the B spline
interpolation functions that we can use for interpolating the function from sample values.
So, this shows the situation when we have shifted this B i 1 by a value 0.5. So here, you find now
that B i 1 is constant from minus 0.5 to plus 0.5. Similarly 0.5 to 1.5, it will be from 1.5 to 2.5 and
so on. Similarly B i 2 , now it is the regions of support for B 0 2 is between minus 1 and 1, for B 1 2
is from 0 to 2 and so on.
16
So by using this, similarly for cubic interpolation; I do the corresponding shifting.
And by using this, now the interpolation can be obtained as if I go for cubic interpolation that f
(t) is equal to p i B i minus 2, 4 (t) where the summation has to be taken from i equal to 0 to i equal to
n. And here, it gives that if I want to interpolate this function at this particular point; the weight
given by this particular sample is only this much, the weight given by this particular sample is
this much, the weight given by this particular sample is this much, the weight given by this
particular sample, this sample is weighted by this much.
17
So, by taking this weighted average, the weights given by this B spline functions; I can find out
what will be the interpolated value at this particular location.
So, if I use that cubic interpolation function; possibly, I will have for this set of sample values, an
interpolation a smooth interpolation like this. So maybe, this kind of smooth interpolation is
possible using the cubic B spline function. Now, let us see some of the results on the images that
we have obtained.
18
So, this is the example of a scaling operation. I have a small image which is scaled off by a factor
3 in both X direction and Y direction. You find that the image on the left side is obtained by
scaling without applying any interpolation. So obviously, you will find that here the image
appears to be a set of dots where many of the pixels in the image are not filled up. If I go for
nearest neighbor interpolation or constant interpolation, this is the reconstructed image that I can
get and you find that here, I have the blocking artifacts and the image appears to be a collection
of blocks.
Similarly, if I go for other interpolation techniques; if I use quadratic B spline interpolation, then
this is the quality of the image that we obtain. If we go for cubic B spline interpolation, then this
is the quality of the image that we can obtain.
19
The same result is also the same experiment is also done in case of rotation.
This particular image has been rotated by 30 degrees and if I do not apply any interpolation; so
without interpolation, the rotated image is shown on the left hand side and here again you find
that there are a number black spots in this rotated image which cannot be filled up which could
not be filled up because no interpolation was used in this case. Whereas, in the right hand side,
this particular image after rotation has been interpolated using by cubic interpolation function.
So, you find that all those black spots in the digital image had been filled up.
20
Now, let us see the answers of the quiz questions that we had given in the last class. In the last
class, we had given a quiz question for finding out the 3D coordinate point of a world point per
its coordinate in the left camera and the coordinates in the right camera are given.
The value of the focal length lambda was give as 40 millimeter and the X coordinate in one case
was given as 0.2 in the left camera and in the right camera, it was given as 0.4 and the camera
separation was given as 6 centimeter that is 60 millimeter.
21
So, if I simply use the formula; say, Z equal to lambda plus lambda B by x 2 minus x 1 , you will
find that this will be equal to 40 plus 40 into 60 divided by 0.2 because x 2 minus x 1 in this case
is 0.2 and which comes out to be 12040, so much millimeter. So, this is the value of Z that is the
depth information and once I have the value of Z, then the 3D coordinates X and Y can be
computed from this value of Z.
So, X is nothing but x 0 by lambda into lambda minus Z which is nothing but 0.2 upon on 40 into
minus 12000 which is equal to minus 60 millimeter and by applying the same procedure, you can
find out Y equal to minus 90 millimeter. So, this is about the first question.
Second question was to find out what is the minimum depth that can be obtained by using the
stereo camera where the geometry of the stereo camera was specified. Now, this is also very
simple.
22
What you can do is you find that we have 2 cameras with certain dimension of the imaging plate.
They have been specified certain focal length and if I just find out what is the limit of the
imaging image points; so we find that if there is any point beyond this particular line, this point
cannot be imaged by the left camera. If there is any point in this direction, that cannot be imaged
by the right camera because it goes beyond the imaging plate.
And also, for finding out the depth information, it is necessary that the same point should be
imaged by both the cameras. So, the points which can be imaged by both the cameras are only
the points lying in this particular conical region. The points belonging to this region or the points
belonging to this region cannot be imaged by both the cameras. So, all the points must be lying
within this. So, the minimum depth which can be computed is this particular depth.
Now, I know what is the separation between the cameras, I know what is the dimension of the
imaging planes, I also know what is the focal length. So, from these informations by using the
concept of similar triangles, you can easily find out what is the minimum depth that can be
computed by using this stereo setup.
23
Now, coming to today’s questions. So, the question number 1 - if an image represented by the
following matrix is scaled up by a factor of 3 in both X and Y directions, what will be the scaled
image with nearest neighbor interpolation and bilinear B spline interpolation?
The second question - if the same image which is given by this matrix again is rotated by an
angle of 45 degrees in the clockwise direction, what will be the rotated image? Use bilinear B
spline for interpolation.
Thank you.
24
Prof .P .K. Biswas
Lecture - 10
Image Interpolation – II
Hello, welcome to lecture series on digital image processing. In the last class, we started our
discussion on image interpolation and resampling. Today we will continue with the discussion on
the same topic that is image interpolation and resampling and today we will try to explain the
concepts with the help of a number of examples.
So in the last class, we have talked about interpolation and resampling and mostly we have
discussed about the interpolation problems in case of 1 dimensional signal. We have also seen
that what are the desirable properties of the interpolation functions and we have seen that it is the
B Spline function which satisfies all the desired properties of interpolation. And, we have seen
the B Spline functions; how different orders and we have seen, how this B Spline functions can
be used for interpolation operations.
1
In today’s lecture, we will explain the interpolation in 1 dimension with the help of a number of
examples. Then we will see the extension of this 1 dimensional interpolation to 2 dimensional
images. Again, we will explain the image interpolation with examples and then at the end of
today’s lecture, the students will be able to write algorithms for different image interpolation
operation.
So, let us see that what we mean by image interpolation operation. So here, we have shown a
diagram which shows the sample values of 1 dimensional signal say f (t) which is a function of t.
2
So, you find that in this particular diagram, we have a given a number of samples and here the
samples are present for t equal to 0, t equal to 1, t equal to 2, t equal to 3, t equal to 4 and t equal
to 5. The functions are given like time, the sample data. You find that the sample values are
available only at 0, 1, 2, 3, 4 and 5. But in some applications, we may need to find out the
approximate value of these functions at say t equal to 2.3 or say t equal to 3.7 and so on.
So again here, in this diagram, you find that at t equal to 2.3 is somewhere here. I do not have
any information or say t equal to 3.7 somewhere here, again I do not have any information. So,
the purpose of image interpolation is by making use or the signal interpolation is by using the
sample values at these distinct locations. We have to reconstruct or we have to approximate the
value of the function f (t) at any arbitrary point in the time axis. So, that is the basic purpose of
the interpolation operation.
So, whenever you go for some interpolation, we have to make use of certain interpolation
functions and these interpolations operation of the interpolation function should satisfy certain
conditions.
The conditions are; the interpolation function should have a finite region of support. That means
when we do the interpolation, we should not consider the sample values from say minus infinity
to plus infinity; rather if I want to approximate the function value at locations say t equal to 2.3,
then the samples that should be considered are the samples which are nearer to t equal to 2.3. So,
I can consider a sample at t equal to 1, I can consider the sample at t equal to 2, I can consider
the sample at t equal to 3, I can consider the sample at t equal to 4 and so on.
But for approximate the functional value to approximate the functional value at t equal to 2.3, I
should not consider the sample value at say t equal to 50. So, that is what is meant by finite
region of support.
3
Then the second property which this interpolation operation must satisfy is it should be a smooth
interpolation. That means by interpolation, we should not introduce any discontinuity in the
signal. Then the third operation, the third condition that must be satisfied for this interpolation
operation is that the interpolation must be shift invariant. That is if I shift the signal by say t
equal to 5, even then the same interpolation operation, the same interpolation functions should
give me the same result in the same interval. So, these are what are known by the shift invariance
property of the interpolation.
And, we have seen in the last class that B Spline interpolation functions satisfy all these 3
properties which are desirable properties for interpolation. So, this B Spline functions are
something like this.
We have seen that for interpolation with the help of B Spline function, we use a B Spline
function which is given by B i k .
4
So, let me just go to what is the interpolation operation that we have to do. So for interpolation,
what we use is say f (t) should be equal to some p i into B i k (t) where i vary from 0 to say n
where p i indicates the i’th sample and B i k is the interpolation function. And, we have defined in
the last class that this B i k can be defined as B i k , it can be defined recursively as t minus t i into B i
k minus 1 (t) upon t i plus k minus 1 minus t i plus t i plus k minus t into B i plus 1 , k minus 1 (t) upon t i plus k
minus t i plus 1 where B i 1 (t) is given by it is equal to 1 for t i less than or equal to t less than t i plus
1 and it is equal to 0 otherwise.
So, you find that when we have defined B i 1 (t) to be 1 within certain region and it is equal to 0
beyond that region; then using this B i 1 , I can estimate I can calculate the values of other B i k by
using the recursive relation. And pictorially, these different values of B i k for k equal to 1, it is a
constant. For B i 2 that is for k equal to 2, it is a linear operation, linear function. For k equal to 3,
B i 3 is a quadratic function and for k equal to 4 that is B i 4 , it is a cubic equation.
And, we have said in the last class; so, here you find the region of support for B i 1 is just 1
sample interval. For B i 2 , the region the region of support is just 2 sample intervals. For B i 3 , it is
3 samples intervals and for B i 4 , it is 4 sample intervals. And, we have mentioned in the last class
that out of this, the quadratic one that is for the value k equal to 3, it is normally not used because
this does not give a symmetric interpolation. Whereas, using the other 3 that is B i 1 , B i 2 and B i 4 ,
we can get symmetric interpolation.
So normally, the functions, the B Spline functions which are used for interpolation purpose are
the first order that is equal to 1, the second order or linear that is k equal to 2 and the cubic
interpolation that is for k equal to 4.
5
And, we have also said that these functions for k equal to 1, k equal to 2 and k equal to 4 can be
approximated by B 0 1 (t) is equal to 1 for 0 less than or equal to t less than 1 and it is equal to 0
otherwise. So, only in the range 0 to 1 excluding t equal to 1, B 0 1 equal to 1 and beyond this
range, B 0 1 is equal to 0. Then B 0 2 is defined like this that it is equal to t for 0 less than or equal
to t less than 1 and it is equal to 2 minus t for 1 less than or equal to t less than 2 and it is equal to
0 otherwise.
So, here again you find that for the values of t between 0 and 1, B 0 2 (t) increase linearly. For t
equal to 1 to 2, the value of B 0 2 decreases linearly and beyond 0 and 2 that is for values of t less
than 0 and for values of t greater than 2, the value of B 0 2 is equal to 0.
Similarly for the quadratic one sorry for the cubic one, B 0 4 is defined as B 0 4 (t) will be defined
as t cube by 6 for 0 less than or equal to t less than 1. It is defined as minus 3t cube plus 12t
square minus 12t plus 4 divided by 6 for 1 less than or equal to t less than 2. This is equal to 3t
cube minus 24 t square plus 60 t minus 44 divided by 6 for 2 less than or equal to t less than 3.
This is equal to 4 minus t whole cube divided by 6 for 3 less than or equal to t less than 4 and it
is 0 otherwise.
So, these are the different orders of the B Spline functions. So, here again you find that for value
equal to value of t less than 0, B 0 4 (t) equal to 0 and similarly for value of t greater than 4, B 0 4
(t) is also equal to 0. So, these are the B Spline functions using which the interpolation operation
can be done. Now, let us see that how do we interpolate.
6
Again, I take the example of this sample data where I have a number of samples of a function t
and which is represented by f S (t) and the value of f S (t) at present at t equal to 0, t equal to 1, t
equal to 2, t equal to 3, t equal to 4 and t equal to 5. As we said that the interpolation function is
given by this f (t) is equal to p i B i 1 (t) if I go for constant interpolation.
Now here, B i 1 (t), so when I have computed, say B 0 1 (t), we have said that these values equal to
1 for t lying between 0 and 1 and this is equal to 0 otherwise. So, if I interpolate this particular
sample data say for example I want to find out what is the value of the signal at say 1.3. So, this
is the point, t equal to 1.3. So, I want to find out the value of f (t) at t equal to 1.3. So, to do this,
my interpolation formula says that this should be equal to f (1.3), this should be equal to p 0 into
B 0 1 (1.3) plus p 1 B 1 1 (1.3) and so on. Now, if I plot this B 0 1 (1.3) just super impose this on this
particular sample data’s sample data diagram, you find that B 0 1 (1.3) or B 0 1 (t), the function is
equal to 1 in the range 0 to 1 excluding 1.
Similarly p 1 B 1 1 (1.3), the value will be equal to 1 in the range 1 to 2 excluding 2 and it will be 0
beyond this. So, when I try to compute the function value at t equal to 1.3, I have to compute p 0 .
That is the sample value at t equal to 0 multiplied by this B 0 1 (t). Now, because B 0 1 (t) is equal
to 0 for values of t greater than or equal to 1, so these particular term p 0 B 0 1 (t), this term will be
equal to 0.
Now, when I compute this, p 1 into B 1 1 (1.3), you will find that this B 1 1 (1.3) is equal to 1 in the
range 0 to 1 to 2 excluding 2 and beyond 2, the value of B 1 1 (t) is equal to 0. Similarly, for
values of t less than or less than 1, the value of B 1 1 (t) is also equal to 0. Similarly p 2 B 2 1 that
value is 1 within the range 2 to 3. So within this range, B 2 1 (t) is equal to 1 and beyond this, B to
1 equal to 0.
So, when I try to compute the value at the point 1.3, you find that this will be nothing but p 1 into
B 1 1 (1.3) and this B 1 1 (1.3) is equal to 1. So, the value at this point will be simply equal to t 1;
7
so in this case, it is f S (1). And, you find that for any other values of t within the range 1 to 2, the
value of f (t) will be same as f 1 or p 1 . So, I can approximate this or I can do this interpolation
like this. That is between 1 and 2, all the values the function values for all values of t between 1
and 2 is equal to 1. Following similar argument, you will find that between 2 and 3 the function
value will be equal to f 2 , between 3 and 4 the function value will be equal to f 3 , between 4 and 5
the function value will be equal to f 4 and it will continue like this.
So, this is what I get if I use this simple interpolation formula. That is f (t) equal to p 1 p i B i 1 .
Similar is also the situation if I go for linear interpolation. So, what I get in case of linear
interpolation?
In case of linear interpolation, f (t) is given by B i 2 p i into B i 2 (t) where we have to take the
summation of all these terms for values of i from i equal to 0 to n. So, what we get in this case?
You find that we have said that B i 0 (t) is nothing but a linearly increasing and decreasing
function. So, if I plot p i B 0 2 (t), B 0 2 (t) is a function like this which increases linearly between 0
and 1. At 1 this reaches the value 1 and then again from t equal to 1 and t equal to 2, the value of
B 0 2 (t) decreases linearly and it becomes 0 at t equal to 2.
Similarly, B 1 2 (t) will have a function value something like this. It will increase linearly from 1
to 2, it will reach a value of 1 at t equal to 2 and again from t equal to 2 to t equal to 3, it decrease
linearly. Then at t equal to 3, the value of B 1 to t becomes equal to 0. So, here again if I want to
find out say for example, the value of function at say t equal to 1.7. So, I have the functional
values at t equal to 1, I have the value of the function at t equal to 2. Now, at t equal to 1.7, I
have to approximate the value of this function using its neighboring pixels.
Now, if I try to approximate this, you find that using this particular interpolation formula; here
again, 1.7 is to be computed as p 0 B 0 2 (1.7) plus p 1 B 1 2 (1.7). So, it will continue like this. Now,
here we find that the contribution to this point by this sample p 0 by this sample value p 0 is given
8
by this interpolation function B 0 2 (t) and by this, the contribution to this point t equal to 1.7 by
this sample f 1 or p 1 is given by B 1 2 (1.7), contribution to this point by the sample value f (2) is
given by B 2 2 (1.7). But B 2 2 (1.7) is equal to 0 at t equal to 1.7 because B 2 2 (1.7) is something
like this.
So, value of this function is 0 at t equal to 1.7. So, only contribution that we get at t equal to 1.7
is from the sample f (0) and from the sample f (1). So, using this, I can estimate what will be the
value of B 0 2 at 1.7, I can also at estimate what will be the value of B 1 2 1.7. And in this particular
case, we have seen a property of this B Spline function that is we have said earlier that B i k (t) is
nothing but B 0 k into t minus i. So, that is a property of this B Spline functions.
So, when I do this, you find that this B 1 2 (1.7) is nothing but B 0 2 because this is t minus i and
value of i is equal to 1. So, this will be B 0 2 (0.7). So, if I simply calculate the value of B 0 2 for
different values of t, I can estimate that what will be B 0 2 (1.7). And, in that case, the value at this
location that is f (1.7) will now be given by; if I simply calculate this, so in this case f of 1.7 will
be given by p 0 into B 0 2 (1.7) plus p 1 into B 0 2 (0.7) because this is same as B 1 2 (1.7) where
value of p 0 is equal to 0.5 which is the sample value at location t equal to 0 and value of p 1 is
equal to 1 that is the sample value at location t equal to 1.
Now, here we have find that there is a problem that if when I am trying to compute the value at t
equal to 1.7, the contribution only comes from the sample values at t equal to 0 and t equal to 1.
But this interpolation or this approximate value does not have any contribution from t equal to 2
or t equal to 3.
So, your interpolation or approximation that you are doing is very much burst because it is only
considering the sample values to the left of this particular point, we are not considering the
sample values to the right of this particular point t equal to 1.7. So, that is the problem with this
basic interpolation formula.
9
So, to solve this problem, what we do is instead of using the simple formula that is f (t) equal to
summation p i B i k (t) where i varies from 0 to n, we slightly modify this interpolation formula.
We make say, f star t is equal to p i B i minus S , k (t) where again i varies from 0 to n. And the value
of S, we decide that for k equal to 1 that is when you go for constant interpolation, we assume
value of S to be 0.5. For k equal to 2 that is for linear interpolation, we assume value of S to be 1
and for k equal to 4 that is for cubic interpolation, we assume value of S to be equal to 2.
So, here again we find that I have not considered k equal to 3 because as we said that k equal to 3
gives you the quadratic interpolation and in case of quadratic interpolation, the interpolation is
not symmetric. So, what we effectively do by changing B i k (t) to B i minus S , k (t) is that the
interpolation function or the B Spline interpolation function, we give this is a shift by S in the
left direction while we consider the contribution of the sample p i to the point t the for
interpolation purpose.
Now, let us see what we get after doing this. So, as we said that with k equal to 1, I take the
value of S is equal to 0.5. So, in the formula that p i B i k (t) when I consider the contributions of
sample p0; in the earlier formulation we had to use the B Spline function B 0 k (t). Now, if k
equal to 1, this is the constant interpolation. So, I have to consider B 0 1 (t). Using this modified
formulation, when I consider the contribution point p 0 , I do not consider the B Spline function to
be B 0 1 (t) but rather I consider the B Spline function to be B minus 0.51 (t).
Similarly, for the linear operation when I take the contribution of point p 0 to any obituary point
along with B p 0 , I had to consider as per the initial formulation B 0 2 (t). Now, using this modified
formulation, I will use minus B minus 1 2 (t). Similarly for the cubic interpolation, again with p 0 , I
will consider the B Spline to be B minus 4t instead of B 0 4 (t).
10
So, here we find that in this particular diagram using this formulation, effectively what we are
doing is we are shifting the B Spline functions by the value of S in the left word direction. So, B
0 1 (t), in the earlier case we had B 0 1 (t) to be 1 between 0 and 1. So, B 0 1 (t) is something like
this.
Now, along with p 0 , I do not consider B 0 1 (t) but I will consider B minus 0.5 , 1 (t) and minus 0.5, 1
(t) is equal to 1 for values of t between minus 0.5 and plus 0.5. And, value of B minus 0.5,1 (t) will
be equal to 0 beyond this range. Similar is also the case for the linear interpolation, the linear B
Spline and it is also similar for the cubic B spline that is B i 4 (t) and in this case it will be i minus
2, 4 (t). So using this, let us see that how it helps us in the interpolation operation.
So, as I said that for interpolation, when I consider the contribution of p 0 to any particular point;
along with p 0 , I will consider the B Spline function to be B minus 0.5 , 1 (t) for constant
interpolation. So, it appears like this that for contribution of p 0 , I consider B minus 0.5 , 1 (t). To find
out the contribution of p 1 , I consider the B Spline function to be B 0.5 , 1 (t). Similarly, in case of
a linear interpolation, to find out the contribution of point p 0 , I consider the B Spline function to
be B minus 1 , 2 (t) and to find out the contribution of p 1 , I consider the B Spline function to be B 0 2
(t) and so on.
11
Similar is also case for the cubic interpolation. Here again, to find out the contribution of say p 0 ,
I have to consider the B Spline function of B minus 2 , 4 (t). To find out the contribution of p 1 , I
have to consider the B Spline function of B minus 1 , 4 (t). To find out the contribution of p 2 , I have
to consider the B Spline function of B 0 4 (t) and so on.
Now, let us see that using this kind of using this modified formulation; whether our interpolation
is going to improve or not. So, let us take this particular case. Again, we go for constant
interpolation.
12
Here again, I have shown the same set of samples and now suppose I want to find out what will
be value of the function at say t equal to 2.3 and for considering this, I will consider the equation
to be p i into B i minus 0.5 , 1 for constant interpolation value of k is equal to 1(t) and I will take the
sum for i equal to 0 to n. So, these will give me the approximate or interpolated value at t.
So, again you find that coming to this diagram, as we have said that when I consider B i minus 0.5 (t)
in that case, B minus 0.5,1 (t) is equal to 1 between the range minus 0.5 to plus 0.5 and beyond this
range, B minus 0.5 (t) will be equal to 0.
So, when I compute p 0 B minus 0.5 , 1 (t) for computation of this particular component along with
this sample p 0 which is equal to 0.5, I have to consider the Bezier function. The Bezier sorry B
Spline interpolation functions as this which is equal to 1 from minus 0.5 to plus 0.5. So, here we
are going to find that because this B minus 0.5 , 1 (t) is equal to 0, beyond t equal to 1.5, so this p 0
does not have any contribution to point t equal to 2.3because at this point the contribution of this
or this particular product p 0 B minus 0.5 , 1 (t) will be equal to 0.
Similarly, if I consider the effect of point p 1 to t equal to 2.3, the effect of point p 1 will also be
equal to 0 because the product p 1 B minus 1 p 1 B 0.5 , 1 (t) is equal to 0 at t equal to 2.3. But if I
consider the effect of p 2 which is equal to 0.6 here; you find that for this, the B Spline
interpolation function has a range something like this, the region of support.
So, this is equal to 1 for t equal to 1.5 to t equal to 2.5 and it is equal to 0 beyond this junction.
To find out the contribution of p 3 which is equal to 1 point 1.1, here again you can see that to
find out the contribution of this particular point, the corresponding B spline function that is B 2.5
(t) is equal to 1 in the range 2.5 to 3.5 and it is equal to 0 outside.
So, even p 3 does not have any contribution to this particular point t equal to 2.3. So, at t equal to
2.3 if I expand this, I will have a single term which is equal to p 1 into B sorry p 2 into B 1.5 , 1 (t)
so where t is equal to 2.3 and the same will be applicable for any value of t in the range t equal to
1.5 to t equal to 2.5.
13
So, I can say that using this formulation, what I am getting is I am getting the interpolation
something like this. After interpolation, using this constant interpolation function after our
modified formulation, the value of the interpolated function will be say from t equal to 0 to t
equal to 1.5, the f (t) value of f (t) will be equal to f (0). Between 1.5 to between 0.5 to 1.5, value
of f (t) will be equal to f (1).
From point 1.5 to 2.5, the value of f(t) will be equal to f (2), from 2.5 to 3.5, the value of f (t) will
be equal to f (3), from 3.5 to 4.5, the value of f (t) will be equal to f (4) and from 4.5 to 5.5, the
value of f (t) equal to f (5) and this appears to be a more reasonable approximation because what
we are doing is whenever we are trying to interpolate at a particular value of t, what we are doing
is we are trying to find out what is the nearest sample to that particular location of that particular
value of t and whatever is the value of the nearest sample, we are simply copying that value to
this desired value of t.
So here, for any point within this range that is for t equal to 1 to t equal to 1.5, the nearest sample
is f (1). For any sample from 1.5 to 2, the nearest sample is p 2 or f (2). So, this f (2) is copied to
this particular location t where t is from 1.5 to 2. So, this appears to be a more logical
interpolation than the original formulation of interpolation. Similar is also case for linear
interpolation.
14
In case of linear interpolation, when I consider the value of p 0 , what I do is I consider B minus 1 , 2
(t) and B minus 1 , 2 (t) is something like this where from minus 1 to 0, this increases linearly,
attains a value of 1 at t equal to 0. Similarly, if when I consider the contributions of p 1 , the
corresponding Bezier interpolation function that I have to consider is B 0 to t which is something
like this.
So now, if I want to find out what is the value at the same point say 2.3, you find that to find out
the value at point 2.3, the contribution of p 1 will be equal to 0 because the value of B 0 to t is
equal to 0 beyond point t equal to t at t equal to 2. And by this, you will find that the only
contribution that you can get to this point t equal to 2.3 is from the point p 2 and from the point
p3.
And, in this particular case, I will have f (2.3) which will be nothing but p 2 , then B because it is I
have to make it i minus, 2 in i minus 1 in this particular case. So, this will be equal to B 1 , 2 (2.3)
plus p 2 into B sorry p 3 into B 2 , 2 (2.3) just this.
So, the contribution of this point to this point t equal to 2.3 will be given by this value and the
contribution of point p 3 will be given by this value and here you find that because the function
increases linearly from 0 to 1 between t equal to 2 and t equal to 3 and it decreases linearly from
3 to 0 when t varies from 3 to 4; so here, you will find that the value that we will get will be
nothing but p 2 into the value of this function in this particular case will be equal to 0.7 plus it
will be p 3 into 0.3.
So, if I simply replace the value of p 2 which is equal to 0.6 and p 3 which is equal to 1.1, I can
find out what is the value of f (t) at t equal to 2.3. So, similar is also the case for cubic
interpolation.
15
And in case of cubic interpolation, you find that your region of supports will be something like
this. Sorry so the region the nature of the region of support will be something like this and again
by using the same formulation that is f (t) is equal to p i B i minus 2 in this particular case into k
value of k is equal 4t, I can find out, I am taking the summation for i equal to 0 to n; I can find
out what will be the value of f (t) for any particular time instant t or any arbitrary time instant t.
16
So, by modifying this interpolation operation, we can go for all these different types of
interpolation. Now, to explain this, let us take a particular example. So, I take an example like
this.
I take a function say f, the function values are like this that f of 0, so I take f of 1 is equal to say
1.5, I take f of 2 is equal to 2.5, I take say f of 3 is equal to say 3, I take f of 4 is equal to
something like say 2.5, I take f of 5 is equal to again 3, f of 6 will be something like 2.4, I can
take f of 7 to be something likes say 1, I can take f of 8 to be something like 2.5 and I wants to
find out the approximate value of this function at say t equal to 4.3.
So, given this sample values, I want to find out what is the value of this function at t equal to 4.3.
So, I want to find out f (4.3) given this sample values. Now, suppose the kind of interpolation
that I will use is a cubic interpolation, so I use cubic interpolation and using these samples, I
want to find out the value of f (4.3). So, let us see that how we can do it.
17
Here, you find that f (4.3) can be written as using considering the region of support, this can be
written as f (3) into B 1 4 (4.3). So, you find that since this f (3) is nothing but p 3 in our case, so
this B i becomes B i minus 2 . So, 3 minus 2 is equal to 1, so I am considering B 1 4 and at location t
equal to 4.3. So, this will be plus f (4) which is nothing but p 4 into B 2 4 (4.3) plus it will be f (5)
into B 3 4 at location t equal to 4.3 plus it will be f (6) into B 4 4 at location t equal to 4.3.
Now, as we said that we have told that B i , k (t) is nothing but B 0 k (t) minus i. So, just by using
this particular property of the B Spline functions, I can now rewrite this equation in the form that
this will be equal to f (3) or p 3 into B 0 4 and this will be now t minus i, i is equal to 1, so this will
be 3.3 plus f (4) into B 0 4 (2.3) plus f (5) into B 0 4 (1.3) plus f (6) into B 0 4 and this will be equal
to 0.3.
Now, we can compute the values of this B 0 4 (3.3), you can compute the values of B 0 4 (2.3), you
can compute the value of B 0 5 (1.3) and you can also compute the value of B 0 4 (0.3) using the
approximate analytical formula of B 0 4 (t) that you have given you have seen that this is nothing
but a cubic formula of variable t. So, if I do this, you will find that and using the sample values,
this B 0 4 (3.3), this gets a value of 0.057. B 2 4 B 0 4 (2.3), this gets the value of 0.59, B 0 4 (1.3), this
gets the value of 0.35 and this B 0 4 (0.3) gets a value of 0.0045 and you can verify this by using
the computation the analytical formula that we are going to learn.
And by using the values of f (3) f (4) f (5) and f (6), if I compute this equation; then I get the
final interpolated value to be 0.7068 sorry 2.7068. Now, if I do the same computation using
constant interpolation and as I said that the constant interpolation is nothing but the nearest
interpolation; so, when I try to find out the value at f (4.3) the point t equal to 4.3 is nearest to
point t equal to 4 at which I have a sample value.
18
So, using nearest neighbor or constant interpolation, f (4.3) will be simply equal to f (4) and
which in our case is equal to 2.5. Whereas, if I go for linear interpolation, again you can compute
this using the linear equations that we have said that using linear interpolation, the value of f
(4.3) will be equal to 2.65.
So, you find that there is slight difference of the interpolated value whenever we go for constant
interpolation or we go for linear interpolation or we go for cubic interpolation. So, using this type
of formulation, we can go for interpolation of the 1 dimensional sample functions. Now, when
you go for the interpolation of image functions, you find that images consist of a number of rows
and a number of columns. Now, the interpolation that we have discussed so far, these
interpolations are mainly valid for 1 dimensional.
Now, the question is how do we extend this 1 dimensional interpolation operation into 2
dimensions so that it can be applied for interpolation images as well?
So, in case of image what we will do is as the image is nothing but a set of pixels arranged in a
set of rows and in a set of columns.
19
So, let us consider a 2 dimensional grid to consider the image pixel. So, I have the grid points
like this. So in case of an image, I have the image points or image pixels located at this location,
located at this location. So, these are the grid points where I have the image pixels. So, it is
So, I said this is location (0, 0), this is location say (0, 1), this is location (0, 2) and so on so that I
have these as the 0’th row, these as the 0’th column and similarly I have row number 1, row
number 2, row number 3, row number 4 and column number 1, column number 2 and column
number 3, column number 4 and so on.
Now given this particular situation; so, I have the sample values present at all these grid points.
Now, given this particular pixel array which is again in the form of a 2 dimensional function
matrix, I have to find out what will be the pixel value at this particular location. Suppose, this is
say location (4, 4) let us assume, this is at pixel location at (4, 5) - 4th row, fifth column, this
may be a pixel locations say (5, 4) that is fifth row 4th column and this may be the pixel location
say (5, 5) that is fifth row and fifth column.
So, at these different pixel locations, I have the intensity values or pixel values and using this I
want to interpolate what will be the pixel value at locations say 4.3 and say 4.2. So, I want to
compute what will be the pixel value at this particular location 4.3 and 4.2. So, now you find that
the earlier discussion what we had for interpolation incase of 1 dimension, now that has to be
extended for 2 dimension to get the image interpolation. Now, the job is not very complicated, it
is again a very simple job.
20
What we have to do is I have to take the rows one after another. I also have to take the columns
one of after another. So, first what you do is you go for interpolation along the rows and then you
try for interpolation along the column and for this interpolation, again I can go for either constant
interpolation or I can go for linear interpolation or I can go for cubic interpolation. But now
because our interpolation will be in 2 dimensions; so the kind of interpolation, if it is linear
interpolation, it will be called a bilinear interpolation, for cubic interpolation, it will be called a
bicubic interpolation.
So, let us see that what will be the nature of the interpolation if I go for a constant interpolation.
So effectively, what we will do is so in this particular case, we will interpolate along the row 4,
we will also interpolate row 5. So, we will try to find out what is the value, pixel value at
location (4, 4.2). We will also try to find out what is the pixel location at (5, 4.2). So, once I have
the pixel values interpolated values at this location and this location that is (4, 4.2) and (5, 4.2),
using these 2 sample values, I will try to interpolate the value at those this location (4.3, 4.2). So,
simply extending the concept of 1D interpolation to 2D interpolations.
Now, what will be the nature of this interpolation if I go for constant interpolation? So, as we
said the constant interpolation simply takes the nearest neighbor and copies to the particular, the
arbitrary location which does not occur on the regular grid.
So, at this location I have a pixel value, at this location I have pixel value, at this location I have
a pixel value, at this location I have a pixel value. Now, if I want to find out what will be the
value the pixel at this particular location, what I do is I simply try to find out which is the nearest
neighbor of this particular point and here, you find that the nearest neighbor of this particular
point is this point.
So, all the points which are nearest to this point within this square, all the pixel values will get
the value of this particular pixel. So, it will be something like this. Similarly all the pixels, all the
21
points lying within this region will get the pixel values of this particular region. Similar is the
case here and similar is the case here. When you go for bilinear interpolation, when I try to
interpolate the pixel at any particular location like this, so again here, incase of bilinear
interpolation if I want to find out the pixel value at any particular, any arbitrary grid locations
something like this, so somewhere here, what I have to do is; I have to consider this pixel and
this pixel, do the bilinear interpolation to find out the pixel at this location. I consider this pixel
and this pixel do bilinear interpolation to find out the pixel to value at this location. Then using
this and this, doing bilinear interpolation along the column; I can find out what is the pixel value
at this particular image and the same concept can also be extended for bicubic interpolation.
So by this, we explain how to do interpoltion, either constant interpoltion which we have said is
also the nearest inerpolation, we can go for linear interpolation. In case of image, it is bilinear
interpolation or the cubic interpoltion. Incase of image, it is bicubic interpolation where you have
to do interpoltion both along the rows and after the rows, you can do it along the columns. It can
also be reversed. First, you can do the interpolation along column, then using the interpolated
value along 2 or more columns, I can findout the interpolated value on any row location which
does not fall on any regualar grid point.
So, now let us seen that whatever results, these results we had already shown in the last class. So,
find that the first 1 is interpolated using nearest neighbour interpolation and as we have
explained that because the value of the nearest pixels is copied to all the arbitrary location; so,
this is likely to give blocky … defect and in this nearest interpolted image, you also find that
those blocking defects are quite prominent. We have also seen the output with other interpolation
operations.
22
We have shown the output with Linear B Spline interpolation and also shown with cubic B
Spline interpolation.
So, this is case with rotation.
23
Again when you rotate, if you do not interpolate, you get a number of patches black patches as
shown in the left image; if you go for interpolation, all those black patches will be removed and
you get a continuous image as it is shown in the right image.
Now, this interpolation operation is useful not only for this translation or rotation kind of
operations, you find that in many other applications, for example in case of satellite imaginary.
When the image of the earth surface is taken with the help of a satellite; now because of earth
rotation, the image which is obtained from the satellite, the pixels always does not follow
regular.
So in such cases, what we have to go for is to rectify the distortion or to correct the distortion
which appears in the satellites images and this distortion is mainly due to the rotation of the earth
surface. So, for correction of those distortions, the similar the type of interpolation is also used.
Thank you.
24
Prof. P. K. Biswas
Lecture - 11
Image Interpolation - I
Digital image processing: in the last class we have seen the interpolation and re-sampling
operation of images and we have seen different applications of the interpolation and re-sampling
operations.
So, while we have talked about the interpolation and re-sampling, we have seen that it is the B
spline functions or B spline interpolation functions of different orders which are mainly used for
image interpolation purpose and before this interpolation, we have also talked about the basic
transformation operations and the transformation operations that we have discussed, those were
mainly in the class of geometric transformations.
That is we have talked about the transformation like translation, we have talked about rotation,
we have talked about scaling and we have seen that these are the kind of transformations which
are mainly used for coordinate translation. That is given a point in 1 coordinate system, we can
translate the point or we can represent the point in another coordinate system where the second
coordinate system may be a translated or rotated version of the first coordinate system.
1
We have also talked about another type of transformation which is perspective transformation
and this perspective transformation is mainly used to find out or to map a point in a 3
dimensional world coordinate system to a 2 dimensional plane where this 2 dimensional plane is
the imaging plane.
So, there our purpose was that given a point or in the 3D coordinates of a point in a 3
dimensional coordinate system; what will be the coordinate of that point on the image plane
when it is imaged by a camera? In today’s lecture we will talk about another kind of
transformation which we call as image transformation.
So, we will talk about or we will explain the different image transformation operations. Now,
before coming to specific transformation operations like say Fourier transform or discrete cosine
transform or say discrete cosine transform, before we come to such specific transformations, we
will first talk about a unitary transformation which is a class of transformations or class of
unitary transformations and all the different sort of transformations that is whether it is discreet
Fourier transform or discrete cosine transform or hadamard transform; all these different
transforms are different cases of this class of unitary transformations.
Then, when we talk about this unitary transformation, we will also explain what is an orthogonal
and orthonormal basis function. So, we will see that what is known as an orthogonal basis
function, what is also known as an orthonormal basis function. We will also explain how an
arbitrary 1 dimensional signal can be represented by series summation of orthogonal basis
vectors and we will also explain how an arbitrary image can be represented by a series
summation of orthonormal basis images.
2
Now firstly, let us see that what is the image transformation. You find that in this case, we have
shown a diagram where the input is an image and after the image is transformed, we get another
image. So, if the size of the input image is N by N, say it is having N number of rows and N
number of columns, the transformed image is also of same size that is of size N by N. And, given
this transformed image, if we perform the inverse transformation; we get back the original image
that is image of size N by N.
Now, if given an image, by applying transformation, we are transforming that to another image
of same size and doing the inverse transformation operation; we get back the original image.
Then the question naturally comes that what is the use of this transformation? And, here you find
that after a transformation, the second image of same size N by N that we get, that is called the
transformed coefficient matrix.
So, the natural question that arises in this case that if by transformation and going to another
image and by using inverse transformation, I get back the original image; then why do we go for
this transformation at all? Now, we will find and we will also see in our subsequent lectures that
this kind of transformation has got a number of very very important applications.
3
One of the applications is for preprocessing. In case of image preprocessing, preprocessing of the
images, if the image contains noise; then you find or you know that contamination of noise gives
rise to high frequency components in the image.
So, if by using some sort of unitary transformation, we can find out what are the frequency
components in the image, then from these frequency coefficients, if we can suppress the high
frequency components; then after suppressing the high frequency components, the modified
coefficient matrix that you get, if you take the inverse transform of that modified coefficient
matrix, then the original image or the reconstructed image that we get that is a filtered image. So,
filtering is very very important application where these image transformation techniques can be
applied.
The other kind of preprocessing techniques, we will also see later on that is also very very useful
for image enhancement operation. Say for example, if we have an image which is very blurred
that is the contrast of the image is very very poor, then again in the transformation domain or
using the transform coefficients, we can do certain operations by which we can enhance the
contrast of the image. So, that is what is known as enhancement operation.
We will also see that these image transformation operations are very very useful for data
computation. So, if I have to transmit an image or if I have to store the image on a hard disc, then
you can easily think that if I have an image of size say 512 by 512 pixels and if it is a black and
white image, every pixel contains 8 bits, if it is a color image, every pixel contains normally
24bits.
So, storing an image colored image of size 500 and 500 512 by 512 pixel size, takes huge
amount of disc space. So, if by some operation I can compress the space or I can reduce the
space required to store the same image; then obviously on the on a limited disc space, I can store
more number of images.
4
Similar is the case if I go for transmission of the image or transmission of image sequences or
video. In that case the bandwidth of the channel over which this image or the video has to be
transmitted is a bottle neck which forces us that we must imply some data computation
techniques so that the bandwidth requirement for the transmission of the image or the
transmission of the video will be reduced.
And, we will also see later on that this image transformation technique is the first step in most of
the data computation or image or video computation techniques. These transformation
techniques are also very very useful for feature extraction operation. By features I mean that in
the images, if I am interested to find out the edges or I am interested to find out the corners of
certain shapes, then this transformation techniques or if I work in the transformation domain,
then finding out the edges or finding out the corners of certain objects that also becomes very
very convenient.
So, these are some of the applications where these image transformation techniques can be used.
So apparently, we have seen that by image transformation, I just transformed an original image
to another image and by inverse transformation that transformed image can be retransformed to
the original image. So, the application of this image transformation operation can be like this and
here I have sited only few of the applications. We will see later that applications of this image
transformation are much more than what I have listed here.
Now, what is actually done by image transformation? By image transformation, what we do is

we try to represent a given image as a series of summation of a set of unitary matrices. Now,
what is a unit unitary matrix? A matrix A is said to be a unitary matrix if A inverse or inverse of A
is equal to A star transpose where A star is the complex conjugate of A. So, a matrix A will be
called a unitary matrix if the inverse of the matrix is same as, first we take the conjugate of the
matrix A then take its transpose; so A inverse will be equal to A start transpose where A star is the
5
complex conjugate of the matrix A that is complex conjugate of each and every element of
matrix A. And these unitary matrices, we will call as the basis images. So, the purpose of this
image transformation operation is to represent any arbitrary image as a series summation of such
unitary matrices or series summation of such basis images. Now, to start with, I will first try to
explain with the help of 1 dimensional signal.
So, let us take an arbitrary 1 dimensional signal. So, I take a signal say x (t). So, I take an
arbitrary signal x (t) and you see that this is a function of t. So, this x (t), the nature of x (t) can
be anything. Say, let us take that I have a signal like this x (t) which is a function of t. Now, this
arbitrary signal x (t) can be represented as a series summation of a set of orthogonal basis
function.
So, I am just taking this as an example in for 1 dimensional signal and later on we will extend to
2 dimensions that is for the image. So, this arbitrary signal, this 1 dimensional signal x (t), we
can represent by the series summation of a set of orthogonal basis functions. Now, the question is
what is orthogonal? By orthogonal I mean that if I consider a set of real valued continuous
functions, so I consider a set of real valued continuous functions say a n (t) which is equal to set
say a 0 (t), a 1 (t) and so on.
So, this a set of real valued continuous functions and these set of real valued continuous
functions is said to be orthogonal over an interval say t 0 to t 0 plus T. So, I define that this set of
ah t inverse real valued functions will be orthogonal over an interval t 0 to t 0 plus capital T if I
take the integration of function say a m (t) into a n (t) dt and take the integration of this over the
interval capital T. Then, this integral will be equal to some constant k if m is equal to n and this
will be equal to 0 if m is not equal to n.
So, I take 2 functions a m (t) and a n (t), take the product and integrate the product over interval
capital T. So, if this integration is equal to some constant say k, when m is equal to n and this is
6
equal to 0 whenever m is not equal to n. So, if this is true, for this set of real valued continuous
functions, then this set of real valued continuous functions form an orthogonal set of basis
functions. And if the value of this constant k is equal to 1, so if the value of this constant k is
equal to 1; then we say that the set is orthonormal. So, an orthogonal basis function as we have
defined, this non 0 constant k if this is equal to 1; then we say that it is a orthonormal set of basic
functions.
Let us just take an example that we mean by this. Suppose, we take a set like this, say sin omega
t, sin twice omega t and sin 3 omega t. So, this is my set of functions a n (t). Now, if I plot sin
omega t over interval t equal to 0 to capital T; so this will be like this and where omega is equal
to 2 pi by capital T. So, capital T is the period of this sinusoidal waveform. Then, if I plot this sin
omega t, we will find that sin omega t in the period 0 to capital T is something like this. So, this
is t, this is sin omega t and this is the time period capital T.
If I plot twice omega t over this same diagram, sin of twice omega t will be something like this.
So, this is sin of sorry this is sin of twice omega t. Now, if I take the product of sin omega t and
sin twice omega t in the interval 0 to capital T, the product will appear something like this.
So, we find that in this particular region, both sin twice omega t and sin omega t, they are
positive. So, the product will be of this form. In this region, sin omega t is positive but sin twice
omega t is negative. So, the product will be of this form. In this particular region, sin twice
omega t is positive whereas sin omega t is negative. So, the product is going to be like this. This
will be of this form and in this particular region, both sin omega t and sin twice omega t, they are
negative. So, the product is going to be positive. So, it will be of this form.
Now, if I integrate this, so if I integrate sin of omega t into sin of twice omega t dt over the
interval 0 to capital T; this integral is nothing but the area covered by this curve and if you take
7
this area, you will find that the positive half will be cancelled by the negative half and this
product will come out to be 0. This integration will come to be 0.
Similar is the case if I multiply sin omega t with sin thrice omega t and take the integration.
Similar will also be the case if I multiply sin twice omega t with sin 3 omega t and take the
integration. So, this particular set that is sin omega t, sin twice omega t and sin 3 omega t, this
particular set is the set of orthogonal basis functions.
Now suppose, we have an arbitrary real valued function x (t) and this function x (t) is we
consider within the region t 0 less than or equal to t less than or equal to t 0 plus capital T. Now,
this function x (t) can be represented by a series summation. So, we can write x (t) as summation
C n a n (t). So, you remember that a n (t) is the set of orthogonal basis functions.
So, we represent x (t) as a series summation. So, x (t) is equal to sum of C n a n (t) where n varies
from 0 to infinity. Then this term C n is called the n’th coefficient of expansion. This is called n’th
coefficient of expansion. Now, the purpose is, the problem is; how do we find out or how do we
calculate the value of C n ?
To calculate the value of C n , what we can do is we can multiply both the left hand side and the
right hand side by another function from the set of orthogonal basis function. So, we multiply
both the sides by function say a m (t) and take the integration from t equal to 0 to capital T or take
the integration over the interval capital T.
So, what we get is we get an integration of this form x (t) a m (t) dt integral over capital T, this
will be equal to again integral over capital T and this integral of C n a n (t) into a m (t) because we
are multiplying both the left hand side and the right hand side by the function amt dt and you
take the integral over the interval capital T.
8
Now, if I expand this, you find that if I expand this; this will be of the form C 0 integration over
capital T a 0 (t) into a m (t) dt plus C 1 integration over first… interval capital T a 1 (t) into a m (t) dt
plus it will continue like this, will have 1 term say C m integral over T a m (t) into a m (t) dt plus
some more integration terms.
Now, as per the definition of the orthogonality that we have said, that a integral of a n (t) into a m
(t) into dt that will be equal to some constant k if and only if m is equal to n and this integral will
vanish for all the cases wherever m is not equal to n.
So, by using that formula of orthogonality, what we get in this case is we simply get integral x (t)
into a m (t) dt this integral over capital T. This will be simply equal to constant k times C m
because the right hand side of this integration that we have said, this right hand side, all these
terms will be equal to 0 only for this term a m (t) into a m (t) dt the value will be equal to k.
So, what we get here is integration x (t) a m (t) dt is equal to the constant k times C m . So, from
this we can easily calculate that the m’th coefficient C m will be given by 1 upon k integration x
(t) a m (t) into dt where you take the integration over the interval capital T. And obviously, you
can find out that if the set is an orthonormal set, not an orthogonal set; in that case, value of k is
equal to 1. So, we can get the m’th coefficient C m to be x (t) c a m (t) dt integrate this over the
interval t. So, the value to term k will be equal to 1.
So, this is how we can get the m’th coefficient of expansion of any arbitrary function x (t) and
this computation can be done if the set of basis functions that we are taking that is the set a n (t) is
an orthogonal basis function. Now, the set the orthogonal basis the set of orthogonal basis
functions a n (t) is said to be complete.
9
You say that this orthogonal basis function is complete, if this is complete or closed if 1 of the 2
conditions holds. The first condition is there is no signal say x (t) with integral x square t dt over
the interval capital T less than infinity. So, this means the signal with finite energy. So, there does
not exist any signal x (t) with x square t dt less than infinity such that integral x (t) a n (t) dt is
equal to 0.
This integration has to be taken over the interval capital T for n equal to 0, 1 and so on. And the
second condition is that for any piece wise continuous signal x (t); so x (t) is piece wise
continuous and with the same condition of finite energy that is x square t dt integral over capital
T must be less than infinity and if there exist an epsilon greater than 0, however small this
epsilon is, there exists an end and a finite expansion such that x hat t is equal to C n a n (t). Now, n
varies from 0 to capital N minus 1 such that integral x (t) minus x hat t square dt taken over the
same interval capital T must be less than epsilon.
So, this says that for a piece wise continuous function x (t) having finite energy, there must be an
epsilon which is greater than 0 but very small and there must be some constant capital N such
that if we can have an expansion that x hat t is equal to summation of c n a n (t); now this n varies
from 0 to capital N minus 1 for which this term x (t) minus x hat t square dt over integral over
capital T this is less than epsilon.
So, you find that this x (t) is the original signal x (t) and x hat t, earlier case we have seen that if
we go for infinite expansion, then this x (t) can be represented exactly. Now, what we are doing
is we are going for a truncated expansion. We are not going to take all the infinite number terms
but we are going to take only capital N number of terms. So obviously, this x (t), it is not being
represented exactly but what we are going to have is an approximate expansion. And if x (t) is of
finite energy that is integral of x square T dt integration over capital T, S less then infinite; then
we can say that there must be a finite N, capital N, the number of terms for which the error of the
reconstructed signal, so this x (t) mines x hat is square dt this is nothing but the energy of the
10
error signal, of the error that is introduced because of this truncation which must be limited, it
must be less then are equal to epsilon where epsilon is a very very positive small value.
So, we say that the set of orthogonal basis functions a n (t) is complete or closed if one of these
conditions hold, at least one of these conditions hold. That is the first condition or the second
condition. So, this says that when we have a complete orthogonal function, then this complete
orthogonal function expansion enables representation of x (t) by a finite set of coefficients where
the finite set of coefficients are C 0 C 1 like this upto C N mines 1. So, this is the finite set of
coefficients.
So, if we have a complete orthogonal function, set of orthogonal functions; then using this
completes set of orthogonal functions, we can go for a finite expansion of a signal x (t) using the
infinite number of expansion coefficients C 0 C 1 upto C N minus 1 as is shown here. So, I have a
finite set of expansion coefficients.
So, from this discussion what we have seen is that an arbitrary continuous signal x (t) can be
represented by the series summation of a set of orthogonal basis functions and this series
expansion is given as x (t) is equal to C n a n (t) where n varies from 0 to infinity if I go for infinite
expansion or this can also be represented as we have seen by finite expansion, finite series
expansion.
In this case, this will be represented by c n a n (t) where n will now vary from 0 to N capital N
minus 1. So, this is x hat t. So obviously, we are going for an approximate representation of x (t)
not a complete expansion, not the exact representation of x (t). So, this is the case that we have
for continuous signals x (t). But in our case, we are not dealing with the continuous signals but
we are dealing with the discrete signals.
11
So in case of discrete signals, what we have is a set of samples or a series of samples. So, the
series of samples can be represented by say u (n) where 0 less than or equal to n less than or
equal to capital N minus 1.
So, we have a series of discrete samples in n. So, in this case, we have capital N number of
samples. So obviously, you can see that this is a 1 dimensional sequence of samples and because
it is 1 dimensional sequence of samples and the sample size is capital N that is we have capital N
number of samples; so, I can represent this set of samples by a vector say u of dimension capital
N.
So, I am representing this by a vector u of dimension capital N and for transformation, what I do
is I pre multiply this vector u by a unitary matrix A of dimension N by N. So, given this vector u,
if I pre multiply this with a unitary matrix capital A where the dimension of this unitary matrix is
n by n; so you find that this u is a vector of dimension N and I have a matrix a unitary matrix of
dimension N by N. So, this multiplication results in another vector v.
So, this vector v we call as a transformed vector or transformation vector. This is transformed
vector and this unitary matrix A is called the transformation matrix. So, what I have done is I
have taken an N dimensional vector u, pre multiplied by pre multiplied that N dimensional vector
u by a unitary matrix of dimension n by n. So, after multiplication, I got again an N dimensional
vector v.
Now, so by matrix equation, this is v equal to A times u. If I expand this, so now what I do is I
expand this matrix equation.
12
So, if I am expand this matrix equation, this can be represented as a series summation which will
be given by v (k) is equal to a (k, n) into u (n) where n varies from 0 to capital N minus 1 and
this has to be computed for k equal to 0, 1 upto N minus 1. So, I get the all the N elements of the
vector v (k). Now, if A is a unitary matrix, then from vector v, I can also get back our original
vector u. So for doing that, what we will do is we will pre multiply v by A inverse.
So, this should give me the original vector u and this A inverse v because this is an unitary matrix
will be nothing but A conjugate transpose v and if I represent the same equation in the form of a
series summation, this will come out to be u (n) is equal to a star (k, n) times v (k) where k will
now vary from 0 to N minus 1 and this has to be computed for all values of n varying from 0, 1
upto N minus 1.
Now, you find that what is this a star (k, n)? Now, if I represent this a (k, n) or if I expand this
matrix a (k, n), this is of the form a 11 or a 01 a 02 a 03 like this a 0 n a 10 sorry this is a 00 , a 01 , a 02 upto
a 0n , this will be a 10 a 11 . So, it will go like this and finally I will have a k0 a k1 like this, I will have
a (k, n).
Now, find that in this expression, we are multiplying a (k, n) by v (k) or a (k, n) star which is the
conjugate of a (k, n) into v (k). Now, this a (k, n) star is nothing but the column vector of matrix
A star. So, if I have this matrix A, this a (k, n) this a (k, n) star is nothing but a column vector of
matrix A star. So, this column vectors or column vectors of matrix A star transports.
13
So, this column vectors are a (k, n) star a (k, n) star, this column vectors are actually called the
basis vector of the matrix A and you remember this matrix A is an unitary matrix and here what
we have done is the sequence of sample u (n) or vector u (n) has been represented because we
have represented an u (n) as summation of a star (k, n) into v (k) for k equal to 0 to N minus 1.
So, this vector u (n) has been represented has a series summation of of a set of basis vectors.
Now, if this basis vectors has to be orthogonal or orthonormal; then what is the property that it
has to follow? So, if we have a set of basis vectors and in this case, we have said that the
columns of A star transpose, this forms the set of basis vectors. So, if I take any 2 columns and
take the dot product of these 2 those 2 columns, the dot product is going to be non zero the dot
product is going to be 0 and if I take the dot product how the column with itself, this dot product
is going to be non zero.
So, if I take a column say a column i and take the dot product of A i with A i or I take 2 columns
Ai and A j and take the dot product of the these 2 columns; so this dot product will be equal to
some constant k whenever i is equal to j and this will be equal to 0 whenever i is not equal to 0.
So, if this property followed, then the matrix A will be a unitary matrix. So, in this case, we have
represented the vector v or vector u by a series summation of a set of basis vectors. So, this is
what we have got in case of a 1 dimensional signal or a 1 dimensional vector, vector u.
14
Now, we are talking about image transformation. So, in our case, our interest is on image
transformations. Now, the same concept of representing a vector as a series summation of a set of
basis vectors can also be extended in case of an image. So, in case of an image, the vector u that
we have defined in the earlier case, now it will be a 2 dimensional matrix. So, u instead of being
a vector, now it will be a 2 dimensional matrix and we represent this by u (m, n) where m and n
are row and column indices where 0 is less than are equal to m, n and less than or equal to n
minus 1. So, see that we are considering an image of dimension capital N by capital N.
Now, transformation on this image can be represented as v (k, l) will be equal to again we take
the series summation a k l (m, n) into u (m, n) where m and n vary from 0 to capital N minus 1.
So, here you find that a k l is a matrix again of dimension capital M by capital N but in this case,
the matrix itself has an index k, l and this computation v (k, l) has to be done for 0 less than or
equal to k l less than or equal to capital N minus 1.
So, this clearly shows that the matrix that we are taking this is of dimensional capital N by
capital N and not only that we have capital N into capital N that is N square capital N square
number of such matrices or such unitary matrixes. So, this a k l (m, n) because kl k and l, both of
them take the values from 0 to capital N minus 1; so I have capital N square number of unitary
matrices and from this v (k, l) which is in this case the transformation matrix, I can get back this
original matrix u (m, n) by applying the inverse transformation.
So, in this case u (m, n) will be equal to again double summation a star k dash l dash into v sorry
a star kl into let me rewrite this. We will have u (m, n) will be given by double summation a star k
l (m, n) v (k, l) where k, l varies from 0 to N minus 1 and this has to be computed for 0 less than
or equal to m, n less than or equal to capital N minus 1.
15
So, you find that by extending the concept of series expansion of 1 dimensional vector to 2
dimensions, we can represent an image as a series summation of basis unitary matrices. So, in
this case, all of akl or is that all of a k l (m, n) will be the unitary matrices.
Now, what is the orthogonality property? What is meant by orthogonality property in case of the
matrix? The orthogonality property says that for this matrix A it says that a k l (m, n) into a k dash l
dash (m, n), if I take the summation for m, n equal to 0 to capital N minus 1; this will be equal to a
Kronecker delta function of k minus k dash and l minus l dash.
So, it says that this functional value will be equal to 1 whenever k is equal to k dash and l equal
to l dash. In all other cases, this summation will be 0 and the completeness is that if I take the
summation a k l (m, n) into a k l sorry this should be a k dash l dash star. So, a k l star m dash n dash.
Summation is taken over k and l equal to 0 to capital N minus 1. This will be equal to Kronecker
delta function m minus n dash and n minus n dash.
So, it says that this summation will be equal to 1 whenever m is equal to n dash and n is equal to
n dash. So, the matrix by applying this kind of transformation, the matrix v which we get which
is nothing but set of v (k, l), this is what is called the transformed matrix or the transformation
coefficients. So, this is also called the transform coefficients.
So, you find that in this particular case, any arbitrary image is represented by a series summation
of a set of basis images or a set of unitary matrices. Now, if we truncate the summation; so in this
case, what we get is we get the set of coefficients and the coefficient size the same as the original
image size. That is if we have m by n image, our coefficient matrix will also be of m by n.
Now, while doing the inverse transformation, if I do not consider all the coefficient matrices, I
consider as a sub set of it; in that case, what we are going to get is an approximate reconstructed
image and it can be shown that this approximate reconstructed image will have an error, a limited
16
error if the basis matrices that we are considering, the set of basis matrices or set of basis images
that is complete. So, this error will be minimized if the basis image that we considered is
complete basis images.
So in that case, what we will have is the reconstructed image you had will be given by double
summation say v (k, l) into a star k l (m, n). Now suppose, l will vary from 0 to Q minus 1 and say
k will vary from 0 to P minus 1. So, instead of considering both k and l varying from 0 to N
minus 1, I am considering only Q number of coefficients along l and p number of coefficients
along k. So, the number of coefficients that I am considering for reconstructing the image or for
inverse transformation is P into Q instead of n square. So, this P into Q is using this P into Q
number of coefficients, I get the reconstructed image u hat.
So obviously, this u hat is not the exact image. It is an approximate image because I did not
consider all the coefficient values and the sum of squared error in this will be given by epsilon
square equal to u (m, n) that is the original image minus u hat m, n which is the approximate
reconstructed image, square of this and you take the summation over m, n varying from 0 to N
minus 1 and it can be shown that this error will be minimized if our set of basis images that is a k l
(m, n), this is complete.
Now another point that is to be noted here; if you compute the amount of computation that is
involved, you will find that if N square is the image size, the number of computations or amount
of computations that will be needed both for forward transformation and for inverse
transformation will be of order N to the power 4. So, for doing this, we have to have we have to
incurve tremendous amount of computation.
So, one of the problem is how to reduce this computational requirement when we go for inverse
transformation or whenever we go for forward transformation. So, we will continue with the
discussion of this unitary transformation in our next class.
17
Now, today let us consider the solution of the quiz questions that we have given at the end of
lecture 9. So, you remember that at the end of lecture 9 or during lecture 9, we were discussing
about image interpolation and image re-sampling and one of the applications of this we have said
that if I scale an image or scale of an image or if I rotate an image or if I want correct an image;
there will be many cases where the information will not be available at regular grid points.
So, interpolation and resembling aims to fill up those grid points where those where no
information is is available in the transformed image. So, here we have given a problem where
this image will be expanded by 3.
So, for solving this particular problem if I consider this particular part, the same will be
applicable for the entire part. So, if I expand this or if I scale it up by a factor 3, you will find that
I will get a value 3 here. Then for 2 subsequent locations, there will not be any information. I
will also have a 3 here; here again, for 2 subsequent locations, there will be no information. I will
have a 3 here, here again for 2 subsequent locations I will not have any information, here I will
not have any information, here I will not have any information, here I will have the value 5.
Now, the purpose is I have to fill up all these blank spaces. Now, you find that if I go for the
nearest neighbor interpolation, the nearest neighbor interpolation says that you fill up this
location with the value of the pixel which is nearest to it. So, if I go for that nearest neighbor
interpolation; this location will be filled up by 3, this location will be filled up by 3, this location
will be filled up by 3, this location will be filled up by 2, this will be filled up by 3, this will be
filled up by 3, this will be filled up by 3, this will also be filled up by 3, this will be filled up by 3
whereas these 3 locations will be filled up by value 5.
So, this is what we have in case of nearest neighbor interpolation. If I go for bilinear B spline
interpolation, then what i have to do is when I want to fill up these bank locations; so here I had
18
the value of 3, here I had the value 5. So, whenever I want to fill up this particular bank location,
I have to take a linear combination of this pixel and this pixel.
This will get a weightage of the distance of this pixel from this particular point and a weightage
of this pixel, the weight is given by distance of this pixel from this particular point. So, we will
find that this particular point will be filled up by a value which is equal to 0.7 times 3 plus 0.3
times 3 which in this particular case is also equal to 3. But when I come to this particular pixel,
the value that will be assigned to this particular pixel will be equal to 0.3 times 5 plus 0 0.7 times
3 sorry it is the reverse. This is 0.7 times 5 plus 0.3 times 3.
So, if I compute this, I can find out what will be the value at this particular location. So, using
similar procedure, I can find out what will be the values at all the points where i do not have any
information in the transformed image. So, using similar procedure, you can find out the values at
all the positions, all such blank positions where there is no value.
The second one is also of the same type where the image is to be rotated by 45 degree and here
again after rotating the image by 45 degree; we will find that there are some locations where
there is no incensory value. So, for those locations, what you have to do is you have to go for
inverse geometric transformation and then you find that to which particular point in the original
image, this point lies. And you have to interpolate the value at that particular point using either
Bi - linear transformation as is given in this problem following the bilinear interpolation that we
have discussed and the value that we get there; you replace that in the corresponding location in
the transformed image.
So, following this similar procedure, you can also find out what will be the rotated image, rotated
interpolated image. Now, coming to today’s questions.
19
So, there are some quiz questions on today’s lecture. First one is what is meant by a set of
orthogonal functions? What is the difference between orthogonality and orthonormality? The
third problem: determine if the following set of vectors is orthogonal or not. The vectors are (1,
0, 0) (0, 1, 0) and (0, 0, 1).
20
Prof. P. K. Biswas
Department of Electrical & Electronics Communication Engineering
Lecture 12
Image Transformation - II
Hello, welcome to the video lecture series on digital image processing. Last class we started our
discussion on image transformation. Today we are going to continue with the same topic that is
we will continue with the image transformation topic. So, let us see what we have done in our
last lecture.
In our introductory lecture on image transformations, we have said the basics of image
transformation. We have seen what is meant by a unitary transform. We have also seen what is
orthogonal and orthonormal basis vectors. We have seen how an arbitrary 1 dimensional signal
can be represented by series of summation of orthogonal basis vectors and we have also seen
how an arbitrary image can be represented by series of summation of orthonormal basis images.
So, when we talk about the image transformation; basically, the image is represented as a series
summation of orthonormal basis images.
1
After today’s lecture, the students will be able to analyze the computational complexity of image
transform operations. They will be able to explain what is meant by a separable unitary
transformation, they will also know how separable unitary transforms help to implement fast
transformations and of course, they will be able to write algorithms for fast transforms. So, first
let us see that what we have done in the last class.
In the last class, we have taken 1 dimensional sequence of the discrete signal samples is given in
the form u (n) where n varies from 0 to some capital N minus 1. So, we have taken initially a 1
dimensional sequence of discrete samples like this, that is u (n) and we have found out what is
2
meant by unitary transformation of this 1 dimensional discrete sequence. So, by unitary
transformation, by unitary transformation of this 1 dimensional discrete sequence is given by say
v is equal to A times u where A is a unitary matrix and this can be represented expanded in the
form v (k) is equal to we have a (k, n) u (n) where n varies from 0 to capital N minus 1 assuming
that we have capital N number of samples in the input discrete sequence.
Now, we say that this transformation is a unitary transformation if the matrix A is a unitary
matrix. So, what is meant by a unitary matrix? The matrix A will be said to be a unitary matrix if
it obeys the relation that A inverse, inverse of matrix A will be given by A conjugate transpose.
That is if you take the conjugate of every element of matrix A and then the take then take then
take the transpose of those conjugate elements; then that should be equal to the inverse of matrix
A itself.
So, this says that A into A conjugate transpose, that should be same as A conjugate transpose A
which will be same as an identity matrix. So, if this relation is true for the matrix A, then we say
that A is a unitary matrix and the transformation which is given by this unitary matrix is unitary
transformation. So, using this matrix A, we go for unitary matrix, unitary transformation.
Now, once we have this transformation and we get the transformation coefficients v (k) or the
transformed vector, transform sequence v; we should be also able to find out that how from these
transformation coefficients, we get back the original sequence u (n). So, this original sequence is
obtained by a similar such relation which is given by u is equal to A.
Obviously, it should equal to A inverse v and in our case, since A inverse is same as A conjugate
transpose; so, this can be written as A conjugate transpose v and this expression can be expanded
as u (n) is equal to summation v (k) a conjugate (k, n) where k varies from 0 to N minus 1 and
we have to compute this for all values of n varying from 0 to n minus 1, so 0 less than or equal to
n less than or equal to capital N minus 1.
3
So, by using the unitary transformation, we can get the coefficients, the transformation
coefficients and using the inverse transformation, we can obtain the input sequence, input
discrete sequence from the coefficient, from this sequence of coefficients. And, this expression
says that the input sequence u (n) is now represented in the form of series summation of a set of
vectors or orthonormal basis vectors. So, this is what we get in case of 1 dimensional sequence.
Now, let us see what will be the case in case of a 2 dimensional sequence.
So, for a 2 dimensional sequence; see if I go for the case of 2 dimensional signals; then the same
transformation equations will be of the form v (k, l) is equal to we have to have double
summation u (m, n) into a k , l (m, n) where both m and n varies from 0 to capital N minus 1.
So here, u (m, n) is the input image, it is a 2 dimensional image. Again, we are transforming this
using the unitary matrix A and in the expanded form, the expression can be written like this - v
(k, l) is equal to double summation u (m, n) a k , l (m, n) where both m and n varies from 0 to
infinity and this has to be computed for all the values of k and l where k and l varies from 0 to n
minus 1. So, all k and l will be in the range 0 to N minus 1.
In the same manner, we can have the inverse transformation so that we can get the original 2
dimensional matrix from the transformation coefficient matrix and this inverse transformation in
the expanded form can again be written like this. So, from v (k, l) we have to get back u (m, n).
So, we can write it as u (m, n) again is equal to double summation v (k, l) into a star k, l (m, n)
where both k and l will vary in the range 0 to capital N minus 1 and this we have to compute for
all values of m and n in the range 0 to capital N minus 1 where this image transform that is a k , l
(m, n), this is nothing but a set of complete orthonormal discrete basis functions. So, this a k , l (m,
n), this is a set of complete orthonormal basis functions.
4
And, in our last class, we have said what is meant by the complete set of orthonormal basis
functions and in this case, this quantity the v (k, l), what we are getting these are known as
transform coefficients. Now, let us see that what will be the computational complexity of these
expressions.
If you take any of these expressions, say for example the forward transformation where we have
this particular expression v (k, l) is equal to double summation u (m, n) a k , l (m, n) where m and
n vary from 0 to capital N minus 1. That means both m and n; m will vary from 0 to capital N
minus 1, n will also vary from 0 to capital N minus 1.
So, to compute this v (k, l), you find that if I compute this particular expression; for every v (k,
l), the number of complex multiplication and complex addition that has to be performed is of the
order of capital N square and you remember that this has to be computed for every value of k and
l where k and l vary in the range 0 to capital n minus 1. That is k is having capital N number of
values, l will also have capital N number of values.
So, to find out v (k, l), a single coefficient v (k, l), we have to have of the order of capital N
square number of complex multiplications and additions and because this has to computed for
very v (k, l) and we have capital N square number of coefficients because both k and l vary in the
range 0 to capital N minus 1; so there are capital N square number of coefficients and for
computation of each of the coefficient, we need capital N square number of complex addition
and multiplication.
So, the total amount of computation that will be needed in this particular case is of the order of
capital N to the power 4. Obviously, this is quite expensive for any of the practical size images
because in practical cases, we get images of the size of say 256 by 256 pixels or 512 by 512
pixels, even it can go upto say 1k by 1k number of pixels or 2k by 2k number of pixels and so
on.
So, if the computational complexity is of the order of capital N to the power 4 where the image is
of size n by n; you find that what is the tremendous amount of computation that has to be
performed for doing the image transformations using this simple relation? So, what is the way
out? We have to think that how we can reduce the computational complexity?
Obviously, to reduce the computational complexity, we have to use some mathematical tools and
that is where we have the concept of separable unitary transforms.
5
So, we find that we have the transformation matrix which is represented by matrix A or we have
represented this as a k , l (m, n) and we say that this is separable if a k , l (m, n) can be represented
in the form, so if I can represent this in the form a k (m) into say b l (n) or equivalently, I can put it
in the form a (k, m) into b (l, n).
So, if this a k , l (m, n) can be represented as a product of a (k, m) and b (l, n); then this is called
our then this is called separable. So, in this case, both a (k, m) where k varies from 0 to capital N
minus 1 and b (l, n) where l also varies from 0 to capital N minus 1. So, these 2 sets - a (k, m)
and b (l, n), they are nothing but 1 dimensional complete orthogonal sets of basis vectors. So,
both a (k, m) and b (l, n), they are 1 dimensional complete orthonormal basis vectors.
Now, if I represent this set of orthonormal basis vectors, both a (k, m) and b (l, n) in the form of
matrices that is we represent A as a (k, m) as matrix A and similarly b (l, n) the set of these
orthonormal basis vectors if we represent in the form of a matrix, then both and both A and B
themselves should be unitary matrices and we have said that if they are unitary matrixes, then
AA conjugate transpose is equal to A transpose A conjugate which should be equal to identity
matrix.
So, if this holds true; in that case, we say that the transformation that we are going to have is a
separable transformation and we are going to see next that how this separable transformation
helps us to reduce the computational complexity. See, in the original form, we had the
computational complexity of the order capital N to the power 4 and will see that whether this
computational complexity can be reduced from capital from the order capital N to the power 4.
6
Now, in most of the cases, what we do is we assume these 2 matrixes A and B to be same and
that is how these are decided. So, if I take both A and B to be equal to be same, then the
transformation equations can be written in the form v (k, l) will be double summation a (k, m) u
(m, n) a (l, n).
So, compare this with our earlier expressions where in the expression we had a k , l (m, n). So
now, this a k , l (m, n), we are separating into 2 components. One is a (k, m), the other one is a (l,
m) and this is possible because the matrix A that we are considering is a separable matrix. So,
because this is a separable matrix, we can write v (k, l) in the form of a (k, m) u (m, n) into a (l,
n) where again in this case, both m and n will vary from 0 to capital N minus 1 and in matrix
form, this equation can be represented as V equal to AUA transpose where U is the input image
of dimension capital N by capital N and V is the coefficient matrix again of dimension capital N
by capital N and the matrix A is also of dimension capital N by capital N.
In the same manner, the inverse transformation that is what we have got is the coefficient matrix
and by inverse transformation, we want to have the original image matrix from the coefficient
matrix. So, in the same manner, the inverse transformation can now be written as u (m, n) equal
to again we have to have this double summation a star (k, m) v (k, l) a star (l, n) where both k
and l will vary from 0 to capital N minus 1.
So, this is the expression for the inverse transformation and again as before, this inverse
transformation can be represented in the form of a matrix equation where the matrix equation
will look like this - U equal to A conjugate transpose V into A conjugate and these are called 2
dimensional separable transformations. So, you find that from our original expressions, we have
now brought it to an expression in the form of separable transformations.
7
So, you find that this particular expression that is V, when we have written this V equal to sorry
so here we have written v equal to, so if you go back to our previous slide, you will find that V
equal to AU UA transpose. So, if I just write in the form AUA transpose, so I get the coefficient
matrix V from our original image matrix U by using this separable transformations. The same
equation, we can also represent in the form of V transpose equal to A [AU] transpose.
Now, what does this equation mean? You will find that here what it says that if I compute A, the
matrix multiplication of A and U take the transpose of this. Then re multiply that result with the
matrix A itself. Then what we are going to get is the transpose of the coefficient matrix V. See, if
I analyze this equation, it simply indicates that these 2 dimensional transformations can be
performed by first transforming each column U with matrix A and then transforming each row of
the result to obtain the rows of the coefficient matrix V. So, that is what is meant by this
particular expression.
So, A into U, what it does is it transforms each column of the matrix A with of the input image A
with the input image U with the matrix A and this intermediate result you get, you transform
each row of this again with matrix A and that gives you the rows of the transformation matrix or
the rows of the coefficient matrix V. And, so if I take the transpose of this final result, what we
are going to get is the set of coefficient matrix that we wanted to have. Now, if I analyze this
particular expression, you will find that A is a matrix of dimension capital N by capital N, U is
also a matrix of the same dimension capital N by capital N.
And then, from matrix algebra, we know that if I wanted to multiply 2 matrices of dimension
capital N by capital N; then the complexity or the number of additions and multiplications that
we have to do is of order capital N cube. So here, to perform this first multiplication, we have to
have of order N cube number of multiplications additions. The resultant matrix is also of
dimension capital N by capital N and the second matrix multiplication that we want to perform
8
that is A with AU transpose, this will also need of order N cube number of multiplications
additions.
So, the total number of addition and multiplication that we have to perform when I implement
this as a separable transformation is nothing but of order 2N cube and you compare this with our
original configuration when we had seen that the number of addition and multiplication that has
to be done is of order N to the power 4. So, what we have obtained in this particular case is the
reduction of computational complexity by a factor of capital N.
So, this simply indicates that if the transformation is done in the form of a separable
transformation, then it is possible and as we have seen that we can reduce the computational
complexity of implementation of the transformation operation. Obviously, the final result that
you get that is the coefficient matrix is same as the coefficient matrix that you get when you
implement this as a non separable transformation.
So, advantage is that you get by implementing this as a separable transformation is reduction in
computational complexity. Now, let us see that what is meant by the basis images.
So, what is meant by basis image? Now here, we assume that suppose a k star, this denote the
k’th column of the matrix A conjugate transpose. So, a k star, we represent this, the k’th column
of A conjugate transpose where A is the transformation matrix and now if I define the matrices
A k , l star as a k star into a l star transpose; so you will find that a k star is the k’th column of the
matrix A star transpose, a l star is also the l’th column of the matrix A conjugate transpose.
So, if I take the product of a k star and a l star transpose, then I get the matrix, a matrix A k , l star
and let us also define the inner product of say 2 N by N matrices. So, I define inner product of 2
N by N matrices, say F and G. So, the inner product of these 2 matrices F and G are defined as f
(m, n) g star (m, n) where both m and n vary from 0 to capital N minus 1.
9
So, define the inner product of 2 matrices F and G in the form of f (m, n) g star (m, n) where both
m and n vary from 0 to capital N minus 1.
So now, by using these 2 definitions, now if I rewrite all transformation equations; so now we
can write the transformation equations as v (k, l) is equal to, you will find that the old expression
that we have written u (m, n) a k , l (m, n) where both m and n vary from 0 to capital N minus 1.
So, this is nothing but as per our definition, so if you just look at this definition; this is nothing
but an expression of an inner product. So, this was the expression of the inner product.
So, this transformation equation is nothing but an expression of an inner product and this inner
product is the inner product of the image matrix u with the transformation matrix A star k, l.
Similarly, if I write the inverse transformation u (m, n) which is given as again in the form of
double summation, v (k, l) into a star k, l (m, n) where k, l vary from 0 to capital N minus 1. So
again, you will find that in the matrix form, this will be written as U equal to summation v (k, l)
into A star k, l where both k and l vary from 0 to capital N minus 1.
So, if you look at this particular expression, you will find that our original image matrix now is
represented by a linear combination of N square matrices A star k, l because both k and l vary
from 0 to capital N minus 1. So, I have N square such matrices A k , l and by looking at this
expression, you will find that our original image matrix U is now represented by a linear
combination of N square matrices a star k, l where each of these N square matrices are of
dimension capital N by capital N and these matrices A star k, l are known as the basis images.
So, this particular derivation simply says that the purpose of image transformation is to represent
an input image in the form of linear combination of a set of basis images. Now, to look at how
this basis images look like, to see how this basis images look like; let us see some of the images.
10
So here, we find that we have shown 2 images. We will see later that these are the basis images
of dimension 8 by 8. So here, we have shown basis images of dimension 8 by 8 and there are
total 8 into 8 that is 64 basis images. We will see later that in case of discrete Fourier
transformation, we get 2 components. One is the real component, other one is the imaginary
component. So accordingly, we have to have 2 basis images. One corresponds to the real
component, the other one corresponds to the imaginary component.
Similarly, this is another basis image which corresponds to the discrete cosine transformation. So
again, here I have shown the basis images of size N by 8 by 8. Of course, the inner size image is
11
quite expanded and again we have 8 into 8 that is 64 numbers of images. So here, we find that a
row of this represents the index k and the column indicates the index l. So again, we have 64
images, each of these 64 images is of size 8 by 8 pixels.
Similarly, we have the basis images for other transformations like Walsh transform, Hadamard
transform and so on. So, once we look at the basis images; so the purpose of showing these basis
images is that as we said that the basic purpose of image transformation is to represent an input
image as linear combination of a set of basis images and when we take this linear combination,
each of this basis images will be weighted by the corresponding coefficient in the transformation
coefficient v (k, l) that we compute after the transformation and as we have said that this v (k, l)
is nothing but the inner product of k, l’th basis image.
12
So, when you compute this v (k, l) as we have seen earlier; so, if you just look at this, this v (k, l)
which is represented as inner product of the input image U and the k, l’th basis image is star k, l.
So, each of these coefficients v (k, l) is actually represented as the inner product of the input
image U with the k, l’th basis image A k , l star and because this is the inner product of the input
image U and the k, l’th basis image A k , l star, this is also called the projection of the input image
on the k, l’th basis image.
So, this is also called the projection of the input image U onto the k, l’th basis image A k , l star
and this also shows that any N by N image; any image input image of size, any input image U of
size capital N by capital N can be expanded using a complete set of N square basis images. So,
that is the basic purpose of our input of the image transformation. Now, let us take an example.
13
So, let us consider an example of this transformation. Say, we have been given a transformation
matrix which is given by A equal to 1 upon root 2 (1, 1, minus 1) and we have the input image
matrix U equal to (1, 2, 3, 4) and in this example we will try to see that how this input image U
can be transformed with this transform matrix A and the transformation coefficients that you get,
If I take the inverse transformation of that, we should be able to get back our original input
image U.
So given this, the transformed image; we can compute the transformed image like this, the
transformation matrix V will be given by 1 upon 2 into (1, 1, 1, minus 1) into our input image (1,
2, 3, 4). See, if you just see our expressions, you will find that our expression was something like
this. When we computed V, we had computed V equal to AUA transpose. So, by using that, we
have AU, then A transpose and by nature of this transformation matrix A, you will find that A
transpose is nothing but same as A.
So, you will have (1, 1, 1, minus 1) and if you do this matrix computation, it will simply come
out to be 1 upon 2 into (4, 6, minus 2, minus 2) into (1, 1, 1, minus 1). And on completion of this
matrix multiplication, the final coefficient matrix V will come out to be (5, minus 1, minus 2, 0).
So, I get the coefficient matrix V as (5, minus 1, minus 2, 0).
Now, let us see that what is the, for this particular transformation, what will be the corresponding
basis images?
14
Now, when we defined the basis images, you remember that we have said that we have assumed
a k star to be the k’th column. This was the k’th column of matrix A star transpose. Now, using
the same concept and from this our basis functions was taken as A k , l star which was given by a k
star multiplied with ah sorry A k , l star the k, l’th basis image was computed as a k star multiplied
with a l star transpose. So, this is how we had computed the basis images, we have defined the
basis images.
So, using the same concept, in this particular example where we have all the transformation
matrix A is given as 1 upon root 2 (1, 1, 1, minus 1); I can compute the basis images as A 0 , 0
star. The 0’Th basis image will be simply half into the basis vectors (1, 1) and (1, 1) transpose.
So, this will be nothing but half into (1, 1, 1, 1). Similarly, we can also compute A 0 , 1 that is
01’th basis image will be given as half into (1, 1, minus 1, minus 1) which will be same as A 1 , 0
that is 10’th basis image and similarly we can also compute A 1 , 1 that is 11’th basis image will
be come out to be half into (1, minus 1, minus 1, minus 1).
So, this is simply by the matrix multiplication operations. We can compute these basis images
from the rows of from the columns of A conjugate transpose.
15
Now, to see that what will be the result of inverse transformation, you remember the
transformation coefficient matrix V, we had obtained as (5, minus 1, minus 2 and 0). So, this was
our coefficient matrix. By inverse transformation, what we get is or inverse transformation is A
conjugate transpose VA conjugate which by replacing these values, we will get as half into (1, 1,
1, minus 1) then (5, minus 1, minus 2, 0) and again (1, 1, 1, minus 1) and if you compute this
matrix multiplication, the result will be (1, 2, 3, 4) which is nothing but our original image
matrix U.
So, here again, you will find that by the inverse transformation, we get back our original image U
and we have also found that what are the basis images, the 4 basis images - A star 0, 0 , A star 0, 1 ,
A star 1, 0 and A star 1, 1 for this particular transformation matrix A which has to be operated on
the image matrix U and we have also seen that by the inverse transformation, we can get back
the original image matrix U. Now, let us look further in this separable transformation.
16
So, what we had in our case is we had U as the original image matrix and after transformation,
we get V as the coefficient matrix and you would remember that both these matrices are of
dimension capital N by capital N. Now, what we do is for both these matrices U and V, we
represent them in the form of vectors by row ordering. That is we concatenate one row after
another. So, by this row ordering, what we are doing is we are transforming this matrix of
dimension capital N by capital N to a vector of dimension capital N square and by this row
ordering, the vector that we get let us represent this by the variable say u.
So, by row ordering, the input image matrix is mapped to a vector say u. Similarly, by row
ordering, the matrix coefficient, matrix V is also represented by v. Now, once we do this, then
this transformation equations can also be written as v is equal to A Kronecker product with Au.
So, this Kronecker product of A and A can be represented as this A and it is represented by A
into u.
Similarly, the inverse transformation can also be written as u is equal to A Kronecker product of
A conjugate transpose which is nothing but A sorry A conjugate transpose v where this particular
sign A, A this represents Kronecker product and the matrix A which is equal to the Kronecker
product of the 2 matrices A and A, this is also a unitary matrix.
So, once we do this, then you will find that our 2 dimensional transformation; after doing this
row ordering of the input image U and the coefficient matrix V, once they are represented as 1
dimensional vectors of dimension capital N square, so this 2 dimensional image transformation is
now represented in the form of or in a 1 dimensional transformation form.
17
So, by this what we have is say, any arbitrary 1 dimensional signal say x can now be represented
as, say y can now be transformed as y equal to Ax and we say that this particular transformation
is separable where A is the transformation matrix; we say that this transformation is separable if
this transformation matrix A can be represented by as the Kronecker product of 2 matrices A 1
and A 2 .
So, whenever this transformation matrix A is represented as Kronecker product of 2 matrices, A

1 and A 2 sorry A 2 , then this particular transformation is separable because in this case, this
transformation operation can be represented as y equal to A 1 x into A 2 transpose where this y is
the coefficient matrix and x is the input matrix and we have mapped this y into a vector y by row
ordering and this matrix x is mapped into this vector x again by row ordering.
Now, if we represent this in this form, then it can be shown that if both A 1 and A 2 are of
dimension N by N and then because this A is the Kronecker product of A 1 and A 2 ; this A will
be of dimension N square and by this matrix multiplication, again we can see this will be of
dimension N square by N square.
So, total N to the power of 4 numbers of elements. So, the amount of computation that you have
to do in this particular case will be again of order N to the power 4 and because this
transformation A is separable and this can be represented as Kronecker product of A 1 and A 2
and you will find that this particular operation can now be obtained using N cube number of
operations order N cube number of operations.
So, this again says that if a transformation matrix is represented as Kronecker product of 2
smaller matrices, then we can reduce the amount of computation. So obviously, if both A 1 and
A 2 can be further represented as Kronecker product of other unitary matrices, then it is possible
that we can reduce the computation time further and effectively actually that is what is done in
case of fast transformations.
18
So, today we have discussed about the separable transformation and we have seen that how this
separable transformation can be used to reduce the computational complexity.
Now, the answers to the quiz questions that we given in the last class. Obviously, in this case the
first 2 are quite obvious. The third one which we said that determine if the following set of
vectors is orthogonal. Now, what you have to do is you have to check whether these vectors,
these 3 vectors are pair vise orthogonal or not.
That is if you take the inner product of pair of these vectors, then only if you take the inner
product of the vector with itself, you should get a non 0 value and if you take the inner product
of 2 different vectors, you should get a 0 value and you will find that if you verify on this, you
will get the same result that is A 1, 0, 0 , inner product with 1, 0, 0 that will be 1. But 1, 0, 0, 1 inner
product with 0, 1, 0 or inner product with 0, 0, 1 that will be 0 and you can verify that obviously
this particular set of vectors is orthogonal and it is not only orthogonal, this particular set of
vectors will be orthonormal.
19
The fourth one, you have to find out the coefficient C n . Obviously this comes from the definition
that if you integrate x (t) a n (t) dt over the interval capital T; then that gives you the value of this
coefficient C n . So, this again straight way comes from the lecture material that we have covered
in our previous class that is lecture number 11.
Now, coming to today’s quiz questions; the first question is what is the advantage of separable
transform? Second question: under what condition, a transform is said to be separable? The third
question: here we have given 2 matrices A and B, you have to find out the Kronecker product of
the matrices A and B.
20
The fourth one: here we have given 2 matrices, the first one is the transformation matrix A and
the second one is the input image matrix U. You have to calculate the transform, this image
matrix U when transformed with the transformation matrix A and you also have to find out the
corresponding basis images.
Thank you.
21
Prof. P. K. Biswas
Indian Institute of Technology Kharagpur
Lecture - 13
Hello, welcome to the video lecture series on digital image processing. In the last 2 classes, we
have seen the basic theories of unitary transformations and we have seen we have analyzed the
computational complexity of the unitary transformation operations, particularly with respect to
the image transformations.
We have explained the separable unitary transformation where we have explained how separable
unitary transformation helps to implement the fast transformations and fast transformation
implementation as you have seen it during our last class; it reduces the computational complexity
of the transformation operations.
1
After giving the general unitary introduction to the general unitary transformations; in today’s
lecture we are going to discuss about the Fourier transformation which is a specific case of the
unitary transformation. So, during today’s lecture, we will talk about the Fourier transformation
and we will talk about Fourier transformation both in the continuous domain as well as in
discrete domain.
We will see what are the properties of the Fourier transformation operations and we will also see
that what is meant by fast Fourier transform that is fast implementation of the Fourier
transformation operation.
Now, this Fourier transformation operation, we have discussed in brief when we have discussed
about the sampling theorem. That is given an analog image or continuous image while
discretization, the first step of discretization was sampling the analog image. So, during our
discussion on sampling, we have talked about the Fourier transformation and there we have said
that Fourier transformation gives you the frequency components present in the image and for
sampling, we must meet the condition that your sampling frequency must be greater than twice
the maximum frequency present in the continuous image.
In today’s lecture, we will discuss about the Fourier transformation in greater details. So, first let
us see what is meant by the Fourier transformation.
2
As we have seen earlier that if we assume a Function say f (x), so we will first talk about the
Fourier transformation in the continuous domain and if we assume that f (x) is a continuous
function, so this f (x) is a continuous Function of some variable say x; then the Fourier
transformation of this Function f (x), we normally write it as the Fourier transformation of the
Function f (x).
This is also written as capital F of u, this is given by the expression integral expression f (x) e to
the power minus j 2 pi ux dx but the integration is carried over from minus infinity to infinity.
Now, this variable ux, this is the frequency variable. So, given a Function f (x), a continuous
Function f (x); by using this integration operation, we can find out the Fourier transformation of
the Fourier transform of this continuous Function f (x) and the Fourier transform is given by F
(u).
Now, for doing this continuous Fourier transformation, this Function f (x) has to meet some
requirement. The requirement is the Function f (x) must be continuous, it must be continuous and
it must be integrable. So, if f (x) meets these 2 requirements that is f (x) is continuous and
integrable; then using this integral operation, we can find out the Fourier transformation of this
continuous Function f (x).
Similarly, we can also have the inverse Fourier transformation. That is given the Fourier
transform F (u) of a Function f (x) and if F (u) is integrable, F (u) must be integrable; then we
can find out the inverse Fourier transform of F (u) which is nothing but the continuous Function f
(x) and this is given by a similar integration operation and now it is F (u) integral e to the power
j 2 pi ux dx and the sorry du and this integration again has to be carried out from minus infinity
to infinity.
So, from f (x) using this integral operation, we can get the Fourier transformation which is the F
(u) and if F (u) is integrable, then using the inverse Fourier transformation, we can get back the
3
original continuous Function f (x) and these 2 expressions that is F (u) and f (x); the expressions
for F (u) and expression for f (x), these 2 expressions are known as Fourier transform pairs. So,
these 2 are known as Fourier transform pairs.
Now, from this expression, you find that because for doing the Fourier transformation; what we
are doing is we are taking the Function f (x), multiplying it with an exponential e to the power
minus j 2 pi ux dx and integrating this over the interval minus infinity to infinity. So naturally,
this expression F (u) that you get is in general complex because e to the power minus j 2 pi ux
this quantity is a complex quantity.
So, in general, the Function F (u), it is a complex function. In general, it is a complex Function
and because this F (u) is a complex function; so we can write this F (u), we can break this F (u)
in the real part - so the real part, we write as R (u) and the imaginary part - so it will be I (u). So,
this F (u) which in general is a complex quantity is now broken into the real part and the
imaginary part or the same F (u) can also be written in the form of modulus of F of u into e to
the power j of phi u where this modulus of F of u which gives you the modulus of this complex
quantity F (u) this is nothing but R (u) square plus I (u) square and square root of this and this is
what is known as Fourier spectrum of f (x).
So, this we call as Fourier spectrum of the Function f (x) and this quantity - phi of u which is
given by tan inverse I of u upon R of u, this is what is called the phase angle, this is the phase
angle. So, from this we get what is known as the Fourier spectrum, Fourier spectrum of f (x)
which is nothing but the modulus of the magnitude of the Fourier transformation F (u) and the
tan inverse of the imaginary component I of u by the real component R of u. That is what is the
phase angle for this particular, for a particular value of u.
Now, there is another term which is called the power spectrum. So, power spectrum of the
Function f (x) which is also represented as p of u, this is nothing but F of u magnitude square and
4
if you expand this, this will be simply R square u plus I square u. So, we get the power spectrum,
we get the Fourier spectrum and we also get the phase angle from the Fourier transformation
coefficients and this is what we have in case of 1 dimensional image because we have taken a
Function f (x) which is a Function of a single variable x.
Now, because in our case, we were discussing about the image processing operations and we
have already said that the image is nothing but a 2 dimensional Function which is a Function of 2
variables x and y; so we have to discuss about the Fourier transformation in 2 dimension rather
than in single dimension.
So, when we go for 2 dimensional Fourier transformation, so we will talk about 2D Fourier
transform; the 1 dimensional Fourier transform that we have discussed just before can be easily
extended to 2 dimension in the form that now in this case, a Function is a 2 dimensional Function
f (x, y) which is a Function of 2 variables x and y and the Fourier transform of this f (x, y) is now
given by F (u, v) which is equal to, now we have to have double integral f (x, y) e to the power
minus j 2 pi ux plus vy dxdy and both these integrations have to be taken over the interval minus
So, we find that from a 1 dimensional Fourier transformation, we have easily extended that to 2
dimensional Fourier transformation and now this integration has to be taken over x and y
because our image is a 2 dimensional image which is a Function of 2 variables x and y. So, the
forward transformation is given by this expression F of (u, v) is equal to f (x, y) e to the power
minus j 2 pi (ux plus vy) dxdy and integration has to be taken over from minus infinity to
infinity.
In the same manner, the inverse Fourier transformation; so you can take the inverse Fourier
transformation to get f (x, y) that is the image from its Fourier transform coefficients which are F
5
(u and v) by taking the similar integral operation and in this case, it will be F (u, v) e to the
power j 2 pi (ux plus vy) dudv and the integration has to be taken from minus infinity to infinity.
So, in this 2 dimensional signal, the Fourier spectrum F (u, v) is given by R square (u, v) so as
before this R gives you the real component, plus I square (u, v) where I gives you the imaginary
component and square root of this. So, this is what is the Fourier spectrum of the 2 dimensional
signal f (x, y). We can get the phase angle in the same manner. The phase angle phi (u, y) (u, v)
is given by tan inverse I (u, v) by R (u, v).
And, the power spectrum in the same manner, we get as P (u, v) is equal to F (u, v) square which
is nothing but R square (u, v) plus I square (u, v). So, we find that all these quantities which we
had defined in case of the single dimensional signal is also applicable in case of the 2
dimensional signals that is f (x, y). Now, to illustrate this Fourier transformation let us take an
example.
6
Suppose, we have a continuous Function like this, the Function f (x, y) which is again a Function
of 2 variables x and y and the Function in our case is like this that f (x, y) assumes a value, a
constant value say capital A for all values of x lying between 0 to capital X and all values of y
lying between 0 to capital Y.
So, what we get is rectangular Function like this where all values of x greater than capital X, the
Function value is 0 and all values of y greater than capital Y, the Function values also 0 and
between 0 to capital X and 0 to capital Y, the value of the Function is equal to capital A. Let us
see, how we can find out the Fourier transformation of these particular 2 dimensional signals.
7
So, to compute the Fourier transformation, we follow the same expression. We have said that F
(u, v) is nothing but double integration from minus infinity to infinity f (x, y) e to the power
minus j 2 pi ux plus vy dxdy. Now, in our case, this f (x, y) is equal to constant which is equal to
A as long as x lies between 0 to capital X and y is in between 0 to capital Y and outside this
region, the value of f (x, y) is equal to 0.
So, you can break this particular integral in this form. This will be same as capital A, then take
the integration over x which will be in this particular case e to the power minus j 2 pi ux dx.
Now, this integration over x has to be from 0 to capital X multiplied by e to the power minus j 2
pi y dy where this integration will be in the range 0 to capital Y.
So, if I compute this these 2 integrations, these 2 integrals; you will find that it will take the form
something like this and if you compute these 2 limits, you will find that it will take the value A
capital X into capital Y into sin (pi ux) into e to the power minus j pi ux upon pi ux into sin (pi
vy) into e to the power minus j pi vy upon pi vy.
So, after doing all these integral operations, I get an expression like this. So, from this
expression, if you compute the Fourier spectrum; the Fourier X spectrum will be something like
this.
So, what we are interested in is the Fourier spectrum. So, the Fourier spectrum that is modulus of
F (u, v) will be given by A capital X capital Y into sin (pi ux) upon pi ux into sin (pi vy) upon pi
vy. So, this is what is the Fourier spectrum of the Fourier transformation that we have got. Now,
if we plot to the Fourier spectrum, the plot will be something like this.
8
So, this is what is the plot of this Fourier spectrum. So, the Fourier spectrum plot is this one. So,
you will find that this is again a 2 dimensional Function; of course in this case, the spectrum that
has been shown is shifted so that the spectrum comes within the range for its complete
feasibility.
So, for a rectangular Function, rectangular 2 dimensional Function; you will find that the Fourier
spectrum will be something like this and we can find out that if I say that this is the x axis and
this is the y axis and assuming the centre to be at the origin, you will find that along the x axis at
point 1 upon capital X, similarly 2 upon capital X, the value of this Fourier spectrum will be
equal to 0. Similarly, along the Y axis at values 1 upon capital Y, 2 upon capital Y; the values of
this spectrum will also be equal to 0. So, what we get is the Fourier spectrum and the nature of
the Fourier spectrum of the particular 2 dimensional signal.
Now, so far what we have discussed is the case of the continuous functions or analog functions
but in our case, we have to be interested in the case for discrete images or digital images where
the functions are not continuous but the Functions are discrete.
So, all these integration operations that we are doing in case of the continuous functions; they
will be replaced by the corresponding summation operations.
9
So, when you go for the 2 dimensional signal; so, in case of this discrete signals, the discrete
Fourier transformation will be of this form F (u, v). Now, these integrations will be replaced by
summations. So, this will take the form of 1 upon M into N. Then double summation f (x, y), the
expression remains almost the same minus j 2 pi ux by capital M plus vy by capital N and now
the summation will be for y equal to 0 to N minus 1, capital N minus 1 and x equal to 0 to capital
M minus 1 because our images are of size M by N and the frequency variables u because our
images are discrete, the frequency variables are also going to be discrete.
So, the frequency variables u will vary from 0, 1 upto M minus 1 and the frequency variable v
will similarly vary from 0, 1 upto capital N minus 1. So, this is what is the forward discrete
Fourier transforms, forward 2 dimensional discrete Fourier transformations. In the same manner,
we can also obtain the inverse Fourier transformation for this 2 dimensional signal.
10
So, the inverse Fourier transformation will be given by f (x, y) will be double summation F (u, v)
which is the Fourier transformation of f (x, y) e to the power j 2 pi ux by M plus vy by N and
now the integration will be from v equal to 0 to capital N minus 1 and u equal to 0 capital N
minus 1.
So, the frequency variables v varies from 0 to capital N minus 1 and u varies from 0 to capital M
minus 1 and obviously, this will give you back the digital image f (x, y), the discrete image
where x will now vary from 0 to capital M minus 1 and y will now vary from 0 to capital N
minus 1.
So, we have formulated these equations in a general case where the discrete image is represented
by a 2 dimensional array of size capital M by capital N. Now, as we said that in most of the cases
the image is mostly represented in the form of square array where M is equal to N so if the image
is represented in the form of square array; in that case, this transformation equations will be
represented as
11
F (u, v) will be equal to 1 upon capital N double summation f (x, y) and now because M is equal
to N, so the expression becomes e to the power minus j 2 pi by N ux plus vy where both x and y
will now vary from 0 to capital N minus 1and similarly the inverse Fourier transform f (x, y) will
be given by 1 upon capital N summation double summation F (u, v) e to the power j 2 pi by N ux
plus vy but the variables u and v will now vary from 0 to capital N minus 1.
So, this is the Fourier transformation pair that we get in discrete case for a square image where
the number of rows and the number of columns are same and we have discussed earlier that e to
the power j 2 pi by N ux plus vy, this is what we have called as the basis images. This we have
discussed when we have discussed about the unitary transformation and we have said, we have
shown the time that these basis images will be like this.
12
So, as the Fourier transformation, as we have seen that it is a complex quantity; so, for the
Fourier transformation, we will have 2 basis images. One basis image corresponds to the real
part, the other basis image corresponds to the imaginary part and these are the 2 basis images,
one for the real part and the other one for the imaginary.
Now as we have defined, the Fourier transform, the Fourier spectrum, the phase, the power
spectrum in case of analog image; all these quantities can also be defined or also defined in case
of discrete image in the same manner.
13
So, in case of this discrete image, the Fourier spectrum is given by similar expression. That is F
(u, v) is nothing but R square (u, v) plus I square (u, v) square root of this, phase is given by phi
(u, v) is equal to tan inverse I (u, v) upon R (u, v) and the power spectrum P(u, v) is given by
the similar expression which is nothing but F (u, v) modulus square which is nothing but R
square (u, v) plus I square (u, v) where R is the real part of the Fourier coefficient and I (u, v) is
the imaginary part of the Fourier coefficient.
So, after discussing about this Fourier transformation both in the forward direction and also in
the reverse direction; let us look at how this Fourier transform coefficients look like.
So, here we have the result of one of the images and you will find that this is a popular image
very popular image which is sighted in most of the image processing text books that is the image
of Lena. So, if you take the discrete Fourier transformation of this particular image, the right
hand side, this one shows that DFT which is given in the form of an intensity plot and the bottom
one that is this particular plot is the 3 dimensional plot of the DFT coefficients.
Here again, when these coefficients are plotted, it is shifted so that the origin is shifted at the
centre of the plane so that you can have a better view of all these coefficients here. Here you find
that at the origin, the intensity of the coefficient or the value of the coefficient is quite high
compared to the values of the coefficients as you move away from the origin. So, this indicates
that the Fourier coefficient is maximum at least for this particular image at origin that is when u
equal to 0 or v equal to 0 and later on, we will see that u equal to 0, v equal to 0 gives you what
is the DC component of this particular image.
And in most of the images, the DC component is maximum and as you move towards the higher
frequency components, the energy of the higher frequency signals are less compared to the DC
component. So, after discussing about all these Fourier transformation, the entire inverse Fourier
14
transformation and looking at how the Fourier coefficients look like; let us see some of the
properties of these Fourier transformation operations.
So now, we will see some of the properties, important properties of Fourier transformation. So,
the first property that we will talk about is the separability. Now, if you analyze the expression of
the Fourier transformation where we have said that the Fourier transformation F (u, v) is given
by 1 upon N double summation f (x, y) e to the power minus j 2 pi by N; we are assuming a
square image of size N by N into ux plus vy where both x and y varies from 0 to capital N minus
1.
Now, find that this particular expression, expression of the Fourier transformation, this particular
expression can be rewritten in the form 1 upon N into e to the power minus j 2 pi by capital N ux
where x varies from 0 to capital N minus 1 into N capital N into 1 upon capital N summation y
varying from 0 to capital N minus 1 f (x, y) e to the power minus j 2 pi by capital N into vy.
So, it is the same Fourier transformation expression but now we have separated the variables x
and y into 2 different summation operations. So, the first summation operation, you will find that
it involves the variable x and the second summation operation involves the variable y. Now, if
you look at this Function f (x, y) for which we are trying to find out the Fourier transformation;
now this second summation operation where the summation is taken over y where y varies from
0 to capital N minus 1, you find that in this Function f (x, y) if we keep the value of x to be fixed
that is for a particular value of x, the different values of f (x, y) that represents nothing but a
particular row of the image.
So, in this particular case, for a particular value of x, if I keep x to be fixed; so for a fixed value
of x, this f (x, y) represents a particular row of the image which is nothing but an 1 dimensional
signal. So, by looking at that what we are doing is we are transforming the rows of the image and
different rows of the image for different values of x.
15
So, after expansion or elaboration of this particular expression, the same expression now gets
converted to 1 upon capital N x equal to 0 to capital N minus 1 e to the power minus j 2 pi by
capital N ux into F of (x, v). I represent these as F of (x, v) and of course, there is a
multiplication term which is capital N and this is nothing but 1 upon capital N summation F of
(x, v) e to the power minus j 2 pi by capital N ux.
So, once if you look at these expressions, you will find that the second summation, the second
summation operations gives you the Fourier transformation of the different rows of the image
and that Fourier transformation of the different rows which now we represent by F (x, v), this x
represents the x is an index of a particular row and the second summation, what it does is it takes
this intermediate Fourier coefficients and on this Fourier coefficients, now it performs the
Fourier transformation over the columns to give us the complete Fourier transformation
operation or F of (u, v).
So, the first operation that we are performing is the Fourier transformation over of different rows
of the image multiplying this intermediate result by the factor of capital N and then this
intermediate result or intermediate Fourier transformation matrix that we get, we further take the
Fourier transformation of different columns of this intermediate result to get the final Fourier
transformation.
So graphically, we can represent this inter operation like this that this is our x axis, this is our y
axis, I have an image f of (x, y). So, first of all what we are doing is we are taking the Fourier
transformation along the row. So, we are doing row transformation and after doing row
transformation, we are multiplying all these intermediate values by a factor N. So, you multiply
by the capital by the factor capital N and this gives us the intermediate Fourier transformation
coefficients which now we represent as capital F (x, v).
16
So, you get one of the frequency components which is v and then what we do is we take this
intermediate result and initially we had done row transformation and then now we will do
column transformation and after doing this column transformation, what we get is so here it will
be x and it will be axis v and we get the final result as u, v and our final transformation
coefficients will be capital F (u and v). Of course, this is the origin (0, 0), all these values are N
minus 1, N minus 1. Here also, it is (0, 0), this is N minus 1, N minus 1. Here also it is (0, 0),
here it is capital N minus 1, here it is capital N minus 1.
So, you will find that by using this separability property what we have done is this 2 dimensional
Fourier transformation operation is now converted into 2 1 dimensional Fourier transformation
operations.
So, in the first case what we are doing is we are doing the 1 dimensional Fourier transformation
operation over different rows of the image and the intermediate result that you get, that you
multiply with the dimension of the image which is n and this intermediate result, you take and
now you do again 1 dimensional Fourier transformation across the different columns of this
intermediate result and then you finally get the 2 dimensional Fourier transformation coefficient.
So because of separability, this 2 dimensional Fourier transformation has been converted to 2 1

dimensional Fourier transformation operations and obviously by using this, your operation will
be much more simpler.
So, in the same manner as you have done in case of forward Fourier transformation, we can also
have the inverse Fourier transformation, we can also have the Inverse Fourier transformation. So,
in case of inverse Fourier transformation, our expression was f (x, y) is equal to 1 up on capital N
double summation F (u, v) e to the power j 2 pi upon N ux plus vy where both u and v varies
from 0 to capital N minus 1. So in the same manner, I can also break this expression into 2
summations.
17
So, the first summation will be e to the power j 2 pi by capital N ux. Here, u will vary from 0 to
capital N minus 1 multiplied by N into 1 upon capital N F (u, v) e to the power j 2 pi upon capital
N vy and now v will vary from 0 to capital N minus 1. So, again as before, you will find that this
second operation, this is nothing but inverse discrete Fourier transformation along a row.
So, this second expression, this gives you the inverse Fourier transformation along the row and
when you finally convert this and get the final expression, this will be 1 upon capital N
summation N times F (u, y) into e to the power j 2 pi upon capital N ux and now u varies from 0
to capital N minus 1. This particular expression is inverse dispute Fourier transformation along
columns.
So, as we have done in case of forward Fourier transformation that is for a given image, you first
take the Fourier transformation of the different rows of the image to get the intermediate Fourier
transformation coefficient and then take the Fourier transformation of different columns of that
set of intermediate Fourier coefficients to get the final Fourier transformation.
In the same manner, in the inverse Fourier transformation; we can also take the Fourier
coefficient array, do the inverse Fourier transformation along the rows and all those intermediate
results that you get, for that, second step you do the inverse dispute Fourier transformation along
the columns and these 2 operations completes the inverse Fourier transformation operation of the
2 dimensional array to give you the 2 dimensional signal f (x) of y, x and y.
So, because of this separability property, we have been able to convert the 2 dimensional Fourier
transformation operation into 2 1 dimensional Fourier transformation operations and because
now it has to be implemented as 1 dimensional Fourier transformation operation; so the
operation is much more simple than in case of 2 dimensional Fourier transform transformation
operation. Now, let us look at the second property of this Fourier transformation.
18
The second property that we will talk about is the translation property. Translation property says
that if we have a 2 dimensional signal say f (x, y) and translate this by a vector x 0 , y 0 . So, along
x direction, you translate it by x 0 and along y direction, you translate it by y 0 . So, the Function
that you get is f (x minus x 0 , y minus y 0 ).
So, if I take the Fourier transformation of this translated signal f (x minus x 0 , y minus y 0 ); how
the Fourier transformation will look like? So, you can find out the Fourier transformation of this
translated signal and let us call this Fourier transformation as F t (u, v). So, I represent this as F t
(u, v). So, going by the similar expression, this will be nothing but 1 upon capital N f of x minus
x 0 , y minus y 0 into e to the power minus j 2 pi by capital N into u (x minus x 0 ) plus v (y minus
y 0 ).
So, if I expand this, what I will get is 1 upon capital N into double summation f of (x minus x 0 , y
minus y 0 ) into e to the power minus j 2 pi by N (ux plus vy) into e to the power minus j 2 pi by N
ux 0 plus vy 0 ; by simply expanding this particular expression.
Now here, if you consider the first expression that is f (x minus y 0 y minus y 0 ) e to the power
minus j 2 pi by N ux plus vy summation from x equal to 0 to N minus 1, y equal to 0 to N minus
1; this particular term is nothing but a Fourier transformation f of (u, v). So, by doing this
translation what we get is the final expression F t of (u, v) will come in the form F of (u, v) into e
to the power minus j 2 pi by capital N into ux 0 plus vy 0 . So, this is the final expression of this
translated signal that we get.
So, if I compare, if you compare these 2 expressions F (u, v) and F t (u, v), you will find that the
Fourier spectrum of the signal after translation does not change because the magnitude of this F t
(u, v) and the magnitude of F (u, v) will be the same. So because of this translation, what you get
is only it will produces some additional phase difference.
So, whenever f (x, y) is translated by x 0 y 0 , the additional phase difference which is introduced
by the e to the power minus j 2 pi by capital N ux 0 plus vy 0 but otherwise, the magnitudes of the
Fourier spectrum or the magnitude of the Fourier transformation that is the Fourier spectrum that
remains unaltered.
19
In the same manner, if we talk about the inverse Fourier transformation; the inverse Fourier
transformation F of (u minus u 0 , v minus v 0 ), this will give raise to f of xy e to the power j 2 pi
by capital N (u 0 x plus v 0 y). So, this says that if f (x, y) is multiplied by this exponential term,
then its Fourier transformation is going to be replaced is going to be displaced by the vector u 0
v 0 and this is the property which will we will use later on to find out that how the Fourier
transformation coefficients can be better visualized.
So here, in this case, we get the Fourier transformation, the forward Fourier transformation and
the inverse Fourier transformation with translation and you will find that and we have found that
the shift in f (x, y) by say x 0 y 0 does not change the Fourier spectrum of the signal. What we get
is just an additional phase term gets introduced in the Fourier spectrum.
So, with this let us conclude today’s lecture on the Fourier transformation. We will talk about the
other Fourier transformation, other properties of the Fourier transformation in our subsequent
lectures.
20
Now, let us see some of the answers to some of the questions that we had presented during our
last lecture. So, the first 2 questions that we had that what is the advantage of separable
transform? This we have already discussed during our lecture that if the transformation is
separable, then we can go for the first implementation of the transformation that is computational
complexity of the transformation implementation will be much less if the transformation is
separable.
The second question: under what condition a transform is said to be separable? This also we have
discussed during the our previous discussion that we have said that a transformation is separable
if the transformation matrix can be represented as a product of 2 matrices for the transformation
was unitary which is this transformation unitary matrix is now represented as product of 2
matrices say a 1 and a 2 and if both these matrices a 1 and a 2 are also unitary; in that case, we will
say that the transformation is separable.
We have also said that this can also be discussed; this can also be explained in terms of
Kronecker products. That is if the transformation, original transformation at matrix A can be
represented as Kronecker product of 2 other matrices say A and B both of which are unitary
matrices; then also the matrix is separable and the advantage is obviously, for a separable
transformation, we can go for faster implementation of that transformation.
Now, the third question: find the Kronecker product of A and B.
21
Now, the Kronecker product of 2 matrices A and B in this particular case will be represented as
suppose the matrix A is given by a 11 a 12 a 21 a 22 , so this is on matrix A and the matrix B is
suppose represented as b 11 b 12 b 21 b 22 , this is matrix B. Then the Kronecker product of these 2
matrices A and B will be represented as a 11 into b 11 b 12 b 21 b 22 then a 12 the same b 11 b 12 b 21 b 22
a 21 again the same matrix B and a 22 again the same matrix B. So, this entire matrix is the
Kronecker product of the 2 matrices A and B.
Now, from this definition if I replace the values of a 11 a 12 a 21 a 22 b 11 b 12 b 21 b 22 all these

different values from the given matrices A and B, what I get is the Kronecker product of the 2
matrices A and B. Now here, let me mention that from this definition it is quite obvious that A
Kronecker product with B is not equal to B Kronecker product with A and this is in general true
for matrix multiplication.
In general, A into B is not equal B into A. In the same manner, A Kronecker product with B is in
general not equal to B Kronecker product with A.
22
So, coming to our next question, the next question was for the 2 by 2 transform A and the image
U as given below, calculate the transformed image v and the corresponding basis images. We
had taken in during our lecture an example of exactly similar nature and there also we have said
that if that you have the transformation matrix and the image; then the transformation
coefficients V can be easily obtained as AUA transpose where A is the transformation matrix, e
is the image U is the image.
So, just by replacing the matrices A and U in this expression, we can get what it is the coefficient
matrix V. To get the basis images, what you have to do is we have to take the outer products of
the columns of matrix A conjugate transpose.
23
See, in this particular case, because it is the real matrix, so A conjugate will be same as A. We
can find out take the transpose of it and after taking the transpose, what we will do is we will
take different columns and take the outer products of the columns to find out the basis images.
So in this particular case, the basis image A 0 , 0 conjugate will be nothing but a 0’th column outer
product of this with a 0’th column transpose and following the similar approach, we can find out
all other basis images for this given transformation.
So, we have discussed all the problems that we had given at the end of our last lecture. Now,
coming to the today’s problems, we are giving 2 problems. First one is find out the discrete
Fourier transformation coefficients of a digital image f (x, y) of size capital N by capital N where
f (x, y) equal to 1 for all values of capital X and capital Y.
The second problem is; consider the sample values of one dimensional signal as given below and
find out the DFT coefficients of this sample values.
Thank you.
24
Prof. P.K. Biswas
Indian Institute of Technology Kharagpur
Lecture - 14
Fourier Transformation – II
Hello, welcome to the video lecture series on digital image processing.
In our last lecture, we have started discussion on the Fourier transformation and towards the end
we have seen some of the properties of the Fourier transformation. So, what we have done in the
last class is we have talked about the Fourier transformation both in the continuous and in the
discrete domain and we have talked about some of the properties of the Fourier transformation
like the separability property and the translation property.
Today, we will continue with our lecture on the Fourier transformation and will see the other
properties of the Fourier transformation and we will talk about how to implement Fourier
transformation in a faster way. That is we will talk about the fast Fourier transformation
algorithm.
1
So, in today’s lecture, we will see the properties of the discrete Fourier transformation,
specifically the periodicity and conjugate property of the Fourier transformation. We will talk
about the rotation property of the Fourier transformation, we will see the distributivity and the
scaling property of the Fourier transformation followed by the convolution and correlation
property of the Fourier transformation and then we will talk about an implementation, a fast
implementation of the Fourier transformation which is called fast Fourier transform.
So first, let us see what just try to repeat, what we have done in the last class.
2
So in the last class, we have talked about the separability. We have talked about the separability
of the Fourier transformation and here, we have seen that given a 2 dimensional signal f (x, y) in
the discrete domain that is samples of this 2 dimensional signal f (x, y); we can compute the
Fourier transformation of f (x, y) as F (u, v) which is given by the expression 1 upon capital N
where our original signal f (x, y) is of dimension capital N by capital N.
And, the Fourier transformation expression comes as f (x, y) into e to the power minus j 2 pi by
N into ux plus vy where both x and y vary from 0 to capital N minus 1 and if I rearrange this
particular expression, then this expression can be written in the form 1 upon capital N then
summation e to the power minus j 2 pi by capital N ux and then multiply this quantity by capital
N and then 1 upon N again a summation f (x, y) e to the power minus j 2 pi by capital N vy.
So, in the inner summation, it is taken from y equal to 0 to capital N minus 1 and the outer
summation is taken from x equal to 0 to capital N minus 1 and here we have seen that this inner
summation, this gives the Fourier transformation of different rows of the input image f (x, y) and
the outer summation, this outer summation gives the Fourier transformation of different columns
of the intermediate result that we have obtained.
So, the advantage of this separability property that we have seen in the last class is because of
this separability property; we can do the Fourier transformation, 2 dimensional Fourier
transformations in 2 steps. In the first step, we take the Fourier transformation of every
individual row of the input image array and in the second step we can take the Fourier
transformation of every column of the intermediate result that has been obtained in the first step.
So now, the implementation of the 2 dimensional Fourier transformations becomes very easy.
3
So, the scheme that we have said in the last class is like this. If I have an input array given by f
(x, y) where this is the x dimension, this is the y dimension. So, first what we do is we do row
transformation that is take Fourier transformation of every row of the input image, multiply the
result by capital N. So, what I get is an intermediate result array and this intermediate result array
gives Fourier transformation of different rows of the input image.
So, this is represented as F (x, v) and this is my x dimension and this becomes the v dimension.
And after getting this intermediate result, I take the second step of the Fourier transformation and
now the Fourier transformation is taken for every column. So, I do column transformation and
that gives us the final result of the 2 dimensional Fourier transformations F (u, v). So, this
becomes my u axis, the frequency axis u, this becomes frequency axis v and of course, this is the
origin (0, 0).
So, it shows that because of the separability property, now the implementation of the 2
dimensional Fourier transformation has been simplified because the 2 dimensional Fourier
transformation can now be implemented as 2 step of 1 dimensional Fourier transformation
operations and that is how we get this final Fourier transformation F (u, v) in the form of the
sequence of 1 dimensional Fourier transformations and we have seen in the last class that the
same is also true for inverse Fourier transformation.
Inverse Fourier transformation is also separable. So, given an array F (u, v), we can do first
inverse Fourier transformation of every row followed by inverse Fourier transformation of every
column and that gives us the final output in the form of F (x, y) which is the image array. So, this
is the advantage that we get because of separability property of the Fourier transformation.
4
The second one, the second property that we have discussed in the last class is the translation
property. So, this translation property says that if we have an input image f (x, y), input image
array f (x, y); then translate this input image by (x 0 , y 0 ). So, what we get is a translated image f
(x minus x 0 ) and (y minus y 0 ). So, if we take the Fourier transformation of this, we have found
that the Fourier transformation of this translated image which we had represented as F t (u, v),
this became equal to F (u, v) into e to the power minus j 2 pi by capital N ux 0 plus vy 0 .
So, if you find, in this case, the Fourier transformation of the translated image is F (u, v) that is
the Fourier transformer of the original image f (x, y) which is multiplied by e to the power of
minus j 2 pi by N ux 0 plus vy 0 . So, if we consider the Fourier spectrum of this particular signal,
you will find that the Fourier spectrum that is F transpose F t (u, v) will be same as F (u, v).
Now, this term e to the power of minus j 2 pi by N ux 0 plus vy 0 , this simply introduces an
additional phase shift. But the Fourier spectrum remains unchanged and in the same manner, if
the Fourier spectrum F (u, v) is translated by u 0 v 0 . So, instead of taking F (u, v), we take F(u
minus u 0 ) (v minus v 0 ) which obviously is the translated version of u F (u, v) where F (u, v) has
been translated by vector u 0 v 0 in the frequency domain. And if I take the in inverse Fourier
transform of this, the inverse Fourier transform will be f (x, y) into e to the power j 2 pi by N into
u 0 x plus v 0 y. So, this also can be derived in the same manner in which we have done the
forward Fourier transformation.
So here, you find that if f (x, y) is multiplied by this exponential term e to the power j 2 pi by N
u 0 x plus v 0 y; then the corresponding in the frequency domain, its Fourier transform is simply
translated by the vector (u 0 , v 0 ). So, what we get is F u minus u 0 and v minus v 0 . So, under this
translation property, now the DFT pair becomes if we have F (x, y) e to the power j 2 pi by
capital N u 0 x plus v 0 y. The corresponding Fourier transformation of this is F (u minus u 0 , v
minus v 0 ) and if we have the translated image f (x minus x 0 , y minus y 0 ), the corresponding
Fourier transformation will be F (u, v) e to the power minus j 2 pi by N ux 0 plus vy 0 .
5
So, these are the Fourier transform pairs under translation. So, this f (x, y) and f (x minus x 0 , y
minus y 0 ); so these 2 expressions gives you the Fourier transform pairs, the DFT pairs under
translation. So, these are the 2 properties that we have discussed in the last lecture.
Today, let us talk about some other properties. So, the third property that we will talk about
today is the periodicity and conjugate property. So, the first one that we will discuss is the
periodicity and the conjugate property. The Periodicity property says that both the discrete
Fourier transform and the inverse discrete Fourier transform that is DFT and IDFT are periodic
with a period capital N. So, let us see how this periodicity can be proved.
So, this periodicity property says that F (u, v), this is the Fourier transform of our signal f (x, y).
This is equal to F (u plus N, v) which is same as F (u, v plus N) which is same as F (u plus
capital N, v plus capital N). So, this is what is meant by periodic. So, you will find that the
Fourier transformation F (u, v) is periodic both in x direction and in y direction that gives rise to
F (u, v) is equal to F (u plus N), F (v plus N) which is same as F (u plus N, v) and which is also
same as F (u, v plus N).
Now, let us see how we can derive or we can prove this particular property. So, you have seen
the Fourier transformation expression as we have discussed many times F (u, v) is equal to
double summation f (x, y) e to the power minus j 2 pi by capital N ux plus vy. Of course, we
have to have the scaling factor 1 upon capital N where both x and y vary from 0 to capital N
minus 1.
Now, if we try to compute F (u plus capital N, v plus capital N) then what do we get? Following
the same expression, this will be nothing but 1 upon capital N then double summation f (x, y) e
to the power minus j 2 pi upon capital N and now we will have ux plus vy plus capital Nx
because now u is replaced by u plus capital N, so we will have capital Nx plus capital Ny where
both x and y will vary from 0 to capital N minus 1.
6
Now, this same expression, if we take out this capital Nx and capital Ny in a separate
exponential, then this will take the form 1 upon capital N double summation f (x, y) e to the
power minus j 2 pi upon capital N ux plus vy into e to the power minus j 2 pi into x plus y. Now,
if you look at this second exponential term that is e to the power minus j 2 pi x plus y, you will
find that x and y are the integer values. So, x plus y will always be integer. So, this will be the
exponential e to the power minus j some k times 2 pi and because this is an exponentiation of
some integer multiple of 2 pi; so the value of this second exponential will always be equal to 1.
So finally, what we get is 1 upon capital N double summation f (x, y) into e to the power minus j
2 pi upon capital N into ux plus vy and you will find that this is exactly the expression of F (u
and v). So, as we said that the discrete Fourier transformation is periodic with period N, capital N
both in the u direction as well as in the v direction and that is that can very easily we proved like
this by this mathematical derivation; we have found that F (u plus capital N, v plus capital N) is
same as F (u, v) and the same is true in case of inverse Fourier transformation.
So, if we derive the inverse Fourier transformation, then we will get the similar result showing
that the inverse Fourier transformation is also periodic with period capital N. Now, the other
property that we said is the conjugate property.
The conjugate property says that if f (x, y), this function, if this is a real value function; f (x, y) if
it is a real value function, in that case the Fourier transformation F (u, v) will be F star (minus u,
minus v) where this F star indicates that it is complex conjugate and obviously because of this, if
I take the Fourier spectrum, F (u, v) will be same as F of (minus u, minus v). So, this is what is
known as the conjugate property of the discrete Fourier transformation.
Now, find that using the periodicity property helps to visualize the Fourier spectrum of a given
signal. So, let us see how this periodicity property helps us to properly visualize the Fourier
7
spectrum. So for this, we will consider a 1 dimensional signal. Obviously, this can very easily be
extended to a 2 dimensional signal.
So, by this, what we mean is if we have a 1 dimensional signal say f (x) whose Fourier transform
is given by capital F (u); then as we said, that the periodicity property says that F (u) is equal to F
of u plus capital N and also the Fourier spectrum F of u is same as F of minus u. So, this says
that F (u) has a period of length capital N and because the spectrum F (u) is same as F of minus
u, so the magnitude of the Fourier spectrum of the Fourier transform is centered at the original.
So by this, what we mean is, let us consider a figure like this.
8
You will find that this is the typical this is a typical Fourier transform of a particular signal and
here you find that this Fourier spectrum, the Fourier’s transform is centered at the origin and if
you look at the frequency access; so this is the u access, if you look at this frequency access, you
will find that F of minus u, the magnitude of F of minus u is same as the magnitude of the F of
plus u.
So, this figure shows that the transform values, if we look at the transform values from N by 2
plus 1; so that is somewhere here, this is N by 2 plus 1 to N minus 1, so that is somewhere here.
So, find that the transform values in the range N by 2 plus 1 to N minus 1, this is nothing but the
transform values in the left, transform values of the half period in the left half in the left of the
origin.
So, just by looking at this, the transform values from N plus 1 N by 2 plus 1 to N minus 1, you
will find that these values are nothing but the reflections of the half period to the left of the origin
0. But what we have done is we have computed the Fourier transformation in the range 0 to N
minus 1. So, you will get all the Fourier coefficients in the range 0 to N minus 1. So, the Fourier
coefficients ranging the values of u from 0 to N minus 1 and because of this conjugate property,
you will find we find that in this range 0 to capital N minus 1, what we get is 2 back to back half
periods of this interval. So, this is nothing but 2 back to back half periods. So, this is 1 half
period, this is 1 half period and they are placed back to back.
So, to display these Fourier transformation coefficients in the proper manner, what we have to do
is we have to displace the origin by a value capital N by 2.
So, by displacement what we get is this. So, here you will find that in this particular case, the
origin has been shifted to capital N by 2. So now, what we are doing is instead of considering the
Fourier transformation F (u), we are considering the Fourier transformation F (u minus capital N
9
by 2) and for this displacement, what we have to do is we have to multiply f (x) by minus 1 to
the power x.
So, every f (x) has to be multiplied by minus 1 to the power x and this result, if you take the DFT
of this; then what you get is the Fourier transformation coefficients in this particular form and
this comes from the shifting property of the inverse Fourier transformation. So, this operation we
have to do if we want to go for the proper display of the Fourier transformation coefficients.
The next property that we will talk about is the rotation property, rotation property of the discrete
Fourier transformation. So, to explain this rotation property, we will introduce the polar
coordinated coordinate system that is we will now replace x by r cosine theta, y will be replaced
by r sin theta, u will be replaced by omega cosine phi and v will be replaced by omega sin phi.
So by this, now our original 2 dimensional signal, 2 dimensional array in the plane f (x, y) gets
transformed into f (r, theta) and the Fourier transformation F (u, v), the Fourier transform
coefficients F (u, v) now gets transformed into F of omega phi. Now, using these polar
coordinates if we find out, compute the Fourier transformation; then it will be found that f of r
theta plus theta 0 , the corresponding Fourier transformation will be given by capital F omega phi
plus theta 0 .
So, this will be the Fourier transformation pair in the polar coordinate system. So, this indicates
our original signal was f (r, theta). If I rotate this f (r, theta) by an angle theta 0, then the rotated
image becomes f (r, theta plus theta 0 ) and if I take the Fourier transform of f (r, theta plus theta 0 )
that is the rotated image which is now rotated by an angle theta 0 , then the Fourier transform
becomes F omega phi plus theta 0 where F omega phi was the Fourier transform of the original
image f (r, theta).
10
So, this simply says that if I rotate image f (x, y) by an angle say theta 0 , its Fourier
transformation will also be rotated by the same angle theta 0 and that is what is obvious from this
particular expression because f of r theta plus theta 0 gives rise to the Fourier transformation f of
capital F omega plus omega phi theta phi plus theta 0 where f omega phi was the Fourier
transformation of f (r, theta). So, by rotating an input image by an angle theta 0 , the
corresponding Fourier transform is also rotated by the same angle theta 0 .
So, to illustrate this, let us come to this particular figure.
So here, you find that we had a rectangle, an image where we have all values we have pixel
values equal to 1 within a rectangle and outside this, the pixel values are equal to 0 and the
corresponding Fourier transformation is this. So here, the Fourier transformation coefficients or
the Fourier spectrum is represented in the form of intensity values in an image. The second pair
shows that the same rectangle is now rotated by an angle 45 degree.
So here, we have rotated this rectangle by angle 45 degree and here you find that if you compare
the Fourier transformation of the original rectangle and the Fourier transformation of this rotated
rectangle; here also you will find that the Fourier transform coefficients, they are also rotated by
the same angle of 45 degree. So, this illustrates the rotation property of the discrete Fourier
transformation.
11
The next property that we will talk about is what is called distributivity and scaling property. The
distributive property says that if I take 2 signals, 2 arrays f 1 (x, y) and f 2 (x, y); so these are 2
arrays, take the summation of these 2 arrays f 1 (x, y) and f 2 (x, y) and then you find out the
Fourier transformation of this particular result. That is f 1 (x, y) plus f 2 (x, y) and take the Fourier
transform of this.
Now, this Fourier transformation will be same as the Fourier transformation of f 1 (x, y) plus
Fourier transformation of f 2 (x, y). So, this is true under addition. That is for these 2 signals f 1 (x,
y) and f 2 (x, y) if I take the addition, if I take the summation and then take the Fourier
transformation; the Fourier transformation of this will be the summation of the Fourier
transformation of individual signals f 1 (x, y) and f 2 (x, y).
But if I take the multiplication that is if I take f 1 (x, y) into f 2 (x, y) and take the Fourier
transformation of this product; this in general is not equal to the Fourier transform of f 1 (x, y)
into the Fourier transform of f 2 (x, y). So, this shows that the discrete Fourier transformation and
same is true for the inverse Fourier transformation.
So, this shows that the discrete Fourier transformation and its inverse is distributive over addition
but the discrete Fourier transformation and its inverse is in general not distributive over
multiplication. So, the distributivity property is valid for addition of the 2 signals but it is not in
general valid for multiplication of 2 signals.
12
So, the next property of the same discrete Fourier transform that we will talk about is the scaling
property. The Scaling property says that if we have 2 scalar quantities a and b; now given a
signal f (x, y), multiply this by the scalar quantity a, it’s corresponding Fourier transformation
will be F (u, v) multiplied by the same scalar quantity and the inverse is also true.
So, if I multiply a signal by a scalar quantity a and take its Fourier transformation; then we will
find the Fourier transformation of this multiplied signal is nothing but the Fourier transformation
of the original signal multiplied by the same scalar quantity and the true but the same is true for
the reverse that is also for inverse Fourier transformation.
And, the second one is if I take f of ax, by that is now you scale the individual dimensions x is
scaled by the scalar quantity a, the dimension y is scaled by the scalar quantity b; the
corresponding Fourier transformation will be 1 upon a into b, then Fourier transformation u by a
and v by b and this is the reverse. So, these are the scaling properties of the discrete Fourier
transformation.
Now, we can also compute the average value of the signal f (x, y). Now, the average value for f
(x, y) is given by if I represent it like this, this is nothing but 1 upon capital N square into
summation of f (x, y) where the summation has to be taken for x and y varying from 0 to capital
N minus 1. So, this is what is the average value of the signal f (x, y).
Now, you find that for the Fourier coefficient, the transform coefficient say f (0, 0); what is this
coefficient? This is nothing but 1 upon capital N then double summation f (x, y) because all the
exponential terms will lead to a value 1 and this summation has to be taken for x and y varying
from 0 to capital N minus 1.
So, you will find that there is direct relation between the average of the 2 dimensional signal f (x,
y) and its 0’th Fourier coefficient, DFT coefficient. So, this clearly shows that the average value
13
f (x, y), the average value is nothing but 1 upon capital N into the 0’th coefficient 0’th discrete
Fourier transformation coefficient and this is nothing but because here the frequency u equal to
0, frequency v equal to 0, so this is nothing but the DC component of the signal. So, the DC
component divided by N gives you the average value of the particular signal.
The next property, this we have already discussed in one of our earlier lectures when we have
discussed about the sampling and quantization. That is the convolution property. In case of
convolution property, we have said that if we have say 2 signals f (x), multiply this with the
signal g (x); then the Fourier transform in the frequency domain, this is equivalent to F of u
convolution with G of u.
Similarly, if I take the convolution of 2 signals f (x) and g (x); the corresponding Fourier
transformation in the Fourier domain, it will be the multiplication of F (u) and G (u). So, the
convolution of 2 signals in the special domain is equivalent to multiplication of the Fourier
transformations of the same signals in the frequency domain. On the other hand, multiplication
of 2 signals in the special domain is equivalent to convolution of the Fourier transforms of the
same signals in the frequency domain. So, this is what is known as the convolution property.
The other one is called the correlation property. The correlation property says that if we have 2
signals say f (x, y) and g (x, y), so now we are taking 2 dimensional signals and if I take the
correlation of these 2 signals say f (x, y) and g (x, y); in the frequency domain, this will be
equivalent to the multiplication F star (u, v) where this star indicates the complex conjugate into
G (u, v).
And similarly, if I take the multiplication in the special domain that is f star xy into g (x, y); in
the frequency domain, this will be equivalent to F (u, v) correlation with G (u, v). So, these are
the 2 properties which are known as the convolution property and the correlation property of the
14
Fourier transformations. So with this, we have discussed the various properties of the discrete
Fourier transformation.
Now, let us see an implementation of the Fourier transformation because if you look at the
expression of Fourier transformation, the expression we have told many times; this is F (u, v)
which is same as f (x, y) e to the power minus j 2 pi by capital N ux plus vy where both x and y
vary from 0 to capital N minus 1 and this divided by 1 upon N. So if I compute, if I analyze this
particular expression which we have done earlier also in relation with unitary transformation,
you will find that this text N to the power 4 number of computations.
In case of 1 dimensional signal, F (u) will be given by f (x) e to the power minus j 2 pi by capital
N ux summation of this over x equal to 0 to capital N minus 1 and you have to scale it by 1 upon
N. This particular expression takes N square number of computations. So obviously, the number
of computations and each of these computations are complex addition and multiplication
operations. So, you find that a computational complexity of N square for a data set of size capital
N is quite high. So, for implementation, we have discussed earlier that if our transformations are
separable; in that case, we can go for fast implementation of the transformations.
Let us see how that fast implementation can be done in case of this discrete Fourier
transformation. So, because of this separability property, we can implement this discrete Fourier
transformation in a faster way. So, for that what I do is let us represent this particular expression
F (u) is equal to 1 upon capital N f (x) e to the power minus j 2 pi by N ux, take the summation
from x equal to 0 to capital N minus 1; we represent this expression in the form 1 upon capital N
F (x).
Now, I introduce a term W N to the power ux where x varies from 0 to capital N minus 1. Now
here, this W N is nothing but e to the power minus j 2 pi by capital N. So, we have simply
introduced this term for simplification of our expressions. Now, if I assume which generally is
15
the case that the number of samples N is of the form say 2 to the power N; so if I assume that
number of samples is of this form, then this capital N can be represented as 2 into capital M and
let us see that how this particular assumption helps us.
And with this assumption, now we can represent, rewrite F (u) as 1 upon 2 M because N is equal
to 2 M. So now, I can write 1 upon 2 M, then take the summation f (x) into W 2 M to the power
ux where x now varies from 0 to 2 m minus 1. The same expression I can rewrite as half 1 upon
capital M summation F (2x) W 2 M to the power u into 2x plus 1 upon capital M summation F (2x
plus 1) W 2 M to the power u into 2 x plus 1 where x varies from 0 to capital M minus 1, here also
x varies from 0 to capital M minus 1.
Now by this, you see that what we have done. F (2x), as x varies from 0 to capital M minus 1,
this gives us only the even samples of our input sequence. Similarly f (2x plus 1), as x varies
from 0 to capital M minus 1, this gives us only the odd samples of the input sequence. So, we
have simply separated out the even samples from the odd samples and if I further simplify this
particular expression, this expression can now be written in the form half into 1 upon capital M
summation f (2x) into W capital M to the power ux where x varies from 0 to capital M minus 1
plus 1 upon capital M summation f (2x) plus 1 W M to the power ux into W 2 M to the power u.
So after simplification, after some simplification, the same expression can be written in this
particular form. Now, if you analyze this particular expression, you will find that the first
summation, this one gives you the Fourier transform of all the even samples. So, this gives you F
even u and this quantity in the second summation, this gives you the Fourier transformation of all
the odd samples. So, I read it will write it as odd u and in this particular case, u varies from 0 to
capital M minus 1.
So, by separating the even samples and odd samples, I can compute the Fourier transformation of
the even samples to give me F even u; I can compute the Fourier transformation of the odd
16
samples to give me F odd u and then I can combine these 2 to give me the Fourier DFT
coefficients of values from 0 to capital M minus 1.
Now, following some more property, so effectively what we have got is F (u) is equal to half F
even u plus F odd u into W 2 M to the power u. Now, we can also show that W M to the power u plus
M is same as W M to the power u. This can be derived from the definition of W M and also we can
find out that W 2 M u plus M is same as minus W 2 m to the power u. So, this tells us that capital F
u plus capital M is nothing but half of F even u minus F odd u into W 2 M to the power u.
So here again, u varies from 0 to m minus 1. That means this gives us the coefficients from M to
2 M minus 1. So, I get back all the coefficients. The first part, this part gives us the coefficient
from 0 to m minus 1 and this half gives us the coefficients from capital M to 2 M minus 1. Now,
what is the advantage that we have got? In our original formulation, we have seen that the
number of complex multiplications and additions were of the order of N square.
Now, we have divided the N number of samples into 2 half’s. For each of them for each of the
half’s, when I compute the discrete Fourier transformation, the amount of computation will be N
square by 4 for each of the half’s and the total amount of computation will be of order N square
by 2 taking 2 half’s, considering 2 half’s separately.
So, straight way we have got a reduction in the computation by a factor of 2. So, it is further
possible that this odd half of the samples and the even half of the samples that we have got, we
can further sub divide it. So, from N by 2, we can go to N by 4; from N by 4, we can go to N by
8 number of samples; from N by 8, we can go to N by 16 number of samples and so on until we
are left with only 2 samples.
So, if I go further breaking this sequence of samples into smaller sizes, compute the DFT’s of
each of those smaller size samples and then combine them together, then you will find that we
17
can gain enormously in terms of amount of computation and it can be shown that for this first
Fourier transform implementation, the total number of computation is given by N log N and log
is taken with this 2.
So, this gives enormous amount of computation, as enormous gain in computation as against N
square number of computations that is needed for direct implementation of discrete Fourier
transformation. So with this, we have come to the end of our discussion on Fourier
transformation.
Now, let us discuss about these questions that we have given in our last class. The first question
we said that find out DFT coefficients of a digital image f (x, y) of size capital N by capital N
where f (x, y) equal to 1 for all values of x and y. Now, this computation is very simple.
18
Here, you find that F (u, v) will be simply summation 1 upon N f (x, y) e to the power minus j 2
pi by capital N ux plus vy. Now, for simplicity, let me break it into 2 summations. So, I will
write it as 1 upon capital N into e to the power minus j 2 pi by capital N summation f (x, y); no, f
(x, y) is equal to 1, so I can simply forget this term f (x, y). So, this will be simply e to the power
minus j sorry this is ux e from minus j 2 pi by N vy.
Now, let us take 1 of these terms. If I expand this, so summation e to the power minus j 2 pi by
capital N vy, this will be simply 1 plus e to the power minus j 2 pi by capital N plus e to the
power minus j 2 pi by capital N into 2 v, this is into v plus it continues like this and there will be
total capital N number of terms and if you look at this particular series, it is nothing but a GP
series having capital N number of terms.
So, this summation will simply be 1 minus e to the power minus j 2 pi into v divided by 1 minus
e to the power minus j 2 pi by capital N into v and this particular term will be equal to 1 only
when v equal to 0 and it will be equal to 0 when v is non 0. So, by substituting this and same is
the case for the other summation. So, by substituting this result in this expression what we get is
F (u, v) is equal to 1 in this particular expression when u and v is equal to 0 and this is equal to 0
when u and v are non 0. So, this is the final result that we will get for the first problem.
Now coming to the second problem, consider the sample values of a 1 dimensional signal as
given below; find out the DFT coefficients and also show that the inverse DFT produces the
original sample values. This is very simple. You simply replace these values in our DFT
expressions. So, you get the DFT coefficients f 0 , f 1 , f 2 and f 3 and whatever value you get as the
coefficients, you replace those values in our inverse DFT expression and you will see that you
can get back the same sample values.
19
Now, coming to the today’s questions, the today’s questions are: what is the time complexity of
fast Fourier transformation, the second question is show that the discrete Fourier transformation
and its inverse are periodic functions, third question is find out the Fourier coefficients for the
following set of 1 dimensional signal using the fast Fourier transformation technique and verify
that the result obtained using the fast Fourier transformation technique is same as that using
direct implementation of discrete Fourier transformation technique.
Thank you.
20
Digital image processing
Prof. P. K. Biswas
Lecture - 15
Discrete Cosine Transform
Walsh Transform
Hadamard Transform
Welcome to this lecture on digital image processing. In our last class, we have discussed about
the discrete Fourier transform.
We have seen both the continuous domain as well as the discrete domain of the Fourier
transformation. We have seen the properties of the Fourier transform; the properties, specifically
the seperability, translation property, periodicity and conjugate property, the rotation property
distributivity and scaling and convulation and the correlation property and then finally, we have
seen a fast implementation of the discreet Fourier transform which we have said as the fast
Fourier transform or FFT operation
1
In today’s lecture, we will talk about some other transformations in the digital domain. We will
talk about the discrete cosine transform, we will talk about the discrete Walsh transform, we will
talk about discrete Hadamard transform and we will also see some properties of these different
transformation techniques.
Now, for during the last 2 classes, when we have talked about the discrete Fourier
transformation, you might have noticed one thing that this discrete Fourier transformation is
nothing but a special case of a class of transformations or a class of separable transformations.
Some of these discussions, we have done while we have talked about the unitary transformation.
Now, before we start our discussion on the discrete cosine transformation or Walsh
transformation or Hadamard transform, let us have some more insight on this class of
transformations. Now, as we said that discrete Fourier transformation is actually a special case of
a class of transformations.
2
Let us see what is that class of transformation. You will find that if we define a transformation of
this form say T (u, v) is equal to double summation f (x, y) where f (x, y) is the 2 dimensional
signal into g (x, y, u, v) where both x and y vary from 0 to capital N minus 1.
So, we are assuming that our 2 dimensional signal f (x, y) is an N by N array, capital N by capital
N array and the corresponding inverse transformation is given by f (x, y) is equal to double
summation again, we have this transformation matrix transform coefficients T (u, v) into h (x, y,
u, v) where this g (x, y, u, v) g (x, y, u, v) this is called the forward transformation kernel and h
(x, y, u, v) is called the inverse transformation kernel or the basis functions.
Now, these transformations, this class of transformation will be separable if we can write g (x, y,
u, v) in the form g 1 (x, u) into g 2 (y, v). So, if g (x, y, u, v) can be written in the form g 1 (x, u)
into g 2 (y, v), then this transformation will be a separable transformation. Moreover, if g 1 and g 2
these are functionally same that means if I can write this as g 1 (x, u) into g 1 (y, v) that is I am
assuming g 1 and g 2 to be functionally same.
So, in that case, these class of transformations will be separable obviously because g (x, y, u, v)
we have written as product of 2 functions - g 1 (x, u) into g 2 (y, v). And since g 1 (x, u) and g 2 (y,
v); so this function g 1 and g 2 , they are functionally same, so this I can write as g 1 (x, u) into g 1
(y, v) and in this case, the function will be called as symmetric.
So here, what we have is this particular transformation or class of transformations is called

separable as well as symmetric and the same is also true for the inverse transformation kernel
that is h (x, y, u, v).
3
Now, find that for a 2 dimensional discrete Fourier transformation, we had g (x, y, u, v) which
was of this form e to the power minus j 2 pi by capital N into ux plus vy and of course, we had
this multiplicity term 1 upon capital N. So, this was in the forward transformation kernel in case
of 2 dimensional discrete Fourier transform or 2D DFT.
Obviously, this transformation is separable as well as symmetric because I can now write this g
(x, y, u, v) as g 1 (x, u) multiplied by g 1 (y, v) which is nothing but 1 over square root of capital N
e to the power minus j 2 pi by capital N ux into 1 over square root of N e to the power minus j 2
pi by capital N vy.
So, you find that the first product g 1 (x, u) and the second term that is g 1 (y, v), they are
functionally same but only the arguments; x in 1 case, it is ux and in the other case, it is vy. So
obviously, this 2 dimensional discrete Fourier transformation is separable as well as symmetric.
So, as we said that this represents a specific case of the 2 dimensional discrete Fourier
transformations represents a specific case of a class of transformations and we had also discussed
it, discussed the same when we have talked about the unitary transformation.
In today’s lecture, we will talk about some other transformations belonging to the same class.
The first transformation belonging to this class that we will talk about is called the discrete
Fourier transformation or DCT. Let us see what are the forward as well as inverse transform
kernels of this discrete Fourier transform.
4
So now, let us talk about the discrete Fourier transform or DCT. In case of discrete Fourier
transformation, the forward kernel forward transformation kernel g (x, y, u, v) is given by alpha
times u into alpha times v into cosine 2x plus 1 u pi upon 2 N into cosine 2y plus 1 into u pi upon
twice N which is same as the inverse transformation kernel which is given by x h (x, y, u, v).
So, you find that in case of discrete cosine transformation, if you analyze this, you find that both
the forward transformation kernel and also the inverse transformation kernel, they are identical
and not only that, these transformations transformation kernels are separable as well as
symmetric because in this I can have g 1 (x, u) equal to alpha u cosine twice x plus 1 u pi divided
by twice N and g 1 (y, v) can be alpha times v into cosine 2y plus 1 v pi upon twice N.
So, this transformation that is discrete cosine transformation is separable as well as symmetric
and the inverse forward inverse transformation kernel and the forward transformation kernel,
they are identical. Now, we have to see what the values of alpha u and alpha v. Here, alpha u is
given by square root of 1 upon capital N where u is equal to 0 and it is equal to square root of
twice by capital N for values of u equal to 1, 2 to capital N minus 1.
So, these are the values of alpha u for different values of u and similar is the values of alpha v for
different values of v. Now, using these forward and inverse transformation kernels, let us see
how the basis functions or the basis images look like in case of discrete cosine transform.
5
So, this figure shows the 2 dimensional basis images or basis functions in case of discrete cosine
transformation where we have shown the basis images for an 8 by 8 discrete cosine
transformation or 8 by 8 2 dimensional discrete cosine transformations.
6
Now, using these kernels, now we can write the expressions for the 2 dimensional discrete cosine
transformation in the form of c (u, v) is equal to alpha (u) alpha (v) double summation f (x, y)
into cosine of 2x plus 1 into pi u upon twice N into cosine of twice y plus 1 pi v upon twice N
where both x and y vary from 0 to capital N minus 1.
Similarly, the inverse discrete cosine transformation can be written as f (x, y) is equal to double
summation alpha (u) times alpha (v) times c (u, v). So, c (u, v) is the coefficient matrix into
cosine of (twice x plus 1) u pi upon twice capital N into cosine of (twice y plus 1) into v pi upon
twice capital N and now u and v vary from 0 to capital N minus 1. So, this is the forward 2
dimensional discrete cosine transformation and this is the inverse 2 dimensional discrete cosine
transformation.
Now, you find that there is one difference in case of forward discrete cosine transformation. The
terms alpha (u) and alpha (v) were kept outside the summation, double summation whereas
incase of inverse discrete cosine transformation, the terms alpha (u) and alpha (v) are kept inside
the double summation.
The reason being, in case of forward transformation because the summation is taken over x and y
varying from 0 to capital N minus 1, so alpha (u) and alpha (v), these terms are independent of
this summation operation whereas, in case of inverse discrete cosine transformation, the double
summation is taken over u and v varying from 0 to capital N minus 1. So, this terms alpha (u)
and alpha (v) are kept ah are kept inside the double summation operation.
So, using this discrete cosine transformation 2 dimensional discrete cosine transformation, let us
see that for a given image, what kind of output we get.
7
So this shows, this figure shows the discrete Fourier discrete cosine transformation coefficients
for the same image which is very popular in image processing community, the image of length.
The results are shown in 2 forms. The first figure, this is the coefficients which is shown in the
form of intensity plots in the form of a 2 dimensional array whereas, the third figure shows the
same coefficients which are plotted in the form of a 3 dimensional in the form of a surface in 3
dimension.
Now, if you closely look at this output coefficients, you find that in case of discrete cosine
transformation, the energy of the coefficients are concentrated mostly in a particular region
where the coefficients are near the origin that is (0, 0) which is more visible in the case of a 3
dimensional pot. So, you find that here in this particular case, the energy is concentrated in a
small region in the coefficients space near about the (0, 0) coefficients. So, this is a very very
important property of the discrete cosine transformation which is called energy compaction
property.
Now, among the other properties of discrete cosine transformation which is obviously similar to
the discrete Fourier transformation; as we have said that the discrete cosine transformation is
separable as well as symmetric, it is also possible to have a faster implementation of discrete
cosine transformation or FDCT in the same manner as we have implemented FFT incase of
discrete Fourier transformation.
The other important property of the discrete cosine transformation is the periodicity property.
Now, in case of discrete cosine transformation, you will find that the periodicity is not same as
incase of discrete Fourier transformation. In case of Fourier transformation, we have said that the
discrete Fourier transform is periodic with period capital N where N is the number of samples. In
case of discrete cosine transformation, the magnitude of the coefficients are periodic with a
period twice N where N is the number of samples.
8
So, the periodicity in case of discrete cosine transformation is twice or the period in case of
discrete cosine transformation is twice of the period in case of discrete Fourier transformation
and we will see later that this particular property helps to obtain data compression, the smother
data compression using the discrete cosine transformation and not using the discrete Fourier
transformation.
The other property which obviously helps the data compression using discrete cosine
transformation is the energy compaction property because most of the signal energy or image
energy is concentrated in a very few number of coefficients near the origin or near the (0, 0)
value in the frequency domain in the uv plane.
So, by coding few numbers of coefficients, we can represent or we can represent most of the
energy, most of the signal energy or most of the image energy. So, that also helps in the data
compression using discrete cosine transformation, a property which is not normally found in case
of discrete Fourier transformation.
So, after discussing about all these different properties of the discrete cosine transformation, let
us go to the other transformation which we have said as Walsh transformation.
So now, let us discuss about the Walsh transform. In case of 1 D, the discrete Walsh transform
kernels are given by g (x, u) is equal to 1 upon capital N into product minus 1 to the power b i (x)
into b n minus 1 minus I (x) where the product is taken over i equal to 0 to n minus 1. So, you find in
this particular case, the capital N gives you the number of samples and the lower case n is the
number of bits needed to represent x as well as u sorry this is u not x.
So, capital N is the number of samples and the lower case n is the number of bits needed to
represent both x and u and in this case, the forward transformation kernel is given by g (x, u)
9
equal to 1 upon capital N into product i equal to 0 to lower case n minus 1, minus 1 to the power
b i (x) into b n minus 1 minus i (u).
Now, in this particular case, the convention is; see, if I represent b k (Z), b k (Z) represents the
k’th bit in the digital representation of z, digital or binary representation of z. So, that is the
interpretation of b i (x).
So, using this, the forward discrete Walsh transformation will be given by w (u), in case of 1
dimension will be given by 1 upon capital N summation f (x) into product i equal to 0 to lower
case n minus 1, minus 1 to the power b i (x) into b n minus 1 minus i (u) where x varies from 0 to
capital n minus 1.
The inverse transformation kernel, in case of this discrete Walsh transformation is identical with
the forward transformation kernel. So, h (x, u), the inverse transformation kernel is same as
product i equal to 0 to lower case n minus 1 into minus 1 to the power b i (x) into b n minus 1 minus i
(u) and using this inverse transformation kernel, we can get the inverse Walsh transformation as f
(x) equal to summation u equal to 0 to capital N minus 1 W (u) product i equal to 0 to lower case
n minus 1, minus 1 to the power b i (x) into b n minus 1 minus i (u) So, this is the inverse kernel and
this is the inverse transformation.
So here, you find that the discrete Walsh transformation, both the forward transformation and the
inverse transformation, they are identical. Only thing is there is the difference of the multiplicity
factor 1 upon capital N but otherwise because the transformations are identical, so the algorithm
used to perform the forward transformation, the same algorithm can also be used to perform the
inverse Walsh transformation.
10
Now, in case of 2 dimensional signal; so in case of 2 dimensional signal, we will have the
transformation kernel as g (x, y, u, v) which is equal to 1 upon capital N product i equal to 0 to
lower case n minus 1, minus 1 to the power b i (x) into bn minus 1 minus i (u) plus b i (y) into b n minus 1
minus i (v) and the inverse transformation kernel in this case is identical with the forward
transformation kernel.
So, the inverse transformation kernel is given by 1 upon capital N product again i equal to 0 to
lower case n minus 1, minus 1 to the power b i (x) b n minus 1 minus i (u) plus b i (y) b n minus 1 minus i
(v).
11
So, using this forward transformation kernel and the inverse transformation kernel, now you find
that the inverse as well as the forward discrete Walsh transformation can now be implemented as
W (u, v) is equal to 1 upon capital N double summation f (x, y) into product i equal to 0 to n
minus 1, minus 1 to the power b i (x) into b n minus 1 minus i (u) plus b i (y) into b n minus 1 minus i (v) and
the summation has to be taken over x and y varying from 0 to capital N minus 1.
And in the same manner, because the forward transformation as well as the inverse
transformation, they are identical in case of discrete Walsh transformation; the same expression
if i replace f (x, y) by W (u, v) and the summation is taken over u v varying from 0 to capital N
minus 1, what I get is the inverse Walsh transformation and I get back the original signal f (x, y)
from the transformation coefficients W (u, v).
So, you find that here, the same algorithm which is used for computing the forward Walsh
transformation can also be used for computing the inverse Walsh transformation. So now, let us
see that what are the basis functions of this Walsh transformation and what are the results on
some image.
So, for Walsh transformation, the basis function appears like this or the state of basis images
appear like this. Here, the basis images are given for a 4 by 4 2D Walsh transformation and if I
apply this Walsh transformation on the same image say Lena, you find that this is the kind of
result that we get.
12
So, here again, you find that the property of this cosine transformations that is the coefficients
neared 0, they are having the maximum energy and as you go away from the origin in the uv
plane the energy of the coefficients reduces. So, this transformation also has the energy
compaction property. But here you find that the energy compaction property is not as strong as in
case of the discrete cosine transformation.
So here, the coefficient energies which is mostly concentrated in this particular region is not that
strong as the compaction of energy in case of discrete cosine transformation and by analyzing
this forward as well as the inverse transformation, Walsh transformation kernels Walsh
transformation kernels; you can again find out thus that this Walsh transformation is separable as
well as symmetric.
Not only that, for this Walsh transformation, it is also possible to have a fast implementation of
2D Walsh transformation almost in the same manner as we have done in case of the discrete
Fourier transformation where we have computed the first Fourier transform or FFT.
13
So, in case of discrete Walsh transformation, the first implementation of the Walsh
transformation will be even simpler and in this case, the first transformation can be implemented
as W (u). So here, we are saying that because the Walsh transformation is separable; so the same
way in which we have done the Fourier transformation, 2D Fourier transformation that the
Walsh transformation, 2D Walsh transformation can be implemented by using a sequence of 1
dimensional Walsh transformation and that is also true in case of discrete cosine transformation.
So, first you perform 1 dimensional Walsh transformation along the rows of the image and then
the intermediate result that you get, on that you perform 1 dimensional Walsh transformation
along the columns of the intermediate matrix. So, you get the final transformation coefficients.
The same is also true in case of discrete cosine transformation because the discrete cosine
transformation is also separable.
So, to illustrate the first implementation of the Walsh transformation, I take the 1 dimensional
case. So here, the first implementation can be done in this form. I can write W (u) is equal to half
of W even (u) plus W odd (u) and W (u) plus capital M is equal to half of W even (u) minus W odd
(u). So, you find and in this case, u varies from 0 to capital N by 2 minus 1 and M is equal to N
by 2.
So, we find that almost in the same manner in which we have implemented the first Fourier
transformation, the discrete 2 dimensional or discrete Walsh transformations, first discrete Walsh
transformation can also be implemented in the same manner. Here, we divide all the samples for
which the Walsh transformation has to be taken into even numbered samples and odd numbered
samples, compute the Walsh transformation of the even numbered samples, compute the Walsh
transform of the odd numbered odd numbered samples, then combine these 2 intermediate results
to give you the Walsh transformation of the total number of samples.
14
And because this division can be recursive, so first I have N number of samples. I divide them
into N by 2 odd samples and N by 2 even samples, even N by 2 odd samples can be divided into
N by 4 number of odd samples and even samples and if I continue this and finally I come to a
stage where I am left with only 2 samples, I perform the Walsh transformation of those 2
samples, then hierarchical combine those intermediate results to get the final Walsh
transformation.
So here again, by using this fast implementation of the Walsh transformation, you may find that
the computational complexity will be reduced drastically.
So, after discussing about the Walsh transformation, let us go to the next transformation which is
called the Hadamard transform. So, the next transformation that we discuss is Hadamard
transform. In case of Hadamard transform, first let us say consider the case in 1 dimension. The
forward transformation kernel is given by g (x, u) equal to 1 upon capital N minus 1 to the power
summation b i (x) into b i (u) where this i varies from 0 to lower case n minus 1.
So again, the capital N as well as lower case N, they have the same interpretation as in case of
Walsh transformation and using this forward transformation kernel, the forward Walsh
transformation forward Hadamard transformation can be obtained as H (u) equal to 1 upon
capital N summation x varies from 0 to capital N minus 1 f (x) into minus 1 to the power
summation b i (x) into b i (u) where i varies from 0 to lower case n minus 1.
And for Hadamard transform also, the forward transformation as well as the inverse
transformation, they are identical. That is the forward transformation kernel and the inverse
transformation kernel, they are identical. So here again, the same algorithm can be used for
forward transformation as well as the inverse transformation.
15
So here, the inverse transformation kernel is given by h (x, u) is equal to minus 1 to the power
summation b i (x) into b i (u) where i varies from 0 to lower case n minus 1 and using this, the
inverse Hadamard transformation is obtained as f (x) is equal to summation u varying from 0 to
capital N minus 1 H (u) minus 1 to the power summation b i (x) into b i (u) where i varies from 0
to lower case n minus 1. So, these are the forward and inverse Hadamard transformation in case
of 1 dimension.
Obviously, this can easily be extended into 2 dimension as in the other cases where the 2
dimensional forward and inverse transformation will be given by g (x, y, u, v) is equal to 1 upon
16
capital N into minus 1 to the power summation b i (x) into b i (u) plus b i (y) into b i (v) where this
summation is taken from over i equal to 0 to lowercase n minus 1 and similarly, the inverse
transformation kernel is also given by h (x, y, u, v) which is same as g (x, y, u, v) that is 1 upon
capital N minus 1 to the power summation i equal to 0 to lower case n minus 1 into b i (x) into b i
(u) plus b i (y) into b i (v).
So, you find that the forward transformation kernel and the inverse transformation kernel in case
of 2 dimensional discrete Hadamard transformations are identical. So, that gives us the forward
transformation and the inverse transformation for the 2 dimensional discrete Hadamard
transformations to be same which enables us to use the same algorithm or same program to
compute the forward transformation as well as the inverse transformation.
And, if you analyze this, you find that this Hadamard transformation is also separable and
symmetric. That means in the same manner, this 2 dimensional Hadamard transformation can be
implemented by using a sequence of 1 dimensional Hadamard transformations. So, for the image
first we implement 1 dimensional Hadamard transformation over the rows of the image and then
implement the 1 dimensional Hadamard transformations over the columns of this intermediate
matrix and that gives you the final Hadamard transformation output.
Now, if I further analyze the kernels of this Hadamard transformation and because we have said
that 2 dimensional Hadamard transformations can now be implemented in the form of a sequence
of 1 dimensional Hadamard transformations, so we analyze further with respect to an 1
dimensional Hadamard transformation.
So, as you as you have seen that 1 dimensional Hadamard transformation is given by g (x, u) is
equal to 1 upon capital N minus 1 to the power summation b i (x) into b i (u) where i varies from 0
to lower case n minus 1 and let me mention here that all these additions that we are doing, this
17
summations follow modular 2 arithmetic that means this summations are actually nothing but
ORing operation of different bits.
Now, if I analyze this 1 dimensional forward Hadamard transformation kernel, you will find that
if I omit the multiplicative term 1 upon N, so if I omit from this, this multiplicative term 1 upon
capital N; then this forward transformation actually forms, leads to a matrix which is known as
Hadamard matrix.
So, to see that what is this Hadamard matrix, you will find that for different values of x and u,
the Hadamard matrix will look like this. So, here we have shown a Hadamard matrix for
Hadamard transformation of dimension 8. So, for N equal to 8, this Hadamard matrix has been
formed and here plus means it is equal to plus 1 and minus means it is equal to minus 1.
Now, if you analyze this particular Hadamard matrix, you will find that it is possible to generate
a recursive relation; it is possible to formulate a recursive relation to generate the transformation
matrixes. Now, how that can be done?
You will find that this particular path, if I consider say these 4 by 4 elements, these 4 by 4
elements, these 4 by 4 elements, they are identical. Whereas, these 4 by 4 elements of this matrix
is just negative of this and the same is followed, the same pattern can be observed in all other
parts of this matrix. So, by observing this, now we can formulate a recursive relation to generate
these transformation matrixes.
18
So, to have that recursive relation, let us first have a Hadamard matrix of the lowest order that is
for value of n equal to 2 and for this lowest order, we have a Hadamard matrix H 2 which is
nothing but (1, 1, 1, minus 1) and then using this recursively, a Hadamard matrix of dimensional
2 N can be obtained from a Hadamard matrix of dimensional N which is given by the relation
(HN, HN, HN and H minus N). So, a Hadamard matrix of higher dimension can recursively
formed from a Hadamard matrix of lower dimension. So, this is a very very important property
of the Hadamard transformation.
Now, if we analyze the Hadamard matrix further; let us see, suppose I want to analyze, I want to
analyze this particular Hadamard matrix, here you find that if I consider the number of sign
changes along a particular column, so you find that the number of sign sign changes along
column number of 0 is equal to 0. Number of sign changes along column number 1 is equal to 7,
number of sign changes along column number 2 is equal to 3, along column number 3, the
number of sign changes equal to 4, along column 4 it is equal to 1, along column 5 it is equal to
6, along column 6 it is equal 2, along column 7 it is equal to 5.
So, if I define the number of sign changes along a particular column as the sequence which is
similar to the concept of frequency in case of discrete Fourier transformation or in case of
discrete cosine transformation; so in case of Hadamard matrix, we are defining the number of
sign changes along a particular column or for a particular value of u as the sequence.
So here, you will find that for value of u is equal to 0, the sequence is equal to 0; u equal to 1, the
sequence equal to 7; u equal to 2, the sequence equal to 3. So, there is no straight forward
relation between the value of u and the corresponding sequence unlike in case of discrete Fourier
transform or in case of discrete cosine transform or we have seen that increasing values of the
frequency variable u corresponds to increasing values of the frequency components.
19
So, if we want to have similar type of concepts in case of Hadamard transformation also, then
what we need is we need some sort of reordering of this hadamard matrix and that kind of
reordering can be obtained by another transformation where that particular transformation, the
kernels of that particular transformation will be given by g (x, u) is equal to 1 upon capital N
minus 1 to the power summation b i (x) into p i (u). Now, instead of b i (u), we are writing p i (u)
where this summation is taken over i equal to 0 to lower case n minus 1 and this particular term
p i (u)can be obtained from b i (u) using these relations.
p 0 (u) will be given by b n minus 1 (u), p 1 (u) will be given by b n minus 1 (u) plus b n minus 2 (u), p 2 (u)
will be given by b n minus 2 (u) plus b n minus 3 (u) and continuing like this, p n minus 1 (u) will be given
by b 1 (u) plus b 0 (u) where all these summations are again modulo 2 summations. That is they
can also be explained, they can be implemented using the binary OR operations.
Now, by using this modification, again this particular forward transformation kernel that you get,
the modified forward transformation kernel that you get that leads to a modified Hadamard
matrix. So, let see what is this modified Hadamard matrix.
So, the modified Hadamard matrix that you get is of this particular form and if you look at to this
particular modified Hadamard matrix, you find that here sequence for u equal to 0 is again equal
to 0, sequence for u equal to 1 is equal 1, sequence for u equal to 2 is equal 2 and now for
increasing values of u, we have increasing values of sequencing.
20
So, using this modified or ordered hadamard matrix forward kernel; in case of 2 dimensions, the
ordered Hadamard basis functions are obtained in this particular form.
And using this ordered Hadamard basis functions, if I compare this with the Walsh basis
functions, you will find that basis functions or basis images in case of Walsh transformation and
the basis images in case of ordered Hadamard transformation; the basis functions are identical.
But there is a difference of ordering of the Walsh basis functions and the ordered Hadamard
basis functions.
21
Otherwise, the basis functions for Walsh transformation and the ordered Hadamard
transformation, they are identical and because of this, in many cases, a term which is used is
called Walsh Hadamard transformation and this term Walsh Hadamard transformation is actually
used to mean either Walsh transformation or Hadamard transformation. So, this uses, this means
1 of the 2. Now, using this ordered Hadamard transformation, the result on the same image that
you get is something like this.
So here you find, again, if I look at the energy distribution of different Hadamard coefficients,
the ordered Hadamard coefficients; you will find that here the energy is concentrated more
towards 0 compared to the Walsh transformation. So, the energy compaction property of the
ordered Hadamard transformation is more than the energy compaction property of Walsh
transformation.
22
Now, this slide shows the comparison of the transformation coefficients of this different
transformation. The first one shows the coefficient matrix for the discrete Fourier transformation,
the second one shows the coefficient matrix for the discrete cosine transformation, third one is
for discrete Walsh transformation and fourth one is for discrete Hadamard transformation, it is
ordered Hadamard transformation.
Now, by comparing all these 4 different results; you find that in case of discrete cosine
transformation, the discrete cosine transformation has the property of strongly concentrating the
energy in very few numbers of coefficients. So, the energy compaction property of discrete
cosine transformation is much more compared to the other transformations and that is why this
discrete cosine transformation is very popular for the data compression operations unlike the
other cases.
And, in case of discrete Fourier transformation and discrete cosine transformation, though we
can associate the frequency term with the transformation coefficients, it is not possible to have
such a physical interpretation of the coefficients of the discrete Walsh transformation nor in case
of discrete Hadamard transformation.
So, though we cannot have such a kind of physical interpretation but still because of this energy
compaction property, the Hadamard transform as well as the Walsh transform can have some
application in data compression operations. So, with this we come to our end of our discussion
on the discrete cosine transformation, discrete Walsh transformation and discrete Hadamard
transformation and we have also seen some comparison with the discrete Fourier transformation.
23
Now, let us discuss about the questions that we have discussed, we have given in the last class.
The first 1 is the time complexity of the first Fourier transformation and if you analyze the way
in which the first Fourier transform is implemented, you can easily find out that the complexity,
the computational complexity of first Fourier transformation will be in n log n where n is the
data size.
The second question show that DFT and its inverse are periodic functions where the periodicity
is N capital N. So, in the expression if you replace u by u plus capital N or v by v plus capital N,
you will find that we will get back the same expression which means that the DFT as well as the
inverse DFT, they are periodic with a periodicity of capital N.
The third question is find out the Fourier coefficients for discrete Fourier coefficients for the
sequence of samples as given here. We have to compute the first Fourier transformation and
verify that the discrete Fourier transform coefficients as well as FFT coefficients, they are
identical. It is also quite straight forward by using the expressions that we have discussed when
we have talked about the Fourier transformation and discrete Fourier transformation, these
coefficients can be easily computed and it can be verified that DFT coefficients and FFT
coefficients, they are identical.
24
Now, coming to today’s quiz questions; the first question is which property of discrete cosine
transform makes it so popular for image compression applications. The second one, what is the
transformation kernel for Walsh transformation. Third question - what is the transformation
kernel for Hadamard transform.
Fourth question - explain the significance of modified Hadamard transform. Now, in this case,
by modified Hadamard transform I mean the ordered Hadamard transform. So, explain the
significance of ordered Hadamard transform. The fifth question - find out the Walsh transform
coefficients for the following samples of a 1 dimensional signal where sample values are f (0)
equal to 3, f (1) equal to 2, f (2) equal to 5 and f (3) equal to 4.
Thank you.
25
Prof. P. K. Biswas
Lecture - 16
K – L Transform
Welcome to the video lecture on digital image processing. For last few classes, we were
discussing about the image transformations.
So, we have talked about the unitary transformation, we have talked about the Fourier
transformation and in the last class, we have talked about the discrete cosine transform, we have
seen the discrete Walsh transform, discrete Hadamard transform. We have seen their properties
and we have compared the performance of these transformation operations.
1
In today’s lecture, we will talk about another transform operation which is fundamentally
different from the transformations that we have discussed in last few classes. So, the
transformation that we will discuss about today is called K – L transformation. We will see what
is the fundamental difference between K – L transform and other transformations, we will see the
properties of K – L transform, we will see the applications of K – L transform for data alignment
and data compression operations and we will also see the computation of K – L transform for an
image.
Now, as we said that K – L transform is fundamentally different from other transformations; so

before we start discussion on K – L transform, let us see what is the difference. The basic
difference in all the previous transformations that we have discussed that is whether it is the
Fourier transformation or discrete cosine transformation or Walsh transformation or Hadamard
transformation; in all these cases, the transformation kernel whether it is forward transformation
kernel or inverse transformation kernel, they are fixed.
2
So, for example, in case of discrete Fourier transformation or DFT, we have seen that the
transformation kernel is given by g (x, u) is equal to e to the power minus j 2 pi by N into ux.
Similarly, for the discrete cosine transformation as well as for other transformations like Walsh
transform or Hadamard transform. In all those cases, the transformation kernels are fixed. The
values of the transformation kernel depend upon the locations x and the location u. The kernels
are independent of the data over which the transformation has to be performed.
But unlike these transformations, in case of K – L transformation, the transformation kernel is

actually derived from the data. So, in case of K – L transform, it actually operates on the basis of
statistical properties of vectored representation of the data.
So, let us see how these transformations are actually opted. So, to go for K – L transformation,
our requirement is the data has to be represented in the form of vectors. So, let us assume a
population of vectors say x which are given like this. So, we consider a vector population x
which is given by say x 1 x 2 x 3 say upto x n . So, these vectors x are actually vectors of
dimension n.
Now, given such a state of vectors or population of vectors x, we can find out the mean vector
given my mu x which is nothing but the expectation value of this vector population x and
similarly, we can also find out the covariance matrix C x which is given by the expectation value
of x minus the mean vector mu x into x minus mu x transpose.
So here, you will find that x, since x is of dimension n; this particular covariance matrix will be
of dimension n by n. So, this is the dimensionality of the covariance matrix C x and obviously,
the dimensionality of the mean vector mu x will be equal to n.
3
Now, in this covariance matrix C x , you will find that an element C ii that is an element in the
i’th row and i’th column is nothing but the variance of the element x i of the vectors x. Similarly,
an element C ij , this is nothing but the covariance of the elements x i and x j of the vectors x and
you will find that this particular covariance matrix C x , it is real and symmetric. So, because this
covariance matrix is real and symmetric, we can always find a set of n orthonormal Eigen
vectors. So, because this covariance matrix C x is real and symmetric, we can find out a set of
orthonormal Eigen vectors of this covariance matrix C x .
Now, if we assume that suppose e i is an Eigen vector of this covariance matrix C x which
corresponds to the Eigen value lambda 1 lambda i . So, corresponding to the Eigen value lambda
i , we have the Eigen vector say e i and we assume this Eigen values are arranged in descending
order of magnitude of the Eigen values. That is we assume that lambda j is greater than or equal
to lambda j plus 1 for j varying from 1, 2 upto n minus 1.
So, what we are taking? We are taking the Eigen values of the covariance matrix C x and we are
taking the Eigen vectors corresponding to every Eigen value. So, corresponding to the Eigen
value lambda i , we have this Eigen vector e i and we also assume that these Eigen values are
arranged in descending order of magnitude that is lambda j is greater than or equal to lambda j
plus 1 for j varying from 1 to n minus 1.
Now, from this set of Eigen vectors, we form a matrix, say A. So, we form matrix A from this set
of Eigen vectors and this matrix A is formed in such a way that the first row of matrix A is the
Eigen vector corresponding to the largest Eigen value and similarly the last row of this matrix A
corresponds to the Eigen vector is the Eigen vector which corresponds to the smallest Eigen
value of the covariance matrix C x .
4
Now, if we use such a matrix A to obtain the transform operations, then what we get is we get a
transformation of the form say y equal to A into x minus mu x . So, using this matrix A which has
been so formed, we form our transformation like y equal to A into x minus mu x where you find
that x is a vector and mu x is the mean vector.
Now, this particular transformation, the transform output y that you get, that follows certain
important relationship. The first relationship, the important property is that the mean of these
vectors y or mu y is equal to 0. So, these are the properties of the vector y that is obtained. So,
the first property is the mean of y mean of vectors, y mu y equal to 0.
Similarly, the covariance matrix of y given by C y , this is also obtained from C x , the covariance
matrix of x and the transformation matrix that we have generated A. And the relationship
between the covariance matrixes of y is like this that C y is given by AC x A transpose. Not only
that, this covariance matrix C y is a diagonal matrix whose elements along the main diagonal are
the Eigen values of C x .
So, this C y will be of the form lambda 1 0, 0, so it continues like this; 0, then 0, lambda 2 , 0, it
continues like this then finally we have 0, 0, 0 and upto this we have lambda 1 . So, this is the
covariance matrix of y that is C y .
And obviously, in this particular case, you will find that the Eigen values of C y is same as the
Eigen values of C x which is nothing but lambda 1 , lambda 2 upto lambda n and it is also a fact
that the Eigen vectors of C y will also be same as the Eigen n Eigen vector of C x and since in this
case we find that the diagonal elements are always 0, that means the elements of y vectors they
are uncorrelated.
So, the property of the vectors y that we have got is the mean of the vectors equal to 0. We can
obtain the covariance matrix C y from the covariance matrix C x and the transformation matrices
5
A. The Eigen values of C y are same as the Eigen values of C x and also as the off diagonal
elements of C y are equal to 0; that indicates that the elements of the vectors y, different elements
of the vector y are uncorrelated. Now, let us see what is the implication of this. To see the
implication of these observations, let us come to the following figure.
So, in this figure we have a binary image, a 2 dimensional binary image. Here we assume that all
the pixel locations which are white, there an object is present, an object element is present and
wherever the pixel value is 0, there is no object element present.
So, in this particular case, the object region consists of the pixels say (3, 4) (4, 3) (4, 4) then (4,
5) then (5, 4) then (5, 5) then (5, 6) and (6, 5). So, these are the pixel location which contains the
objects and other pixel location does not contain the object.
Now, what we plan to do is we will find out the K – L transform of those pixel locations where
an object is present. So, from this, we have the population of Eigen vectors which is given by
this.
6
Just, we consider the locations of the pixels where an object is present that is the pixel is equal to
white and those locations are considered as vectors and so the population of vectors x is given by
we have (3, 4) because in location (3, 4) we have an object present. We have (4, 3), there also an
object is present; we have (4, 4), here also an object is present; we have (4, 5), we have (5, 4),
then (5, 5), then (5, 6) and then (6, 5).
So, we have 1, 2, 3, 4, 5, 6, 7, 8 vectors, 8 2 - dimensional vectors in this particular population.

Now, from these vectors, it is quite easy to compute the mean vector mu x and you can easily
compute that mean vector mu x in this particular case will be nothing but 4.5, 4.5. So, this is the
mean vector that we have got.
So once we have the mean vector, now we can go for computing the covariance matrix and you
will find that the covariance matrix C x was defined as the expectation value of x minus mu x
into x minus mu x transpose. So, finding out x minus mu x into x minus mu x transpose for all the
vectors x and taking the average of them gives us the expectation value of x minus mu x into x
minus mu x transpose which is nothing but the covariance matrix C x .
7
So here, for the first vector x 1 , we can find out x 1 minus mu x as; you find that x 1 is nothing but
that the vector (3, 4), so x 1 minus mu x will be equal to minus 1.5 and minus 0.5. So, you can
find out x 1 minus mu x into x 1 minus mu x transpose, if we compute this this will be a value
equal to 0.25, 0.75, 0.75 and 2.25.
So similarly, we find out x minus mu x into x minus mu x transpose for all other vectors in the
population x and finally, average of all of them gives us the covariance matrix C x and if you
compute like this, you can easily obtain that covariance matrix C x will come out to be 0.75,
0.375, 0.375 and 0.75.
8
So, this is the covariance matrix of the population of vectors x. Now, once we have this
covariance matrix; to find out the K – L transformation, we have to find out what are the Eigen
values of this covariance matrix and to determine the Eigen values of the covariance matrix, you
all might be knowing that the operation is like this that given the covariance matrix, we simply
perform 0.75 minus lambda 0.375, then 0.375, 0.75 minus lambda and set this determinant is
equal to 0 and then you solve for the values of lambda.
So, if you do this, you will find that this simply gives an equation of the form 0.75 minus lambda
square is equal to 0.375 square. Now, if you solve this, the solution is very simple. The lambda
comes out to be 0.75 plus minus 0.375 whereby you will get lambda 1 is equal to 1.125 and
lambda 2 comes out as 0.375.
So, these are the 2 Eigen values of the covariance matrix C x in this particular case and once we
have this Eigen values, we have to find out what are the Eigen vectors corresponding to these
Eigen values.
9
And to find out the Eigen vectors, you know that the relation is for the given matrix for a given
matrix say A or in our particular case, it is C x , so let us take C x . So, C x into say vector Z has to
be equal to lambda times Z if Z is the Eigen vector corresponding to the Eigen value lambda and
if we solve this, we will find that we get 2 different Eigen vectors corresponding to 2 different
Lambda’s. So, corresponding to Lambda 1 is equal to 1.125, we have the corresponding Eigen
vector e 1 which is given as 1 upon root 2 into (1, 1). So, this will be the corresponding Eigen
vector.
Similarly, corresponding to the Eigen value lambda 2 equal to 0.375, this corresponds to the
Eigen vector e 2 which is equal to 1 upon root 2 into 1 minus 1. So, you will find that once we get
these Eigen vectors, we can formulate the corresponding transformation matrix. As we said, we
will get the transformation matrix A from the Eigen vectors of the covariance matrix C x but the
rows of the transformation matrix A are the Eigen vectors of C x such that the first row will
correspond to the Eigen vector will be the Eigen vector corresponding to the maximum Eigen
value and the last row will be the Eigen vector corresponding to the minimum Eigen value.
So, in this case, the transformation matrix A will be simply given by 1 upon root 2 to (1, 1, 1,
minus 1). Now, what is the implication of this?
10
So, you will find that using this particular transformation, transformation matrix; if I apply the K
– L transformation, then the transformed output, the transformed vector will by Y equal to A into
x minus mu x .
So, you will find that application of this particular transformation this particular transformation
amounts to establishing a new coordinate system whose origin is at the centeroid of the object
pixels. So, this particular transformation K – L transformation, basically establishes a new
coordinated system whose origin will be at the center of the object and the axis of this new
coordinate system will be parallel to the directions of the Eigen vectors. So, by this what we
mean is like this one.
11
So, this was our original figure where all the white pixels are the object pixels. Now, by
application of this transformation, this K – L transformation with transformation matrix A, we
get 2 Eigen vectors. The Eigen vectors are these - e 1 and e 2 . So, you find that this e 1 and e 2 , it
forms a new coordinate system and the origin of this coordinate system is located at the center of
the object and the axis are parallel to the directions of the vectors e 1 and e 2 and this figure also
shows that this is basically a rotation transformation and this rotation transformation aligns the
data with Eigen vectors and because of this alignment, different elements of the vector Y, they
become uncorrelated.
So, it is only because of this alignment, the data becomes uncorrelated and also because the
Eigen values of lambda i appear along the main diagonal of C y that we have seen earlier, this
lambda i basically tells the variance of the component Y i along the Eigen vector e i and later on
we will see the application of this kind of transformation to align the objects along the Eigen
vectors and this is very very important for object recognition purpose.
Now, let us see the other aspects of the K – L transformation. So, this is one of the applications
where we have said that this K – L transformation basically aligns the data along the Eigen
vectors. Another important property of K – L transformation deals with the reconstruction of the
vector x from the vector Y.
So, by K – L transformation what we have got is we have got a state of vectors y from another
state of vectors x using the transformation matrix A where A was derived using the Eigen vectors
of the covariance matrix of x that is C x .
12
So, our K – L transformation expression was Y equal to A into x minus mu x . Now, here you
find that because this matrix A, the rows of this matrix A are the Eigen vectors of the covariance
matrix C x . So, A consists of rows which are orthogonal vectors and because rows of A are
orthogonal vectors, so this simply says that inverse of A is nothing but A transpose.
So now, inverse of A is very simple. If you simply take the transpose of the transform matrix A,
you get the inverse of it. So, from the forward transform, forward K – L transform, we can very
easily find out the inverse K – L transform to reconstruct x from the transformed image or the
transformed data Y and in this case, the reconstruction expression is very simple. It is given by x
equal to A transpose Y plus mu x . This is a direct formation from the expression of forward
transformation.
Now, the important property of this particular expression is like this that suppose, here you find
that this matrix A has been formed by using all the Eigen vectors of the covariance matrix C x .
Now, suppose I choose that I will make a transformation matrix where I will not consider, I will
not take all the Eigen vectors of the covariance matrix C x . Rather, I will consider say k number
of Eigen vectors and using that k number of Eigen vectors, I will make a transformation matrix
say A k .
So, this A k is formed using k number of Eigen vectors k number of Eigen vectors of matrix C x . I
am not considering all the Eigen vectors of the matrix C x and obviously because I am taking k
number of Eigen vectors, I will take those Eigen vectors corresponding to k-largest Eigen values.
So obviously, this matrix A k , now it will have k number of rows and every row will have n
number of elements. So, the matrix A will be of dimension k by n and the inverse transformation
will be will also be done in the similar manner.
So, using this transformation matrix A k , now I apply the transformation. So, I get Y equal to A k
into X minus mu x . Now, because A k is of dimension k by n and X is of dimension n by 1, so
naturally this transformation will generate vectors Y which are of dimension k. Now, in earlier
13
case, in our original formulation here; the transformation matrix Y was the transformed vector Y
was of dimension n. But when I have made a reduced transformation matrix A considering only
k number of Eigen vectors; here I find that using the same transformation, now the transformed
vectors y that I get, they are no longer of dimension n but this y are vectors of dimension k.
Now, using these vectors of reduced dimension if I try to reconstruct X, obviously the
reconstruction will not be perfect. But what I will get is an approximate value of X. So, let me
write that expression like this.
Here, what I will get is I will get an approximate X sorry I will get an approximate x, let me
write it as x hat which will be given by A k transpose Y plus mu x . Now, here you find that the
A k vector was of dimension k by n, vector y was of dimension k. Now, when I take A k
transpose, A k transpose becomes of dimension n by k. Now, if I multiply this matrix A k
transpose which is of dimension n by k by this vector y which is of dimension k; obviously, I get
a vector which is of dimension n by 1.
So, by this you will find that this inverse transformation, it gives me the approximate
reconstructed x but the dimension of x hat which is the approximation of x is same as x which is
nothing but of dimension n. So, by this inverse transformation, I get back a vector x hat which is
of same dimension as x but it is not the exact value of x, this is an approximate value of x and it
can be shown that the mean square error of this reconstruction that is the mean square error
between x and x hat is given by an expression that e ms is given by sum of lambda j where j
varies from 1 to n minus sum of lambda i where i varies from 1 to k which is nothing but sum of
lambda j where j varies from k plus 1 to n.
So, you will find that this mean square error, this term that we have got; you remember that
while forming our transformation matrix A k , we have considered k number of Eigen vectors of
matrix C x and these k number of Eigen vectors corresponding to corresponds to largest Eigen
14
values of matrix C x and in this particular expression, the mean square error is given by the sum
of those Eigen values whose corresponding Eigen vectors was not considered for formation of
our transformation matrix A and because the corresponding Eigen values are the smallest Eigen
values, so this particular transformation and the corresponding inverse transformation ensures
that the mean square error of the reconstructed signal or the mean square error between x and x
hat will be minimum.
That is because of this summation consists of summation of only those Eigen values which are
having the minimum value. So, that is why this K – L transform is often called an optimum
transform because it minimizes the error of reconstruction between x and x hat. Now, these is a
very very important property of K – L transformation which is useful for data compression and
in this particular case, let us see that how this particular property of K – L transformation will
help to reduce or to compress the image data.
So obviously, the first operation that you have to do is if I want to apply this K – L
transformation over an image, I have to see how to apply this K – L transform over an image?
So, we have a digital image is a 2 dimensional array of quantized intensity values. So, a digital
image as it is represented by a 2 dimensional array of quantized intensity values; so let us put a
digital image in this form. Now here, let us assume that this image consists of n number of rows
and n number of columns. So, there will be n number of columns and n number of rows and as
we have said that in order to be able to apply K – L transformation, the data has to be represented
by a collection of the vectors. So, this 2 dimensional image or 2 dimensional array which
consists of n number of rows and n number of columns can be converted into a set of vectors in
more than 1 ways.
So, let us assume in this particular case that we represent every column of this 2 dimensional
array as a vector. So, if we do that then every column of this; so this will be represented by a
15
vector say x 0 , this column will be represented by a vector say x 1 , this column will be represented
by a vector say x 2 and this way we will have say n number of vectors as there are n number of
columns.
So, once we have this n number of vectors; for this n number of vectors, we can find out the
mean vector which is mu x and this is given by 1 upon capital n then summation x i where i varies
from 0 to capital n minus 1. And similarly, we can also find out the covariance matrix of these n
vectors and the expression for the covariance matrix as we had already seen that this is 1 upon
capital N summation x i minus minus mu x into x i minus mu x transpose where this i will vary
from 0 to capital N minus 1.
And, here you will find that our mean vector mu x , this is of dimension capital N whereas the
covariance matrix C x , this is of dimension capital N by capital N. So, once we have obtained the
mean vector mu x and the covariance matrix C x , we can find out the Eigen vectors and Eigen
values of this covariance matrix C x and as we have already seen
that because this particular covariance matrix C x is of dimension capital N by capital N, there
will be N number of Eigen values lambda i where this i varies from 0 to capital N minus 1 and
corresponding to every Eigen value lambda i , there will be an Eigen vector e i . So, this e i , Eigen
vector e ii here again will be vary from 0 to capital N minus 1.
Now, given this n number of Eigen vectors, for n number of Eigen values; we can make the
transformation matrix, we can form the transformation matrix A and here this transformation
matrix A will be formed as say e 0 transpose, I will write this as transpose because e 0 being Eigen
vector and normally a vector is represented as a column vector.
So, we will write the matrix A which consists of a number of where rows of this matrix A will be
the Eigen vectors of the covariance matrix A covariance matrix C x . So, this A will be e 0
16
transpose, e 1 transpose and there are n number of Eigen vectors, so I will have e N minus 1
transpose where this e 0 corresponds to the Eigen value lambda 0 and obviously in this case, as
we have already said that our assumption is lambda 0 is greater than or equal to lambda 1 which
is greater than or equal to lambda 2 and continued like this, it is greater than or equal to lambda
N minus 1 . So, this is how we form the transformation matrix A.
Now, from this transformation matrix, we can make a truncated transformation matrix where
instead of using all the Eigen vectors of the covariance matrix C x , we consider only the first k
number of Eigen vectors which corresponds to k number of Eigen values, k number of the largest
Eigen values.
So, we form the transformation matrix, the modified transformation matrix A k using the first k
number of Eigen vectors. So, in our case, A k will be e 0 transpose, e 1 transpose and likewise it
will go upto e K minus 1 transpose and using this A k , we take the transformation of the different
column vectors of the image which we have represented by vector x i . So, for every x i , we get a
transform vector say Y i .
So here, the transformation equation is Y i is equal to A k - this is the modified transformation

matrix into X i minus mu x where this i varies from 0 to capital N minus 1. So, here you find that
because A k is of dimension the dimension of A k is k by N and dimension of X i and mu x both of
them are of dimension N by 1; so X i minus mu x , this is a vector of dimension capital N by 1.
So, this particular vector, this is of dimension capital N by 1.
So, you will find that when I multiply, when I perform this transformation - A k into X i minus mu
x , this actually leads to a transformed vector Y i where Y i will be of dimension k by 1. So, this is
the dimensionality of Y i . That means using this transformation, with the transformation matrix
A k , we are getting the transformed vector Y i of dimension k.
17
So, if this is done, if this transformation is carried out for all the column vectors of matrix of the
2 dimensional image; in that case, I get n number of transformed vectors Y i where each of this
transformed vector is a vector of dimension k. That means the transformed image that I will get,
the transformed image will consist of N number of column vectors where every column is of
dimension k. That means the transformed image now will be of dimension K by N having K
number of rows and N number of columns.
You remember that our original image was of dimension of capital N by capital N. Now, using
this transformed image if I do the inverse transformation to get back the original image; as we
said earlier that we do not get the perfectly reconstructed image, rather what we will get is an
approximate image.
So, this approximate image will be given by x i hat is equal to A k transpose Y i plus mu x where
this x i hat here you find that it will be of dimension capital N. So, collection of all these x i hats
gives you the reconstructed image from the transformed image. As we have said that the mean
square error between the reconstructed image and the original image in this particular case will
be minimum because that is how we have formed the transformation matrix and there we have
said that the mean square error of the reconstructed vector from the original vector was
summation of the Eigen values which are left out; corresponding to which the Eigen vectors
were not considered for formation of the transformation matrix.
So here, you find that because now in this case for getting the reconstructed image, what are the
quantities that we have to save? Obviously, the first quantity that we have to save, the first
information that we have to save is the transformation matrix A k . So, this A k needs to be saved
and the other information that you have to save is the transformed matrix or the set of
transformed vectors Y i for i equal to so the set of transformed vectors Y i for i equal to 0 to
capital N minus 1.
18
So, if we save these 2 quantities A k and the set of transformed vectors Y i , then from these 2
quantities we can reconstruct an approximate original image given by the vectors x i hat. So, you
will find that in this case, the amount of compression that can be obtained depends upon what is
the value of K that is how many Eigen vectors we really consider; we really take into account for
formation of our transformation matrix A.
So, the value of k can be 1 where we considered only 1 Eigen vector to form our transformation
matrix A. It can be 2 where we consider only 2 Eigen vectors to form the transformation matrix
A and depending upon the number of the Eigen vectors, the amount of compression that we can
achieve will be varying.
Now, let us see that what are the kind of results that we get with different values of k.
So here, you find that we have shown some of the images. Here, the top image, this is the
original image. Now, when this original image is actually transformed and reconstructed using
the transformation matrix with only 1 Eigen vector; so this the Eigen vector which corresponds
to the corresponds to the largest Eigen value of the covariance matrix. Then the reconstructed
image that we get is given by this result. So, here we find that the reconstructed image is not at
all good but still from the reconstructed image, we can make out that what this image is about.
Now, if we increase the number of Eigen vectors in the transformation matrix; when I use 5
Eigen vectors as the transformation matrix, then the reconstructed image is given by this one.
You will find that the amount of information which is contained in this particular image is quite
improved though this is not identical with the original image. If we increase the number of Eigen
vectors further, that is we use 25 Eigen vectors to form the transformation matrix; then this is the
reconstructed imaged that we get.
19
Now, if you closely observe between these 2 images, compare these 2 images; you will find that
there are some artifacts. See for example, in this particular region, there is an artifact, something
like a vertical line which was not present in the original image and that is improved to a larger
extent in this particular image. So, again the image quality has been improved if I increase the
number of Eigen vectors from 5 to 35.
Similarly, if I increase the number of Eigen vectors further, if I go for 50 Eigen vectors; the
image is further improved, 100 Eigen vectors I get further improvement, if I use 128 number of
Eigen vectors, I get still a better reconstructed image. So, this way you will find that if I consider
all the Eigen vectors of the covariance matrix to form the transformation matrix; in that case, the
reconstruction will be a perfect reconstruction.
So here, we have discussed about the K – L transformation where we have said the K – L
transformation is fundamentally defined from the other transformations that we have discussed
earlier that is the discrete Fourier transformation, discrete cosine transformation and so on and
there we have said that in those case of transformations, the transformation matrix or the
transformation kernel is fixed whereas in case of K – L transformation, you will find that the
transformation kernel that is the transformation matrix A which is derived from the covariance
matrix and this covariance matrix actually represents what is the statistical property of the vector
representation of the data.
So here, the kernel of transformation or the transformation matrix is dependent upon the data, it
is not fixed. So, that is the fundamental difference between the other transformations with the K
– L transformation. But the advantage of the K – L transformation that is quite obvious from the
reconstructed images is that the energy compaction property which we have said earlier; here in
this particular case, in case of K – L transformation, the energy compaction property is much
higher than that of any other transformation.
20
Here, you find that in the earlier result that we have shown where using only 1 Eigen vector as
the transformation matrix, this particular result, here, using only 1 Eigen vector still I can
reconstruct the image and I can say what is the content of that image though the reconstruction
quality is very poor. So, it shows that the energy compaction in a Eigen vector, in the number of
components is much higher in case of K – L transform than in case of other transformation.
But as it is quite obvious that the computational complexity for K – L transformation is quite
high compared to the other transformations and in fact that is the reason that despite its strong
property of energy compaction, K – L transformation has not been much popular for data
compression operations. With this, we come to the end of our discussion on transformations.
Now, let us see the answers to our previous days lecture.
So here, it is quite obvious that property of the DCT which makes it popular for image
compression applications is the energy compaction operation, energy compaction property. The
transformation kernels for Walsh transformation, Hadamard transformation has already been
discussed during our lecture.
Significance of modified Hadamard transform: so here, we have said that by modified Hadamard
transform I mean the ordered Hadamard transform. The significance of ordered Hadamard
transform is that here we can correlate the sequence of the signal with the variable u which is
done in case of DFT or in case of DCT where the increasing value of u means increasing value
of frequency components and here equivalent to the frequency component in case of Hadamard
transform, we have defined what is called sequence and the increasing value of u should indicate
increasing value of sequence and that is basically the significance of this Hadamard transform
and we have seen during our last lecture that because of this, here also we can get some energy
compaction property that is most of the energy is confined within few Hadamard coefficients.
21
The fifth problem, it was a problem where you have to find out the Walsh transformation
coefficients of the following samples. This also quite simple from the discussion that we had; if i
simply replace these values in the Walsh transform expressions that we discussed, then we will
get the Walsh transform coefficients.
Now, coming to today’s lecture, questions on today’s lecture: the first question is what is the
fundamental difference between K – L transform and discrete cosine transform. The second
question, in what sense K – L transform is optimal?
Third question is how do you generate the transformation kernel for K – L transform? Fourth
question, what is the role of K – L transform in object recognition? Fifth one, why is it important
to arrange the Eigen vectors in a particular order to form the transform matrix and the last
question why K – L transform is not popular for data compression operation?
Thank you.
22
Prof. P. K. Biswas
Department of electronics & Electrical Communication Engineering
Lecture - 17
Image Enhancement
(Point Processing - I)
Hello, welcome to the video lecture series on digital image processing. During our last few
courses, few lectures; we have talked about various image processing techniques.
So, during past few lectures we have talked about the unitary transformation, then some specific
cases of unitary transformation like discrete Fourier transformation, we have talked about the
discrete cosine transformation, we have talked about discrete Walsh transformation, discrete
Hadamard transformation, K - L transformation and we have seen that the K - L transformation
is fundamentally different from other transformations like DFT or DCT or DWT or DHT in the
sense that for all these transformations, the transformation canal is fixed whereas for K-L
transformation, the transformation canal has to be derived from the image for which the
transformation is to be taken.
Then we have also seen the properties of these different transformation techniques and we have
compared the performance of these transforms with respect to certain results.
Today and for coming few lectures, we will be talking about image enhancement techniques.
First we will see that what is the necessity of image enhancement. Then we will see that image
enhancement techniques fall under 2 broad categories. One of the category is spatial domain
operations. In spatial domain operations, the enhancement techniques work directly on the image
pixels and then these spatial domain operations can have 3 different forms. One is the point
processing, other one is the histogram based processing techniques and the third one is mask
processing techniques.
Of course, histogram based processing technique is also a form of point processing technique.
For these spatial domain operations, we said that we do not do any P processing on the images;
the images are directly operated in their spatial domain to give us the transformed images which
are the enhanced images.
The other category of these image enchantment techniques, they work on normally the discrete
Fourier transformation coefficient of the images. So, they are called as frequency domain
operations and we will see later that there are different operations which can be done in
frequency domain like low pass filtering, band pass filtering, high pass filtering and so on and
then also we have different forms of these different filters.
Now, let us see that what meant by image enhancement. By image enhancement what we mean
is it is a technique of processing an image to enhance certain features of the image. Now, as it is
said that it is for the enhancement of certain features of the image; so obviously, depending upon
which feature we what to enhance, there are different forms of image enhancement techniques.
Some applications may demand that are input images noisy? So, we want to reduce the noise so
that the image becomes better visually. So, reduction of this noise or removal of the noise from
the images is also a form of image enhancement.
In many cases, we have found that the images which are captured by image capturing device say
for example camera, they are very dark and image may become very dark because of various
reasons. So, for such kind of applications, the image enhancement technique may need to
increase the contrast of the image or to increase the intensity of the image. So, far that kind of
application, we will have some other type of image enhancement techniques.
Some applications may need that the applications need that the ages of the objects present in the
image, those should be highlighted. So, again in such cases, the image enhancement techniques
should be able to highlight the ages of the objects present in the image. So, you find that the
image enhancement techniques, these techniques vary depending upon the application, different
types of applications need enhancement of different types of features in the image.
So, the result, the ultimate aim of the image enhancement techniques is such that we want to
process an image so that the result becomes more suitable than the original image for certain
specific applications. So, as we have already said, obviously the processing techniques are very
much problem oriented because different kinds of problem demand enhancement of different
kinds of features in the image. So obviously, the processing techniques will be application
dependent and naturally our technique which is best suitable for this kind of application is not
best suitable for some other kind of applications.
So, a technique for enhancement of x-ray image may not be the best for enhancement of
microscopic images. So, this is broadly what we mean by enhancement of an image and
obviously these are application dependent.
Now, as we have already said that image enhancement techniques fall under 2 broad categories;
the first category is the spatial domain technique where the image enhancement processes, they
work directly on the image plane itself that means these techniques try to directly manipulate the
pixels in the image. The other category of image enhancement techniques is frequency domain
techniques.
So, in case of frequency techniques, first we have to take the Fourier transformation of the
image, then whatever is the Fourier transformation coefficients that we get, you modify those
Fourier transformation coefficients and these modified set of coefficients, you take the inverse
Fourier transform of that to obtain the enhanced image or the modified image as we need.
So first, we will be talking about the image enhancement techniques in the spatial domain. So, let
us see that what are the different spatial domain image enhancement techniques that we can
have. So, as we said that the spatial domain techniques work directly on the image pixels; so
naturally we have to define a transformation function which will transform an image pixel from
the original image to a pixel in the enhanced or processed image.
So, such a function can be defined in this form: we can write that g (x) is equal to some
transformation T into f of x or because in this case, we are dealing with the 2 dimensional
images. So, we will write the expressions as g (x, y) is equal to some transformation T of the
image f (x, y). So, in this case f (x, y) is the original image, T is the transformation which is
applied on this original image to give us the processed image g (x, y).
Now, as we said that in case of spatial domain techniques, you find that this transformation T is
working directly on f (x, y) that is in the spatial domain or in the image plane to give us the
processed image g (x, y) where T is an operator which is to work on the original image f and this
operator is defined over or neighborhood of the point (x, y) in the original image f (x) and later
on we will see that this operator T, this transformation operator T can also operate on more than
one images.
So, for the time being, we are considering the case that where this operator T, the transformation
operator T works on single image and when we want to find out a processed image at location (x,
y); then this operator T works on the original f at location (x, y) considering certain
neighborhood of the point (x, y) to determine what will be the process pixel value at location (x,
y) in the processed image g.
Now, the neighborhood of a point (x, y) is usually a square sub image which is centered at point
(x, y). So, let us look at this particular figure.
Here, you find that we have taken a rectangular image. So, this outer rectangle represents the
image f and within this image f, we have taken pixel at a particular location (x, y). So, this is the
pixel location (x, y) and the neighborhood of this point (x, y) as we said that it is usually a square
sub image around point (x, y). So, this shows a 3 by 3 neighborhood around the pixel point (x, y)
in the image f.
Now, what happens in case of point processing? We said that this operator, the transformation
operator T operates at point (x, y) considering a certain neighborhood of the point (x, y) and in
this particular case; we have shown a neighborhood size of 3 by 3 around point (x, y). Now, for
different applications, the neighborhood size may be different. We can have a neighborhood size
of 5 by 5, 7 by 7 and so on depending upon the type of the image and the type of operation that
we want to have.
Now, in case of point processing, the neighborhood size that is considered is of size 1 by 1. So,
the neighborhood size of a point in case of point processing is of size 1 by 1.
So, that means that this operator T, now works on the single pixel location, so it works on only
that particular pixel location (x, y) and depending upon the value, depending upon the intensity
value at that location (x, y), it determines what will be the intensity in the corresponding location
in the processed image g. It does not consider the pixel values of its neighboring locations.
So, in such cases, we can write the transformation function in the form s is equal to some
transformation T of r where this r is the pixel value in the original image and s is the pixel value
in the corresponding location in the processed image. So, this transformation function, it simply
becomes of this form s equal to T and T of r where s and r are independent pixel values at
different locations. Now, this transformation functions can be put in the form of these 2 figures.
So, in this particular case, the first figure shows a transformation function where you find that
here in this case along the x axis or the along the horizontal axis, we have put the intensity values
r of the original image and along the vertical axis, we have put the intensity values of different
pixels in the process image g and obviously they are related by s equal to T (r) and the
transformation function is given by this particular curve.
So, this is our T of r and in this particular figure as it is shown that the point, so the pixel values
near zero has been marked as dark regions. So, it is quite obvious that in an image if the intensity
values of the pixels are near about 0 that is very small intensity values, those regions appear as
very dark and the intensity values which are higher in an image, those regions appear as light
regions.
So, this first one, the first transformation function shows that in this particular range, a very
narrow range of the intensity values in the original image is mapped to a wide range of intensity
values in the processed image g and effectively, this is the operation which gives enhancement of
the image. In the second figure that we have shown, here you find that this particular
transformation function says that if I consider say this is some intensity value say I; so far all the
pixels in the input image, if the in intensity values are less than I, then in the processed image,
the corresponding pixel will be replaced by a value 0, whereas a pixel where the intensity value
is greater than I, the corresponding pixel in the processed image will have a maximum value.
So, this second particular transformation operation, it actually generates a binary image
consisting of only the low values and the high values and this particular operation is known as
thresholding operation. Now, what happens in case of so, this is the kind of operation that will be
done for point processing.
Now, the other kind of spatial domain operation where the neighborhood size is larger than 1,
say neighborhood size of 3 by 3 or 5 by 5 and 7 by 7 so on; that kind of operations is usually
known as mask operations. So, in case of mask operation, what we have to do is we have to
define a neighborhood around every pixel (x, y) at which point we want to get the intensity value
in the processed image and for doing this, it is not only the intensity value of that particular pixel
but also the intensity values of the pixels around that point which is within the neighborhood of
that point. All of them take part in deciding what will be the intensity value at the corresponding
location (x, y) in the processed image g. So, let us see that how that operation is done.
So here, again we have copied the same 3 by 3 neighborhood what we have seen in our previous
slide. So, if I consider a 3 by 3 neighborhood, then for mask processing what we have to do is we
have we also have to define a 3 by 3 mask and in this particular case, you find that on the right
hand side of the figure, we have defined a mask where the values in the mask are represented as
W minus 1, minus 1 W minus 1, 0 W minus 1, 1 W 0, minus 1 W 0, 0 W 0, 1 W 1, minus 1 W 1, 0 and W 1, 1 . So,
these are the different values which are also known as coefficients which are present in this 3 by
3 mask.
Now, to generate the intensity value at location (x, y) in the processed image, the operation that
has to be done is given by the expression at the bottom where it says that g (x, y) is equal to
double summation w ij into f (x plus i, y plus j) and you have to take this summation over j equal
to minus 1 to 1 and I equal to minus 1 to 1.
So, what does this actually mean? This means that if I place this mask on the image centered at
location (x, y), then all the corresponding pixels under this mask of the image and the
corresponding mask coefficient, they have to be multiplied by together then take the sum for all
such mask locations and what I get, that gives me the processed image the intensity value of the
processed image g at location (x, y).
So, this is what is meant by mask operation and depending upon the size of the mask that we will
want or the size of neighborhood we consider, we have to define the 3 by 3 mask or 5 by 5 mask
or 7 by 7 mask and so on. And, the coefficient values this different W values in the mask that
determine that what kind of image enhancement operations that we are going to do; whether this
will be a image sharpening operation, image averaging operation, s enhancement operations and
so on. All of them depend upon the mask values that is the w ij present in this particular mask.
So, this is the basic difference between the point processing and mask processing and obviously
both these processing techniques fall under the category of spatial domain techniques because in
these cases, we have not considered the discrete Fourier transform coefficients of the original
image which is to be processed.
Now, let us come to the point processing techniques. The first one that we will consider is a point
processing techniques which we call the negative image. Now, in many of the cases, the images
that we get, they contain white or grey level informations embedded in black pixels or very very
dark pixels and the nature of the information is such that we have very few white or gray level
informations present a white background which is very much dark. So, in such cases, finding out
the information from the images, from the raw images, input images becomes very very difficult.
So, in such cases, it is beneficial that instead of considering that raw image, if I just take the
negative of the images. That is all the white pixels that we have that we have in the image or the
larger intensity values that we have in the image, you make them darker and the darker intensity
values, you make them lighter or brighter.
So in effect, what we get is a negative of an image and within this negative image, we will find
through result that visualization or extracting information which we want will be more
convenient than in the original image. So, the kind of transformations that we need in this
particular case is shown in this figure.
So here, we consider that the digital image that we are considering that will have capital L
number of intensity levels represented from 0 to capital L minus 1 in steps of 1. So again, along
the horizontal axis, we have put the intensity values or gray level values of the input image and
along the vertical axis, we have put the intensity values or gray level values of the processed
image and this corresponding transformation function T, now can be represented as s is equal to
T (r) which is nothing but L minus 1 minus r.
So, we find that whenever r is equal to 0, then s will be equal to L minus 1 which is the
maximum intensity value within our digital image and when r is equal to capital L minus 1 that is
the maximum intensity value in the original image; in that case, s will be equal to 0. So, the
maximum intensity value in the original image will be converted to the minimum intensity value
in the processed image and the minimum intensity value in the processed image will be
converted to maximum intensity value in the minimum intensity value in the original image will
be converted to maximum intensity value in the processed image.
So in effect, what we are getting is a negative of the image and graphically, this transformation
can be put in the form of this figure. So, here you find that this transformation is a straight line
with a slope of minus 45 degree and passing through the points (0, L minus 1) and (L minus 1, 0)
in this rs plane. Now, let us see what is the kind of result that we will get by applying this kind of
transformation.
So, here we have shown 2 images. On the left hand side, we have a digital mammogram image
and on the right hand side, we have the negative of this image which is obtained by the
transformation that we have just now discussed. So, you find that in this original image, we have
some white grains and there is a white patch which indicates a cancerous region and this grains
corresponding to the issues corresponding to the tissues, they are not very prominent. I mean it is
very difficult to make out which is what in this original image.
Now, if I take the negative of this particular image; so on the right hand side, we have got this
negative. So here, you find that all the darker regions in the original image has been converted to
brighter regions in this process image and the brighter regions in the original image has been
converted to darker regions in the processed image.
And now, it is very convenient to see what information we can get from this negative image and
this kind of transformation, the negative transformation is very very useful in medical image
processing and as this is just an example which shows that understanding of this particular digital
mammogram image, the negative transformation gives us much more information than that we
have in the original.
And as we said, may be this is the transformation which is best suited for this particular
application but this transformation may not be the best transformation for other kind of
applications.
So now, let us see that what are the other kind of image enhancement techniques that we can
have. The next image enhancement technique is again a very very simple enhancement technique
that we are going to discuss is called contrast stretching. So, we will talk about the contrast
stretching operation.
So, why do we need such contrast stretching? You might have found that in many cases, the
images that we get from an imaging device is very dark and this may happen because of various
reasons. One of the reasons is when you have taken the image of certain object or certain scene,
the illumination of the object or the illumination of the scene was very poor. That means the
object itself was very dark. So, naturally the image has become very dark.
The second reason why an image may be dark is that dynamic range of the sensor on which you
are imaging is very small. Now, what I mean by dynamic range is it is the capacity of the sensor
to record the minimum intensity value and the maximum intensity value. So, the difference
between the minimum intensity value and the maximum intensity value is what is the dynamic
range of the sensor.
So, even if your scene is properly illuminated but your sensor itself is not capable of recording
all those variations in the scene intensity, that also leads to an image which is very very dark.
The another reason which may lead to dark images is that when you have taken the photograph,
may be the aperture of the lens of the camera was not properly set, may be the aperture was very
small so that a very small amount of light was allowed to pass through the lens to the imaging
sensor. So, if the aperture is not properly set, that also leads to an image which is very very dark.
So, for such dark images, the kind of processing techniques which is very suitable is called the
contrast stretching operation. Now, let us see what is the kind of dark image that we can have.
Here, we show an image which is a low contrast image. So, here obviously you find that the
contrast of the image or the intensity of the image is very very poor and overall appearance of
this particular image is very dark and the purpose of contrast stretching is to process such images
so that the dynamic range of the image will be very high, will be quite high so that the different
details in the objects present in the image will be clearly visible.
Now, a typical transformation which may be applied for contrast stretching operation is shown in
this particular figure. So, here you find that in this particular transformation, we have indicated 2
different points; one is (r 1 , s 1 ) that is this particular point and the other 1s point (r 2 , s 2 ) that is
this particular point. Now, it is the locations of these points (r 1 , s 1 ) and (r 2 , s 2 ) which controls the
shape of this transformation function and accordingly, it influences upon that what are the
different types of contrast enhancements that we can obtain in the processed image.
Now, the locations of this (r 1 , s 1 ) and (r 2 , s 2 ) are very very important. You will find that if we
make r 1 equal to s 1 and r 2 equal to s 2 , then the transformation function becomes a straight line
with a slope equal to 45 degree. That means that whatever 1s the intensity image that we have in
the processed image, we will have the same intensity level. That means by applying such a
transformation where r 1 equal to s 1 and r 2 equal s 2 in this transformation function that we have
said, by applying that kind of transformation, the processed image does not undergo any
variation from the original image.
For other values of other combinations of (r 1 , s 1 ) and (r 2 , s 2 ), we really get some variation in the
processed image. So, the values which are mostly used is here you find that if I make the other
extreme, if I make r 1 equal to r 2 and s 1 equal to s 2 and if I make sorry r 1 equal to r 2 and s 1 equal
to 0 and s 2 equal to L minus 1, then that leads to thresholding operation. So, the corresponding
transformation generates the binary image which is the processed image. Now, for enhancement
operation, usually what is used is r 1 less than r 2 and s 1 less than s 2 which gives us a
transformation function as given in this particular figure and this transformation function
generally leads to image enhancement.
Now, the condition that r 1 less than or equal to r 2 , that is very very important. So, the condition,
we have just said that r 1 less than or equal to r 2 and s 1 less than or equal to s 2 ; now, this
particular condition is very very important as you find that if this condition is maintained, then
the transformation function that we get becomes a single valued transformation function and the
transformation is monotonically increasing. So, that is very very important to maintain the order
of the intensity values in the processed image that is an image which is dark in the original image
will remain darker in the processed image and image which is brighter in the original that will a
point which is brighter in the original image, that will remain brighter in the processed image.
But what difference we are going to have is the difference of intensity values that we have in the
original image and the difference of intensity values we get in the process image. That is what
gives us the enhancement. But if it is reversed, if the order is reversed; in that case, the processed
image will look totally different from the original image. And, all the transformations that we are
going to discuss except the negative operation that we have said initially, all of them maintain
this particular property that the order of the intensity values is maintained. That is the transfer
function is monotonically increasing and we will have our transfer function which is single
valued transfer function.
Now, using this particular transfer function, let us see what kind of result we can obtain.
So, in our earlier slide, we have shown an image which is a low contrast image as is shown on
the left hand side of this particular diagram of this particular slide and so this left hand side
image; this is original image which is a low contrast image and by using the contrast
enhancement operation, what we have got is an image which is the processed image shown in the
right hand side and here, you can clearly observe that more details are available in the processed
image than in the original image.
So obviously, the contrast of the processed image has become much much higher than the
contrast in the original image. So, this is a technique which is called contrast stretching technique
which is mostly useful for images where the contrast is very very poor and we have said that we
can get a poor contrast because of various reasons; either the scene illumination was poor or the
dynamic range of the image sensor was very very less or the aperture setting of the camera lens
was not proper and in such cases, the dark images that we get that can be enhanced by using this
contrast stretching techniques.
Now, there are some other kind of applications where we need to reduce the dynamic range of
the original images. Now, the applications where we need to reduce the dynamic range is say for
example I have an original image whose dynamic range is so high that it cannot be properly
reproduced by our display device. See, normally we have a gray level display device or a black
and white display device which normally uses bits. That means it can display intensity levels
from 0 to 255 that is total 256 different intensity levels.
But in the original image if I have a minimum intensity value of say 0 and the maximum
intensity value of say few thousands, then what will happen that because the dynamic range of
the original image is very high but my display device cannot take care of such a high dynamic
range; so the display device will mostly display the highest intensity values and the lower
intensity values will be in most of the cases suppressed and by that, a kind of image that we will
get usually is something like this.
So here, you find that on the left hand side, we have shown an image this is basically the Fourier
transformation that is DFT coefficients of certain image. So, on the left hand side, we have
shown an image, the Fourier coefficients and here you find that only at the center, we have a
bright dot and outside this, the image is mostly dark or mostly black. But actually, there are a
number of intensity levels between the 0 and the minimum that is 0 between this maximum and
the minimum levels but which could not be reproduced by this particular device because its
dynamic range is very poor.
On the right hand side, we have shown the same image after some free processing that is after
reducing the dynamic range of the original image by using the image enhancement techniques
and here you find that in the processed image, in addition to the bright spot at the center, we have
many other coefficients which are visible as you move away from the center. So here, our
application is to compress the dynamic range of the input images and the kind of transformations
which can give us this dynamic range compression is a transformation of this form which is a
logarithmic transformation.
So here again, we assume that r is the intensity of a pixel in the original image and s is the
intensity of the pixel in the processed image and the relation is s is equal to T (r) which is equal
to c log into 1 plus modules of r where this c is a constant. This constant has to be decided
depending upon the dynamic range of your display device and the dynamic range of the input
image which is to be displayed and then log of 1 plus modules of r is taken because otherwise
whenever r is equal to 0 that is an intensity level in the input image is equal to 0, log of 0 is not
defined.
So, to take care of that we take 1 plus modules of r and if you take c log 1 plus modules of r, that
gives a compression of the dynamic range and the image can be properly displayed on a display
where the dynamic range is limited.
A similar such operation, again for enhancement that can be used is called power law
transformation. The power law transformation is normally used for different imaging devices. It
is used for image capturing devices; it is used for image printers and so on. In case of power law
devices, the transformation function between the original image, intensity and the processed
image intensity is given by s is equal to T (r) which is nothing but c into r to the power gamma.
So, in this plot that we have shown, this plot is shown for different values of gamma where c is
equal to 1. So, you find that for value of gamma which is less than 1; this transformation function
usually towards the lower intensity side, it expands the dynamic range of a very small intensity
range in the input image whereas, for a higher intensity side, a higher range of input intensity is
mapped to a lower range of intensity values in the processed image and the reverse is true for
values of gamma which are greater than 1.
Now, for this kind of transformation, the exponent is conventionally represented by the symbol
gamma and that is why this kind of transformation this kind of correction is also known as
gamma correction and this kind of processing is used as I said for different types of display
devices, it is used for different types of printing devices, it is used of ah different types of
capturing devices. The reason is all those devices mostly follow these power law characteristics.
So, if I give an input image that will be converted by power law before the image is actually
produced. Now, should compensate for this power law which is introduced by the device itself, if
I do the reverse operation beforehand; then the actual image that I want to display that will be
displayed properly. Say for example, in case of a CRT display, the relation between the intensity
to voltage that follows the power law with the value of gamma which varies normally from 0.8 to
2.5.
So, if I use the value of gamma equal to 2.5 and if I come to this particular figure, then you find
that with gamma equal to 2.5, this is the curve or this is the transformation function that will be
used. So, whichever image I want to display, the device itself will transform the image using this
particular curve before displaying the particular image and as this curve shows that the image
which will be displayed will normally be darker than the original image that we intend to
display.
So, what we have to do is we have to take some corrective measure before giving the image to
the CRT for the display purpose and because of this correction; we can compensate this power
law so that our image will be displayed properly.

So, coming to this next slide, you find that here we have shown an image which is to be
displayed and the image is on the top left corner. The monitor has a characteristics of power law,
it has a power law characteristics which is given by s equal to r to the power 2.5 and as we said
that because of this power law characteristics, the image will be darker and which is obvious that
the image as displayed on the device is given on the right hand side and you find that this image
is darker than the original image.
So, to compensate for this what I do is before giving this image to the CRT for display, we go for
a gamma correction. That means you transform the image using the transformation function s
equal to r to the power 1upon 2.5. So, by this transformation and if you refer back to our power
law curves you find that the original image now becomes a brighter image. That is the lower
intensity ranges in the input image has now been mapped to a larger intensity range in the
processed image.
So, as a result the image has become brighter and when this brighter image is given to the CRT
display for display operation, the monitor will perform its characteristic power law that is s equal
to r to the power 2.5 and because of this characteristics; the earlier correction, the gamma
correction that we have incorporated that gets nullified and we get the image and now we find
that on the right bottom, this is the actual image now which will be displayed on the CRT screen
and this image now appears to be almost same as the original image that we want to display.
So, this is also a sort of enhancement because if I do not use this kind of correction, then the
image that we are going to display on the CRT screen that will be a distorted image but because
of this power law correction or the gamma correction as it is called, the image that we get on the
CRT screen will be almost same as the original image that we want to display.
Now, this kind of power law transformation, it is not only useful for imaging devices like CRT
display or image printer and so on; similar power law transformations can also be used for
enhancing the images. Now, the advantage that you get in case of power law transformation is
that the transformation curve gets various shapes depending upon different values of gamma and
as we have shown in our previous slides that if the value of gamma is less than 1, then on the
darker side, the lower range of intensity values will be mapped into a larger range of intensity
values in the processed image whereas on the brighter side, a larger range of intensity values will
be mapped into lower range of intensity values in the processed image and the reverse is true
when gamma is greater than 1.
So, by using different values of gamma, I can have different power law transformations and as a
result, what I can have is a controlled enhancement of the input images. So, as it is shown in this
particular case.
Here, you find that on the top left we have shown an Arial image and you find that the most of
the intensity values of this Arial image are on the bright brighter side. So, as result what happens
is you find that most of the portions in this image are almost washed out we cannot get the
details of the image very easily.
Now, if we process this image using the power law transformation, then you find that the other 3
images that is the right top image is obtained by using the power law of transformation with
certain value of gamma. Similarly, the bottom left image using some other value of gamma and
the right bottom image is also obtained by using some other value of gamma and here you find
that for the first image that is right top image has been corrected with a value of gamma which is
less than the value of gamma used for the image shown in the left bottom image which is again
less than the value of gamma used for getting obtaining the image shown in the right bottom
side.
And, as it is quite obvious, in all these cases, you find that the washed out characteristics of the
original image have been controlled that is in the processed image, we can get much more details
of the image content and as you find that as we increase the value of gamma, the image becomes
more and more dark and which is obvious from the power law characteristic, the power law
transformation function plot that we have already shown.
So, this is another kind of processing operation, the power law transformation that can be also
used to enhance some features of the input image. The other kind of transformation that we can
use for this power image enhancement is called gray level slicing. So in case of gray level sizing,
some applications may need that the application may not be interested in all the intensity levels
but the application may be may need the intensity levels only in certain gray level values.
So, in such cases, for enhancement, what we can use is the gray level slicing operation and the
transformation function is shown over here. Here, the transformation function on the left hand
side says that for the intensity level in the range A to B, the image will be enhanced for all other
intensity levels, the pixels will be suppressed.
On the right hand side, the transformation function shows that again within A and B, the image
will be enhanced but outside this range, the original image will be retained and the results that
we get is something like this.

The first image shows that only the desired intensity levels are obtained or retained with
enhancement. All other regions, they have been suppressed. The right hand image shows that the
desired range of intensities have been enhanced but other intensity levels have remained as it is.
So, with this we stop our today’s discussion on point processing. We will continue with this topic
in our next lecture.
Now, let us come to some questions on today’s lecture topic.
So, today’s quiz questions are first question is what is meant by image enhancement? What are
the different types of image enhancement techniques? What is the transformation function to
create a negative image? For what type of images, negative transformation is useful? A captured
image appears very dark because of wrong lens aperture setting. Which enhancement technique
is appropriate to enhance such an image?
Then, what is the use of dynamic range compression of an image? Suggest a transformation
function for dynamic range compression? What is meant by gamma correction?
Thank you.
Prof. Dr. P. K. Biswas
Department of Electronic and Electrical Communication Engineering
Lecture - 18
Image Enhancement (Point Processing - II)
discussed about the various point processing techniques.
(Refer Slide Time 1:07)
So, we have talked about the image enhancement using point processing and under that we have
talked about the various point processing techniques like negative image transformation and in
case of negative image transformation, we have seen that the processed image that we get is a
negative version of the input original image and such processed images are useful in case we
have very few pixels in the original image where the information content is mostly in the white
pixels or gray pixels which are embedded into large regions of dark pixels
So, in such cases, if we take the negative of the image; in that case, the processed imaging
processed image, the information content becomes much more convenient to visualize. The other
kind of point processing techniques that we have discussed is the contrast stretching operation.
In case of contrast stretching operation, we have seen that this kind of contrast stretching
operation is useful where the original image is very dark and we have said that such dark images,
we can have when the scene illumination was very poor or we can also have a very dark image
1
where the dynamic range of the sensor is very small so that it cannot record all the intensity
values present in the scene or the dark images can also be obtained if while image acquisition,
the aperture setting of the camera lens is not proper.
So, for these different kinds of cases, we can have a dark image and contrast stretching is a very
very useful technique to enhance the contrast of such dark images. The other kind of
transformation that we have used for image enhancement is a logarithmic transformation and
there we have said that logarithmic transformation basically compresses the dynamic range of
the input image and these kind of transformation, we have said that it is very very useful when an
image which is to be displayed on a display device but the dynamic range of the input image is
very large which the display device cannot handle. So, for such cases, you go for the logarithmic
transformation which compresses the dynamic range of the input image so that it can be
reproduced faithfully on the display.
Then we have also talked about the other kind of image enhancement techniques power-law
transformation and we have said that this power-law transformation is very very useful for image
display devices, for printing devices as well as for image acquisition devices because by nature,
all these devices provide a power-law transformation of the image that is to be produced;
whether it is on the display or it is on the printer or the image which is to be captured.
So, because the devices themselves transform the image using the power-law transformation,
then if we do not take any action before providing the image to those devices, then the images
which will be produced will be distorted in nature. So, the purpose of this power-law
transformation is you apply a power-law transformation to the input image in such a way that it
compensates the power-law transformation which is applied by the device.
So in effect, what we get is an output image; whether it is on the display or on the printer, will be
a faithful reproduction of the input image. The other kind of image enhancement techniques that
we have discussed about is the gray level slicing operation and we have said that these gray level
slicing operations are useful for applications or the application demands or the application wants
the enhanced values of certain gray levels.
So, there again, we have seen 2 different types of 2 different types of transformation functions.
In one case of transformation function, the transformation enhances all the intensity values
within a given range and the intensity values outside that given range is suppressed or made to 0.
The other kind of gray level slicing transformation that we have said is there within the given
range, the intensity values are enhanced. But outside that particular range, the intensity values
remain untouched. That is whatever is the intensity value is the in the original image, the same
intensity values are reproduced in the processed image or as within the given range, the intensity
values are enhanced.
So, these kinds of applications, these kinds of transformation is very very useful for applications
where the application wants that intensity values within a certain range should be highlighted.
Now, all these different point processing techniques that we have discussed till now, they do not
consider the overall appearance of the image. They simply provide the transformation on a
particular intensity value and accordingly produce the output intensity value.
2
Now, in today’s discussion, we will talk about another approach where the transformation
techniques also take care of the global appearance of the image. So, histogram is such a measure
which provides a global description of the appearance of an image. So today, what we are going
to discuss, the enhancement techniques that we going to discuss; few of them are based on
histogram based processing.
So, in today’s discussion, we will talk about initially what is an histogram, then we will talk
about 2 histogram based techniques, one of them is called histogram equalization and the other
one is called histogram specification or sometimes it is also called histogram matching or
histogram modification. Then apart from this histogram based techniques, we will also talk about
2 more image enhancement techniques.
You remember from our previous discussion that when we have said that a transformation
function T is applied on the original image F to give us the processed image G and there we have
said that this transformation function T transforms an intensity in the input image to a intensity
value in the original image and there we have mentioned that it is not necessary that the
transformation function T will work on a single image, the transformation function T can also
work on multiple images, more than one images.
So, we will discuss 2 such approaches. One approach is image enhancement using image
subtraction operation and the other approach is image enhancement using image averaging
operation. So, first let us start discussion on histogram processing and before that let us see that
what we mean by the histogram of an image.
3
So, to define the histogram of an image we consider that an image is having gray level intensities
in the range 0 to L minus 1. So, we will consider that the digital images that we are talking about,
it will have L number of discrete intensity levels and we will represent those intensity levels in
the range 0 to capital L minus 1.
And, we say that a variables r k represents the k’th intensity level. Now, a histogram is
represented by h (r k ) which is equal to n k where n k is the number of pixels in the image having
intensity level h (r k ). So, once we get the number of pixels having an intensity value h having
intensity value r k and if we plot these number of pixel values, the number of pixels having
different intensity values against the intensity value of that of those pixels; then the plot that we
get is known as a histogram.
So, in this particular case, we will find that because we are considering the discrete images; so
this function the histogram h (r k ) will also be discrete. So here, r k is a discrete intensity level, n k
is the number of pixels having intensity level r k and h (r k ) which is same as n k also assumes
discrete values. In many cases, we talk about what is called a normalized histogram.
So instead of taking a simple histogram as just defined, we sometimes take a normalized

histogram. So, a normalized histogram is very easily derived from this original histograms or the
normalized histogram is represented as p (r k ) is equal to n k by n.
So, as before this n k is the number of pixels having intensity value r k and n is the total number of
pixels in the digital image. So, find that from this expression that p (r k ) equal to n k by n, this p
(r k ) actually tells you that what is the probability of occurrence of a pixel having intensity value
equal to r k and such type of histograms give as we said; information, a global description of the
appearance of an image.
4
So now, let us see that what are the different types of images that we can usually get and what
are the corresponding histograms.
So, here we find that the first image as you see that it is a very very dark image. It is very
difficult to find out what is the content of this particular image and if we plot the histogram of
this particular image, then the histogram is plotted on the right hand side. You find that this plot
says that most of the pixels of this particular image have intensity values which are near to 0.
So here, this particular image because we are considering all the images which are digitized and
every pixel is digitizing digitized even 8 bits; so we will have total 256 number of intensity
levels and those 256 number of intensity levels are represented by intensity values from 0 to 255
and for this particular case, for this particular dark image, you find that most of the pixels have
intensity values which are near to 0 and that gives a very very dark appearance of this image.
Now, let us see a second image.
5
Here, you find that this image is very bright and if you look at the histogram of this particular
image; you find that for this image, the histogram shows that most of the pixels of this image
have intensity values which are near to the maximum that is near value 255 and because of this
the image becomes very bright. Let us come to a third image category.
This is an image where you find that the intensity values are higher than the intensity values of
the first image that you had shown. It is lower than the intensity values of the just previous image
that we have shown. So, this is something in between and the histogram of this particular image
shows that most of the pixels of this image have intensity values which are in the middle range
6
and not only that, the spade of the intensity values of this pixels are also very low, the spade is
very very small.
So, this image appears to be a medium kind of image, it is neither very dark nor very bright. So,
the image is a medium kind of image but at the same time, the variation of the intensity values of
this particular image is very poor and as a result, the image that we have got over here; this
image gives a medium kind of appearance, not very bright neither very low but at the same time,
the variation of intensities is not very clear. That means the contrast of the image is very very
poor. So, let us look at the fourth category image.
So this one, in this image, the histogram plot shows that the intensity values vary from very low
value to very high value that is it has a wide variation from 0 to 255 levels and as a result, the
image appears to be a very very prominent image having low intensity values, high intensity
values and at the same time, if you look at the image, you find that many of the details of the
image are easily visible from this particular image.
So, as we said that the histogram, the nature of the histogram shows that what is the global
appearance of the image of an image and which is also quite obvious from these 4 different types
of images that we have shown; the first one was the gray image which is a dark image, the
second one was bright image, the third one was a medium category image but the contrast of the
image was very poor and we will see that this fourth one is an ideal image at least for the
visualization purpose where the image brightness is proper and at the same time, the details of
the objects present in the image can also be very easily understood. So, this is an image which is
a high contrast image.
So, when we talk about this histogram based processing, most of the histogram based image
enhancement techniques, they try to improve the contrast of the image; whether we talk about the
histogram equalization or the histogram modification techniques.
7
Now, when we talk about this histogram based techniques, this histogram based techniques; the
histograms just give you a description a global description of the image. It does not tell you
anything about the content of the image and that is quite obvious in these cases. Just by looking
at the histogram, we cannot say that what is the content of the image.
We can just have an idea of what is the global appearance of that particular image and histogram
based techniques try to modify this histogram of an image to have an image to appear in a
particular way; either dark or bright or the image contrast is very high and depending upon the
type of operations that we do using these histograms, we can have either histogram equalization
operation or we can have histogram modification operation.
So now, let us see that once we have given that what is a histogram and what does the histogram
tell us, let us see that how these histograms can be processed to enhance the images. So, the first
one that we will talk about is the image equalization or histogram equalization operation.
So, for this histogram equalization operation, initially we will assume that r to be a variable
representing the gray level in an image. So, this r represents the gray level in an image and for
the time being, we will also assume that the pixel values in an image are continuous and they are
normalized in the range 0 to 1. So, we assume the normalized pixel values and the pixel values
can take values in the range 0 to 1 where 0 indicates a black pixel, so 0 indicates a black pixel
and 1 indicates a white pixel.
Later on, we will extend our ideas to discrete formulation when we will consider the pixel values
in the range 0 to capital L minus 1 where L is the number of gray level, discrete gray levels
present in the image. Now, as we said that for point possessing, we are interested to find out a
transformation where the transformation is of the form s is equal to T (r) where r is the intensity
8
in the original image and s is the intensity in the process remains or the transformed image or the
enhanced image.
Now this T, the transformation function has to satisfy 2 conditions. Firstly the T (r) has to be
single valued and it has to be a monotonically increasing in the range 0 to 1. So, the first
condition is T (r), it must be single valued and monotonically increasing in the range 0 less than
or equal to r less than or equal to 1 and the second condition that T (r) must satisfy is 0 should be
less than or equal to t (r) which should be less than or equal to 1 for 0 less than or equal to r less
than or equal to 1.
Now, the first condition is very very important because it maintains the order of the gray levels
in the processed image. That is a pixel which is dark in the original image should remain darker
in the processed image; a pixel which is brighter in the original image should remain brighter in
the processed image. So, the intensity ordering does not change in the processed image and that
is guaranteed by the first condition that is T (r) should be single valued and monotonically
increasing in the range 0 to 0 to 1 of the values of the r. The second condition that is 0 less than
or equal to T (r) less than or equal to one this is the one which ensures that the processed image
that you get, that does not leads to a pixel value which is higher than the maximum intensity
value that is allowed.
So, this ensures that the processed image will have pixel values which are always within the
available minimum and maximum range and it can be found that if these conditions are satisfied
by T (r); then the inverse that is r is equal to the inverse of this that is r is equal to T inverse of s
will also satisfy these 2 conditions. So, we want a transfer function T which will satisfy these
conditions and if these conditions are satisfied by T (r), then the inverse transformation will also
satisfy this particular condition.
Now, let us say how the histograms help us to get a transfer function function of this form?
9
So now, we assume, so as we said that we assume that the images assume the intensity values,
normalized intensity values in the range 0 to 1 and as we said that r is an intensity value in the
original image, s is an intensity value in the processed image. We assume p r (r) to be the
probability density function of r where r is the variable representing intensity values in the
original image and we also assume p s (s) to be the PDF or probability density function of s
where s is a variable representing intensity values in the processed image. So, these are the 2
probability functions PDFs that we assume.
Now given this, from elementary probability theory we know that if p r (r) and the transformation
function T (r), they are known and T inverse s is single valued and monotonically increasing, T
inverse s is single valued and monotonically increasing. Then we can obtain the PDF of s that is
p s (s) is given by p r (r) into dr ds where at r equal to T inverse s.
So, this is what is obtained from elementary probability theory that if we known p r (r) and we
also know T (r) and T inverse s, a single valued and monotonically increasing; then p s (s) can be
obtained from p r (r) as p s (s) is equal to p r (r) into dr ds.
Now, all the histogram processing techniques, they try to modify the probability density
function, PDF p s (s) so that the image gets a particular appearance and this appearance is
obtained where the transformation function T (r). So now, what is that type of T (r), the
transformation function T (r) that we can have? So, let us consider a particular transformation
function.
Say, we take a transformation function of this form say s is equal to T (r) is equal to integral
prwdw where the range of integration varies from 0 to r and r varies in the range 0 to 1. So, we
find that this integral gives the cumulative distribution function of the variable r. Now, if I take T
(r) of this particular form, then this particular T (r) will satisfy all the conditions, both the
10
conditions that we have stated earlier. And from this, we can compute ds upon dr which is
nothing but p r (r).
So, by substitution in our earlier expression, you find that p s (s) as we have said is nothing but p r
(r) into dr ds, this we have said earlier, this is obtained from elementary probability theory and in
this particular case, this will be p r (r) into 1 upon p r (r) which will be equal to 1. So, we find that
if we take this particular transformation function which is nothing but cumulative distribution
function of the variable r; then using this transformation function, the transformation that we get
generates an image which has a uniform probability density function of the intensity values s.
And, we have seen earlier that an image, high contrast image have a probability distribution
function or has a histogram which has intensity values pixels having intensity values over the
entire range 0 to 255 of the pixel values. So, if I go for this kind of transformation, as we are
getting an uniform probability distribution function, probability density function of the processed
image; then this is expected, then this is what is going to enhance the contrast of the image and
this particular result is very very important that p s (s) is equal to 1 and you find that we have
obtained this result irrespective of T inverse s and that is very very important because it may not
always be possible to obtain T inverse analytically.
So, whatever be the nature of T inverse s, if we take that cumulative distribution function of r
and use that as the transformation function T (r); then the image is going to be enhanced. So, this
simply says that using CDF, the cumulative distribution function as the transformation function,
we can enhance the contrast of an image and by this contrast enhancement what you mean is the
dynamic range of the intensity values is going to be enhanced.
Now, what we have discussed till now, this is valid for the continuous domain. But the images
that we are going to consider, all the images are discrete image. So, we must have a discrete
formulation of whatever derivation that we have done till now.
So now, let us see that how we can have a discrete formulation of these derivations.
11
So, for discrete formulation, what we have seen earlier is that p r (r k ) is given by n k divided by n
where n k is the number of pixels having intensity value r k and n is the total number of pixels in
the image. And a plot of this p r (r k ) for all values of r k gives us the histogram of the image. So,
the technique to obtain the histogram equalization and by that the image enhancement will be;
first we have to find out the cumulative distribution function the CDF of r k and so we will get s k
which is given by T (r k ) and this T (r k ) now is the cumulative distribution function which is p r of
say r i where i will vary from 0 to k and this is nothing but sum of n i by n where i will vary from
to 0 to k. The inverse of this is obviously r k is equal to T inverse of s k for 0 less than or equal to
s k less than or equal to 1.
So, if I use this as a transformation function, then the operation that we get is an histogram
equalization and as we have said that this histogram equalization basically gives us a transformed
image where the intensity values have an uniform distribution and because of this, the image the
processed image that we get appears to be a high contrast image.
So, let us see that what are the results that we can get using such a kind of histogram equalization
operation?
12
So here, on the left hand side, we have an image and it it is obvious that the contrast of this
image is very very poor. On the right hand side we have shown the histogram of this particular
image and here again you find that from this histogram that most of the pixels in this particular
image have intensities which are very close to 0 and there are very few pixels in this image
which are intensities having higher values.
By this histogram equalization, the image that you get is shown on the bottom and here you find
that this image obviously has a contrast which is higher than the previous image because many of
the details in the image are not very clear in the original image whereas those details are very
clear in this second image and on the right hand side, we have shown the histogram of this
processed image and if you compare these 2 histograms, you will find that the histogram of this
processed image is more of equalization we can have such a kind of enhancement.
13
This shows another image, again processed by histogram equalization. So on the top, you find
that this is the image of a part of a car and because of this enhancement, not only the image
appears to be better but if you look at this number plate; you find that in the original image, the
numbers are not readable whereas in this processed image I can easily read this number say
something like FN 0968. So, this is not readable in the original but it is readable in the processed
image and the histogram that I get of this particular image which is almost which is near to be
uniform.
So, this is one kind of histogram based processing technique that is histogram equalization which
gives enhancement of the contrast obtained. Now, though this gives an enhancement, it gives
contrast enhancement but histogram equalization has got certain limitation. First of the limitation
is using this histogram equalization; whatever image you get, the equalized image you get that is
fixed. I cannot have any interactive manipulation of the image.
So, it generates only single processed image. Now, to overcome this limitation if some of the
applications, if some application demands that we want to enhance only certain region of the
histogram, we want to have the details within certain region of the histogram; note what is given
by the histogram equalization process, then the kind of technique that should be used is what is
called histogram matching or histogram specification techniques.
14
So in case of histogram specification techniques, what we have to have is we have to have a

target histogram. So, we have to have a target histogram and the image has to be processed in
such a way that the histogram of the processed image becomes same as that of the target
histogram.
Now, to say how we can go for such a type of histogram specification or histogram matching or
histogram modification; initially, we assume that we have again we have 2 variables. One is
variable r representing the continuous gray levels in the given image and we assume a variable z
representing intensities in the processed image. r is the intensities in the original image and z
represents the intensities in the processed image where this is specified in the form of the
probability distribution function p z (z). So, this p z (z) specifies r target histogram and from the
given image r, we can obtain p r (r) that is the histogram of the given image. So, this we can
obtain from the input image whereas p z (z) that is target histogram is specified.
Now, for this histogram matching, what we have to do is if I equalize the given image using the
transformation function s is equal to T (r) as we have seen earlier is equal to p r (w) dw within
range 0 to r; so if I equalize the given image using this particular transformation function, then
what I get is an image having intensity values with probability distribution function, probability
density function which is uniform.
Now, using this p z (z), we compute the transformation function G (Z). So, this G (Z) will be
obtained as integration p z (z) sorry p z (t) into dt in the range 0 to z and then from these 2
equations, what we can have is G (z) is equal to T (r) that is equal to s and this gives Z equal to G
inverse s which is equal to G inverse T(r).
So, you find that the operations that we are doing is firstly, we are equalizing the given image
using histogram equalization techniques, we are finding out the transformation function G (z)
from the histogram from the target histogram that has been specified, then this equalized image
15
is inverse transformed using the inverse transformation G inverse s and the resultant image by
doing this operation, the resultant image that we will get that is likely to have an histogram
which is given by this target histogram ah p z (z).
So, our procedure is; first equalize the original image obtaining the histogram from the given
image, then find out the transformation function G (z) from the target histogram that has been
specified, then do the inverse transformation of the equalized image using not T inverse but
using the G inverse and this G inverse has to be obtained from the target histogram that has been
specified and by doing this, the image that you get becomes an histogram modified image, a
processed modified image, processed image whose histogram is likely to be same as the
histogram that has been specified as the target histogram.
So, again this is a continuous domain formulation but our images are digitals, so we have to go
for a discrete formulation of these derivations. So now, let us see that how we can this particular
formulation.
So again, as before we can find out s k which is equal to Tk T (r k ) which is equal to sum of n i by
n where n varies from sorry where r varies from, i varies from 0 to k and this we obtained from
the given image, from the input image and from the target histogram that is specified that is p z
(z), we get a transformation function say V k equal to G (Z k ) which is equal to sum of p z (z i )
where now i varies from 0 to k and we set this equal to s k and this has to be for k equal to 0, 1
upto L minus 1 and then finally we obtain the processed image as the inverse of or G inverse of
T (r k ).
So, this is the discrete formulation of the continuous discrete formulation of the continuous
domain derivations that we have done earlier. Now, let us see that using this, what kind of
operations that we have.
16
So here, it shows a transformation function T (r) is equal to T (r) on left hand side which is
obtained from the given image and using the target histogram, we obtain the function G (z). So,
this function T (r) gives the value S k for a particular intensitive value r k in the given image. The
function G (z), it is supposed to give an output value V k for an input value Z k .
Now coming G (z), you find that Z k is the intensity value which is not known to us. We want to
find out Z k from r k . So, the operation that will be doing for this is whatever S k that we get from
r k , we set that S k to this second transformation and now you do the inverse transformation
operation. So, as shown in the next slide, we set S k along the vertical axis of this V equal to G (z)
transformation function, then you do the inverse transformation that is from S k you come to Z k .
So, what we have to apply is an inverse transformation function to get the value Z k for a given
intensitive value r k in the original image.
Now, conceptually or graphically, this is very simple. But the question is how to implement it?
Here we find that in the continuous domain, we may not get analytical solutions for G inverse.
But in the discrete domain, the problem becomes simpler because we are considering we are
dealing with only discrete values.
So, in case of discrete domain, this transformation function that is r k to s k or S equal to T of r or

Z k to or Z k to V k that is V equal to G (z), these transformations can be implemented by simple
look of tables. So, by this what I mean is something like this.
17
T (r) is represented by an array where for r k , the k indicates an index into an array and the
element in that particular array location gives us the value S k . So, whenever a value r k is
specified using k, immediately go to this particular array r and the content of that array location
gives us what is the corresponding value S k .
Similarly, for V equal G (z k ), V k equal to G (z k ); we have similar operation that if Z k is known, I

can use k as an index, go to the array z, then I get the corresponding value V k . Now, in the first
case, it is very simple; I know what is the value of r k , so I can find out what is the corresponding
value of S k from this array but the second one is an inverse operation. I know S k or as we have
equated S k to V k , I know what is V k . Now, from this V k , I have to find out what is the
corresponding value Z k . So, this is an inverse problem and to solve this problem, we have to go
for an iterative solution approach.
18
So, an iterative solution, we can obtain in this form. We know that G (z k ) is equal to S k . So, this
gives G (Z k ) minus S k , this is equal to 0. So, our approach will be to iterate on the values of Z k
to get a solution on this and this has to be done for k equal to 0, 1 upto L minus 1. So, what we
should do?
The closest solution can be that we initialize z to a value say Z hat. So, we initialize Z k is equal
to Z hat for every value of k where this Z hat is the smallest integer, Z hat is the smallest integer
which satisfies G (z) hat minus S k greater than or equal to 0. So, our approach can be that we
start with a very small value, the smallest integer of Z hat, then go on incrementing Z hat by 1 at
every step until and unless this condition is satisfied. So, when this condition is satisfied, then the
value of Z hat will get that is the Z k corresponding to this given value S k .
So now, let us stop our discussion today. We will continue with this topic in our next class.
19
Now, coming to the questions of today’s lectures; the questions are first, what is a histogram?
Give the transformation function of histogram equalization technique? What should be the nature
of the histogram of a histogram equalized image? Suppose, a digital image is subjected to
histogram equalization, what effect will a second pass equalization have over the equalized
image? What is histogram specification technique? What condition must be satisfied by the
target histogram to be used in histogram specification technique?
Thank you.
20
Hello, welcome to the video lectures series on digital image processing. For last few classes, we
have started our discussion on image enhancement techniques.
So, in the previous class, we have seen what is meant by histogram; we have seen how the global
appearance of an image is reflected in its histogram, we have seen that histogram based
enhancement techniques aims at modifying the global appearance of an image by modifying its
histogram. Then we have started discussion on histogram equalization technique and histogram
specification or histogram matching techniques.
21
So today’s class, we will talk about some implementation issues of histogram equalization and
histogram specification techniques and we will talk about this implementation issues with respect
to some examples. Then we will also compare the performance of histogram specification and
histogram equalization techniques with the help of some results obtained on some images. Then
lastly, we will talk about two more point processing techniques for histogram equalization; one
of them is histogram subtraction and other one is histogram averaging techniques. So now, let us
briefly recapitulate what we have done in the last class.
As we have said that histogram of an image that indicates what is the global appearance of an
image. We have also seen these images in the last class but just for a quick recapitulation; you
will find that on the left hand side, we have shown an image which is very dark and we call this
22
as the dark image and on the right hand side, we have shown the corresponding histogram and
you will find that this histogram shows that most of the pixels in this particular image, they are
having an intensity value which is near about 0 and there is practically no pixel having higher
intensity values and that is what gives this particular image a dark appearance.
Then the second one that we have shown is a bright image or a light image and again from this
particular histogram, you will find that most of the pixels in this particular image have intensity
values which are near to maximum value that is 255 in this particular case and since we are
talking about all the images in our application which are quantized where every pixel is
quantized with 8 bits, so the intensity levels will vary from 0 to the 255.
So, in our case, the minimum intensity of a pixel will be 0 and the maximum intensity of a pixel
will be 255. So, in this particular example, you will find that the intensity of the images as this
histogram shows that most of the pixels have intensity which are …. ((55:50))
23
What is the effect what effect will a second pass equalization have over the equalized image? So,
as we have already mentioned that once an image is histogram equalized, the histogram of the
processed image will be a uniform histogram. That means it will have a uniform probability
density function and if I want to equalize this equalized image, then you will find that the
corresponding transformation function will be a linear one where the state line will be inclined at
an angle of 45 degree with the x axis.
So, that clearly indicates that whatever kind of equalization you do over an already equalized
image. That is not going to have any further effect on the processed image. So, this is ideal case
but practically we have seen that after equalization, the histogram that you get is not really
uniform. So, there will be some effect in the second pass the effect but the effect may be
negligible.
Sixth one is again a tricky one. What condition must be satisfied by the target histogram to be
used in histogram specification technique? You will find that in case of histogram specification
technique, the target histogram is used for inverse transformation that is G inverse. So, it must be
it must be true that the transformation function G has to be monotonically increasing and that is
only possible if you have the value of p z (z) non 0 for every possible value of z. So, that is the
condition that must be the satisfied by the target histogram.
24
Now, coming to today’s questions; first one is explain why the discrete histogram equalization
technique does not in general yield a flat histogram. The second, an image has a gray level PDF
p r (r) as shown here and the target histogram as shown on the right. We have to find out the
transformation in terms of r and z that is what is the mapping from r to z.
The third question we have given the probability density functions to probability density
functions, again you have to find out the transformation between r and z.
Thank you.
25
Prof. P. K. Biswas
Lecture - 19
Image Enhancement Point Processing-III
Hello, welcome to the video lecture series on digital image processing. For last few
classes we have started our discussion on image enhancement techniques.
So, in the previous class, we have seen what is meant by histogram; we have seen how
the global appearance of an image is reflected in its histogram, we have seen that
histogram based enhancement techniques aims at modifying the global appearance of an
image by modifying its histogram. Then we have started discussion on histogram
equalization technique and histogram specification or histogram matching techniques.
1
So today’s class, we will talk about some implementation issues of histogram

equalization and histogram specification techniques and we will talk about this
implementation issues with respect to some examples. Then we will also compare the
performance of histogram specification and histogram equalization techniques with the
help of some results obtained on some images.
Then lastly, we will talk about two more point processing techniques for histogram
equalization; one of them is histogram subtraction and other one is histogram averaging
techniques. So now, let us briefly recapitulate what we have done in the last class.
2
As we have said that histogram of an image that indicates what is the global appearance
of an image. We have also seen these images in the last class but just for a quick
recapitulation; you will find that on the left hand side, we have shown an image which is
very dark and we call this as the dark image and on the right hand side, we have shown
the corresponding histogram and you will find that this histogram shows that most of the
pixels in this particular image, they are having an intensity value which is near about 0
and there is practically no pixel having higher intensity values and that is what gives this
particular image a dark appearance.
3
Then the second one that we have shown is a bright image or a light image and again
from this particular histogram, you will find that most of the pixels in this particular
image have intensity values which are near to maximum value that is 255 in this
particular case and since we are talking about all the images in our application which are
quantized where every pixel is quantized with 8 bits, so the intensity levels will vary from
0 to the 255.
So, in our case, the minimum intensity of a pixel will be 0 and the maximum intensity of
a pixel will be 255. So, in this particular example, you will find that the intensity of the
images as this histogram shows that most of the pixels have intensity which is near 255
that is maximum value.
Then, the next image shows that here the image has got pixels having intensity value in
the middle range but the range of intensity value is very is narrow. So, as a result, the
image is neither very bright nor very dark but at the same time because dynamic range of
the intensity values is very low, the image contrast is very poor.
4
So, as the next slide shows which we call a high contrast image where you find that most
of the details of the objects present in the image are visible and by looking at the
corresponding histogram, we find that the pixels in this particular image have wide range
of intensity values starting from very low value which is near about 0 to the maximum
value which is near about 55.
So, this says that we will said that a particular image has a high contrast if its intensity
values, the pixel intensity values have a wide range of values starting from a very low
value to a very high value. So, all these 4 examples tell us that how the global appearance
of an image is reflected in its corresponding histogram and that is why all the histogram
based enhancement techniques, they try to adjust the global appearance of the image by
modifying the histogram of the corresponding image.
So, the first technique of this histogram based enhancement that we have discussed in the
last class is called histogram equalization. So, let us quickly review what we mean by
histogram equalization.
5
So here, in case of histogram equalization, if I consider a discrete case; in a discrete case

we have seen that the histogram of an image is given in the form like this that is p r (r k )
where r k is an intensity level present in the image and which is given by summation of n i
by n, i varying from 0 to k where n k is the number of pixels with intensity value is equal
to r k sorry this is not summation p r (r k ) is given by n i by n k .
So, as this expression suggests that this particular expression tells us what is the
probability of a pixel having a value r k present in the image and plot of all these p (r k )
values for different values of r k defines what is the histogram of this particular image.
Now, when you talk about the histogram equalization, the histogram equalization
technique makes use of this histogram to find out the Transformation function between a
intensity level in the original image to an intensity level in the processed image and that
transformation function is given by S k is equal to, so transformation function we
represent by T (r k ) which is given by summation of n i by n where i varies from 0 to k
and which is nothing but summation of p r (r i ) where i varies from 0 to k.
So, this is the Transformation function that we get which is to be used for histogram
equalization purpose. Now, find that in this particular case, because the histogram which
is defined that is p r (r k ) equal to n i by n, it is a normalized histogram. So, every value of
p r (r k ) will be within the range 0 to 1. And similarly, this transformation function that T
(r k ) when it gives us a value S k corresponding to an intensity level in the input image
which is equal r k , the maximum value of S k also in this particular case will vary from 0
to 1.
So, the minimum value of the intensity as suggested by this particular expression will be
0 and the maximum value of the intensity will be equal to 1. But we know that when we
are talking about the digital images, the minimum intensity of an image can be a value 0
6
and the maximum intensity can have a value r L minus 1, s k varies from 0 to L minus 1
and in our discrete case, this r L minus 1 is equal to 255 because in our case, the intensity
values of different images are quantized with 8 bits.
So, we can have an intensity varying from 0 to 255 whereas, this particular
Transformation function that is s k equal to T (r k ), this gives us a maximum intensity
value s k in the processed image which is equal to 1. So for practical implementation, we
have to do some sort post processing so that all these s k values that you get in the range 0
to 1 can now be mapped to the maximum dynamic range of the image that is from 0 to
255 and the kind of mapping function that we have to use is given by say I can write it as
s dash is equal to say integer value because we will getting only integer values into s
minus s minimum divided by 1 minus s minimum into L minus 1 where L minus 1 is the
maximum intensity level plus you give DC a shift of 0.5.
So, whatever value of s we get by this Transformation s k equal to T (r k ), that value of s

has to be scaled by this function to give us an intensity level in the processed image
which varies from 0 to maximum level that is 0 to capital L minus 1 and in our case, this
capital L minus 1 will be equal to a value 255. Now, let us take an example to illustrate
this.
7
Suppose, we have an input image having 8 discrete intensity values that is r varies from
0, 1, 2 upto say 7. So, we have 8 discrete intensity values. Similarly, the processed image
that you want to generate, that will also have 8 discrete intensity values varying from 0 to
7.
Now suppose, the probability density functions or the histogram of the input image is
specified like this; so it is given that p r (1) that is the probability that an intensity value
will be equal to 1 sorry p r (0) the probability that an intensity value will be equal to 0 is
equal to 0, p r (1) that is probability that intensity value will be equal to 1 is same as p r (2)
which is given as say 0.1, p r (3) that is given as 0.3, p r (4) is equal to p r (5) which is
given equal to 0, p r (6) is given as 0.4 and p r (7) is given as 0.1.
Now, our aim is that given this histogram of the input image, we want to find out the
transformation function T (r) which will map such an input image to the corresponding
output image and the output image will be equalized. So to do this what we have to do is
we have to find out the mapping function T (r).
8
So, this mapping function, we can generate in this form: let us have all these values in the
form of a table. So, I have this r, r varies from 0 to 7. The corresponding p r , the
probability values are given by 0, 0.1, 0.1, 0.3, 0, 0, 0.4 and 0.1. Then obviously, from
this probability density function, we can compute the transformation function T (r) which
is nothing summation of say p r (i) where i varies from 0 to r.
So, if we compute the transformation function, we will find that the transformation
function comes out to be like this; this is 0, this is 0.1, here it is 0.2, here it is 0.5 because
the next 2 probability density function values are 0, so it will remain as 0.5, this will also
remain as 0.5 then this will be 0.9 and here I will get 1.0.
So, this is the Transformation function that we have. So, this means that if my input
intensity is 0, this transformation function will give me a value s; this is nothing but the
value say s k which will be equal to 0. If the intensity is 1, input intensity is 1, the output
s k will be equal to 0.1. If the input intensity is 2, output s k will be equal to 0.2.
Similarly, if the input intensity is 6, the output intensity value will be 0.9 but naturally
because the output intensities has to vary from 0 to the maximum value which is equal to
7; so we have to scale this particular function, this particular s values the cover this entire
range of intensities and for that we use the same expression, the same mapping function
as we have said that s dash equal to integer of s minus s min divided by 1 minus s min
and in this particular case, L minus 1 equal to 7 plus 0.5.
So, doing this calculation and taking the nearest integer value whatever we get that will
be my reconstructed intensity level. So, if I do this, then you will find that for all these
different values of s, the reconstructed s dash will be for r equal to 0 the reconstructed s
dash will be equal to 0. For r equal to 1, the reconstructed s dash will also be equal to 1;
9
for r equal to 2, the reconstructed s dash will also be equal to 2 but for r equal to 3, so in
this case r equal to 3, I get s equal to 0.5 and minimum s is 0. So, this becomes 0.5 and
denominator is also equal to 1; so 0.5 into 7, that gives you 3.5 plus 0.5 which is equal to
4. So, when my input intensity is 3, the corresponding output intensity will be equal to 4.
Similarly, for r equal to 4, the output intensity will also be equal to 4; for r equal to 5, the
output intensity will also be equal to 4. For r equal to 6, now the output intensity if you
calculate following the same relation, it will come out to be 7; for r equal 7, the output
intensity will also be equal to 7.
So, this first column that is for different values of r and the last column that is the
different values of s; so this first column and the last column, these gives us the
corresponding mapping between the given intensity value to the corresponding output
intensity value and this is the image which is a processed image or the enhanced image
which is to be displayed.
So, this is how the histogram equalization operations have to be done and we have seen
in the last class that using such histogram equalization operations, we have got the results
which are given like this.
So here, we have shown on image which is a very very dark image and on the right hand
side, we have the corresponding histogram and once we do histogram equalization, then
what we get is an equalized image or the processed images and on the bottom row, you
find that we have a brighter image which is the histogram equalized image and on the
right side we have the corresponding histogram.
10
And, as we have mentioned in our last class that whenever we are going for histogram
equalization; then the probability density function of the intensity values of the equalized
image, they are ideally normal ideally uniform distribution.
In this particular case, you will find that this histogram of this equalized image that we
have got, this is not absolutely uniform. However, this is near to uniform. So, that
theoretical derivation which shows us that the distribution value, intensity distribution
will be uniform that is the theoretical one.
In practical case, in discrete situations, in most of the cases we do not get a uniform
probability distribution, a uniform intensity distribution. The reason being that in discrete
cases, there may be situation that many of the allowed pixel values will not be present in
the image and because of this the histogram that you get or the intensity distribution that
you get in most of the cases, they will not be uniform.
So, this shows us the cases of histogram equalization. Now, let us come to the case of
histogram specification or histogram modification as it is called.
So, we will talk about histogram specification. So, as we have told in our last class that
histogram equalization is an automated process; so whatever output you get, whatever
process image you get by using the histogram equalization techniques that is fixed and
histogram equalization techniques is not suitable for interactive image manipulation
whereas, interactive image manipulation or interactive enhancement can be done by
histogram specification techniques
So, in histogram specification techniques what we do is we have the input image, we can
find out what is the histogram of that particular image, then a target histogram is
11
specified and you have to process the input image in such a way that the histogram of the
processed image will resemble, will be close to the target histogram which is specified.
So, here we have 2 different cases. Firstly, we have p r (r k ) which has we had seen is
nothing but in n k divided by n where n k is the number of pixels in the given image with
an intensity value equal to r k and this we compute from the give image which is to be
processed. And, we have a target histogram which is specified which is to be so that or
processed image will have a histogram which is almost close to the target histogram and
the target histogram is specified in the form p z (z k ). So, this is the target histogram which
is specified.
You note that we do not have the image corresponding to this particular histogram. So, it
is the histogram which is specified and we use the subscript p r (r k ) and p z (z k ); so this
subscripts r and z, they are used to indicate that these 2 probability distribution
probability density functions p r and p z , they are different.
So, in case of histogram specification, what we have to do is the process is done in this
manner; firstly using this p r (r k ), you find out the transformation function corresponding
to equalization and that transformation function as we have seen earlier is given by s k
equal to T of r k which is nothing but sum of n i by n where i varies from 0 to k and which
is obviously equal to p r (r i ), i varying from 0 to k.
So, this is a transformation function that is computed from the histogram p r (r k ) which is
obtain from the given image and to obtain this histogram specification, the process is like
this; you define a variable say Z k such that which will follow this property, we will have
the transformation function say V k equal to G (z k ) and that will be equal to we can
compute this from the specified histogram which is given in the form of p z say (z i ) where
i varies from 0 to k and we define these to be equal to say s k .
So, you find that this intermediate stage that is V k equal to G (z k ) where this
transformation function G (z k ) is given by p z (z i ) summation i from 0 to k, this is a
hypothetical case because we really do not have the image corresponding to the specified
histogram p z p z (z k ). Now once I get this, to get the reconstructed image or the processed
image; I have to take the inverse transformation.
So here, you find that as we have defined that for this particular Z k , we have V k equal to
G (z k ) which is equal s k and G (z k ) is computed in this form and from here to get the
value Z k , the intensity value Z k in the processed image; we have to take the inverse
transformation but in this case the inverse transformation is not taken with respect to T
but the inverse transformation has to be taken with respect to G. So, our Z k in this case
will be equal to G inverse of s k .
So, what we have to do is for the given image, we have to find out the transformation
function T (r) corresponding to the histogram of the given image and using this
transformation function, every intensity value of the input image which is r k has to be
mapped to an equalized intensity value which is equal to S k . So, that is the first step.
12
The second step is from the specified histogram p z we have to get a transformation
function G and then the s k that we have obtained in the previous step that has to be
inversed transformed using this transformation function G to give you the processed
image intensity value in the processed image which is equal to Z k .
Now, so far we have concerned, we have discussed that finding out T (r k ) is very simple
that is this forward process is very simple but the difficulty comes for getting the inverse
transformation that is G inverse. It may not always be possible to get analytical
expressions for T and G and similarly, it may not always be possible to get an analytical
expression for G inverse.
So, the best possible way to solve this inverse process is to go for an iterative approach.
Now, let us see that what does this entire expression mean.
So here, we have shown the same formulations graphically. On the left hand side, we
have the transformation function S equal to T (r) and this Transformation function is
obtained from the histogram of the given image and on the right side, we have given the
transformation function V equal to G (z) and this Transformation function has to be
obtained from the target histogram that is specified.
Now, once we have these 2 transformation functions, you will find that these 2
transformation functions straightway tell you that given an r k , I can find out what is the
value of s k ; given Z k , I can find out what is the corresponding value of V k . But the
problem is Z k is unknown, we do not know what is the value of Z k ; this is the one that we
have to find out by using the inverse transform G inverse.
13
So, the process as per our definition; since we have seen that V k equal to S k , so what we
do is for the given image r k , we find out the corresponding value S k by using this
transformation S equal to T (r) and once we get this, then we set this V k equal to S k that
is from the first transformation function, we come to the second transformation function.
So, we set V k equal to s k and then find out Z k in the reverse direction. So, now our
direction of the transformation is reverse and we find out Z k from this value of S k using
this transformation curve G (z) but as we have mentioned that it may not always be
possible to find out the analytical expressions for r and G; so though the method appears
to be very simple but its implementation is not that simple. But in the discrete domain,
the matter, this particular case can be simplified. It can be simplified in the sense that
both these transformation functions that is S equal to T (r) and V equal to G (z), they can
actually be implemented in the formation of arrays. So, the arrays are like this.
So, I have an array r as shown in the top where the intensity value of the input image r k is
to be taken as an index to this particular array and the content of this array element that is
S is the value the corresponding value S k . Similarly, the second Transformation function
G z that can also be implemented with the help of an array where Z k is an index is an
index to this array and the corresponding element in the array gives us the value of V k .
Now, you find that using these arrays the forward Transformation is very simple. When
we want to find out S k is equal to Transformation function T (r k ); what we do is using
this r k , you simply come to the corresponding element in this particular array, find out
what is the value stored in that particular location and that value gives you the value of
Sk.
But the matter is not so simple when we go for the inverse transformation. So, for inverse
Transformation what you have to do is we have to find out a location in this second array,
14
we have to find out a location in the second array that is we have to find out the value of
Z k where the element is equal to S k . So, this is what we have to do.
So, you find that as the forward transformation that is S k equal to T (r k ) was very simple
but the reverse transformation is not that simple. So, to do this reverse transformation,
what we have to do is we have to go for an iterative procedure. The iterative is something
like this. So, we do the iterative procedure following this.
You find that as per our definition we have said that G (z k ), this is V k , V k equal to G (z k )
is equal to s k . So, if this is equal to 2, then we must have G (z k ) minus S k which is equal
to 0. The solution would have been very simple if Z k was known but here we are trying
to find out the value of Z k .
So, to find out the value of Z k , we take help of an iterative procedure. So, what we do is
we initialize the value of Z k to some value say z hat and try to iterate on this z hat until
and unless you come to a condition, until and unless a condition like G(z) hat minus S k is
greater than or equal to 0.
So, until and unless this condition is satisfied, you go on iterating the value of z hat
incrementing the value of z hat by 1 at every iteration and for this, what we have to do is
we have to start with a minimum value of z hat, then increment the value of z hat by steps
of 1 in every iteration until and unless we come to a condition like this that is G (z) hat
minus S equal to 0 and the minimum value for z hat for which this condition is satisfied
that gives you the corresponding value of Z k . So, this is a simple iterative procedure. So,
again as before we can illustrate this with the help of an example.
15
Here again, we take P r . We assume that both r and z, there varies from they assumes
value from 0 to 7 and we take the probability density functions of P r (r) like this that is p r
(0) is equal to 0, P r (1) is equal to P r (2) is equal to 0.1, P r (3) is equal to 0.3, P r (4) is
equal to P r (5) equal to 0, P r (6) is equal to 0.4 and P r (7) is equal 0.1.
So, this is what we assumed that it is obtained from the given image and similarly, the
target histogram is given in the form P z (z) but the values are P z (0) is equal to 0, P z (1) is
equal to 0.1, P z (2) is equal to 0.2, P z (3) is equal to 0.4, P z (4) is equal to 0.2, P z (5) is
equal to 0.1, P z (6) is equal to P z (7) this is equal to 0. So, this is the target histogram that
has been specified.
Now, our aim is to find out the transformation function or the mapping function from r to
z. So, for doing this, we follow the similar procedure as we have done in case of
histogram equalization.
16
And in case of histogram equalization, we have found that for different values of r, we
get P r (r) like this: for r equal to 0, we have P r (r) equal to 0; for r equal to 1, P r (r) equal
to 0.1; for r equal to 2, this is also 0.1; 3, this is 0.3; for 4, this is 0; for 5, this is also 0;
for 6, this is 0.4; for 7, this 0.1 and from this, we can find out what is the corresponding
value of s and the corresponding values of s will be given by 0.1, 0.2, 0.5, 0.5, 0.5 then
here you get 0.9 and here you get 1.0.
Similarly, from the target histogram, we get for different values of Z, 0, 1, 2, 3, 4, 5, 6, 7.

The corresponding histogram is given by P z (z) which is 0, 0.1, 0.2, 0.4, 0.2, 0.1, 0, 0 and
the corresponding G (z) is given by 0, 0.1, 0.3, 0.7, 0.9, 1.0, 1.0 and 1.0.
Now, if I follow the same procedure that to map from r to r to z; first I have to map from
r to s, then I have to map from s to z and for that I have to find out the minimum value of
z for which G (z) minus s is greater than or equal to 0.
So, for this, when I come to value of s is equal to 0; so here I put the corresponding
values for z, let me put it as z prime. So, when s equal to 0, the minimum value of z for
which G (z) minus s is greater tan or equal to 0 is 0. For s is equal to 0.1, again I start
with z equal to 0. I find that the minimum value of z for which G (z) minus s will be
greater than or equal to 0 is equal to 1.
Here, the minimum value of z for which G (z) minus s will be greater than or equal to 0 is
equal to 2. When I come here that is r equal to 3, the corresponding value of s is equal to
17
0.5. Again, I do the same thing. You find that the minimum value of z for which the
condition will be equal to 2 that is equal to 3. When I come to r equal to 4, you find that
here the value of s equal to 0.5 and when I compute G (z) minus s, the minimum value of
z for which G (z) minus s will be equal to greater than or equal to 0 that is equal to 3
because for 3, G (z) equal to 0.7. So, this will also be equal to 3 and if I follow the similar
procedure, I will find that the corresponding functions will be like this.
So, for r equal to 1, the corresponding processed image will have an intensity value. So,
for r equal to 0, the corresponding processed image will have intensity value equal to 0;
for r equal to 1, it will equal to 1; r equal to 2, processed image will be equal to 2; r equal
to 3, processed will also be equal to 3 but r equal to 4 and 5, the processed image will
have intensity values which is a which are equal to 3. For r equal to 6, the processed
image will have an intensity value which is equal to 4; r equal to 7, the processed imaged
will have an intensity value which is equal to 5.
So, these 2 columns; the first column of r and the column of z prime, these 2 columns
gives us the mapping between an intensity level and the corresponding processed image
intensity level when I go for this histogram equalization sorry histogram matching.
So, you find that our iterative procedure will be something like this that first you obtain
the histogram of the given image, then pre compute a mapped level S k from for each
level r k using this particular relation. Then from the target histogram, you obtain the
mapping function G and for that this is the corresponding expression, then pre compute
values of Z k for each values of S k using the iterative scheme.
And once these 2 are over, I have a pre computed transformation function which is in the
form of a table which maps an input intensity value to the corresponding output intensity
value and once that is done then for final enhancement of the images, I take the input
18
intensity value and map to the corresponding output intensity value using the mapping
function.
So, that will be our final step and if this is done for each and every pixel location in the
input image, the final output image that will be an enhanced image where the enhanced
image whose intensity levels will have a distribution which is close to the distribution
that is specified.
So, using these histograms equalization techniques, you find that what are the results that
we get. Again, I take the same image that is dolphin image. On the top is the original
image and on the right hand side, I have the corresponding histogram. On the bottom, I
have ah an equal histogram matched image and on the right hand side, I have the
corresponding histogram.
So, you find that these 2 histograms are quite different and I will come a bit later that
what is the corresponding target histogram that we have taken.
19
Now, to compare the result of histogram equalization with histogram specification, you
find that on the top I have shown the same histogram equalized image as we have done
that we have shown earlier and on the bottom row, we have shown this histogram
matched image and at this point; this histogram specification, the target histogram which
was specified was the histogram which is obtained using this equalization process.
So, this particular histogram was our target histogram. So, this histogram was the target
histogram and using this target histogram when we did this histogram specification
operation, then this is the processed image that we get and this is the corresponding
histogram.
Now, if you compare these 2, the histogram equalized image and the histogram matched
image; you can note a number of differences. For example, you find that this background,
contrast of the background is much more than the contrast of the histogram equalized
image and also the details on this waterfront, the water surface is more prominent in the
histogram specified image than in the case of histogram equalized image and similar such
differences can be obtained by specifying other histograms. So, this is our histogram
specification operation.
20
So, this shows another result with histogram specification. On the top we have the dark
image, on the left bottom we have the histogram equalized image, on the right bottom we
have the histogram specified image. Here again, you find that the background in case of
histogram equalized image is almost washed out but the background is highlighted in
case of histogram specified image.
So, this is the kind of difference that we can get between a histogram equalized image
and a histogram specified image. So, this is what is meant by histogram specification and
histogram equalization. Now, as we have said that I will discuss about 2 more techniques;
one is image differencing techniques, the other one is image averaging techniques for
image enhancement.
Now, as the name suggests, whenever we go for image differencing that means we have
to take the difference of pixel values between 2 images.
21
So, given 2 images; say one image is say f (x, y) and the other image say h (x, y). So, the
difference of these 2 images is given by g (x, y) is equal to f (x, y) minus h (x, y). Now,
as this operation suggests that g (x, y) in g (x, y), all those pixel locations will be
highlighted wherever there is a difference between the corresponding locations in f (x, y)
and the corresponding location in h (x, y). Wherever f (x, y) and g (x, y) are same the
correspond f (x, y) and h (x, y) are same, the corresponding pixel in g (x, y) will have a
value which is near to 0.
So, this kind of operation image, differencing operation mainly highlights the difference
between 2 images or the locations where the 2 image contents are different. Such a kind
of image difference operations is very very useful particularly in medical image
processing.
So in case of medical image processing, there is operation which is called say mask mode
radiography. In mask mode radiography, what is done is you take the image of certain
body part of a patient, an x-ray image which is captured with the help of a tv camera
where the camera is normally placed opposite to a x-ray shots and then what is done is
you inject a contrast media into the blood stream of the patient and after injecting this
contrast media, again you take a series of images using the same tv camera of the same
anatomical portion of the patient body.
Now, once you do this; the first one, the image which is taken before injection of this
contrast media that is called a mask and that is why the name is mask mode radiography.
So, if you take the difference of all the frames that you obtain after the injection of the
contrast media, take the difference of those images from the mask image; then you will
22
find that all the regions where the contrast media that flows through the artery, those will
be highlighted in the difference image and this kind of application and this kind of
processing is very very useful to find out how the contrast media flows through artery of
the patient and that is very very helpful to find out if there is any arterial disease of the
patient. Say for example, if there is any blockade in the artery or similar such diseases.
So, this mask mode radiography makes use of this difference image operation to highlight
the regions in the patient body or the arterial regions in the patient body and which is
useful to detect any arterial disease or arterial disorder.
Now, to show a result, here this particular case, you find that on the left hand side
whatever shown that is the mask which is obtained and on the right hand side, this is the
difference image. This difference image is the difference of the images taken after
injection of the contrast media with the mask and here you find that all these arteries
through which the contrast media is flowing, those are clearly visible and because of this,
it is very easy to find out if there is any disorder in the artery of the patient.
Now, the other kind of image processing applications as we said; so this is our difference
image processing. So, as difference image processing can be used to enhance the contents
of certain regions within the image wherever there is a difference between 2 images;
similarly, if we take the average of a number of images of the same scene, then it is
possible to reduce the noise of the image.
23
And, that noise deduction is possible because of the fact that normally if I have a pure
image say f (x, y), then the image that we capture if I call it as g (x, y) that is the captured
image, this captured image is normally the pure is image f (x, y) and on that we have a
contaminated noise say eta (x, y).
Now, if this noise eta (x, y) is additive and 0 mean; then by averaging a large number of
such noisy frames, it is possible to reduce the noise because simply because if I take the
average of say k number of frames of this, then g tilde (x, y) which is the average of k
number of frames is given by 1 over k then summation of g i (x, y) where i varies from 1
to k and if I take the acceptation value or the average value of this g tilde (x, y), then this
average value is nothing but f (x, y) and our condition is the noise must be 0 mean
additive noise and because it is 0 mean, I assume that at every pixel location, the noise is
un correlated and the mean is 0.
So, that is why if you take the average of a large number of frames, the image the noise is
going to get canceled out and this kind of operation is very very useful for astronomical
cases because in case of astronomy, normally the objects which are imaged the intensity
of the images are very very low. So, the image that you capture is likely to be dominated
by the presence of noise.
24
So, here this is the image of a galaxy. On the top left, on the top right; we have the
corresponding noisy image and on the bottom, we have the images which are averaged
over a number of frames. So, the last one is an average image where the average is taken
over 128 number of frames and here the number of frames is less and as it is quite
obvious from this that as you increase the number of frames, the amount of noise that you
have in the processed image is less and less.
So, with this, we come to an end to our discussion on point processing techniques for
image enhancement operations. Now, let us discuss the questions that we have placed in
the last class. The first one is what is an image histogram? So, you find that few of the
questions are very obvious. So, we are not going to discuss about them.
25
Now, the fourth one is very interesting that suppose a digital images is subjected to
histogram equalization; what is the effect, what effect will a second pass equalization
have over the equalized image? So, as we have already mentioned that once a image is
histogram equalized, the histogram of the processed image will be a uniform histogram
that means it will have an uniform probability density function and if I want to equalize
this equalized image, then you find that the corresponding transformation function will be
a linear one where the straight line will be inclined at an angle of 45 degree with the x
axis.
So, that clearly indicates that whatever kind of equalization we do over an already
equalized image that is not going to have any further effect on the processed image. So,
this is ideal case but practically we have seen that after equalization, the histogram that
you get is not really uniform. So, there will be some effect in the second pass the effect
but the effect may be negligible. Sixth one is again a tricky one; what condition must be
satisfied by the target histogram to be used in histogram specification technique?
You find that in case of histogram specification techniques, the target histogram is used
for inverse transformation that is G inverse. So, it must be that the it must be true that the
transformation function G has to be monotonically increasing and that is only possible if
you have the value of P z (z) non 0 for every possible value of z. So, that is the condition
that must be satisfied by the target histogram.
Now, coming to today’s questions; first one is explain why the discrete histogram
equalization technique does not in general yield a flat histogram.
26
The second, an image has gray level PDF P r (r) as shown here and the target histogram as
shown on the right. We have to find out the transformation in terms of r and z that is what
is the mapping from r to z.
The third question, we have given the probability density functions to probability density
functions. Again, you have to find out the Transformation between r and z.
27
Thank you.
28
Prof. P. K. Biswas
Department of Electronic & Electrical Communication Engineering
Lecture - 20
Hello, welcome to the video lecture series on digital image processing. For last few classes, we
were discussing about image enhancement techniques and we have completed our discussion on
point processing techniques for image enhancement.
So, what we have done till now is the point processing techniques for image enhancement and
under this, the first operation that we have done is image negatives and there we have seen that
image negative operation is useful incase the image contains information which are either
contained in the gray level or in the white pixels that are embedded in dark image regions.
So, if you take the negative of such images, then the information content becomes dark where as
background becomes white and visualization of that information in those negative images is
much more easier.
The second operation that we have done is the logarithmic transformation for dynamic range
compression. We have done this logarithmic transformation because we have seen that in some
cases, the dynamic range that is the difference between the minimum intensity value and the
maximum intensity value of an image is so high that a display device is normally not capable to
deal with such a high dynamic range.
1
So, for such cases, we have to reduce the dynamic range of the image so that the image can be
displayed properly on the given display device and this logarithmic transformation operation
gives such a dynamic range compression operation.
Then the next technique we have talked about is the power-law transformation. In case of power-
law transformation, we have seen that many of the devices like whether the image printing
device or the image display device or the image acquisition device; these different devices, they
themselves introduce some sort of power-law operation on the image that is to be displayed.
As a result, the image that we want to display or the image that we want to print, they become
distorted. The appearance of the output image is not same as the image that is to be outputted
that is intended to be outputted. So, this power-law which is introduced by those devices has to
be corrected by some power-law operation.
So, we have seen that in case of this power-law compensation or power-law operation, we
introduced a pre processed image having a power-law which is inverse of the power-law that is
introduced by the device and as a result; the pre process image, when it goes to the device, then
the output of the device will be almost similar to the image that is intended.
So, for this kind of operation, to nullify the effect of the device, we go for power-law
compensation technique. Then the next operation that we have done is contrast stretching. So, in
case of contrast stretching, we have seen that in many cases, we can get a very very dark image
because of the fact that may be the scene when the image was taken was not properly illuminated
or the scene was very very poorly illuminated.
The other reason why we can get such a dark image is that while taking the photograph, the
camera lens was not properly set that is the aperture of the lens was not properly set or we can
also get dark image because of the limitation of the sensor itself. The image sensor, if the
dynamic range of the image sensor is a very narrow; in that case, such a kind of sensor also leads
to an image which is a dark image.
So, to enhance such dark images so that the image can be visualized properly, we go for the
contrast enhancement technique. The other kind of image enhancement we have talked about is
the grey level slicing operation and this kind of grey level operation is useful in cases if the
application needs to highlight certain range of grey levels in the image.
So, in such cases, in grey level slicing, we have seen 2 kind of techniques ((05:40))… region is
highlighted whereas the grey levels outside that particular specified range is suppressed and the
second kind of grey level slicing operation that we have seen is the grey levels within the
specified range is highlighted whereas the grey levels outside the range remains as it is.
So, these are the 2 different types of grey level slicing operations we have talked about and as I
said that if the application demands that the application needs enhancement of certain range of
grey levels, the application is not interested in other grey level values or other intensity values; in
that case, what we go for is the grey level slicing kind of operation.
2
Then we have talked about other enhancement techniques where these enhancement techniques
are based on histogram based processing operations. In other point processing techniques, we
define a transformation function where the transformation function simply works on a particular
pixel of the input image to generate a processed pixel of the output image and those
transformation functions, they do not take care of or they do not consider the overall appearance
of the image and we have seen that the overall appearance of the image is actually reflected in
what is called the histogram of the image.
So, this histogram based processing techniques, they try to modify or they try to highlight the
overall appearance of the image by modifying the histogram of that particular image and under
this category, we have talked about 2 kinds of histogram based processing techniques; one was
one of them was histogram equalization technique and the second one was histogram
modification technique.
Then we have talked about 2 other kinds of image enhancement operations where these
enhancement operations does not perform on a single image but it performs on multiple number
of images. So, one of them we have talked about was image differencing operation. So, this
image differencing operation, it highlights those regions in the image where there is a difference
between the given 2 images. So, only those regions where the given 2 images are different, those
regions will be highlighted and where the 2 images are similar, those regions will be suppressed.
The other kind of operation that we have done was image averaging operation and we have said
that this kind of image averaging operation is very very useful where the object which is imaged
that is of very very low intensity. So, for such kind of objects or while imaging such kind of
objects, the image that you get is likely to be dominated by noise.
So, if I get multiple number of frames of such noisy images and if the noise that is added to the
image is a 0 mean noise, then taking average of multiple number of frames of such noisy images
is likely to cancel the noise part and ultimately what comes out after the averaging operation is
the actual image that is desired. So, these are the different kind of operations, point processing
techniques for image enhancement that we have done till our last class.
3
Now, in today’s class we will talk about another special domain technique which is called mask
processing technique. The previous lectures also we were dealing with the special domain
techniques and we have said that image enhancement techniques can broadly be categories into
special domain techniques and frequency domain techniques. The frequency domain techniques
we will talk about later on.
So, in today’s class, we will talk about another special domain techniques which are known as
mask processing techniques and other under this, we will discuss 3 different types of operations.
The first one is the linear smoothing operation; the second one is a nonlinear operation which is
based on the statistical features of the image which is known as the median filtering operation
and the third kind of mask processing technique that we will talk about is the sharpening filter.
Now, let us see what this mask processing technique means.
4
Now, in our earlier discussions we have mentioned that while going for this contrast
enhancement, what we basically do is given an input image say f (x, y), we transform this input
image by a transformation operator say T which gives us an output image g (x, y) and what will
be the nature of this output image g (x, y) that depends upon what is this transformation operator
T.
In point processing technique, we have said that this transformation operation T that operates on
a single pixel in the image. That is it operates on a single pixel intensity value. But as we earlier
said that T is an operator which operates on a neighborhood of the pixels at location (x, y); so for
point processing operation, the neighborhood size was 1 by 1. So, if we consider a neighborhood
of size more than 1 that is we can consider a neighborhood of size say 3 by 3, we may consider a
neighborhood of size say 5 by 5, we may consider a neighborhood of size 7 by 7 and so on.
So, if we consider a neighborhood of size more than 1, then the kind of operation that we are
going to have that is known as mask processing operation. So, let us see what does this mask
processing operation actually mean.
5
Here, we have shown a 3 by 3 neighborhood around a pixel location (x, y). So, this outer
rectangle represents a particular image and in the middle of this, we have shown a 3 by 3
neighborhood and this 3 by 3 neighborhood is taken around a pixel at location (x, y).
By mask processing what we mean is; so if I consider a neighborhood of size 3 by 3, I also

consider a mask of size 3 by 3. So, we find that here on the right hand side, we have shown a
mask. So, this is a given mask of size 3 by 3 and these different elements in the mask that is W
minus 1, minus 1 W minus 1 , 0 W minus 1, 1 W 0, minus 1 and so on upto W 1 , 1 ; these elements represent the
coefficients of this mask.
6
So, for all these mask processing techniques what we do is we place this mask on this image
where the mask center coincides with the pixel location (x, y). Once you place this mask on this
particular image, then you multiply every coefficient of this mask by the corresponding pixel on
the image and then you take the sum of all these products.
So, the sum of all these products is given by this particular expression and whatever sum you get
that is placed at location (x, y) in the image g(x, y). So, by mask processing operation, this is the
mathematical expression we get that g (x, y) equal to W ij into f (x plus I, y plus j). You have to
take the summation of this product over j varying from minus 1 to 1 and i varying from minus 1
to 1.
So, this is the operation that has to be done for a 3 by 3 neighborhood in which case we get a
mask again of size 3 by 3. Of course, as we said that we can have masks of higher dimension; we
can have a mask of 5 by 5, if I consider a 5 by 5 neighborhood. I have to consider a mask of size
7 by 7, if I consider a 7 by 7 neighborhood and so on.
So, if this particular operation is done at every pixel location (x, y) in the image, then the output
image g (x, y) for various values of x and y that we get is the processed image g. So, this is what
we mean by mask processing operation.
Now, the first of the mask processing operation that we will consider is the image averaging or
image smoothing operation. So, image smoothing is a special filtering operation where the value
at a particular location (x, y) in the processed image is the average of all the pixel values in the
neighborhood of x and y. So, because it is the average, this is also known as averaging filter and
later on we will see that this averaging filter is nothing but a low pass filter. So, when we have
such averaging filter, the corresponding mask can be represented in this form.
7
So, again here we are showing a mask, a 3 by 3 mask and here we find that all the coefficients in
this 3 by 3 mask are equal to 1 and by going back to our mathematical expression, I get an
expression of this form that is g (x, y) equal to 1 upon 9 into f (x plus I, y plus j). Take the
summation over j equal to minus 1 to 1 and i equal to minus 1 to 1.
So naturally, as this expression says you find that what we are doing; we are taking the
summation of all the pixels in the 3 by 3 neighborhood of the pixel location (x, y) and then
dividing these summation by 9. So, which is nothing but average of all the pixel values in the
neighborhood of (x, y) in the 3 by 3 neighborhood of (x, y) including the pixel at location (x, y)
and this average is placed at location (x, y) in the processed image g.
So, this is what is known as averaging filter and also this is called a smoothing filter and the filter
could and the particular mask for which all the filter coefficients or mask coefficients are same or
equal to 1 in this particular case, this is known as a box filter. So, this particular filtering
operation that we are getting, this particular mask is known as a box filter.
Now, when we perform this kind of operation, then naturally because we are going for averaging
of all the pixels in the neighborhood; so the output image is likely to be a smoothed image that
means it will have a blurring effect, all the sharp transitions in the images will be removed and
they will be replaced by a blurred image.
As a result, if there is any sharp edge in the image; the sharp images, the sharp edges will also be
blurred. So, to avoid the effect of blurring, there is another kind of mask; averaging mask or
smoothing mask which performs weighted average.
So, such a kind of mask is given by this particular mask. So, here you find that in this mask, the
center coefficient is equal to 4. The coefficients vertically up, vertically down or horizontally
left, horizontally right; they are equal to 2 and all the diagonal elements of the center elements in
8
this mask are equal to 1. So effectively, what we are doing is when we are taking the average, we
are weighting every pixel in the neighborhood by the corresponding coefficients and what we get
is a weighted average.
So, the center pixel that is the pixel at location (x, y) gets the maximum weightage and as you
move away from the pixel locations, from the center location, the weightage of the pixels are
reduced. So, when we apply this kind of mask, then our general expression of this mask
operation that is valid which becomes W ij f (x minus i y minus f (x plus I, y plus j). Take the
summation from j equal to minus 1 to 1 and i equal to minus 1 to 1 and take 1 upon 16 of this
and that will give the value which is to be placed at location (x, y) in the processed image g. So,
this becomes the expression of g(x, y).
Now, the purpose of going for this kind of weighed averaging is that because here we are
weighting the different pixels in the image for taking the average, the blurring effect will be
reduced in this particular case. So in case of box filter, the image will be very very blurred and of
course the blurring will be more if I go for bigger and bigger neighborhood size or bigger and
bigger mask size. When we go for averaging, weighted averaging; in such cases, the blurring
effect will be reduced. Now, let us say what kind of result we get.
So, this gives the general expression that when we will consider W ij , we have to have a
normalization factor that is this summation has to be divided by sum of the coefficients and as
we said that 3 by 3 neighborhood is only a special case, I can have neighborhoods of other sizes;
so here it shows that we can have a neighborhood of size M by N where M equal to 2a plus 1 and
N equal to 2b plus 1 where a and b are some positive integers and obviously here, you show the
here it is shown that the mask is usually of odd dimension, it is not even dimension and that is
normally the mask of odd dimension which is normally used in case of image processing.
9
Now, using this kind of mask operation, here we have shown some results. You find that the top
left image is noise image. When you do the masking operation or averaging operation on this
noisy image, the right top image shows the averaging with a mask of size 3 by 3, the left bottom
image is obtained using a mask of size 5 by 5 and the right bottom image is obtained using a
mask of size 7 by 7.
So, from these images, it is quite obvious that as I increase the size of the mask, the blurring
effect becomes more and more. So, we find that the right bottom image which is obtained by a
mask of size 7 by 7 is much more blurred compared to the other 2 images and this effect is more
prominent if you look at the edge regions of this images.
Say, if I compare this particular region with the similar region in the upper image or the similar
regions in the original image. You find that here, in the original image that is very very sharp
whereas when I do the smoothing using a 7 by 7 mask, it becomes very blurred whereas the
blurring effect when I use the 3 by 3 mask is much less. Similar such result is obtained with other
images also.
10
So, here is another image. Again, we do the masking operation or the smoothing operation with
different mask sizes. On the top left, we have an original noisy image and the other images are
the smoothed images using various mask sizes. So, on the right top, this is obtained using a mask
of size 3 by 3, the left bottom is an image obtained using a mask of size 5 by 5 and the right
bottom is an image obtained using a mask of size 7 by 7.
So, we find that as we increase the mask size, the reduction is in noise or the noise is reduced to
a greater extend but at the cost of addition of blurring effect. So, though the noise is reduced but
image becomes very blurred. So, that is the effect of using the box filters or the smoothing filters
that though the noise will be removed but the images will be blurred or the sharp contrast in that
image will be reduced.
So, there is a second kind of image, second kind of masking operations which are based on order
statistics which will reduce this kind of blurring effects. So, let us consider one such filter based
on order statistics.
11
So, these kind of filters unlike in case of the earlier filters; these filters are nonlinear filters. So,
here in case of this order statistic filters; the response is based on the ordering of the intensities,
ordering of the pixel values in the neighborhood of the point under consideration. So, what we do
is we take the set of intensity values which is in the neighborhood which are in the neighborhood
of the point (x, y), then order all those intensity values in a particular order and based on this
ordering, you select a value which will be put at location (x, y) in the processed image g and that
is how the output image that you get is a processed image.
But here the processing is done using the order statistics filter. A widely used filter under this
order statistics is what is known as a median filter. So in case of a median filter, what we have to
do is I have an image and what I do is around point (x, y), I take a 3 by 3 neighborhood and
consider all the 9 pixels, intensity values of all the 9 pixels in this 3 by 3 neighborhood.
Then, I arrange this pixel values, the pixel intensity values in a certain order and take the median
of this pixel intensity values. Now, how do you define the median? We define the median say
zeta of a set of values such that half of the values in the set will be less than or equal to zeta and
the remaining half of the values of the set will be greater than or equal to zeta.
So, let us take a particular example. Suppose, I take a 3 by 3 neighborhood around a pixel
location (x, y) and the intensity values in this 3 by 3 neighborhood, let us assume that this is 100,
this is a 85, this is a 98, this may have a value 99, this may have a value say 105, this may have a
value say 102, this may have a value say 90, this may have a value say 101, this may have a
value say 108 and suppose this represents a part of my image say f (x, y).
Now, what I do is I take all these pixel values, all this intensity values and put them in ascending
order of magnitude. So, if I put them in ascending order of magnitude, you find that the
12
minimum of these values is 85, the next value say 90, and next one is 98, the next one is 99, the
next one is 101, the next one is 102, the next one is 105 and the next one is 108. So, all these 9
intensity values, I have put in ascending order of magnitude. Here there will be one more, so
there is one more value - 100. So, these are the 9 intensity values which are put in the ascending
order of magnitude. So once I put them into ascending order of magnitude, from this I take the
fifth maximum value which is equal to 100.
So, if I take the fifth maximum value, you find that there will be equal number of values which is
greater than this fifth value; greater than or equal to this fifth number and there will be same
number of values which will be less than or equal to this fifth number. So, I consider this
particular pixel value 100 and when I generate the image g (x, y), in g (x, y) at location (x, y), I
put this value 100 which is the median of the pixel values within this neighborhood. So, this
gives my processed image g (x, y).
Of course, the intensities in other locations in other pixel regions will be decided by the median
value of the neighborhood of the corresponding pixels. That is if I want to find out what will be
the pixel value at this location, then the neighborhood that I have to consider will be this
particular neighborhood.
So, this is how I can get the median filtered output and as you can see that this kind for filtering
operation is based on statistics. Now, let us see that what kind of result that we can have using
this median filter.
So here, you find that it is again on the same building image. The left top is our original noised
image; on the right hand side, it is the smoothed image using box filter and on the bottom, we
have the image using this median filter.
13
So here again, as you see that the image obtained using the processed image obtained using
median filter operation, maintains the sharpness of the image to a greater extend then that
obtained using the smoothing images.
Coming to the second one, again this is one of the images that we have shown earlier a noise
image having 4 coins. Here again, you find that after doing the smoothing operation, the edges
becomes blurred and at the same time, the noises are not reduced to a great extends. Still this
particular image is noisy.
So if I want to remove all these noise, what I have to do is I have to smooth this images using
higher neighborhood size and the moment I will go for the larger neighborhood size, the blurring
effect will be more and more. On the right hand side, the image that we have; so this particular
image is also the processed image but here the filtering operation which is done is median filter.
So here, we find that because of the median filtering operation; the output image that we get, the
noise in this output image is almost vanished but at the same time the contrast of the image or the
sharpness of the image remains more or less interact. So, this is an advantage that you get if we
go for median filtering rather than smooth smoothing filtering or averaging filtering. To show the
advantage of this median filtering, we will take another example.
14
So, this is the image of the butterfly, a noisy image of a butterfly. On bottom left, the image that
is shown, this is an averaged image where the averaging is done over a neighborhood of size 5
by 5. On the bottom right is the image which is filtered by using median filter. So, this particular
image clearly shows, this result clearly shows the superiority of the median filtering over the
smoothing operation or averaging operation and such median filtering is very very useful for a
particular kind of noise where the noise is a random noise which are known as salt and pepper
noise because of the appearance of the noise in the image.
So, these are the different filtering operations which reduces the noise in the particular image or
the filtering operations which introduce blurring or smoothing over the image. We will now
consider another kind of spatial filters which increases the sharpness of the image. So, the spatial
filter that we will consider now is called sharpening spatial filter.
15
So, we will consider sharpening spatial filter. So, the objective of this sharpening spatial filter is
to highlight the details, the intensity details or variation details in an image. Now, through our
earlier discussion, we have seen that if I do averaging over an image or smoothing over an
image, then the image becomes blurred or the details in the image are removed. Now, this
averaging operation is equivalent to integration operation.
So, if I integrate the image, then what I do is what I am going to get is a blurring effect or a
smoothing effect of the image. So, if integration gives a smoothing effect, so it is quite logical to
think that if I do the opposite operation that is instead of integration, if I do differentiation
operation, then the sharpness of the image is likely to be increased. So, it is the derivative
operations or the differentiations which are used to increase the sharpness of an image.
Now, when I go for the derivative operations, I can use 2 types to derivatives. I can use the first
order derivative or I can also use the second order derivative. So, I can either use the first order
derivative operation or I can use the second order derivative operation to obtain or to enhance the
sharpness of the image. Now, let us see what are the desirable effects that these derivative
operations are going to give.
16
If I use a first order derivative operation or a first order derivative filter, then the desirable effect
of this first order derivative filter is it must be 0, the response must be 0 in areas of constant grey
level in the image and the response must be non zero at the onset of the grey level step or or at
the onset of a grey level ramp and it should be non zero along ramps. Whereas, if I use a second
order derivative filter; then the second order derivative filter response should be 0 in the flat
areas, it should be non zero at the onset and end of a grey level step or grey level ramp and it
should be 0 along ramps of constant slope. So, these are the desirable features or the desirable
responses of a first order derivative filter and the desirable response of a second order derivative
filter.
Now, whichever derivative filter I use; whether it is a first order derivative filter or a second
order derivative filter, I have to look for discrete domain formulation of those derivative
operations. So, let us see how we can formulate the derivative operations; the first order
derivative or the second order derivative in discrete domain.
17
Now, we know that in continuous domain the derivative is given by let us consider a 1
dimensional case that is if I have a function f (x) which is a function of variable x, then I can
have the derivative of this which is given by df (x)/ dx which is given by limit delta x tends to
zero f (x plus delta x) minus f (x) upon delta x. So, this is the definition of derivative in
continuous domain.
Now, when I come to discrete domain; in case of our digital images, the digital images are
represented by a discrete set of points or pixels which are represented at different grid locations
and the minimum distance between 2 pixels is equal to 1.
So, in our case, we will consider the value of delta x equal to 1 and this derivative operation in
case of 1 dimension, now reduces to del f del x is equal to f of x plus 1 minus f of x. Now, here I
use the partial derivative notation because our image is a 2 dimensional image. So, when I take
the derivative in 2 dimensions, we will have partial derivatives along x and we will have partial
derivatives along y. So, the first derivative, the first order derivative for 1 dimensional discrete
signal is given by this particular expression.
Similarly, the second order derivative of a discrete signal in1dimension can be approximated by
del 2 f upon del x 2 which is given by f (x plus 1) plus f (x minus1) minus 2 f (x).
So, this is the first order derivative and this is the second order derivative and you find that these
2 derivations, these 2 definitions of the derivative operations, they satisfy the desirable properties
that we have discussed earlier. Now, to illustrate the response of these derivative operations, let
us take an example.
18
So, this is a 1 dimensional signal where the values of the 1 dimensional signals for various values
of x are given in the form of an array like this and the plot of these functional values, these
discrete values are given on the top. Now, if you take the first order derivative of this as we have
just defined, the first order derivative is given in the second array and the second order derivative
is given in the third array.
So, if you look at this functional value, the plot of this functional value, this represents various
regions, Say for example; here, this part is a flat region, this particular portion is a flat region,
this is also a flat region, this is also a flat region. This is a ramp region, this represents an isolated
point, this area represents a very thin line and here we have a step kind of discontinuity.
So now, if you compare the response of the first order derivative and the second order derivative
of this particular discrete function; you find that the first order derivative is non zero during
ramp, whereas the first order derivative is 0 along a ramp. The second order derivative is 0 along
a ramp; the second order derivative is non zero at the onset and end of the ramp.
Similarly coming to this isolated point, if I compare the response of the first order derivative and
the response of the second order derivative, you find that the response of the second order
derivative for an isolated point is much stronger than the response of the first order derivative.
Similar is the case for a thin line. The response of the second order derivative is greater than the
response of the first order derivative. Coming to this step edge, the response of the first order
derivative and response of the second order derivative is almost same but the difference is in case
of second order derivative, I have a transition from a positive polarity to a negative polarity.
Now, because of this transition from positive polarity to negative polarity, the second order
derivatives normally leads to double lines the moment in case of a step discontinuity in a image
19
whereas, the first order derivative that leads to a single line. Of course, this double line, getting
this double line; usefulness of this, we will discuss later.
Now, but as we have seen, the first order the second order derivative gives a stronger response to
isolated points or to thin lines and because the details in an image normally has the property that
it will be either isolated points or thin lines to which the second order gives second order
derivative gives a stronger response; so, it is quite natural to think that the second order
derivative based operator will be most suitable for image enhancement operations.
So, our observation is as we have discussed previously that first order derivative generally
produce a thicker edge because we have seen that during a ramp or along a ramp, the first order
derivative is non zero whereas, the second order derivative along a ramp is 0 but it gives non
zero values at the starting of the ramp and the end of the ramp.
So, that is why the first order derivatives generally produce a thicker edge in an image. The
second order derivatives gives stronger response to find it is such as thin lines and isolated
points. The first order derivative have stronger response to gray level step and the second order
derivative produce a double response at step edges and as we have already said that as the details
in the image are either in the form of isolated points or thin lines; so the second order derivatives
are better suited for image enhancement operations.
So, we will mainly discuss about the second order derivatives for image enhancement. But to use
this for image enhancement operation; obviously, because our images are digital and as we have
said in many times that we have to have a discrete formulation of this second order derivative
operations and the filter that will design that should be isotropic. That means the response of the
second order derivative filter should be independent of the orientation of the discontinuity in the
image and the most widely used or the popularly known second order derivative operator of
isotropic nature is what is known as a Laplacian operator.
20
So, we will discuss about the Laplacian operator and as we know that the Laplacian of a function
is given by del square f equal to del 2 f del x 2 plus del 2 f del y 2. So, this is the laplacian operator
in continuous domain but what we have to have is the laplacian operator in the discrete domain.
And, as we have already seen that del 2 f del x 2 in case of discrete domain is approximated as f
(x minus1) or f (x plus1) plus f (x minus1) minus 2 f (x).
So, this is in case of a 1 dimensional signal. In our case, our function is a 2 dimensional function
that is the function of variables x and y. So for this, we can write for 2 dimensional signal del 2 f
del x 2 which will be a simply f of (x plus1, y) plus f of (x minus1, y) minus 2 f (x, y). Similarly,
del 2 f del y 2 will be given by f of (x, y plus1) plus f of (x, y minus 1) minus 2 f (x, y).
21
And, if I add these 2, I get the Laplacian operator in discrete domain which is given by del 2 f is
equal to del 2 f del x 2 plus del 2 f del y 2 and we will find that this will be given as f (x plus 1, y)
plus f (x minus 1, y) plus f (x, y plus 1) plus f (x, y minus 1) minus 4 f (x, y) and this particular
operation can be represented again in the form of a 2 dimensional mask. That is for this
Laplacian operator, we can have a 2 dimensional mask and the 2 dimensional mask in this
particular case will be given like this.
So, on the left hand side, the mask that is shown, this mask considers the Laplacian operation
only in the vertical direction and in the horizontal direction and if we also include the diagonal
22
directions, then the Laplacian mask is given on the right hand side. So, we find that using this
particular mask which is shown on the left hand side, I can always derive the expression that we
have just shown.
Now, here I can have 2 different types of mask. Depending upon polarity of the coefficient at the
center pixel, I can have the center pixel to have a polarity either negative or positive. So, if the
polarity is positive, same of the center coefficient; then I can have a mask of this form where the
center pixel will have a positive polarity but otherwise the nature of the mask remains the same.
Now, if I have these kinds of operation, then you find that the image that you get that will have
that will just highlight the discontinuous regions in the image whereas, all the smooth regions in
the image will be suppressed. So, this shows an original image.
On the right hand side, we have the output of the laplacian and if you closely look at this
particular image, you will find that all the discontinuous regions will have some value. However
this particular image cannot be displayed properly.
23
So, we have to have some scaling operation because I will have say gamma H greater than 1 and
gamma L less than 1. This will amplify all the high frequency components that is the
contribution of the reflectance and it will attenuate the low frequency components that is
contribution due to the illumination. Now, using this time of type of filtering, the kind of result
that we get is something like this.
Here, on the left hand side is the original image and on the right hand side is the enhanced image
and if you look in the boxes, you find that many of the details in the boxes which are not
available in the original image is now available in the enhanced image. So, using such
24
homomorphic filtering, we can even go for this kind of enhancement or the illumination, the
contribution due to illumination will be reduced; so even in the dark areas, we can take out the
details.
So with this, we come to an end to our discussion on image enhancements. Now, let us go to
some questions of our today’s lecture.
The first question is a digital image contains an unwanted region of size 7 pixels. What should be
the smoothing mask size to remove this region? Why Laplacian operator is normally used for
image sharpening operation? Third question - what is unsharp masking? Fourth question - give a
3 by 3 mask for performing unsharp masking in a single pass through an image. Fifth, state some
applications of first derivative in image processing.
25
Then, what is ringing? Why ideal low pass and high pass filters lead to ringing effects? How
does blurring vary with cut off frequency? Does Gaussian filter lead to ringing effect? Give the
transfer function filter and what is the principle of homomorphic filter?
Thank you.
26
Prof. P. K. Biswas
Lecture - 21
Image Enhancement Frequency Domain Processing
For last few lectures, we are talking about image enhancement techniques specifically the spatial
domain techniques for image enhancement.
So, for last few lectures, we have talked about the point processing techniques and we have
talked about few mask processing techniques for image enhancement. Both point processing
techniques as well as mask processing techniques, we have said that they are spatial domain
techniques in the sense that they walk directly on the image pixels.
So, among the mask processing techniques, what we have done so far is we have talked about the
linear smoothing filters or averaging filters and we have seen that this smoothing or averaging
filters are some sort of integration operation which integrates the image pixels. We have also
talked about a non linear filter or a filter based on ordered statistics which we have said is the
median filter and we have talked about a sharpening filter and we have said that this sharpening
filter is nothing but some sort of differential operators which differentiate the image pixels to
sharpen the image and we have said that for such sharpening operation, the kind of derivatives
which are most suitable is the second order derivative and accordingly, we have discussed about
1
the second order derivative operators which we have said as Laplacian operator and we have
demonstrated with results that how this Laplacian operators in the spatial domain, they try to
enhance the content of an image.
So today, we will talk about some more mask processing techniques like; we will talk about un
sharp masking, we will talk about high boost filter and we will also see that how the first order
derivative operators can help in enhancement of image content particularly at the discontinuities
and age regions of an image and then we will go to our today’s topic of discussion which we say
is the frequency domain techniques for image enhancement and here again, we will talk about
various types of filtering operations like low pass filtering, high pass filtering, then equivalent to
high boost filtering and then finally, we will talk about homomorphic filtering and all these
filtering operations will be in the frequency domain operations. So, let us first quickly see that
what we have done in the last class.
2
So in the last class, we have talked about the averaging filters or low pass filters and we have
talked about 2 types of spatial masks which are used for this averaging operations. One, we have
said as box filter and we have said that in case of box filter, all the coefficients in the filter mask,
they have the same value and in this case, all the coefficients have value equal to 1.
The other type of mask that we have used is for weighted average operation and here it shows the
corresponding mask which gives the weighted averaging and we have said that if you use this
weighted averaging mask instead of the box filtered mask, then what advantage we get is this
3
weighted average mask tries to retain the sharpness of the image or the contrast of the image as
much as possible whereas if we simply use the box filter, then the image gets blurred too much.
Then these are the different kinds of results that we have obtained. Here, the result is shown for
an image which is on the top left. On the top right, the image is averaged by a 3 by 3 box filter;
on the bottom left, this is an averaging over 5 by 5 filter and on the bottom right, this is an image
with averaging over 7 by 7 filter and as it is quite obvious from this results that as we take the
average or smooth out the image with the help of this box filters, the images get more and more
blurred. Similar such results are also obtained and have been shown in this particular case.
4
Here also you find that using the low pass filter, the content, the noise in the image gets removed
but at the cost of the sharpness of the image. That is when we take the average over a larger
mask, a larger size mask; then it helps to reduce the noise but at the same time, a larger mask
introduces large amount of blurring in the original image.
So, there we have said that instead of using simple box filter or the simple averaging filter if I go
for order statistics, go for filtering based on order statistics like median filter or the pixel value at
a particular location in the processed image will be the median of the pixels in the neighborhood
of the corresponding location in the original image; in that case, this kind of filtering also
reduces the noise. But at the same time, it tries to maintain the contrast of the image.
So here, we have shown one such result. On the top left is the original noisy image, on the top
right is the image which is obtained using the box filter and the bottom image is the image which
is obtained using the median filter and here it is quite obvious that when we go for the median
filtering operation, the median filtering reduces the noise but at the same time, it maintains the
sharpness of the image whereas, if we go for box filtering of higher dimension of higher size,
then the noise is reduced but at the same time, the image sharpness is also reduced. That means
the image gets blurred.
5
This is another set of results where you will find that if you compare the similar results that you
have shown earlier using the median filter, the noise is almost removed but at the same time, the
contrast of the image is also maintained. So, this is the advantage of the median filter that we get
that in addition to removal of noise, you can maintain the contrast of the image. But this kind of
median filtering, as we have mention that this is very suitable for a particular kind of noise,
removal of a particular kind of noise which we have said the salt and pepper noise. The name
comes because of the appearance of these noises in the given images.
6
Then this shows another median filter output result. The bottom 2 images on the left side, it is the
image obtained using the box filter on the right hand side, it is the image obtained using the
median filter. The enhancement using the median filter over the box filter is quite obvious from
this particular image.
Then we then we have said that for enhancement operation, we use the second order derivatives
and the kind of masks that we have used for the second order derivative is the Laplacian mask
and for the Laplacian mask, these are the 2 different masks which we have used for Laplacian
operation.
7
We can also use another type of masks where the center coefficients are positive. You find in
case of earlier masks, the center coefficients are negative whereas, all the neighboring
coefficients are positive in the Laplacian mask. In this case, the center coefficient is positive
whereas, all other neighboring coefficients are negative.
Now, using this Laplacian mask, we can find out the high frequency detailed contents of an
image as has been shown in this particular one. Here you find that the original image, when it is
processed using the Laplacian mask, the details of the image are obtained on the left hand side.
Bottom left, we have shown the details of the image. On the bottom right what we have done is it
is the same image which is displayed after scaling so that the details are displayed properly on
the screen.
Now here, what has been done is we have just shown the details of the image. But in many
applications what is needed is if this detailed information is super imposed on the original image,
then it is better for visualization. So, these detailed images are to be added to the original image
so that we can get an enhanced image.
8
So, the next one shows that if we have this original image, these are that same detailed images
that we have shown earlier. On the right bottom, you have the enhanced image or the detailed
images are added to the original image and for performing this operation, we can have a
composite mask where the composite mask is given like this.
Here you find on the center pixel we have the center coefficient of the mask is equal to 5
whereas, you remember you recollect that in case of Laplacian mask, the center pixel of the
corresponding mask was of equal to 4.
9
So, if I change from 4 to 5 that mean f (x, y) value, the original image is going to be added with
the detailed image to give us the enhanced images. So, that is what is done by using this
composite mask.
And, this is the result that we obtained using the composite mask similar to the one that we have
shown earlier; you find that on the top, we have the original image and on the bottom right, we
have the enhanced image. Bottom left is an enhanced image when we use a mask where only the
horizontal and the vertical neighbors are non zero values whereas, the bottom right is obtained
using the mask were we consider both the horizontal vertical and diagonal coefficients to be non
zero values. And as it is quite clear from this particular result that when we go for this kind of
mask having both horizontal, vertical and the diagonal components has non zero values, the
enhancements is much more.
Now today, we will talk about some more spatial domain or mask operations. The first one that
we will talk about is called an unsharp masking.
10
So, by unsharp masking, we mean; you know that for many years, in the publishing companies
were using a kind of enhancement where the enhancement in the image was obtained subtracting
a blurred version of the image from the original image. So, in such cases, the sharpened image
was obtained as f s (x, y) if I represent it by f s as the sharpened image, then this was obtained by
subtracting f (x, y) and f bar (x, y).
So, this f bar (x, y) is nothing but a blurred version or blurred f (x, y). So, if we subtract the
blurred image from the original image what we get is the details in the image or we get a
sharpened image. So, this f s (x, y) is the sharpened image and this kind of operation was known
as unsharped masking.
Now, we can slightly modify this particular equation to get an expression for another kind of
masking operation which is known as high boost filtering. So, high boost filtering is nothing but
a modification of this unsharp masking operation. So, we obtain high boost filtering as we can
write it in this form f hb (x, y) which is nothing but A times f (x, y) minus f bar (x, y) for A
greater than or equal to 1.
So, we find that if I said the value of this constant A equal to 1, then this high boost filtering
becomes same as unsharp masking. Now, you I can rewrite this particular expression, I can
rewrite this in the form (A minus 1) f (x, y) plus f (x, y) minus f bar (x, y). Now, this f (x, y)
minus f bar (x, y), this is nothing but the sharpened image f s (x, y).
11
So, the expression that I finally get for high boost filtering is f hb (x, y) is equal to A minus 1 f (x,
y) plus f s (x, y). Now, it does not matter in which way we obtain the sharpened images. So, if I
use the Laplacian operator to obtain this sharpened image; in that case, the high boost filtered
output f hb (x, y) simply becomes A f (x, y) minus the Laplacian operator on f (x, y) and this is the
case when the center coefficient in the Laplacian mask will be negative or I will have the same
expression which is written in the form A f (x, y) plus Laplacian of f (x, y) when the center
coefficient in the Laplacian mask is equal to positive.
So, as we have seen earlier that this first expression will be used if the center coefficient in the
Laplacian mask is negative and the second expression will be used if the center coefficient in the
center coefficient in the Laplacian mask is positive.
12
So, using this we can get a similar type of mask where the mask is given by this particular
expression. So, using these masks, we can go for high boost filtering operation and if I use this
high boost filtering, I get the high boost output as we have already seen earlier.
Now, so far the kinds of derivative operators that we have used for sharpening operation, all of
them are second order derivative operators; we have not used first order derivative operators for
filtering so far but first order derivative operators are also capable of enhancing the content of the
image particularly at discontinuities and at region boundaries or edges.
Now, the way we obtain the first order derivative of a particular image is like this.
13
What you used for obtaining the first order derivatives is by using the gradient operator where
the gradient operator is given like this. Gradient of a function f as the gradient is a vector, so we
will write as a vector is nothing but dell f dell x and dell f dell y. So, this is what gives the
gradient of a function f and what we are concerned about for enhancement is the magnitude of
the gradient.
So, magnitude of the gradient, we will write it as dell f which is nothing but magnitude of the
vector grade f which is usually dell f by dell x square plus dell f by dell y square and square root
of this. But you find that this particular expression if I use, this leads to some computational
difficulty in the sense that we have go for squaring and then square root and getting an square
root in the digital domain is not an not an easy task.
So, what we do is we go for an approximation of this and the approximation is obtained as dell f
dell x magnitude plus dell f dell y magnitude of this. So, this is what gives us the first order
derivative operator on an image.
14
And, if I want to obtain dell f dell x, you find that this dell f dell x can simply be computed as f
(x plus 1, y minus 1) plus f (x plus 1, y plus 1) plus 2 f (x plus 1, y) minus f (x minus 1, y minus
1) plus f (x minus 1, y plus 1) plus 2 f (x minus 1, y). So, this is the first order derivative along x
direction and in the same manner, we can also obtain the first order derivative in the y direction.
Now, once we have this kind of discrete formulation of the first order derivative; so similarly, I
can find out dell f dell y which also which will also have a similar form. So, once I have such
discrete formulations of the first order derivatives, we can have a mask which will compute the
first order derivative of an image.
15
So, for computing the first order derivative along x direction, the left hand side shows the mask
and for computing the first order derivative along y direction, the right hand side shows the
mask. And later on, we will see that these operators are known as Shobel operator and using
these first order derivatives; when we apply these first order derivatives on the images, the kind
of processed image that we get is like this.
So, you find that on the left hand side, we have the original image and on the right hand side, we
have the processed image and in this case, you find that this processed image region image
which highlights the edge regions or discontinuity regions in the original image. Now, in many
practical applications such simple derivative operators are not sufficient. So in such cases, what
we may have to do is we may have to go for combinations of various types of operators which
give us the enhanced image. So, with this we come to the end of our discussion on spatial
domain processing techniques.
Now, we start discussion on the frequency domain processing techniques. Now, so far you must
have noticed that this mask operations or the spatial domain operations using the masks,
whatever we have done that is nothing but convolution operation in 2 dimension.
16
So what we have done is we have the original image f (x, y), we defined a mask corresponding to
the type of operation that we want to perform on the original image f (x, y) and using this mask
the kind of operation that is done the mathematical expression of this is given on the bottom and
if you analyze this, you will find that this is nothing but a convolution operation.
So, using this convolution operation, we are going for spatial domain processing of the images.
Now, we have seen we have already seen during our earlier discussions that a convolution
operation in the spatial domain is equivalent to multiplication in the frequency domain.
Convolution in the spatial domain is equivalent to multiplication in the frequency domain.
Similarly, a convolution in the frequency domain is equivalent to multiplication in the spatial
domain.
17
So, what we have seen is that if we have a convolution of say 2 functions f (x, y) and h (x, y) in
the spatial domain, the corresponding operation in the frequency domain is multiplication of F
(u, v) and H (u, v) where F (u, v) is the Fourier transform of this spatial domain function f (x, y)
and h (u, v) is the Fourier transform of the spatial domain function h (x, y).
Similarly, if we multiply two functions f (x, y) and h (x, y) in the spatial domain, the
corresponding operation in the frequency domain is the convolution operation of the Fourier
transforms of f (x, y) which is F (u, v) that has to be confirmed with H (u, v). So, these are the
convolution theorems that we have done during our previous discussions.
So, to perform this convolution operation; the equivalent operation can also be done in the
frequency domain if I take the Fourier transform of the image f (x, y) and I take the Fourier
transform of the spatial mask that is h (x, y). So, the Fourier transform of the spatial mask h (x,
y) as we have said that this is nothing but H (u, v) in this particular case.
So, the equivalent filtering operations, we can do in the frequency domain by choosing the
proper filter H (u, v). Then after taking the product of F (u, v) and H (u, v) if I take the inverse
Fourier transform, then I will get the processed image in the spatial one. Now, to analyze this
further, what we will do is we will take the case in 1 dimensional and we will consider the filters
based on Gaussian functions for analysis purpose.
The reasons we are choosing this filters based on Gaussian functions is that the shapes of such
functions can be easily specified and easily analyzed. Not only that; the forward transformation,
the forward Fourier transformation and the inverse Fourier transformation of Gaussian functions
are also Gaussian.
18
So, if I take a Gaussian filter in the frequency domain; I will write a Gaussian filter in the
frequency domain as H (u) is equal to some constant to A e to the power minus u square by 2
sigma square where sigma is the standard deviation of the Gaussian functions and if I take the
inverse Fourier transform of this, then the corresponding filter in the spatial domain will be given
by h (x) is equal to root over 2 pie A e to the power minus 2 pie square sigma square x square.
Now, if you analyze these 2 functions that is H (u) in the frequency domain and h (x) in the
spatial domain, you find that both these functions are Gaussian as well as real and not only that;
both this functions, they behave reciprocally with each other. That means when H (u) has a broad
profile; this particular function H (u) in the frequency domain, it has a broad profile that is it has
a large value of standard deviation sigma. The corresponding h x in the spatial domain will have
a narrow profile.
Similarly, if H (u) has narrow profile, h (x) will have a broad profile. Particularly, when this
sigma tends to infinity, then this function H (u) this tends to be a flat function and in such case,
the corresponding spatial domain filter h (x) this tends to be an impulse function. So, this shows
that both H (u) and h (x), they are reciprocal to each other.
Now, let us see what will be the nature of these functions, nature of such low pass filter
functions.
19
So here, on the left hand side, we have shown the frequency domain filter H (u) as a function of
u and on the right hand side, we have shown the corresponding spatial domain filter h (x) which
is a function of x. Now from these filters, it is quite obvious that all the values once I specify a
filter H (u) as a function of u in the frequency domain, the corresponding filter h (x) in the spatial
domain, they will have all positive values.
That is none h (x) never become positive negative for any value of x and the narrowed the
frequency domain filter, more it will attenuate the low pass frequency components resulting in
more blurring effect. And if I say make the frequency domain filter narrower that means the
corresponding spatial domain filter or spatial domain mask will be flatter. That means the mask
size in the spatial domain will be larger.
20
So, this slide shows 2 such masks that we have already discussed during our previous discussion.
So, this is the mask where all the coefficients are positive and same and in this mask, the
coefficients are all positive but the variation shows that it is having some sort of Gaussian
distribution in nature and we have already said that if the frequency domain filter becomes very
narrow, it will attenuate even the low frequency components leading to a blurring effect of the
processed image.
Correspondingly in the high pass correspondingly in the spatial domain, the mask size will be
larger and you have seen through our results that if I use a larger mask size for smoothing
operation, then the image gets more and more blurred.
21
Now, in the same manner, as we have said the low pass filter; we can also make the high pass
filters again in the Gaussian domain.
So in this case, in case of Gaussian domain, using the Gaussian function, the high pass filter H
(u) can be defined as A into 1 minus e to the power minus u square by 2 sigma square. So, this is
the high pass filter which is defined using the Gaussian function. If I take the inverse Fourier
transform of this, the corresponding spatial domain filter will be given by h (x) equal to A into
delta x minus the same square root of 2 pie into A into e to the power minus 2 pie square sigma
square x square.
22
So, if I plot this in the frequency domain, this shows the high pass filter in the frequency domain.
So, as it is quite obvious from this plot that it will attenuate the low frequency components
whereas it will pass the high frequency components and the corresponding filter in the spatial
domain is having this form which is given by h (x) as the function of x.
Now, as you note from this particular figure, from this particular function h (x) that h (x) can
assume both positive as well as negative arrows and an important point to note over here is once
h (x) becomes negative; it will remain negative, it does not become positive anymore and in the
spatial domain, the Laplacian operator that we have used earlier, the Laplacian operator was of
similar nature.
So, the Laplacian mask that we have used, we have seen that the center pixel is having a positive
value whereas all the neighboring pixels have the negative values and this is true for both the
Laplacian masks if I consider only the vertical and horizontal components or whether along with
vertical and horizontal components, I also consider the diagonal components.
So, these are the 2 Laplacian masks where the center coefficient is positive and the neighboring
coefficients once they become negative, they will remain negative. So, this shows that using the
Laplacian mask in the spatial domain, the kind of operation that we have done is basically a high
pass filtering operation.
So, now first of all, we will consider the smoothing frequency domain filters or low pass filters
in the frequency domain. Now, as we have already discussed that edges as well as sharp
transitions like noises, they lead to high frequency components in the image and if we want to
reduce these high frequency components, then the kind of filter that we have to use is a low pass
filter where the low pass filter will allow the low frequency components of the input image to be
23
passed to the output and it will cut off the high frequency components of the input image which
will not be passed to the output.
So, our basic model for this filtering operation will be like this that we will have the output in the
frequency domain which is given by G (u, v) which is equal to H (u, v) multiplied by F (u, v)
where this F (u, v) is the Fourier transform of the input image and we have to select a proper
filter function H (u, v) which will attenuate the high frequency components and it will let the low
frequency components to be passed to the output.
Now here, we will consider an ideal low pass filter where we will assume the ideal low pass
filter to be like this that H (u, v) is equal to 1 if D (u, v) where D (u, v) is the distance of the point
(u, v) in the frequency domain from the origin of the frequency rectangle. So, if D (u, v) is less
than or equal to some value say D 0 , then H (u, v) H (u, v) will be equal to 1 and this will be
equal to 0 if the distance from the origin of the point uv is greater than D 0 .
So, this clearly means that if I multiply F (u, v) with such an H (u, v), then all the frequency
components laying within a circle of radius D 0 will be passed to the output and all the frequency
components laying outside this circle of radius D 0 will not be allowed to be passed to the output.
Now, if the Fourier transform F (u, v) is centered is the centered Fourier transform that means
the origin of the Fourier transform rectangle is set at the middle of the rectangle; then this D (u,
v), the distance value is simply computed as u minus M by 2 square plus v minus N by 2 square,
square root of this where we are assuming that we have an image of size M by N. So, for an M
by N image size, D (u, v) will be computed like this if the Fourier transform F (u, v) is the
centered Fourier transformation.
24
A plot of this kind of function is like this. So, here you find that the left hand side shows the
perspective plot of such an ideal filter whereas on the right hand side, we just show the cross
section of such an ideal filter and in such cases, we define a cut off frequency of the filter to be
the point of transition between H (u, v) equal to 1 and H (u, v) equal to 0.
So, in this particular case, this point of transition is the value D 0 , so you consider D 0 to be the cut
off frequency of this particular filter. Now, it may be noted that such a sharp cut off filter is not
realizable using the electronic components. However, using software using computer program it
is different because we are just letting some values to be passed to the output and we are making
the other values to be 0.
So, this kind of ideal low pass filter can be implemented using software whereas using electronic
components, we may not be or we are not able to implement such ideal low pass filters. So, a
better approximation of this is a filter which is called butter worth filter.
25
So, a butter worth filter, a butter worth low pass filter is the response, the frequency response of
this is given by H (u, v) is equal to 1 upon 1 plus D (u, v) by D 0 to the power 2n. So, this is
butter worth filter of order n. The response of or the plot of such a butter worth filter is shown
here.
So here, we have shown the butter worth butter worth filter, the perspective plot of the butter
worth filter and on the right hand side, we have shown the cross section of this butter worth
filter. Now, if I apply the ideal low pass filter and the butter worth filter on an image, let us see
what will be the kind of the output image that we will get.
26
So, in all this cases, we assume that first we take the Fourier transform of the image, then
multiply that Fourier transformation with the frequency response of the filters, then whatever the
product that we get, we take the inverse Fourier transformation of this to obtain our processed
image in the spatial domain.
So here, we use 2 images for test purpose. On the left hand side, we have shown an image
without any noise and on the right hand side, we have shown an image where we have added
some amount of noise.
27
Then, if I process that image using the ideal low pass filter and using the butter worth filter; the
top rows shows the results with ideal low pass filter when the image is without noise and the
bottom row shows the result by applying the butter worth filter again when there is no noise
contamination with the image.
Here, you find as the top row shows that if I use the ideal low pass filter for the same cutoff
frequency say 10, the blurring of the image is very high compared to the blurring which is
introduced by the butter worth filter. If I increase the cut off frequency, when I go for cut off
frequency of 20; in that case you find that in the original image, in the ideal low pass filtered
image the image is very sharp but the disadvantage is that if you simply look at this locations say
along this locations, you find that there is some ringing effect. That means there are a number of
lines, undesired lines which are not present in the original image.
Same is the case over here. So, the butter worth filter, butter worth low pass filter; it introduces
the ringing effect, the ringing effect which are not visible in case of butter worth filter.
Now, the reason why the ideal low pass filters introduces the ringing effect is that we have seen
that for an ideal low pass filter in the frequency domain, the ideal low pass filter response was
something like this. So, if I plot u verses H (u), this was the response of the ideal low pass filter.
Now, if I take the inverse Fourier transform of this, corresponding h (x) will have a function of
this form, like this. So here, you find that there is a main component which is the central
component and there are other secondary components.
Now, the spread of this main component is inversely proportional to D 0 which is the cut off
frequency of the butter of the ideal filter, ideal low pass filter. So, as I reduce D 0 , this spread is
going to increase and that is what is responsible for more and more blurring effect of the
smoothed image. Whereas, all the secondary components; the number of this components again
28
over an unit length is again an inverse function, inversely proportional to this cut off frequency
D 0 and these are the once which are responsible for ringing effect.
When I use butter worth filter, the outputs that we have shown here using the butter worth filters,
these outputs are obtained using butter worth filter of order 1 that is value of N is equal to 1. So,
butter worth filter of order 1 does not leads to any kind of ringing effect. Whereas, if I go for
butter worth filter of higher order that may lead to the ringing effect. In the same manner, we can
also go for Gaussian low pass filter.
And we have already said that for a Gaussian low pass filter, the filter response H (u, v) is given
by e to the power minus D square (u, v) upon 2 sigma square and if I allow sigma to be equal to
the cut off frequency say D 0 , then this H (u, v) the filter response will be e to the power minus D
square uv upon 2 D 0 square.
Now, if I use such a Gaussian low pass filter for filtering operation and as we have already said
the inverse Fourier transform of this is also Gaussian in nature; so using the Gaussian filters, we
will never have any ringing effect in the processed image. So, this is the kind of the low pass
filtering operation or smoothing operations in the spatial domain that we can have. We can also
have the high frequency operation or sharpening filters in the frequency domain.
So, as low pass filters give the smoothing effect, the sharpening effect is given by the high pass
filter. Again, we can have the ideal high pass filter, we can have the butter worth high pass filter,
we can also have the Gaussian high pass filter.
29
So, just in the reverse way we can define an ideal high pass filter as, for an high pass filter, the
ideal high pass filter will be simply H (u, v) is equal to 0 if D (u, v) is less than or equal to D 0
and this will be equal to 1 if D (u, v) is greater than D 0 . So, this is the ideal high pass filter.
Similarly, we can have butter worth high pass filter where H (u, v) will be given by the
expression 1 upon 1 plus D 0 by D (u, v) to the power 2n and we can also have the Gaussian high
pass filter which is given by H (u, v) is equal to 1 minus e to the power minus D square (u, v)
upon 2 D 0 square and you find that in all these cases; the response, the frequency response of an
high pass filter if I write it as H hp is nothing but 1 minus the response of a low pass filter.
30
So, the high pass filter response can be obtained by the low pass filter response where the cutoff
frequencies are same. Now, using such high pass filters, the kind of results that we can obtain is
given here.
So, this is the ideal high pass filter response where the left hand side gives you the perspective
plot and the right hand side gives you the cross section.
31
This shows the butter worth filter perspective plot as well as cross section of butter worth filter
of order 1 and if I apply such high pass filters to the image to the same image, then the result that
will obtained is something like this.
So here, on the left hand side, this is the response of an ideal high pass filter. On the right hand,
side you have shown the response of butter worth high pass filter and in both these cases, the cut
off frequency was taken to be equal to 10.
32
This one where the cutoff of frequency was taken to be equal to 50 and if you closely look at the
ideal filter output; here again you find that you can obtain, you can find that there are ringing
effects around this boundaries whereas in case of butter worth filter, there is no ringing effect.
And again, we said that this is the butter worth filter of order 1 if I go for higher order butter
worth filters that also may lead to ringing effects whereas if I go for a high pass filter which is
Gaussian high pass filter, the Gaussian high pass filter does not leads to any ringing effect.
So, using this high pass filters, I can go for smoothing operation using the low pass filters, I can
go for the smoothing operation and using the high pass filters, I can go for image sharpening
operation. The same operation can also be done using the Laplacian in the frequency domain.
It is simply because if I take the Laplacian of a function; if for a function f (x, y), I get the
corresponding frequency domain say F (u, v) the corresponding Fourier transform, then the
Laplacian operator if I perform del square f (x, y) and take the Fourier transform of this, this will
be nothing but it can be shown it will be equal to minus u square plus v square into F (u, v).
So using this operation, if I consider say H (u, v) is equal to minus u square plus v square and
using this as a filter, I filter this F (u, v) and after that I compute the inverse Fourier
transformation; then the output that we get is nothing but a Laplacian operated output which will
be obviously an enhanced output. Another kind of filtering that we have already done during in
connection with our spatial domain operation that is high boost filtering.
33
So, there we have said that in spatial domain; the high boost filtering operation, the high boost
filtering output f (x, y) if I represent it if I represent this as f hb (x, y) is nothing but A into f (x, y)
minus f lp (x, y) and which is can be represented as A minus 1 into f (x, y) plus f high pass
filtered output (x, y).
In the frequency domain; the corresponding operation, the corresponding filter can be
represented by H hb (u, v) is equal to A minus 1 plus high pass filter (u, v). So, this is what is the
high boost filtered response in the frequency domain.
34
So, if I apply this high boost filter to an image, the kind of result that we get is something like
this where again on the left hand side is the original image and on the right hand side, it is the
high boost filtered image. Now, let us consider another very very interesting filter which we call
as homomorphic filter, homomorphic filter.
The idea aims from our one of the earlier discussions where we have said that the intensity at a
particular point in the image is the product of 2 terms. One is the illumination term, other one is
the reflectance term. That is f (x, y) we have earlier said that it can be represented by an
illumination term i (x, y) multiplied by r (x, y) where r (x, y) is the reflectance term.
Now, coming to the corresponding frequency domain because this is the product of 2 terms; one
is the illumination, other one is the reflectance, taking the Fourier transform directly on this
product is not possible. So, what we do is we define a functions say z (x, y) which is logarithm of
f (x, y) and this is nothing but logarithm of i (x, y) plus logarithm of r (x, y) and if I compute the
Fourier transform, then the Fourier transform of z (x, y) will be represented by z (u, v) which
will have 2 components F i (u, v) plus F r (u, v) where this F i (u, v) is the Fourier transform of ln i
(x, y) and F r (u, v) is the Fourier transform of ln r (x, y).
35
Now, if I define a filter say H (u, v) and apply this filter on this Z (u, v), then the output that I get
is say S (u, v) which is equal to H (u, v) times Z (u, v) which will be nothing but H (u, v) times f i
(u, v) plus H (u, v) times F r (u, v).
Now, taking the inverse Fourier transform, I get s (x, y) is equal to i dash (x, y) plus r dash (x, y)
and finally I get g (x, y) which is nothing but e to the power s (x, y) which is nothing but e to the
power i dash (x, y) into e to the power r dash (x, y) which is nothing but i 0 (x, y) into r out (x, y).
So, the first term is the illumination component and second term is the reflectance component.
Now, because of this separation, it is possible to design a filter which can enhance the high
frequency components and it can attenuate the low frequency components. Now, it is generally
the case that in an image, the illumination components leads to low frequency components
because illumination is slowly fairing whereas as the reflectance component leads to high
frequency components, particularly at the boundaries of 2 reflecting objects.
As a result, the reflectance term leads to high frequency components and illumination terms
leads to low frequency components.
36
So now, if we define a filter like this a filter response like this and here if I say that I will have
say gamma H greater than 1 and gamma L less than 1, this will amplify all the high frequency
components that is the contribution of the reflectance and it will attenuate the low frequency
components that is contribution due to the illumination. Now, using this type of filtering, the
kind of result that we get is something like this.
Here, on the left hand side is the original image and on the right hand side is the enhanced image.
And if you look in the boxes, you find that many of the details in the boxes which are not
available in the original image is now available in the enhanced image. So, using such
37
homomorphic filtering, we can even go for this kind of enhancement or the illumination, the
contribution due to illumination will be reduced. So, even in the dark areas, we can take out the
details.
So with this, we come to an end to our discussion on image enhancements. Now, let us go to
some questions of our today’s lecture.
The first question is a digital image contains an unwanted region of size 7 pixels. What should be
the smoothing mask size to remove this region? Why Laplacian operator is normally used for
image sharpening operation? Third question - what is unsharp masking? Fourth question - give a
3 by 3 mask for performing unsharp masking in a single pass through an image. Fifth, state some
applications of first derivative in image processing.
38
Then, what is ringing? Why ideal low pass and high pass filters lead to ringing effects? How
does blurring vary with cut off frequency? Does Gaussian filter lead to ringing effect? Give the
transfer function of a high boost filter and what is the principle of homomorphic filter?
Thank you.
39
Prof. P.K. Biswas
Lecture - 22
Image Restoration - I
Hello, welcome to the video lecture series on digital image processing. During our last few
lectures, we have talked about various image enhancement techniques.
So, we have talked about image enhancement techniques both in the spatial domain as well as in
the frequency domain. So, among spatial domain techniques, we have talked about the point
processing techniques and we have also talked about the mask processing techniques and in
frequency domain; we have talked about ideal and butter worth low pass filters, we have talked
about ideal and butter worth high pass filters, we have talked about Gaussian filters and we have
also talked about homomorphic filters and we have said that when we are filtering an image in
the frequency domain using a low pass filter, if the low pass filter is an ideal low pass filter; in
that case, there is a ringing effect in the output of the image.
The ringing effect is reduced by using the butter worth filter because of smooth transition which
is given by the butter worth filter from low frequency region to the high frequency region.
However, even in the butter worth filter if we use a butter worth filter of order more than 1 that is
1
if I use a butter worth filter of order 2 or order 3 and so on; in such cases also, the butter worth
filter leads to the ringing effect.
However, we have discussed that if we use Gaussian filters, then Gaussian filters do not lead to
ringing effect at all. Same is the situation in case of the high pass filters where the high pass
filters try to enhance the high frequency components or detailed contents of an image and it
suppresses the low frequency components and that is the reason that the output of a high pass
filter we have seen that if there is any smooth region in the image, the smooth region is almost
appearing as black in the processed image.
Homomorphic filter as we have discussed is a very very interesting filter. It tries to enhance the
reflectance component in an image and it tries to suppress the contribution of the intensity
component of the image or the effect of the illumination of the same object and by using this, we
have seen some interesting result that even in areas of very low illumination where the areas is
not illuminated properly while taking the images, even in such areas, some details of the image,
we have been able to extent.
Now, in today’s lecture or in a number of lectures starting from today, we will talk about image
restoration techniques. So, we will talk about image restoration techniques and we will see what
is the difference between image enhancement and image restoration. We will talk about image
formation process and the degradation model involved in it and we will see the degradation
model and the degradation operation in continuous functions and how it can be formulated in the
discrete domain.
Now, when we have talked about the image enhancement, particularly using a low pass filter or
using smoothing masks in the special domain; we have seen that one of the effect of using a low
pass filter or the effect of using a smoothing mask in the special domain is that the noise content
of the image gets reduced.
2
The simple reason is the noise content leads to high frequency components in the displayed
image. So, I if can remove or reduce the high frequency components that also leads to reduction
of the noise. Now, this type of reduction of the noise is also a sort of restoration. But these are
not usually termed as restoration. Rather a process which tries to recover or which tries to restore
an image which has been degraded by some knowledge of a degradation method which has
degraded the image; this is an operation which is known as image restoration.
So, in case of image restoration, the image degradation model is very very important. So, we
have to find out what is the phenomena or what is the model which has degraded the image and
once that model, the degradation model is known; then we have to apply the inverse process to
recover or restore the desired image.
So, this is the difference between an image enhancement or simple noise filtering in terms of
image enhancement and image restoration. That is in case of image enhancement or simple noise
filtering, we do not make use of any of the degradation model or we do not bother about what is
the process which is degrading the image. Whereas in case of image restoration, we will talk
about the degradation model, we will try to estimate the model that has degraded the image and
using that model; we apply the inverse process and try to restore the image.
So, the degradation modeling is very very important in case of image restoration and when we
try to restore an image, in most of the cases, we define some goodness criteria. So, using this
goodness criteria, we can find out an optimally restored image which more or less which is
almost same as the original image and we will see later that image restoration operations can be
applied as in case of image enhancement both in the frequency domain as well as in the spatial
domain.
So, first of all, let us see that what is the image degradation model that we will consider in our
subsequent lectures. So, let us see the image degradation model first.
3
So here, we assume that our input image is image f (x, y). It is a 2 dimensional function as before
and we assume that this f (x, y), the input image f (x, y) is degraded by a degradation function H.
So, we will put it like this that we have a degradation function H which operates on the input
image f (x, y).
Then, the output of this degradation function is added to an additive noise. So here, we add a
noise term which we represent by say eta (x, y) which is added to the degradation output and this
finally gives us the output image g (x, y). So, this g (x, y) is the degraded image which we want
to recover. So, from this g (x, y), we want to recover the input image, the original input image f
(x, y) using the image restoration techniques.
So, for recovering this f (x, y), what we have to do is we have to perform some filtering
operation and we will see later that this filters, they are actually derived using the knowledge of
the degradation function that is H and output of the filters is our restored image and let us put it
as f hat (x, y) and we put it as f hat (x, y) because in most of the cases, we are unable to restore
the image exactly. That means it is very difficult to get the exact image f (x, y) rather by using
the goodness criteria that we have just mentioned; what we can do is we can get an
approximation of the original image f (x, y). So, that is this reconstructed image f hat (x, y)
which is an approximation of the original image f (x, y).
So, the blocks from here to here that is upto obtaining g (x, y), this is actually the process of
degradation; so you will find that in the degradation, we first have a degradation function H
which operates on the input image f a f (x, y), then the output of this degradation function block
that is added with an additive noise which in this particular case we have represented as eta (x, y)
and this degradation function output added to this additive noise that is what is the degraded
image that we actually absorb and this degraded image is filtered by using the restoration filters.
So, this filters that we use they are actually restoration filters.
4
So, this g (x, y) is passed through the restoration filters where we get the filter output as the
reconstructed image f hat (x, y) and as we have just said that this f hat (x, y) is an approximation
of the original image f (x, y). So, this particular block which represents an operation this is a
restoration operation and as we have said that the process we call as image restoration in that, the
knowledge of the degradation model is very very essential.
So, one of the fundamental task, one of the very important task in the restoration process is to
estimate the degradation model of the degradation model which has degraded the input image
and later on we will see various techniques of how to estimate the degradation model. That is
how to estimate the degradation function H and we will see in a short while from now that this
particular operation that is the conversion from f (x, y) to g (x, y), this can be represented in
special domain as g (x, y) is equal to h (x, y) convolution with f (x, y) plus the noise eta (x, y).
So, this is the operation which is done in the spatial domain and the corresponding operation in
frequency domain will be represented by G (u, v) is equal to H (u, v) into F (u, v) plus N (u, v)
where H (u, v) is the Fourier transformation of H (x, y), F (u, v) is the Fourier transformation of
the input image f (x, y), N (u, v) is the Fourier transform of the additive noise eta (x, y) and G (u,
v) is the Fourier transform of the degraded image G (x, y).
And, this operation is the frequency domain operation and the equivalent operation in the spatial
domain is the other one and here you see that in the special domain, we have represented this
operation as the convolution operation and we had said earlier that a convolution in the special
domain is equivalent to multiplication in the frequency domain. So, that is what the second term
that is G (u, v) is equal to H (u, v) into F (u, v) plus N (u, v).
So here, the convolution in the spatial domain is replaced by the multiplication in the frequency
domain. So, these 2 are very very important expressions and we will make use of these
expressions subsequently more or less throughout our discussion on image restoration process.
5
Now, before we proceed further, let us try to recapitulate some of the definitions. So, first we
will look at some of the definitions that will be used throughout our discussion on image
restoration. So here, what we have is we have a degraded image g (x, y) which now let us
represent it is like this H of f (x, y) plus eta (x, y) where in this particular case, we assume that
this H is the degradation operator which operates on the input image f (x, y) and that when added
with the additive noise eta (x, y) gives us the degraded image g (x, y).
Now here, if we assume or for the time being if we neglect the term eta (x, y) or we said eta (x,
y) equal to 0 for the time being for simplicity of our analysis, then what we get is g (x, y) is equal
to H in f (x, y) and as we said that here this H, we assume that this is the degradation operator.
Now, the first term that we will define in our case is what is known as linearity. So, what do you
mean by the linearity or we say that this degradation operator H is a linear operator.
So, for defining linearity, we know that if we have 2 functions say f 1 (x, y) and f 2 (x, y); then we
say that if H [k 1 f 1 (x, y) plus some constant k 2 f 2 (x, y)], this is equal to k 1 H [f 1 (x, y)] plus k 2 H
[f 2 (x, y)]. So, if for these 2 functions f 1 (x, y) and f 2 (x, y) and for these 2 constants k 1 and k 2 ,
this particular relation is true that is H [k 1 f 1 (x, y) plus k 2 f 2 (x, y)] is equal to k 1 H [f 1 (x, y)]
plus k 2 H [f 2 (x, y)] if this relation is true, then the operator H is said to be a linear operator.
And, we know very well from our linear system theory that this is nothing but the famous super
position theorem. So, this is what is known as the super position theorem and as per our
definition of a linear system, we know already that the super position theorem must hold true if
the system is a linear system. Now, using this same equation if I said say k 1 is equal to k 2 is
equal to 1, then the same equation leads to H [f 1 (x, y) plus f 2 (x, y)] this is nothing but H [f 1 (x,
y) plus H f 2 (x, y)].
6
Simply, we have replaced k 1 and k 2 by 1 and this is what is known as additivity property. So, the
additivity property simply says that the response of the system to the sum of 2 inputs is same as
the sum of their individual responses. So here, we have 2 inputs f 1 (x, y) and f 2 (x, y).
So, if I take the summation of f 1 (x, y) and f f 2 (x, y) and then allow H to operate on it, then
whatever result we will get that will same as when H operates on f 1 and f 2 individually and we
take the sum of those individual responses and this 2 must be equal to true for a linear system
and this is what is known as the additivity property. So, this is what is the additivity property in
this particular case.
Now here, again if i assume that f 2 (x, y) is equal to 0. So, this gives H of k 1 f 1 (x, y) should be
equal to k 1 H [f 1 (x, y)] and this is the property which is known as homogeneity property. So,
these are the different properties of a linear system and the system is also called position
invariant if certain properties hold.
So, the system will be position invariant or location invariant if H [f (x minus alpha, y minus
beta)] is same as g of x minus alpha, y minus beta. So, in this case obviously, what we have
assumed is g (x, y) is equal to H [f (x, y)]. So, when this is true that g (x, y) is equal to H [f (x,
y)], then this particular operator H will be called to be position invariant if H (x minus alpha, y
minus beta) is equal to g (x minus alpha, y minus beta) and that should be true for any function f
(x, y) and any value of alpha, beta.
So, this position invariant property this simply says that the response at any point in the image,
the response of H at any point in the image should solely depend upon the value of the pixel at
that particular point and the response will not depend upon the position of the point in the image
and that is what is given by this particular expression that is H [f (x minus alpha y, minus beta)]
equal to g (x minus alpha, y minus beta).
7
Now given these definitions, let us see that what will be the degradation model for what will be
the degradation model in case of continuous functions.
So, to look at the degradation model in case of continuous functions; we make use of an old
mathematical expression where we have seen that if I take a delta function say delta (x, y) and
the definition of delta (x, y) we have seen earlier that this is equal to 1 if x equal to 0 and y equal
to 0 and this is equal to 0 otherwise.
So, this is the definition of a delta function that we have already used and we can use a shifted
version of this delta function. That is delta x minus x 0 and y minus y 0 will be equal to 1 if x
equal to x 0 and y equal to y 0 and it will be 0 otherwise. So, this is the definition of a delta
function.
Now, earlier we have seen that if we have an image say f (x, y) or a 2 dimensional function f (x,
y), then multiply this with delta x minus x 0 , y minus y 0 and integrate this product over the
interval minus infinity to infinity. Then the result of the integral will be simply equal to f (x 0 ,
y 0 ).
So, this says that if I multiply a 2 dimensional function f (x, y) with the delta function delta x
minus x 0 , y minus y 0 and integrate the product over the interval minus infinity to infinity, then
the result will be simply the value of the 2 dimensional function f (x, y) at location (x 0 , y 0 ).
8
So, by slightly modifying this particular expression, we can have an equivalent expression which
is given by I can formulate the 2 dimensional function f (x, y) as a similar integral operation and
in this case, I will take f (alpha, beta) delta (x minus alpha, y minus beta) d alpha d beta and take
the integral from minus infinity to infinity.
So, we find that we have an equivalent mathematical expression which is equivalent to just the
earlier expression that we have said and in this case, we can formulate f (x, y) the 2 dimensional
function f (x, y) in terms of the value of the function at a particular point alpha beta and in terms
of the delta function delta (x minus alpha, y minus beta).
Now, for the time being if we consider say the noise term eta (x, y) is equal to 0 for simplicity,
then we can write the degraded image g (x, y), we have seen earlier that g (x, y) we have written
as H f (x, y) plus eta (x, y); so for the time being, we are assuming that this additive noise term
eta (x, y) is 0 or it is negligible, then the degraded image g (x, y) can now be written in the form
H of… I replace this f (x, y) by this integral term. So, this will be simply H of double integral f
(alpha, beta) delta (x minus alpha, y minus beta) d alpha d beta where the integral has to be taken
from minus infinity to infinity.
So, I can write, I can get an expression of the degraded image g (x, y) in terms of this integral
definition of the function f (x, y) which is operated by the degradation operator H. Now, once I
get this kind of expression, now if I apply the linearity and additivity property of the linear
system; then this particular expression gets converted to g (x, y) is equal to… I can take this
double summation outside, it becomes H of f (alpha, beta) delta (x minus alpha, y minus beta) d
alpha d beta, take the integral from minus infinity to infinity and this is what we have obtained
by applying the linearity and additivity property to this earlier expression of this degraded image.
9
Now, here you find that this term f (alpha, beta), this is independent of the variables x and y. So,
because the term f (alpha, beta) is independent of the variables x and y, the same expression can
now be rewritten in a slightly different form.
So, that form give us that g (x, y) can now be written as same double integral. We take f (alpha,
beta) outside the scope of the operator H. So, this simply becomes f (alpha, beta), then H delta (x
minus alpha, y minus beta) d alpha d beta. Take the integral over minus infinity to infinity.
Now, this particular term H of delta (x minus alpha, y minus beta), we can write this as h (x,
alpha, y, beta) and this is nothing but what is known as the impulse response of H. So, this is
what is known as the impulse response. That is the response of the operator H when the input is
an impulse given in the form delta (x minus alpha, y minus beta) and in case of optics, this
impulse response is popularly known as point spread function or PSF.
So, using this impulse response, now the same g (x, y), we can write as double integral again f of
(alpha, beta) h (x, alpha, y, beta) d alpha d beta, integral from minus infinity to infinity and this is
what is popularly known as super position integral of first kind. Now, this particular expression
is very very important. It simply says that if the impulse response of the operator H is known,
then it is possible to find out the response of this operator H to any arbitrary input f (alpha, beta).
So, that is what has been done here that using the knowledge of this impulse response h (x,
alpha, y, beta), we have been able to find out the response of this system to an input f (alpha,
beta) and this impulse response is the one which uniquely or completely characterizes a
particular system. So, given any system, if we know what is the impulse response of the system,
then we can find out what will be the response of that system to any other arbitrary function.
10
Now, in addition to this, if the function H, this operator H is position invariant; so we use H to be
position invariant, so if H is position invariant, then obviously H [delta x minus alpha, y minus
beta)] as per of our definition of position invariance will be same as h (x minus alpha, y minus
beta). This is as per the definition of position invariance of a system.
Now, using this position invariance property, now we can write g (x, y) that is the degraded
image as simply double integral f (alpha, beta) into h (x minus alpha, y minus beta) d alpha d
beta, take the integral from minus infinity to infinity. And, if you look at this particular
expression, you will find that this expression is not is nothing but the convolution operation. This
is nothing but the convolution operation of the 2 functions f (x, y) and h (x, y) and that is what
we said that when we have drawn our degradation model, we have said that input image f (x, y)
is actually convolved by the degradation process that is H (x, y). So, this is nothing but that
convolution operation.
And now, if I take, you will find that earlier we have considered this noise term eta (x, y) to be
equal to 0. So now, if I consider this noise term eta (x, y), then our degradation function or the
degradation model becomes simply g (x, y) is equal to f (alpha, beta) h (x minus alpha, y minus
beta) d alpha d beta, take the integral from minus infinity to infinity plus the noise term eta (x, y).
So, this is the general image degradation model and you will find that here we have assumed that
the degradation function H is linear and position invariant and it is very important to note that
many of the degradation operations which we encounter in reality can be approximated by such
linear space invariant or linear position invariant models.
The advantage is once a degradation model can be approximated by a linear position invariant
model, then the inter mathematical tool of linear system theory can be used to find out the
solution for such image restoration process. That means we can use all those tools of linear
system theory to estimate what will be the restored image f (x, y) from a given degraded image g
11
(x, y) provided, we know we have some knowledge of the degradation function that is H (x, y)
and we have some knowledge of what is the noise function eta (x, y).
Now, this formulation that we have done till now, this formulation is for the continuous case and
as we have said many times that in order to use this mathematical operation for our digital image
processing techniques, we have to find out a discrete formulation of this mathematical model.
So, let us see that how we can have an equivalent discrete formulation of this particular
degradation model.
So, to get a discrete formulation, firstly we will consider; so we have to get a discrete
formulation. So, to obtain this discrete formulation, for simplicity, initially we will assume the
cases in 1 dimension and later on this we will extend to 2 dimensional cases for digital image
processing operations. Again for simplicity, initially, we will neglect the contribution of the
noise term that is eta (x, y).
So, in case of 1 dimension as we have done in case of in the continuous signal; we have 2 signals
f (x) and another one is h (x). So, we have said that f (x) is the input signal and h (x) tells us that
what is the degradation function. So, f (x) is the input function and h is the h (x) is the
degradation function. For discretization of the same formulation, what we have to do is we have
to uniformly sample these 2 functions f (x) and h (x) and we assume that f (x) is uniformly
sampled to give an array of dimension A and h (x) is uniformly sampled to give an array of
dimension B.
That means for f (x) in the discrete case, x varies from 0, 1 to A minus 1 and h (x) for h (x), x
varies from 0, 1 to b minus 1. Then what we will do, we will add additional 0s to this f (x) and b
(x) to make both of them of the same dimension and dimension equal to say capital M.
12
So, we make both of them to be of dimension capital M by adding additional number of 0s and
we assume that both f (x) and h (x) after addition of this 0 terms and making both of them to be
of dimension M, they become periodic with a periodicity capital M. So, once we have done this,
now the same convolution operation that we have done in case of our continuous case, now can
also be written in case of discrete case.
So, in discrete case, the convolution operation, we will write in this manner. So, after converting
both f (x) and h (x) into arrays of dimension M, this new arrays that we will get, we represent it
by f e (x) that is f extended x as we have extended it and h we represent by h e (x) that is the
extended version of h (x).
And now, in discrete domain, the convolution function can be written as g e (x) is equal to
summation f e (m) h e (x minus m) where this m will be varying from 0 to capital M minus 1 and
x we will assume values from 0 to capital M minus 1. So, this is the discrete formulation of the
convolution equation that we have obtained in case of continuous signal cases.
Now, if you analyze this convolution expression, you will find that this convolution expression
can be written in the form of a matrix, matrix operation. So we can have the matrix form. In
matrix form, these equations will be like this - g equal to some matrix H times f where the
function f or array f will be simply f e (0), f e (1), this way upto f e (capital M minus 1) and
function g similarly will be g e (0), g e (1), so like this it will be g e (M capital M minus 1).
So, you recollect, you just recollect that f e and g e , these are the names which are given to the
sample versions of the functions f (x) and g (x) after extending the functions by addition of
addition by adding additional number of 0’s to make them of dimension capital M.
13
And, in this particular case, the matrix h will have the matrix h will be of dimension capital M by
capital M. But the elements of H will be like this - h e (0), h e (minus 1), continue like this, it will
be h e (minus M plus 1), here it will be h e (1), h e (0), it will be h e (minus capital M plus 2) and if
we continue like this, it will be h e (capital M minus 1), h e (capital M minus 2), like this it will be
h e (0). So, this is the form of the matrix capital H which is the degradation matrix in this
particular case.
And here, you find that that elements of this degradation matrix capital H are actually generated
from the degradation function h e (x). Now, remember that we have assumed that our h e (x), this
function is actually periodic. This is which we have assumed with periodicity of capital M. So, if
this function is periodic with periodicity capital M that means he x plus capital M that will be
same as h e of x.
14
So, by using this periodicity assumption, now this particular degradation matrix H can be written
in a different form where this matrix H will now be represented as h e (0), h e (capital M minus 1),
h e (capital M minus 2) upto h e (1). The second row will be h e (1), h e (0), h e (capital M minus 1)
and this will be h e (2). Third row will be h e (2), h e (1), h e (0) like this it will be h e (3) and the
last row continue in the same manner will be h e (capital M minus 1), h e (capital M minus 2), h e
(capital M minus 3) and the last term will be equal to h e (0).
Now, if you analyze this particular matrix, you will find that this degradation matrix capital H
has a very very interesting property. That means the first property is different rows of this matrix
are actually generated by rotation to the right of the previous term. So, here if you look at the
second row; you will find that this second row is actually generated by rotating the first row to
the right. Similarly, third row is generated by rotating the second row to right by 1.
So, this is so in this particular matrix, the different rows are actually generated by rotating the
previous row to the right. So, this is called circulant matrix because different rows are generated
by a circular rotation and the circularity in this particular matrix is also complete in the sense that
if I rotate this last row to right, what I get is the first row of the matrix. So, this kind of matrix is
known as a circulant matrix.
So here, I find we find that in case of discrete formulation, the discrete formulation is also a
convolution operation and here in the matrix equation of the degradation model, the degradation
matrix H that we obtain that is actually a circulant matrix. Now, let us extend the concept of this
discrete formulation from 1 dimension to 2 dimensions.
15
So, let us see what we get in case of 2 dimensional functions that is in case of 2 dimensional
images. So, in case of 2 dimension, we have the input function or the image function which is
given by f (x, y) and we have the degradation function which is given by h (x, y) and we assume
that this if f (x, y) is sampled to an array of dimension capital A by capital B and say h (x, y) is
sampled to an array of dimension say capital C by capital D.
Now, as we have done in 1 dimensional case that is the functions f (x) and h (x) are actually
extended by using by putting additional number of 0’s to make both of them of same size say
capital N; in the same manner, here we add additional number of 0’s to both this f (x, y) and h (x,
y) to get the extended functions f e (x, y) and h e (x, y) to make both of them of dimension say
capital M by capital N and we also assume that this f e (x, y) and h e (x, y), they are periodic and
in x dimension, the periodicity will be of period capital M and in y dimension, the periodicity
will be of period capital N.
Now, following similar procedure, we can obtain a convolution expression in 2 dimensions

which is given by g e (x, y) which is nothing but f e (m, n) h e (x minus m, y minus n) where n
varies from 0 to capital N minus 1 and m varies from 0 to capital M minus 1.
16
And, if I write this convolution expression in the form of a matrix and incorporating the noise
term eta (x, y), I will get a matrix equation which is of the form g equal to Hf plus n where this
matrix where this vector f is a vector of dimension capital M into N which is obtained by
concatenating different rows of the 2 dimensional function f (x, y) that is the first N number of
elements of this vector f will be the elements of the first row of matrix f (x, y).
Similarly, we also obtain this particular vector n by concatenation of the rows of the matrix eta
(x, y) and this particular degradation matrix h (x) in this case will be of dimension M into N by
M into N and this matrix H will have a very very interesting form. This matrix H can now be
represented as H 0 H M minus 1 like this upto H 1 . The second row can be H 1 H 0 upto H 2 and the last
row is H M minus 1 H M minus 2 like this we have H 0 where each of these terms H j is a matrix, so
each of this Hj is actually a matrix of dimension N by N where this H j is generated from the j’th
row of the degradation function H (x, y).
17
That is this H j we can write this matrix H j in the form h e (j, 0) h e (j, N minus 1) like this upto h e
(j, 1). Second row will be h e (j, 1) h e (j, 0) this way h e (j, 2) and if I continue like this the last
row will be h e (j, N minus 1) h e (j, N minus 2) like this if I continue, the last element will be h e
(j, 0). So, you find that this matrix H j which is actually a component of the degradation matrix
capital H is a circulant matrix that we have defined earlier and using this block matrix, the
degradation matrix H is also have been subscripted in the form of a circulant matrix. So, this
matrix H in this particular case is what is known as a block circulant matrix. So, this is what is
called a block circulant matrix.
18
So, in case of 2 dimensional function that is in case of a digital image, we have seen that the
degradation model can simply be represented by this expression g equal to H into f plus n where
this vector f is a vector of dimension m into n and the degradation matrix H which is of
dimension m into n by m into n is actually a block circulant matrix where for each block, the
matrix is obtained from the j’th row of the degradation function H (x, y).
So, in our next lecture, we will see what will be the applications of this particular degradation
model to restore an image from its degraded version.
So now, let us see some of the questions of this particular lecture. So, the first question is what is
the difference between image enhancement and image restoration? Second question is what is a
linear position invariant system? Third question, what is homogeneity property? Fourth, what is
a circulant matrix? What is a block circulant matrix? Why does the degradation matrix H become
circulant?
Thank you.
19
Hello, welcome to the video lecture series on digital image processing. In the last class, we have
started discussion on image restoration.
We have said that there are certain cases where image restoration is necessary in the sense that in
many cases, while capturing the image or while acquiring the image, some distortions appear in
the image. For example, if you want to capture a moving object with a camera; in that case,
because of the movement of the camera, it is possible that the image that is captured will be
blurred which is known as motion blurring.
There are many other situations, say for example if the camera in not properly focused, then also
the image that you get is a distorted image. So, in such situations, what we have to go for is
restoration of the image or recovery of the original image from the distorted image.
20
Now, regarding this in the last class, we had talked about what is image restoration technique. In
previous classes, we have talked about image filtering that is if the image is contaminated with
noise. Then, we have talked about various types of filters both in spatial domain as well as in
frequency domain to remove that noise and we just mentioned in our last class that this kind of
noise removal is also a sort of restoration because there also we are trying to recover the original
image from a noisy image.
But conventionally, this kind of simple filtering is not known as restoration. But what is by
restoration what I what we mean is that if we know a degradation model by which the image has
been degraded and on that degradation model on the degraded image some noise has been added.
So, recovery or restoration of the original image from a degraded image using the acquired
knowledge of the degradation function of the model using which the image has been degraded;
so, that kind of recovery is normally known as restoration process. So, this is the basic difference
between restoration and image filtering or image enhancement.
Then, we have seen an image formation process where the degradation is involved and we have
talked about the degradation model in continuous functions as well as its discrete formulation.
So, in today’s lecture, we will talk about the …..((59:19))
21
22
Prof. P. K. Biswas
Lecture - 23
Image Restoration - II
Hello, welcome to the video lecture series on digital image processing. In the last class, we have
started discussion on image restoration. We have said that there are certain cases where image
restoration is necessary in the sense that in many cases while capturing the image or while
acquiring the image, some distortions appear in the image. For example, if you want to capture a
moving object with a camera; in that case, because of the movement of the camera, it is possible
that the image that is captured will be blurred which is known as motion blurring.
There are many other situations say for example, if the camera is not properly focused, then also
the image that you get is a distorted image. So, in such situations, what we have to go for is
restoration of the image or recovery of the original image from the distorted image. Now,
regarding this, in the last class we have talked what is image restoration technique.
In previous classes, we have talked about image filtering. That is if the image is contaminated
with noise, then we have talked about various types of filters both in spatial domain as well as in
frequency domain to remove that noise and we just mentioned in our last class that this kind of
noise removal is also a sort of restoration because there also we are trying to recover the original
image from a noisy image.
1
But conventionally, this kind of simple filtering is not known as restoration. But what by
restoration what I mean is that if we know a degradation model by which the image has been
degraded and on that degradation model, on the degraded image, some noise has been added. So,
recovery or restoration of the original image from a degraded image using the acquired
knowledge of the degradation function of the model using which the image has been degraded;
so, that kind of recovery is normally known as restoration process. So, this is the basic difference
between restorations and image filtering or image enhancement.
Then, we have seen an image formation process where the degradation is involved and we have
talked about the degradation model in continuous functions as well as its discrete formulation.
So, in today’s lecture, we will talk about the estimation of degradation models and we will see
that there are basically 3 different techniques for estimation of the degradation model. One is
simply by observation that is by looking at the degraded image; we can estimate that what is the
degradation function which is involved that has degraded the original image. The second
approach is through experimentations. So, there you can estimate the degradation model by using
some experimental setup and the third approach is by using mathematical modeling techniques.
Now, once we know the degradation model, I mean whichever way we estimate the degradation
model, whether it is by observation or by estimation or by using the mathematical models; once
we know the degradation model, then we can go for restoration of the original image from those
degraded images. So, we will talk about various such restoration techniques.
The first one that we will see is what is called inverse filtering, the second one will be called
minimum mean square error or wiener filtering and the third approach is called constrained least
square filtering approach.
2
Now, in our last class, we have seen a diagram like this. So, in this diagram, you see that we
have shown the degradation function. So here, we have an input image f (x, y) which is degraded
by a degradation function H as has been shown in this diagram. So, H is the degradation
function.
So, once I degraded once we get the degraded image at the output of this degradation function H,
then a noise eta (x, y), a random noise eta (x, y) is added to that degraded image and finally here
we get what is our degraded image we call as g of x and y. So, this degraded image g (x, y)
which is normally available to us and from this g (x, y), by using the knowledge of this
degradation function H, we have to restore the original image and for that what we have to make
use of is a kind of restoration filters and depending upon what kind of restoration filter we use,
we have different types of restoration techniques.
Now, in our last class, based on this model, we have said that the degradation mathematical
expression of this degraded operation can be written in one of these 3 forms. The first one is
given by g (x, y) which is equal to h (x, y) convolution with f (x, y) plus eta (x, y) which is the
random noise. So here, f (x, y) that is the original image and the degradation function h (x, y),
they are specified in the spatial domain.
So, in spatial domain, the original image f (x, y) is convolved with the degradation function h (x,
y) and then a random noise eta (x, y) is added to that to give you the observed image which in
this case, we are calling as g (x, y). So, this is the operation that has to be done in the spatial
domain and we have seen earlier that a convolution operation in spatial domain is equivalent to
performing multiplication of their corresponding Fourier transformations.
So, if for spatial domain image f (x, y), the Fourier transformation is capital F (u, v) and for the
degradation function h (x, y), its Fourier transformation is capital H (u, v); then if I multiply this
capital H (u, v) and capital F (u, v) in the frequency domain and then take the inverse transform
3
of it to obtain the corresponding function in the spatial domain, then I will get the same result.
That is convolution in the spatial domain is equivalent to performing multiplication in the
frequency domain and by applying that convolution theorem, this second mathematical
expression of this degradation model which is given by G (u, v) is equal to H (u, v) into F (u, v)
plus N (u, v) where this N (u, v) is nothing but Fourier transform of the random noise eta (x, y)
and G (u, v) is the Fourier transform of the degraded image that is G (x, y).
So either, we can perform this operation in the frequency domain using the frequency
coefficients or we can also perform the same operation directly in the spatial domain and in the
last class, we have derived another mathematical expression for the same degradation operation.
But there the mathematical expression was given in the form of a matrix and that matrix equation
as has been shown here is given by g is equal to H into f plus n where this g is a column matrix
or column vector of dimension m into n where the image is of dimension m by n. f is also
column vector of the same dimension m into n.
This degradation matrix H, this is of dimension m into n by m into n. So, there will be m into n
number of rows and m into n number of columns. So, you find that the dimension of this
degradation matrix h is quite high if our input image if is of dimension capital M into capital N
and similarly this n is a noise term and these, all these 3 terms together gives you the degradation
expression in within in the form of matrix equation.
Now, this particular expression that is matrix expression, direct solution using this matrix
expression is not an easy task. So, we will talk about this matrix expression, the restoration using
this matrix expression a bit later. But for the time being, we will talk about some other simpler
expressions which are direct fall out of the mathematical expression which is given in the
frequency domain.
Now here, you note one point that whether we are doing the operation in the frequency domain
or we are doing the operation in the spatial domain or we make use of this matrix equation for
restoration operation; in all of these cases, knowledge of the degradation function is essential
because that is what is our restoration problem. That is we try to restore or recover the original
image using acquired knowledge of the degradation function.
So accordingly, as we have said earlier that estimation of the degradation function which
degrades which has degraded the image is very very essential and we have 3 different
approaches using which we can estimate the degradation function.
4
And those approaches, as we have said that there are 3 basic approaches; the first approach is by
observation that is we observe a given image, a given degraded image and by observing the
given degraded image, we can estimate, we can have an estimation of what is the degradation
function. The second approach is by experimentation that means we will have experimental setup
using which we can estimate what is the degradation function that has degraded the image and
the third approach is by mathematical modeling.
So, we can estimate the degradation function using one of these 3 approaches and whichever
degradation function or the degradation model we get, using that we try to restore our original
image from the observed degraded image and the method of restoring the original image from
the degraded image using the degradation function obtained by one of this 3 methods is what is
called a blind convolution.
The reason it is called a blind convolution operation is that using one of these estimation
techniques, the degradation model or the degradation function that we get is just an
approximation. It is not the actual degradation that has taken place to get the degraded image. So,
because it is not the actual degradation function, it is just an approximation, the method of
getting the inverse process that is restored image using one of these degradation functions is
known as blind convolution operation. So, we will talk about this degradation functions one by
one.
5
The first one that we will talk about is estimation of the degradation function by observation. So,
when we try to estimate a degradation function by observation when no acquired knowledge of
the degradation function is given; so what we have is the degraded image g (x, y) and by looking
at this degraded image g (x, y), we have to estimate what is the degradation function involved.
Now, for doing this, what we do is you look at the degraded image, then try to identify a region
which is having some simpler structure. So, if we have a complete degraded image; in this
complete degraded image, you identify a small region, the region which contains some simple
structure. Say for example, it may be an object boundary where a part of the object as well as a
part of the background is there.
Now, after you identify such a region having simple structures, then what we do is we try to
estimate an original image which should have been degraded to give you this degraded image
and this original image should be of same size as the image that has been chosen from the sub
image which has been chosen from the degraded image, their structure should be same and the
gray level regions in this estimated image should be obtained by observing the gray levels in
different regions of the image of the sub image sub degraded image that has been chosen and
once I get this, this is my approximate reconstructed image say f hat (x, y) and this is my
degraded image, let me call it as g hat (x, y).
And once I get this, then I take the Fourier transform of this g hat (x, y) or because it is sub
image, instead of calling it as g hat, let me call it as g s g s (x, y); I take the Fourier transform of
this to get capital G s (u, v). Similarly, the image that I have reconstructed that I have formed by
observation; what should be the actual image? I call it f s (x, y) and from this, if I take the
Fourier transform, I get the Fourier coefficients given by F s (u, v).
Now, our purpose is that we can have an estimation of the degradation function which is given
by H s (u, v). That should be estimated as G s (u, v) upon F s (u, v). So, while doing this, you find
6
that when we have got this particular expression, what we have done is we have neglected the
noise term. Now, in order for this to be a logical, a logical one, this approach to be a logical one;
when I choose a sub image in the original image of which the reconstructed image should have
been this. This sub image should be in a region where the image content is very strong to
minimize the effect of the noise in this particular estimation of F s (u, v) H s (u, v).
Now, this H s (u, v) has been approximated over a small sub region of the degraded image and
then we have formed an approximation of the degraded image that what should have been the
original image. So naturally, this H s (u, v) is of smaller size. But for the restoration purpose, we
need H (u, v) to be of size m by n if my original image is of size capital M by capital N. So, the
next operation will be that you extend this H s (u, v) to H (u, v) to encompass all the pixels, all
the frequency components of that particular image.
Now, let us just look at an example. Say here, we have shown a degraded image, say this is a
degraded image which has been cut out from a bigger degraded image. So, this degraded image
has been cut out from a bigger degraded image and by observation, we form original image like
this. So, … if this is the degraded image, then the original image would have been something like
this and while construction of this approximate original image, you find that in this region, the
intensity value is maintained to be similar to the intensity value in this region.
Similarly in this region, the intensity value is maintained to be similar to the intensity value of
this. So, this is my f s (x, y) and this one is my g s (x, y). So, from this, by taking Fourier
transform, we will compute F s capital F s (u, v) and from here, by using the Fourier
transformation, we will compute G s (u, v).
So, by combining this 2, from this 2; now, I can have an estimation of the degradation function
which is given by H s (u, v) which is equal to G s (u, v) upon F s (u, v). So, this is the method
7
that we can use for estimation by observation. The next technique, the other technique for
estimation of the degraded function is by experimentation.
So, what we do in case of this experimentation? Here, we try to get an imaging setup which is
similar to the imaging setup using which the degraded image has been obtained. So first, we have
to get an imaging setup similar to the original imaging setup and the assumption is that using this
imaging setup which is similar to the original imaging setup, if I can estimate what is the
degradation function of this imaging setup which has been acquired which is similar to the
original; then the same degradation function also applies to the original one.
So here, our purpose will be to find out the point spread function or the impulse response of this
imaging setup. So, our idea will be to obtain the impulse response of this imaging setup and as
we have said earlier, during our earlier discussion that it is the impulse response which fully
characterizes any particular system. So, once the impulse response is known, the response of that
system to any arbitrary input can be computed from the impulse response.
So, our idea here is that we want to obtain the impulse response of this imaging setup and we
assume because this imaging setup is similar to the original that the same impulse response is
also valid for the original imaging setup. So here, the first operation that we have to do is we
have to simulate an impulse. So, first requirement is impulse simulation.
Now, how do you simulate an impulse? An impulse can be simulated by a very bright spot of
light and because our imaging setup is a camera, so we will have a bright spot as small as
possible of light falling on the camera and this bright spot if it is very small, then it is equivalent
to an impulse and using this bright spot of light as an input, whatever image that we get that is
the response to that bright spot of light which in our case is an impulse.
8
So, the image gives you the impulse response to an impulse which is imparted in the form of
bright spot of light and the intensity of light that you generate that tells you what is the strength
of that particular impulse. So, by this simulated impulse and from the image that you get, I get
the impulse response and this impulse response is the one which uniquely characterizes our
imaging setup and in this case, we assume that this impulse response will also be valid for the
original imaging setup. So, now let us see that how this impulse response will look like.
So, that is what has been shown in this particular slide. The left most image is the simulated
impulse. Here you find that at the center, we have a bright spot of light. Of course, this spot is
shown in a magnified form, in reality this spot will be even smaller than this and on the right
hand side, the image that you have got, this is the image which is captured by the camera when
this impulse falls on this camera lens.
So, this is my impulse, simulated impulse and this is what is my impulse response. So, once I
have the impulse and this impulse response; then from this, I can find out what is the degradation
function of this imaging system. Now, we know from our earlier discussion that for a very very
narrow impulse, the Fourier transformation of an impulse is a constant.
9
That means F (u, v) where f (x, y) is the input image; in this particular case, it is the impulse. In
that case, Fourier transform of f (x, y) which is F (u, v), this will be a constant say constant A
and our relation is that the observed image G (u, v) which will be same as H (u, v) times F (u, v).
Now, because this F (u, v) is now the impulse response in frequency domain; so from here, I
straight away get H (u, v) that is the degradation function which is same as G (u, v) upon that
same constant A.
So, in this case, this G (u, v) is the Fourier transform of the observed image and here, this Fourier
transform is nothing but the Fourier transform of the image that we have got which is response to
the simulated impulse that has fallen on the camera. A is the Fourier transform of the impulse
falling on the lens and the ratio of these 2 that is G (u, v) by this constant A that gives us what is
the deformation or what is the degradation model of this particular imaging setup.
So here, we find that we have got the degradation function through an experiment or
experimental setup is we have an imaging setup and we have a light source which can simulate
an impulse. Using that impulse, we got an image which is the impulse response of this imaging
system. We assume that the Fourier transform of the impulse or that is true is a constant A as has
been shown here. We obtain the Fourier transform of the response which is G (u, v) and now this
G (u, v) divided by A should be equal to the degradation function H (u, v) which is the
degradation function of this particular imaging setup.
So, I get the degradation function and the same degradation function, we assume that it is also
valid for the actual imaging system. Now, in this point, regarding this, one point should be kept
in mind that the intensity of the light which is the simulated impulse should be very very high so
that the effect of noise is reduced.
If the intensity of light is not very high, if the light is very feeble; in that case, it is the noise
component which will be very very dominant and using that whatever estimation of this H (u, v),
10
we get, that estimation will not be a correct estimation or in any case, we will not get a correct
estimation. But it will be very far from the reality.
Now, the third approach of this estimation technique as we said that is estimation by
mathematical modeling. Now, this mathematical modeling approach for estimation of the
degradation function has been used for many many years. There are some strong reasons for
using this mathematical approach.
The first one is it provides an insight into the degradation process. Once I have a mathematical
model for degradation, I can have an insight into the degradation process. The second reason is
such a mathematical model can model even the atmospheric disturbance which leads to
degradation of the image. Now, once such mathematical model which is used to model the
degradation and this also can model the atmospheric turbulence which leads to degradation of the
degradation of the image is given by this expression - H (u, v) is equal to e to the power minus k
into u square plus v square to the power 5 by 6.
So, this is one of the mathematical models of degradation which is capable of modeling the
turbulence, the atmospheric turbulence that also leads to degradation in the observed image and
here, this particular constant K, this gives you what is the nature of the turbulence.
So, if the value of K is large, that means the turbulence is very strong whereas if the value of K is
very low, it says that the turbulence is not that strong, it is a mild turbulence. So, by varying the
value of k, we can have the intensity of the turbulence that is to be moderate. Now, using this, we
can have a number of degraded images as has been shown in this particular slide.
11
So here, you find that on the top left, we have this original image. This shows an original image.
This is a degraded image where the value of k was something like 0.00025; this is the value of k
in this particular case. Here, the value of k was something like 0.005 and in this case, the value
of k was something like 0.001 sorry here it was 0.001 and in this case, it was 0.005.
So, the first image, this particular image; here the turbulence is very poor. So, this has been
degraded using the same model as we have just said which models mild turbulence. Here, the
turbulence is medium and here the turbulence is strong and if you closely look at these images,
you find that all these 3 images are degraded to some extent. In this particular case, the
degradation is maximum, here the degradation is minimum.
So, this is the one which gives you modeling of degradation which occurs because of turbulence.
Now, there are other approaches of degradation mathematical model to estimate the degradation
which are obtained by fundamental principles. So, from the basic principles also, we can obtain
what should be the degradation function.
12
So, one such case, so here, we will discuss the basic principle, the degradation model estimation
from basic principles and I try to find out what will be the degradation model, degradation
function where the image is degraded by linear motion and this is a very very common situation
that if we try to estimate or if we try to image a fast moving object; in many cases, we find that
the image that we get is degraded. There is some sort of blurring which is known as motion
blurring and this motion blurring occurs due to the fact that whenever we take the snap of the
scene, the shutter of the camera is open for certain duration of time and during this period, during
which the shutter is open, the object is not stationary, the object is moving.
So, considering any particular point in the imaging plane, here the light which arrives from the
scene does not come from a single point. But the light you get at a particular point on the
imaging sensor is the aggregation of the reflected light from various points in the scene. So, that
tells us that what should be the basic approach to model to estimate the degradation model in
case of motion of the scene with respect to the camera.
So, that is what we are trying to estimate here. So here, we assume that the image f (x, y), this
undergoes motion and when f (x, y) under goes motion, then there will be some moving
component. So, I assume 2 components x 0 (t) and y 0 (t) which is the moving components or time
varying components. So, these are the time varying components along x direction and y direction
respectively.
So, once the object is moving, then the intensity, the total exposure at any point in the imaging
plane can be obtained by aggregation operation or integration operation where the integration has
to be done over the period during which the shutter remains open. So, if I assume the shuttered
image open for a time duration given by capital T; in that case, the total exposure at any point
which is the observation at point (x, y) given by g (x, y) will be of this form - f (x minus x 0 (t), y
minus y 0 (t) dt and integration of this from 0 to capital T.
13
So here, the capital T is the duration of time during which the shutter of the camera remains on
and x 0 (t) and y 0 (t), these 2 terms, they are the time varying components along x direction and y
direction respectively and this g (x, y) gives us the observed blurred image. Now from this, we
have to estimate what is the degradation function or the blurring function.
So, once we get g (x, y), then our purpose is to get the Fourier transform of this that means we
are interested in the Fourier transformation G (u, v) of g (x, y) and this G (u, v) as we know from
the Fourier transformation equations is given by g (x, y) into e to the power minus j 2 phi (ux
plus vy) dx dy and take the integration, double integration from minus infinity to infinity over
both x and y.
So, it is this expression, using this Fourier transformation expression, we can find out what will
be G (u, v) that is Fourier transformation of the degraded image g (x, y) and if I derive this, it
will be of this form; so, minus infinity to infinity, again minus infinity to infinity. Now, this g (x,
y) is to be replaced by 0 to T, the expression that we have got earlier f [x minus x 0 (t), y minus y 0
(t)] dt into e to the power minus j 2 phi (ux plus vy) into dx dy. So, if we just do some re
organization of this particular integral equation, we can write G (u, v) in the form.
14
G (u, v) equal to integral 0 to T, double integral minus infinity to infinity, minus infinity to
infinity f [x minus x 0 (t), y minus y 0 (t)] dx dy into dt. Now, from this particular expression, you
find that the expression within this bracket sorry there will be some more addition, so the final
expression will be like this - G (u, v) will be equal to 0 to capital T. Then within bracket we have
to have this double integral varying from minus infinity to infinity f [x minus x 0 (t), y minus y 0
(t)] e to the power minus j 2 phi (ux plus vy) dt and then dx dy and then dt. So, this will be the
final expression.
15
Now, in this, if you look at this inner part, this is nothing but the Fourier transformation of
shifted f (x, y) where the shift in the x direction is by x 0 (t) and shift in the y direction is by y 0
(t). And, from the property of Fourier transformation we know that the Fourier transformation is
shift invariant in the sense that the Fourier transforms magnitude will remain the same, only it
will introduce some.
So, by doing that we can say that this f [x minus x 0 (t), y minus y 0 (t), this will have a Fourier
transformation which is nothing but F (u, v) e to the power minus j 2 phi into [ux 0 (t) plus vy 0
(t)]. So, this is from the translation invariant property of the Fourier transformation.
So, using this expression, now the expression for G (u, v) can be written as G (u, v) will be equal
to integral 0 to capital T F (u, v) e to the power minus j 2 phi [ux 0 (t) plus vy 0 (t)] dt and because
this term F (u, v) is independent of T, so you can take this term F (u, v) outside the integration.
So, the final expression that we get is F (u, v) into integral 0 to capital T e to the power minus j
phi j 2 phi [ux 0 (t) plus v 0 (t)] dt.
So from this, you find that now if I define my degradation function H (u, v) to be this particular
integration, so if I define H (u, v) to be this; then I get expression for G (u, v) is equal to H (u, v)
into F (u, v). So here, this motion term, the degradation function is given by integration of this
particular expression and in this expression, this x 0 (t) and y 0 (t), they are the motion variables
which are known.
So, if the motion variables are known; then using those motion variables, the values of the
motion variables, I can find out what will be the degradation function and using that degradation
function, I can go for the degradation model. So, I know H (u, v), I know G (u, v) and from that I
can find out the restored image F (u, v).
16
Now, in this particular case if I assume that x 0 (t) is equal to at upon T and similarly y 0 (t), I
assume this also to be some constant bt upon capital T. That means over a period of capital T,
during which the camera shutter is open; in the x direction, the movement is by an amount a and
in the y direction, the movement is by an amount b.
So by using, by assuming this, we can find that H (u, v) by using that integration, by computing
that integration will given by 1 upon phi (ua plus vb) into sin phi (ua plus vb) into e to the power
minus j phi into (ua plus vb). So, this is what is the degradation function or the blurring function.
So now, let us see that using this degradation function what is the kind of degradation that is
actually obtained.
17
So, here again on the left hand side, we have an original image and on the right hand side, this is
the corresponding blurred image where the blurring is introduced assuming uniform linear
motion and for obtaining this particular blurring, here we have assumed a is equal to 0.1 and b is
also equal to 0.1.
So, using this values of a and b, we have obtained this, we have obtained a blurring function or
degradation function and using this degradation function, we have obtained this type of degraded
model and you find that this is quite a common scene whenever you take the image of a very fast
moving object the kind of degradation that you obtain in the image is similar to this.
Now, the problem is we have obtained a degradation function. Now, once I obtained a
degradation function or an estimated degradation function, now given a blurred image; how to
restore the original image or how to recover the original image? So, as we have mentioned that
there are different types of filtering techniques for obtaining or for restoring the original image
from a degraded image. The simplest kind of filtering technique is what is known as inverse
filtering.
18
Now, the concept of inverse filtering is very simple. Our expression is that G (u, v) that is the
Fourier transform of the degraded image is given by H (u, v) into F (u, v) where H (u, v) is the
degradation function in the frequency domain and F (u, v) is the Fourier transform of the original
image, G (u, v) is the Fourier transform of the degraded image
Now, because this H (u, v) into F (u, v), this is a point by point multiplication. That is for every
value u and v, the corresponding F component and the corresponding H component will be
multiplied together to give you the final matrix which is again in the frequency domain.
Now, from this expression, it is quite obvious that I can have F (u, v) which is given by G (u, v)
upon H (u, v) where this H (u, v) is our degradation function in the frequency domain and G (u,
v), I can always compute by taking the Fourier transformation of the degraded image that is
obtained. So, if I divide the Fourier transformation of the degraded image by the degradation
function in frequency domain; what I get is the Fourier transformation of the original image and
as I said that when I compute this H (u, v), this is just an estimated H (u, v). It will never be
exact.
So, the reconstruction of the recovered image that we get is not the actual image but it is an
approximate image, approximate original image which we represent by H (u, v) F hat (u, v).
Now here, as we have already said that G (u, v) if I consider the noise term is given by H (u, v)
into F (u, v) plus the noise term N (u, v).
Now from here, if I compute the Fourier transform of the reconstructed image that will be F hat u
v which is equal to G (u, v) upon H (u, v) and from this expression, this is nothing but F (u, v)
plus N (u, v) upon H (u, v). So, this expression says that even if H (u, v) is known exactly, the
perfect reconstruction may not be possible because we have seen earlier that in most of the cases,
the Fourier transformation coefficients are very very small when the value of u and v is very
large.
19
So, that means for those cases, N (u, v) by H (u, v), this term will be very high that means the
reconstructed image will be dominated by noise and that is what is obtained practically also. So,
to avoid this problem what will be what we have to do is for reconstruction purpose, instead of
considering the entire frequency plane, we have to restrict our reconstruction to a component of
the frequencies in the frequency plane which are nearer to 0. So, if I do that kind of
reconstruction, that limited reconstruction; in that case, the dominance of noise can be avoided.
Now, let us see that what kind of result we can obtain using this inverse filtering.
So, this shows an inverse filtering result here. We have the original image, in the middle we have
degraded image. So, this degraded image we had already shown, so this is the degraded image
and you find that on the right hand side, this is the reconstructed image using the inverse filtering
when for reconstruction all the frequency coefficients are considered and as we have said that as
you go away from (0, 0) component in the frequency domain that is as you go away from the
origin, H (u, v) term becomes very very negligible. So, it is the noise term which tends to
dominate.
So, you find that in this reconstructed image, nothing is available whereas if we go for restricted
reconstruction that is we consider only few frequency components near the origin as has been
shown here that we have considered only those frequency terms within a radius of 10 from the
origin. So, this is the reconstructed image and as it is obvious because our domain of
reconstruction or the frequency components that we have considered as very very limited, so the
image becomes reconstructed image becomes very blurred and that is the property of the low
pass transform. This is nothing but a low pass filter and that is the property of low pass filter. If
the cut off frequency is very low, then the reconstructed image has to be very very blurred.
In the middle of the bottom row, again we have shown the reconstructed image but in this case,
we have increased the cut off frequency; the cut off frequency instead of using 10, now we have
20
used cut off frequency equal to 40 and here you find that if you compare the original image with
this reconstructed image, you find that the reconstruction is quite accurate. If I increase the cut
off frequency further as we said that it is the noise term which is going to dominate; so, on the
right most, here we have increased the cut off frequency to 40.
So here, you find that we can observe the reconstructed image but as if the objects are behind a
curtain of noise. That means it is the noise term which is going to dominate as we increase the
cut off frequency of the filter. So, with this, we complete our today’s discussion. Now, let us
come to the questions on today’s lecture.
So, the first one is what is point spread function? The second one is how can you estimate the
point spread function of an imaging system? Third question, which degradation function can
model atmospheric turbulence? And the fourth question, what problem is faced applying inverse
filtering method to restore an image degraded by uniform motion?
Thank you.
21
Prof. P. K. Biswas
Lecture - 24
Image Restoration - III
Hello, welcome to the video lecture series on digital image processing. For last few classes, we
were discussing about restoration of blurred images.
So, what we have done in our last class is estimation of degradation models. We have seen that
whatever restoration technique we use, knowledge the restoration techniques mainly used the
knowledge of the degradation model which degrades the image. So, estimation of the
degradation model degrading an image is very very important for the restoration operation.
So, in our last class, we have seen 3 methods for estimation of the degradation model. The first
method we discussed is the estimation of the degradation model by observation. The second
technique that we have discussed is the estimation by experimentation and the third technique
that we have discussed is the mathematical modeling of degradation. Then, we have also seen
what should be the corresponding restoration technique and in our last class, we have talked
about the inverse filtering technique and today we will talk about the other restoration technique
which also makes use of the estimated degradation model or the estimated degradation function.
1
So, in today’s lecture, we will see the inverse filtering, the restoration of the motion blurred
image using the inverse filtering technique. In our last class, we have seen inverse filtering
technique where the image was degraded by turbulence, atmospheric turbulence model. We will
also talk about the minimum mean square error or Wiener filtering approach for restoration of a
degraded image.
We will also talk about another technique called Constant Least Square Filter where the constant
least square filter mainly uses the mean and standard deviation of the noise which contaminates
the image, the degraded image and then we will also talk about the restoration techniques where
the noise present in the image is a periodic noise.
2
So firstly, let us quickly go through what we have done in our last class. So, we are talking about
the estimation of the degradation model because that is a very very basic requirement for the
restoration operation. So, the first one that we have said is the estimation of the degradation
model by observation. So in this case, what is given to us is the degraded model and by looking
at the degraded model, we have to estimate that what is the degradation function. From the
degraded image, we have to estimate what is the degradation function.
So here, we have shown one such degraded image and we have said that once a degraded image
is given, we have to look for a part of the image which contains some simpler structure and at the
same time, the energy content, the signal energy content in that part in that sub image should be
very high to reduce the effect of the noise.
So, if you look at this particular degraded picture, you find that this rectangle, it shows an image
region in this degraded image which contains a simple structure. And from this, it appears that
there is a rectangular figure present in this part of the image and there are 2 distinct gray levels;
one is of the object which is towards the black and other one is the background which is a
grayish background.
3
So, by having a small sub image from this portion, what I do is I try to manually estimate that
what should be the corresponding original image. So, as shown in this slide; the top part is the
degraded image which is cut out from the image that we have just shown and the bottom part is
the estimated original image.
Now, what we do is from this if you take the Fourier transform of the top one, what I get is the G
(u, v) as we said that it is the Fourier transformation of the degraded image and the lower one,
we are assuming this to be original. So, if I take the Fourier transform of this, what I get is F (u,
v) that is the Fourier transform of the original image and obviously here, the degradation model
or the degradation function in the Fourier domain is given by H (u, v) which is equal to G (u, v)
by F (u, v) and you remember that in this case, the division operation has to be done point by
point. So, this how the degradation functions can be estimated by observation when only the
degraded images are available.
4
The other approach for estimation of the degradation function we have said is the estimation by
experimentation. So, our requirement is that whichever imaging device or imaging setup that has
been used for getting a degraded image which has been used to record a degraded image in our
laboratory for experimental experimentation purpose; we will have a similar such imaging setup
and then we try to find out that what is the impulse response of that imaging setup. As we have
already discussed that it is the impulse response which completely characterizes any system.
So, if we know what is the impulse response of the system; we can identify, we can always
calculate what is the response of the system to any type of input signal. So, by experimentation,
what we have done is we have taken a similar imaging setup and then you simulate an impulse
by using a very narrow strong beam of light. So, as has been shown in this particular diagram.
So, on the left hand side, what is shown is one such simulated impulse in this particular diagram.
So, the left hand side, this one shows such simulated impulse and on the right hand side, what we
have is the response of this impulse as recorded by the imaging device. So now, if I take the
Fourier transform of this, this is going to give me F (u, v) and if I take the Fourier transform of
this, this is going to give me G (u, v) and you see, that because the input the original is an
impulse, the Fourier transform of an impulse is a constant.
So, if I simply take the Fourier transform of the response which is the impulse response or in this
case it is the point spread function, then this divided by the corresponding constant will give me
the degradation function which is H (u, v). So, this is how we estimate the degradation function
or the degradation model of the imaging setup to experiment.
5
The third approach for obtaining the degradation model is by mathematical modeling. So, in our
last class, we have considered 2 such mathematical models. The first mathematical model that
we have considered which try to give the degradation function corresponding to atmospheric
turbulence and the function is given, the degradation function in the frequency domain is given
like this that H (u, v) is equal to e to the power minus k (u square plus v square) to the power 5
by 6 and using this degradation model, we have shown that how a degraded image will look like.
So, in this particular case, we have shown 4 images. Here, the top image, the top left image; this
is the original one and the other 3 are the degraded image which have been obtained by using the
6
degradation model that we have just said. Now, in that degradation model when I said that e to
the power minus k (u square plus v square) to the power 5 by 6, it is the constant k which tells
you that what is the degree of the degradation or what is the intensity of the disturbance. So, low
value of k indicates the disturbance is very low. Similarly, higher value of k indicates that the
disturbance is very high.
So here, this image has been obtained with a very low value of k, this image has been obtained
with a very high value of k and this image has been obtained with a medium value of k. So, you
find that depending upon the value of k, how the degradation of the original image changes.
The second mathematical approach that or the second model that we have considered is a motion
blurred or the blurring which is introduced due to motion. And we have said that because the
camera shutter is kept open for a finite duration of time, the intensity which is obtained at any
point on the imaging sensor is not really coming from a single point of the scene but the intensity
at any point on the sensor is actually the integration of the intensities or light intensities which
are falling to that particular point from different points of the moving object and this integration
has to be taken over the duration during which the camera shutter remains on.
So, using that concept, what we have got is motion mathematical model for motion blur which in
our last class we have derived that it is given by H (u, v) equal to integration zero to T e to the
power minus j two pi [u x 0 (t) plus v y 0 (t)] dt where this x 0 is the movement, this x 0 (t) indicates
movement along x direction and y 0 (t) indicates movement along the y direction.
So, if I assume that x 0 (t) is equal to a and y 0 (t) equal to b, then this particular integral can be
computed and we get a final degradation function or degradation model as given in the lower
equation.
7
And after this, in our last class what we have done is we have used the inverse filtering and using
this motion blurring model, this is the kind of blurring that we obtained, this is the original
image; on the right hand side, we have shown the motion blurred image. Then we have seen that
once you have the model for the blurring operation, then you can employ inverse filtering to
restore a blurred image.
So, in your last class, we have used the inverse filtering to restore the images which are blurred
by atmospheric turbulence and here again, on the left hand side of the top row, we have shown
the original image. So, this is the original image, this is the degraded image and we have said in
8
the last class because by inverse filtering, the function the blurring function H (u, v) comes in the
denominator.
So, if the value of H (u, v) is very low, then G (u, v) upon H (u, v) that particular term becomes
very high. So, if I consider all the frequency components or all values of (u, v) in H (u, v) for
inverse filtering, the result may not be always good. And that is what has been demonstrated here
that this image was reconstructed considering all values of u and v and you find that this fully
reconstructed image does not contain the information that we want.
So, what we have to do is along with this inverse filtering, we have to employ some sort of low
pass filtering operation so that the higher frequency components or the higher values of u and v
will not be considered for the reconstruction purpose. So here on the bottom row, the left most
image, this shows the reconstructed image where we have considered only the values of u and v
which are within a radius of 10 from the center of the frequency plane and because this is a low
pass filtering operation where the cut off frequency was very very low; so, the reconstructed
image is again very very blurred because many of the low frequency components along with the
high frequency components have also been cut out.
The middle image shows to some extent a very good image where this distance within which (u,
v) values were considered for the reconstruction were taken to be 40. So, here you find that this
reconstructed image this contains most of the information which was contained in the original
image. So, this is a fairly good restoration.
Now, if I increase the distance function, the value of the distance; if I go to 80, that means many
of the high frequency components also we are going to incorporate while restoration. And, you
find that the right most image on the bottom row that is this one where the value of the distance
was equal to 80, this is also a reconstructed image but it appears that the image is behind a
curtain of noise.
So, this clearly indicates that if I go on increasing or if I take more and more u and v values, the
frequency components for restoration using inverse filtering; then the restored image quality is
going to be degraded. It is likely to be dominated by the noise components present in the image.
Now, though this inverse filtering operation works fine for this turbulence kind of blurring, the
blurring introduced by the atmospheric turbulence but this direct inverse filtering does not give
good result in case of motion blurring.
9
So, you find that here we have shown the result of direct inverse filtering in case of motion blur.
So, on the top most one, this is the original image. On the left, we have shown the degraded
image and the right most one, on the bottom is the image restored image obtained using the
inverse direct inverse filtering and the blurring which was considered in this particular case is the
motion blur. Now, let us see why this direct inverse filtering does not give satisfactory result in
case of motion blur.
The reason is if you look at the degradation function, the motion degradation function, say for
example in this particular case; you will find that the degradation function H (u, v) is given by
10
this expression in the frequency domain. Now, this term will be equal to 0 whenever this
component (ua plus vb), this is going to be integer.
So, for any integer value of (ua plus vb), the H (u, v), the corresponding component H (u, v) will
be equal to 0 and for nearly integer values of this (ua plus vb), term H (u, v) is going to be very
very low. So, for direct inverse filtering when we go for dividing G (u, v) by H (u, v), you have
the Fourier transformation of the reconstructed image wherever H (u, v) is very low near about 0,
the corresponding F (u, v) term will be abnormally high and when you take the inverse Fourier
transform of this that very very high value is reflected in the reconstructed image and that is what
gives to a reconstructed image as shown in this form.
So, what is the way out? Can’t we use the inverse filtering for restoration of motion blurred
image?
So, we have attempted round about approach. What we have done is again we have taken an
impulse, try to find out what will be the point spread function of if I employ this kind of motion
blur. So by using the same motion blur function or the motion blur model, you blur this impulse
and what I get is an impulse response like this which is the point spread function in this
particular case.
Now, once I have this point spread function; then as before, what I do is I take the Fourier
transform of this and this Fourier transformation now gives me G (u, v) and because my input
was an impulse, where so for this impulse F (u, v) is equal to constant which is say something
like A, now from these two, I can compute recompute H (u, v) which is given by G (u, v) divided
by this constant term A. Obviously, the value of the constant is same as what is the intensity of
this impulse.
11
So, if it is an unit impulse, if I take an unit impulse, then the value of constant A will be equal to
1 and in that case, the Fourier transform of the point spread function directly gives me the
degradation function H (u, v).
Now, if I perform the inverse filtering using this recomputed degradation function; then we find
that the reconstruction result is very very good. So, this was the blurred image, this is the
reconstructed image during by using the inverse filtering, direct inverse filtering but here the
degradation model was recomputed from the point spread from the Fourier transformation of the
point spread function.
So, though using the direct inverse transform of the mathematical model of the motion blur does
not give me good result but recomputation of that degradation function gives a satisfactory
result. But again, with this inverse filtering approach the major problem is as we said that we
have to consider the (u, v) values for reconstruction which is within a limited domain.
Now, how do you how do say that up to what extent of (u, v) value we should go? That is again
image dependent. So, it is not very easy to decide that to what extent of frequency components
we should consider for the reconstruction of the original image if i go for direct inverse filtering.
So, there is another approach which is the minimum mean square error approach or it is also
called the Wiener filtering approach. In case of Wiener filtering approach, the Wiener filtering
tries to reconstruct the degraded image by minimizing an error function. So, it is something like
this.
12
So, if my original image is f and my reconstructed image is f hat, then the Wiener filtering tries
to minimize the error function which is given by expectation value of f minus f hat square. So,
the error value e which is given by the expectation value of f minus f hat square where f is the
original degraded image and f hat is the restored image from the degraded image; so f minus f
hat square, this gives you the square error and this wiener filtering tries to minimize the
expectation value of this error.
Now here, our assumption is that the image intensity and the noise intensity are uncorrelated and
using that particular assumption, this Wiener filtering works.
13
So here, we will not go into the mathematical details of the derivation but it can be shown that a
frequency domain solution of this, I mean whenever this error function is minimum; the
corresponding F (u, v) in frequency domain is given by F hat (u, v). This will be equal to [H star
(u, v) into S f (u, v) divided by S f (u, v) into H (u, v) square plus S eta (u, v)] this into G (u, v)
where this H star indicates it is the complex conjugate of H (u, v) and G (u, v) as before, it is the
Fourier transform of the degraded image and F hat (u, v) is the Fourier transform of the
reconstructed image and in this particular case, this term S f (u, v), this is the power spectrum
power spectrum of original image undegraded image and S eta (u, v) is the noise power spectrum.
Now, if I simplify this particular expression, I get an expression of this form that F hat (u, v) is
equal to [1 upon H (u, v) into H (u, v) square upon H (u, v) square plus S eta (u, v) upon S f (u, v)]
into G (u, v). So, this is the expression of the Fourier transform of the reconstructed image when
I use Wiener filtering.
Now, in this case, you might notice that if the image does not contain any noise; then obviously,
S eta (u, v) which is the power spectrum of the noise will be equal to 0 and in that case, this
wiener filter becomes identical with the inverse filter. But if the noise contains additive if the
degraded image also contains additive noise in addition to the blurring; in that case, the wiener
filter and the inverse filter is different.
Now here, you find that this Wiener filter considers the ratio of the power spectrum of the noise
power of the noise power and the power of the power spectrum of the original undegraded
image. Now, even if I assume that the additive noise which is contained in the degraded image is
a white noise for which the noise power spectrum will be constant; but it is not possible to find
out what is the power spectrum of the original undegraded image. So, for that purpose, what is
done is normally this ratio that is S eta (u, v) upon S f (u, v) that is the ratio of the power spectrum
of the noise to the power spectrum of the original undegraded image is taken to be a constant k.
14
So, if I do this, in that case, the expression for F hat (u, v) comes out to be 1 upon H (u, v) into H
(u, v) square upon H (u, v) square plus k into G (u, v) where this k, the term k is a constant which
has to be adjusted manually for the optimum reconstruction or for the reconstructed image which
appears to be visually best. So, using this expression, let us see what kind of image that or what
kind of reconstructed image that we get.
So here, we have shown the image restoration of the degraded image using Wiener filter. Again
on the left hand side of the top row, it is the original image. Right hand side of the top row gives
you the degraded image. Left hand, left most image on the bottom row that shows you the full
reconstructed image or the reconstructed image using the inverse filtering where all the
frequency components were considered.
The middle one shows the reconstructed image using inverse filtering where only the frequency
components within a distance of 40 from the center of the frequency plane has been considered
for reconstruction and the right most one is the one which is obtained using wiener filtering and
for obtaining this particular reconstructed image, the value of k was manually adjusted for best
appearance.
Now, if you compare these 2, that is the inverse filtered image with the distance equal to 40 with
the Wiener filtered image; you will find that the reconstructed images are more or less same but
if you look very closely, it may be found that the Wiener filtered image is slightly better than the
inverse filtered image. However visually, they appear to be they appear to be almost same.
The advantage in case of Wiener filter is that I do not have to decide that what extent of
frequency components I have to consider for reconstruction or for restoration of the undegraded
image. But still, the wiener filter has got a disadvantage. That is the manual adjustment of the
value of k and as we have said that the value of k has been used for simplification of the
15
expression where this constant k is nothing but a ratio of the power spectrum of the noise to the
power spectrum of the undegraded original image and in all the cases, taking this ratio to be a
constant may not be justified approach.
So, we have another operation, another kind of filtering operation which is called constant least
square filtering. So now, we will consider a filtering operation which is called constant least
square filter. Now, unlike in case of Wiener filtering where the performance of the Wiener
filtering depends upon the correct estimation of the value of k that is the performance of Wiener
filtering depends upon how correctly you can estimate what is the power spectrum of the original
undegraded image.
In case of this constant least square filter, it does not make any assumption about the original
undegraded image. It makes use of only the noise probability distribution function, probability
density function noise pdf and mainly it uses the mean of the noise which we will write as say m
eta and the variance of the noise which we will write as sigma eta square. So, we will come, we
will see that how the reconstruction using this constant least square filter approach makes use of
this noise parameter like mean of the noise and the variance of the noise.
In case of this constant least square filter, to obtain this constant least square filter, we will start
with the expression that we got in our first class that is g equal to Hf plus n. So, you remember
that this is an expression which we had derived in the first class which tells us that what is the
degradation model for degrading the image where H is the matrix which is derived from the
impulse response H x and n is the noise vector.
Now here, you will notice that the value of H is very very sensitive to noise. So, to take care of
that what we do is we define an optimality criteria and using that optimality criteria, the
reconstruction has to be done and because this degradation function H or the degradation matrix
16
H is noise dependent, it is very very sensitive to noise; so for reconstruction, the optimality
criteria that we will use is the image smoothness.
So, you know from our earlier discussion that the second derivative operation or the Laplacian
operator, it tries to enhance the irregularities or discontinuities in the image. So, if we can
minimize the Laplacian of the reconstructed image that will ensure that the image the
reconstructed image will be smooth. So, our optimality criteria in this particular case is given by
C is equal to double summation del square f (x, y) square where y varies from 0 to capital N
minus 1 and x varies from 0 to capital M minus 1.
So, our assumption is the image that we are trying to reconstruct or the blurred image that we
have obtained that image is of size capital M by capital M capital N. So, our optimality criteria
are given by this where del square f (x, y) is nothing but the Laplacian operation. So, this
optimality criteria is Laplacian operator based and our approach will be that we will try to
minimize this criteria subject to the constraint that g minus H f hat square should be equal to n
square where this f hat, this is the reconstructed image.
So, we will try to minimize these optimality criteria subject to the constraint that g minus Hf hat
square is equal to n square and that is why it is called constant least square filtering. Again,
without going into the details of mathematical derivation, we will simply give the frequency
domain solution of this particular constant least square estimation where the frequency domain
solution now is given by F hat (u, v) is equal to H star (u, v) upon H (u, v) square plus a constant
gamma times P (u, v) square, this times G (u, v).
Again as before, this H star indicates that it is the complex conjugate of H. Here again, we have a
constant term given as gamma where the gamma is to be a adjusted so that the specified constant
that is g minus Hf hat square is equal to n square this constant is met.
17
So here, this gamma is a scalar quantity, scalar constant whose value is to be adjusted to for so
that this particular constant is maintained and this quantity P (u, v), the term P (u, v), it is the
Fourier spectrum or the Fourier transform of the mask given by (1 0, minus 1, 0) (minus 1, 4,
minus 1) (0, minus 1, 0). So, this is my P (x, y) and this P (u, v) is nothing but the Fourier
spectrum of or the Fourier transformation of this P (x, y) and you can easily identify that this is
nothing but the Laplacian operator mask or the Laplacian mask that we have already discussed in
our earlier discussion.
Now here, for implementation of this, for computation of this, you have to keep in mind that our
image is of size capital M by capital N. So, before we compute the Fourier transformation of P
(x, y) which is given in the form of a 3 by 3 mask, we have to paired appropriate number of zeros
so that this P (x, y) also becomes a function of dimension capital M by capital N or an array of
dimension capital M by capital N and after only converting these 2 an array of dimension capital
M by capital N; we can compute P (u, v) and that P (u, v) has to be used in this particular
expression.
So, as we said that this gamma has to be adjusted manually for obtaining the optimum result
and the purpose is that this adjusted value of gamma, the gamma is adjusted so that the specified
constant is maintained. However, it is also possible to automatically estimate the value of gamma
by an iterative approach.
So, for that iterative approach, what we do is we use define a residual vector say r where this
residual vector is nothing but r equal to g minus H f hat. So, you remember that this g is obtained
from the degraded image. The matrix, degradation matrix H is obtained from the degradation
function H x and f hat is actually the estimated restored image.
Now here, since f hat we have seen earlier that f hat sorry we have seen earlier that f hat (u, v)
and so this f hat in the special domain, they are functions of gamma. So obviously, r which is the
18
function of f hat; so, this r will also be a function of gamma. Now, if I define a function say phi
of gamma which is nothing but r transpose r or which is nothing but the Euclidean norm of r, it
can be shown that this function is a monotonically increasing function of gamma.
That means whenever gamma increases, this Euclidean norm of r also increases; if gamma
decreases, the Euclidean norm of r also decreases. And by making use of this property, it is
possible to find out what is the optimum value of gamma within some specified accuracy. So,
our approach in this case, our aim is that we want to estimate the value of gamma such that the
Euclidean norm of r that is r square will be equal to n square plus minus some constant A where
this A is nothing but what is the specified accuracy factor or this gives you the tolerance of
reconstruction.
Now obviously, here you find that if r square is equal to n square, then the specified constant is
exactly met. However it is very very difficult, so you specify some tolerance by giving the
accuracy factor A and we want that the value of gamma should be such that r will be the
Euclidean norm of r will be within this range. Now, given this background; an iterative approach,
an iterative algorithm for estimation of the value of gamma can be put like this.
So, you select an initial value of gamma, then compute y phi gamma which is nothing but
Euclidean norm of r. Then you terminate the algorithm if r square is equal to n square, here it has
been written as eta square plus minus a. If this is not the case, then you proceed to step number 4
where you increase the value of gamma and if r square is less than eta increase the value of
gamma if r square is less than eta square minus a or you decrease the value of gamma if r square
is greater than eta square plus a.
Now, using whatever new value of gamma that you get, you re-compute the image and for that
the image reconstruction function as we have said in frequency domain is given by this particular
expression and with this reconstructed value of F, you go back to step number 2 and you do this
19
iteration until and unless this termination condition that is r square is equal to n square plus
minus a, this condition is met.
Now, using this kind of approach what we have obtained is we have got some reconstructed
image. So here, you find that it is the same original image; this is the degraded version of that
image. On the bottom row, the left hand side gives you the reconstructed image using the Wiener
filtering and again on the bottom row on the right hand side, this gives you the reconstructed
image which is obtained using the constant least square filter.
20
The same for the motion degraded image. Here, we have also considered some additive noise; so
again on the top row on the left, this is the image which is obtained by direct inverse filtering and
you find the prominence of noise in this particular case. The right one is the one which has been
obtained by Wiener filtering. Here also you find that the amount of noise has been reduced but
still the image is noisy and the bottom one is the one that has been obtained by using the constant
least square filtering.
Now, if you look at these 3 images; you will find that the amount of noise is greatly reduced in
the bottom one. That is this particular image which has been obtained by this restored image
which has been obtained by constant least square filtering approach and as we said that the
constant least square filtering approach makes use of the estimation of mean of the noise and the
standard deviation of the noise; so it is quite expected that the noise performance of the least
square constant least square error filter will be quite satisfactory and that is what is observed here
that this image which is obtained using this constant least square filter, the image has been the
noise has been removed to a great extent whereas, the other reconstructed image cannot remove
the noise component to that extent.
However, if you look at these reconstructed images, the reconstruction quality of this image is
not that good. So, that clearly says that using the optimality criteria the reconstructed image that
you get, the optimum reconstructed image may not always be visually the best. So, to obtain a
visually best image, the best approach is you manually adjust that particular constant gamma.
Now, as I said that this constant least square filtering approach makes use of the noise
parameters that is mean of the noise and the variance of the noise; now how can we estimate the
mean and variance of the noise from the degraded image itself? It is possible that if you look at a
more or less uniform intensity region in the image; if you take a sub image of the degraded
image where the intensity is more or less uniform and if you take the histogram of that, the
21
nature of the histogram is same as the probability density function - pdf of the noise which is
contaminated with that image.
So, we can obtain the noise estimate or we can compute the noise term that is eta square in our
expression which is used for this constant least square filtering in this way. We have the noise
variance which is given by say sigma n square, sigma eta square which is nothing but 1 upon
capital M into capital N into double summation eta (x, y) minus m eta which is the mean of the
noise square where y varies from 0 to capital N minus 1 and x varies from 0 to capital M minus
1.
And, the noise mean that is m eta is given by the expression 1 upon capital M into capital N; then
double summation eta (x, y) where again Y varies from 0 to capital N minus 1 and x varies from
0 to capital M minus 1.
Now, from this you find that this particular term, this particular term; this is nothing but what is
our eta square. So, by making use of this and making use of the mean of the noise, we get that eta
square this noise term is nothing but capital M into N where M into N is M and N the dimension
of the image into sigma N square sigma eta square minus m eta and as in our constant that we
have specified, it is the eta square which is used in the constant term and which is only
dependent upon sigma eta and m eta; so, this clearly says that this optimum reconstruction is
possible if I have the information of the noise standard deviation of the noise variance and the
noise mean.
Now, the estimation of noise variance and noise mean is very very important. If I have only the
degraded image what I will do is I will look at some uniform gray level region within the
degraded image, find out the find out the histogram of that particular region and the nature of the
histogram is same as the probability density function of the noise.
22
So, as has been shown here in this particular diagram, here you find that this bar that has been
shown, this is taken from one such noisy image and this is the histogram of this particular region
and this histogram tells you that what is the pdf of the noise which is contaminated with this
image.
So, once I have this probability density function; from this, I can compute what is the standard or
the variance sigma eta square and I can also compute what is the mean that is m eta and in most
of the cases for the noise term, the noise is assumed to be 0 mean. So, this m eta is equal to 0. So,
what is important for us is only this sigma eta square and using this sigma eta square, we can go
for optimum reconstruction of the degraded image.
Now, in some situation, in many cases, it is also possible that the image is contaminated with
periodic noise. So, how do we remove the periodic noise present in the image? You will find that
if you take the Fourier transformation of the periodic noise and display the Fourier
transformation; in that case, because the noise is periodic, the corresponding dots the
corresponding at the corresponding (u, v) location in the Fourier transformation plane, you will
get very very bright dots and that dot indicates that what is the frequency of the periodic noise
present in the image.
Then, we can go for a very simple approach that once I know the frequency components, I can
go for band pass filtering just to remove that part of the coefficients from the Fourier transform
and whatever is the reminder Fourier coefficients we have, if we go for the inverse Fourier
transformation of that, we will get the reconstructed image.
23
So, as has been shown here, see, here you find that we are taking the same image which we have
taken a number of times earlier and if you look at the right most image and if you look closely to
this right most image, you will find that this is contaminated with periodic noise and if I take the
Fourier transform of this, here you find that in the Fourier transform, there are few bright dots;
one bright dot here, one bright dot here, one bright dot here, one bright dot here, one bright dot
here, one bright dot here, one is at this location and one is at this location.
So, all this bright dots tell us that what is the frequency of the periodic noise which contaminates
the image. So, once I have this information, I can go for an appropriate band pass filtering to
filter out that region from the Fourier transform or that part of the Fourier coefficients.
24
So, that is what has been shown next. So, this is what is a band reject filter. So, this is the
perspective plot of an ideal band reject filter and here the band reject filters are shown super
imposed on the frequency plane. So, on the left, what I have is an ideal band reject filter and on
the right, what I have is the corresponding butter worth band reject filter.
So, by using this band reject filter, we are removing a band of frequencies from the Fourier
coefficients corresponding to the frequency of the noise. So, after removal of these frequency
components, if I go for inverse Fourier transform, then I am going to get back my reconstructed
image and that is what we get in this particular case.
25
So here, you find that on the left top, this is again the original image. On the right top, it is the
noisy image contaminated with periodic noise. If I go for ideal band reject filter and then
reconstruct, then this is the image I get which is on the bottom left and if I go for the butter worth
band filter, then the image reconstructed image that I get is in the bottom right.
So, we have talked about the reconstruction or the restoration of images using variation
operations and the last one that we have discussed is if we have an image contaminated with
periodic noise; then we can make use of band reject filter, do frequency domain operation,
employ a band reject filter to remove those frequency components and then go for inverse
filtering to reconstruct the image.
And, here you find that the qualities of the reconstructed images are quite good where we have
used this band reject filter in the frequency domain. So, with this, we complete our discussion on
image restoration. Now, let us have some questions on today’s lecture.
So, the first question is what is the advantage of Wiener filter over inverse filter? The second
question - what is the drawback of Wiener filter? Third one, under what condition Wiener filter
and inverse filters become identical? Fourth one, what is the difference between Wiener filters
and constrained least square error filter? And the last question, how you can estimate the noise
parameters from a given noisy image or from a given blurred image?
Thank you.
26
Prof. P. K. Biswas
Lecture - 25
Image Registration
Hello, welcome to the video lecture series on digital image processing. For our last few classes,
we have talked about different types of image restoration techniques. In today’s lecture, we are
going to talk about another topic which is called image registration.
So, in our last few lectures, we have talked about restoration of degraded image, we have seen
different techniques for estimation of the degradation model and the different model estimation
techniques, we have discussed the estimation of the degradation model by observation,
estimation of the degradation model by experimentation and the mathematical modeling of
degradation and once you have the model the degradation model, then we have talked about the
restoration techniques for restoring a degraded image.
So, among the different restoration techniques, we have talked about the inverse filtering
technique, we have talked about minimum mean square error or Wiener filtering technique, we
have talked about the constraint least square filtering approach and we have also talked about the
restoration of an image if the image is contaminated by periodic noise.
So in such case, we have seen that if we take the Fourier transform of the degraded image, if the
image is actually degraded by periodic noise or a combination of periodic noise; then those noise
components, the noise frequencies appear as very bright spots, very bright dots in the Fourier
transformation or in the frequency plane.
So there, we can apply a band reject filter or sometimes notch filter to remove that particular
frequency component of the Fourier transform and after transforming the band reject operation,
whatever is remaining, if we take the inverse Fourier transformation of that; then we get the
restored image which is free from the periodic noise. So, in today’s lecture, we will talk about
the image registration techniques.
So we will see, what is image registration. Then, when we go for image registration, then we
have to think of the mismatch or match measures. So, we will talk about the different mismatch
or match measures or similarity measures. We will see that whether the cross correlation or we
will see what is the cross correlation between 2 images and we will also see whether this cross
correlation can be used as a similarity measure when we go for image registration and then we
will talk about some applications of this image registration techniques with examples.
So, by image registration what we mean is that the registration is a process which makes the
pixels in 2 images precisely coincide to the same points in the scene. So, by registration what we
mean is if we are having 2 images of the same scene may be the images 2 or more images are
acquired with different sensors located at different positions or may be the images are acquired
using the same sensor but at different time instance, at different instance of time; so in such
cases, if we can find out for every pixel in one image, the corresponding pixel in the other image
or other images that is the process of registration or the process of matching.
And, this has various applications that we will talk about a bit later. So, once registered, the
images can be combined or fused. This is called fusion or combination. So, once we have the
images from different sensors, may be located at different locations or may be if it is a remote
sensing image taken through a satellite where we have images taken in different bands of
frequencies; then if we register all those images, then those images can be combined or they can
be fused so that the information extraction or the image, the fused image becomes more rich in
the information content.
So once registered, the images can be combined or fused in a way that improves the information
extraction process because this fused image, now they have combination of the information from
different image or from different bands of frequencies. They will have more information or they
will be more rich in the information content.
So, this image registration technique has many applications. The first application, we have
already said that the stereo imaging technique, in stereo imaging what we do is we take the
image of the same scene or the images of the same object by 2 cameras which are slightly
displaced and there we have assumed that the cameras are otherwise identical that is apart from
the displacement along a particular axis say y axis or x axis, the features of the cameras are
identical that is they have the same focal length same view angle and all these things.
So, once I acquire these 2 images that one of them we call as left image, the other one is called as
right image; then if I go for this point by point correspondence that is for a particular point in the
left image if I can find out what is the corresponding point in the right image, then from these 2, I
can find out what is the disparity for that particular point location and if this disparity is obtained
for all the points in the image, then from the disparity, we can find out what is the depth or the
distance of the different object points from the camera. So, that is in case of stereo imaging, we
have to find out this point correspondence or the point registration or this is also called point
matching.
The other application that we have just said that in remote sensing where the images may be
taken by different sensors working in different bands, even the sensors may be located in
different locations; so the images of the same scene are taken by different sensors working in
different bands, they are also in different geometric locations, there also if we go for image
registration that is point by point correspondence among different images, then we can fuse those
different images or we can combine those different images so that your fused image become
more rich in terms of information contents. So, the information extraction from such fused
images becomes much more easier.
The other application, again, the images may be taken at different times. So, if the images are
taken at different times and in all those images if we can register those images that is we can find
out which is the point for a particular point in a given image, what is the corresponding point in
the other image which is taken at some other time instant; then by this registration, we can find
out what is the variation at different point locations in the scene and from this variations, we can
extract many informations like say vegetation growth or may be the land erosion, deforestation
or the occurrence of a fire. So, all these different informations can be obtained when we go for
registration of the images which are taken at different instance of time.
There are other applications like finding a place in a picture where it matches our given pattern.
So here, we have a small image which is known as a pattern or a template and we want to find
out that in another image which is usually of bigger size, where does this template match the
best.
Now, these has various applications like automated navigation where we want to find out that
what is the location of a particular object with respect to a map. So, in such automated navigation
applications, this pattern matching or template matching is very very important. So, this image
registration technique or the image registration methods has various other applications and which
can be exploited once the registration techniques are known.
Now, to explain the registration techniques, let us take the first example that example of the
template matching. So for this template matching, we take a template of a smaller size. We call
the template to be a template f, say f is a 2 dimensional image of a smaller size and we have an
image g which is of bigger size.
So, the problem of template matching is to find out where this f, the template f matches best in
the given image g. So, this f is called a template or it also called a pattern. So, our aim is that
given a very big image say g or 2 dimensional image g and a template f which is usually of size
smaller than that of g; we want to find out that where this template f matches the matches best in
the image g.
So, to find out where the template f matches best in the given image g, we have to have some
measure of matching which is called which may be termed as mismatch measure or the opposite
of this that is the match measure or the similarity measure. So, we have to take different match or
similarity measures; so you call them as match measure or similarity measure to find out where
this template f matches the best in the given image g.
So, there are various such match or mismatch measures and let us see that what are the different
measures that can be used. So, we have the given image g, g is the given image and we have the
template f, so f is the template. So, we have to find out the measure of similarity between a
region of the image g and the template f. So, there are various ways in which this similarity can
be measured so that we can find out the match between the f and g over a certain region say
given by A.
So, one of the similarity measure, the simplest similarity measure is we will take the difference
of f and g, so you take the absolute difference between f and g and find out the maximum of the
absolute difference and this maximum has to be computed over the region say A. The other
similarity measure can be that again we take the absolute difference of f and g and then integrate
this absolute difference over the same region A.
The other similarity measure can be you take f minus g, the difference between f and g and the
square of this and take the integration of this over the same region say A. So, you find that in the
first case when it is the difference between f and g, so when I am talking about the difference, it
is pixel by pixel difference. So, it takes the difference between the f and g, take the absolute
value and the maximum of that computed over the given region A.
In the second case, it is f minus g, again the absolute value and this is integrated over the given
region A. So, this is in the analog case. If I convert this in the digital form, this will take the form
as f (i, j) minus g (i, j) where i j is pixel location, absolute value of this and you take the double
summation for all i and j in the given region A.
So, you find that this is nothing but what is called the sum of the absolute difference. So, sum of
absolute difference between the image g and the template f over the region A and if I convert this
expression again in the digital form, this becomes f (i, j) minus g (i, j) square of this and take the
double summation for all i, j belonging to that region A. So, this is nothing but sum of the
difference squares.
So, if I say that f (i, j) minus g (i, j) is the difference between the 2 images or the error; so this
last expression that is f minus g square integration over A double integration over A in the digital
domain, it becomes f (i, j) minus g (i, j) square take the double summation for all i j in the region
A, this is something equivalent to sum of square error. Now, out of these 3 different measures, it
is the last one - g f minus g square integration, this is very very interesting.
So, if I expand this term, f minus g square double integration over the region A, if I expand this,
then this becomes double integration f square plus double integration g square minus 2 into
double integration f into g. So, all this double integrations are to be taken over the given region
A. Now, you find that for a given image, this double integration f square, this is fixed; for a
given template, this is fixed and also for a given image over the region A, this double integration
g square, this is also fixed.
Now, what is this f minus g square, sum of this? This is nothing but a sum of the square
differences that means this gives the degree of mismatch or this is nothing but the mismatch
measure. So, if this f minus g square integration over the region A is minimum because this is the
mismatch measure; so wherever this is minimum, at that particular location, f matches the best
over that particular region of g.
Now, when I expand this, this becomes integration, double integration f square plus double
integration g square minus twice into double integration f into g and as I said that for a given
template f square is fixed and for a given image and a given region g square is also fixed. That
means this f minus g square integration, this will be minimum when this f into g double
integration this term will be maximum.
So, whenever the mismatch measure is minimum that will lead to f into g double integration over
the region A, this will be maximum. So, when f minus g square double integration is taken as the
measure of mismatch, we can take the double integration f into g over the region given region A
is to be the match measure or the similarity measure.
So, this means that whenever the given template matches the best in a particular region in a
particular portion of the given image g; in that case, f into g double integration over the region A
will have a maximum value and we take this as the similarity measure or the match measure.
Now, the same conclusion can also be drawn from what is called Cauchy Schwartz inequality.
So, this Cauchy Schwartz inequality says that double integration of f into g, this is less than or
equal to square root of double integration of f square into double integration of g square and
these 2 terms will be equal only when g is equal to some constant c times f.
So, this Cauchy Schwartz inequality says that f into g double integration will be less than or
equal to square root of double integration f square into double integration g square and the left
hand side and the right hand side will be equal whenever g is equal to c times f. Otherwise, left
hand side will always be less than the right hand side.
So, this also says that whenever f or the template is similar to a region of the given image g
within with a multiplicative factor of constant c, then this f into g integration will take on the
maximum value. Otherwise, it will be less. If I convert this into the digital case, then the same
expression is written can be written in the form that double summation f (i, j) into g (i, j) where i
and j belongs the given region A, this should be less than or equal to square root of double
summation f square (i, j) into g square (i, j).
Again, sorry this is has to be double summation j square (i, j) for this i and j belonging to the
region A, here also for i and j belong to the region A and this left hand side and the right hand
side will be equal only when g (i, j) is equal to some constant into f (i, j) and this has to be true
for all values of (i, j) within the given region. So, for this template matching problem, we have
assumed that f is the given template and g is the given image and we have also assumed that f is
less than or the size of f is less than the size of the given image g.
Now again, from the Cauchy Schwartz inequality, so I go back to this Cauchy Schwartz
inequality, what we get that double integration f (x, y) into g (x plus u, y plus v) into dx dy
should be less than or equal to double integration f square (x, y) dx dy into double integration g
square (x, y) into dx dy sorry this should be this into double integration g square (x plus u, y plus
v) dx dy.
Now, the reason we are introducing these 2 variables u and v is that whenever we try to match
the given pattern f against the given image g, we have to find out what is the match measure or
the similarity measure at different locations of g. So for this, f has to be shifted at all possible
locations in the given image g and that shift, the amount of shift that has to be given to the
pattern f to find out the similarity measure at that particular location, we introduce these 2 shift
components u and v. So, this says that shift along x direction is u and shift along y direction is v.
So here, in this expression, the similarity between the given template f and the image g with shift
u and v is computed and this is computed over the given region A. Now, because this f (x, y) is
small and the value of f (x, y) is 0 outside the region A, so we can replace this left hand side by
an integration of this form - f (x, y) into g (x plus u, y plus v) into dx dy and as I said that
because f (x, y) is 0 outside the region A, so this definite integral the integral over A can now be
replaced by an integral from minus infinity to infinity. So, this is what we get from this left hand
side of the expression.
Now, if you look at this particular expression that is f (x, y) into g (x plus u, y plus v) dx dy
double integral from minus infinity to infinity, this is nothing but the cross correlation between f
and g and then if you look at the right hand side, this f square (x, y) dx dy integral over A, this is
a constant for a given template whereas, this particular component that is g square (x plus u, y
plus v) dx dy, this is not a constant because the value of this depends upon the shift u and v.
So, though from the left hand side, we have got that this is equivalent to cross correlation
between the function f and g but this cross correlation directly cannot be used as a similarity
measure because the right hand side is not fixed. Though f square (x, y) dx dy is fixed but g
square (x, y) (x plus u, y plus v) dx dy integral is not fixed. It depends upon the shift u and v. So,
because of this, the cross correlation measure cannot be directly used as a similarity measure or a
match measure.
So, what we have to go for is what is called a normalized cross correlation. So, if we call this
cross correlation measure f (x, y) g (x plus u, y plus v) dx dy integration from minus infinity to
infinity, if I represent this as the cross correlation C fg , then the normalized cross correlation will
be given by C fg divided by g square (x plus u, y plus v) dx dy double integral over the region A
and square root of this.
So, in our previous one, so what we said is that C fg has to be, the normalized cross correlation
will be C fg divided by double integral g (x plus u, y plus v) dx dy where the integration is taken
over the region A and square root of this and you see that once I take this normalized cross
correlation, this is what we are calling as normalized cross correlation.
So, once we consider this normalized cross correlation, you find that C fg will take the maximum
possible value which is given by double integral f square (x, y) dx dy, take the integral over the
region A and because this is fixed; so this region of integration is not very important, square root
of this, this is the maximum value which will be attained by this normalized cross correlation for
a particular value of u and v for the values of u and v for which this function g becomes some
constant c times f.
So, for that particular value that particular shift (u, v) where g is equal to sum constant c times f,
this normalized cross correlation will take the maximum value and the maximum value of this
normalized cross correlation is given by double integral f square (x, y) dx dy and square root of
this. So, now to illustrate this, let us take an example.
Say here, this is the image g which is given in the form of a 2 dimensional matrix, 2 dimensional
array of size 6 by 6 and our template is a 3 by 3 matrix which is given on the right hand side.
Now, if I calculate the C fg for this; so what I have to do is to find out the match location, I have
to take the template, shift it at all possible locations in the given image g and find out the cross
correlation or the similarity measure for that particular shift.
So initially, let us put this template at the left most corner and what we do is because our
template is 3, 3, 2 - 3, 3, 2 - 2, 2, 2, let us match let us place the center of the template at the
location at which we want to find out the similarity measure. So because of this, 3 will be placed
over here, this particular element will be placed over here, this 2 will go here, this 2 will come
here, this 2 will come here and on the left hand side, the other part of the template will be like
this – 2, 3, 3, 3, 2.
So, this will be the position of the template and at this location, we have to find out what is the
similarity value. So, for that let us find out what is C fg , the cross correlation between f and g for
this particular shift. Now, here if you compute, you find that all these elements of the template
are going beyond the image. So, if I assume that the image components are 0 beyond this
boundary of the image, then these elements will not take part in the computation of cross
correlation.
The cross correlation will be computed only considering these 4 elements and here if you
compute you, will find that this C fg for this particular position, for this particular shift will attain
a value of 47 because here it is 40 plus 2 plus 2 plus 3. So, this becomes assumes a value of 47.
Similarly, if I want to find out the cross correlation at this particular location over here where the
template is placed, the center of the template is placed over here; the other components of the
template will come like this, you will find that the cross correlation at this location is given by
C fg is equal to 56.
So like this, for all possible shifts within this particular image, I have to find out what is the cross
correlation value and if I complete this particular cross correlation computation; in that case, you
will find that finally I get a cross correlation matrix which is like this.
So, these gives the complete cross correlation matrix when this template is shifted to all possible
locations in this given image and the cross correlation value is computed for all such possible
shifts. Now from this, you find that here the maximum cross correlation value is coming at this
particular location which is given by 107 and if I take this 107 to be the similar or the cross
correlation values to be the similarity measure that means this is where I get the maximum
similarity and because of this, it gives a false match that it appears that the template is matching
the best in this particular location as shown by this red rectangle. But that is not the case because;
if I just checking it visually, we can see that the template is best match in this location. So, that is
why we say that the cross correlation measure directly cannot be used as a similarity measure.
So, over here, as we have said that we cannot use the cross correlation measure directly as a
similarity measure; so, what we have to do is we have to compute the normalized cross
correlation value. So, for computation of the normalized cross correlation, we have to compute
this particular component that is g square (x plus u, y plus v) double summation over the region
A and square root of that. So, this is the one that we have to compute for all possible shifts in the
given image g and we have to normalize the cross correlation that we compute with the help of
this quantity.
So, if I compute this, again let us take the same location; if I compute this g square (x plus u, y
plus v) summation over the region A, square root of that summation over the region A, then you
will find that this particular value will come out to be something like 20.07. Because this is
nothing but 20 square plus 1 square plus 1 square plus 1 square other elements for this shift
within this 3 by 3 window is equal to 0; so this is what we get, I have to take the square root of
this. So, if I compute this component, I get the value of 20.07.
So, this is how if I compute this quantity that is g square (x plus u, y plus v) over the region A,
square root of this and summation over the region A for all possible u and v that is for all
possible shifts; then finally, this normalization component that I get is given by this particular
matrix.
So, this cross normalization coefficient have been computed for all possible values of u and v
and then what we have to do is we have to normalize the cross correlation with these
normalization factors.
So, if I do that normalization, this is my original cross correlation coefficient matrix that I have
computed earlier and by using the normalization, what I get is the normalization normalized
cross correlation coefficient matrix which comes like this. And now, you find the difference. In
the original cross correlation matrix, the maximum was given in this location which is 107 and in
the normalized cross correlation matrix; the maximum is coming at this location which is 7.48.
So now, let us see that using this as the similarity measure that is normalized cross correlation as
the similarity measure; what happens to our matrix? So, this is the same matrix, the normalized
cross correlation matrix; the maximum 7.48 is coming at this location and for this maximum, the
corresponding matched location of the template within the image is given over here as shown by
this red rectangle.
So now, you find that this is a perfect matching where the template is where the similarity
measure gives the perfect measure where the template matches exactly. So, that is possible with
the normalized cross correlation but it is not possible using the simple cross correlation. So, the
simple cross correlation cannot be used as the similarity measure but we can use this normalized
cross correlation as a similarity measure.
Now, coming to the application of this: so here, we have shown that application over a real
image, so this is an aerial image taken from the satellite and the smaller image on the left hand
side which is a part of the image which is cut out from this original image and we are using this
as a template, the smaller image we are using as a template. So, this is our template f which we
want to match against this image g. So, we want to find out where in this given image g, this
template f matches the best.
So, for do doing that, as we have said earlier that what we have to do is we have to paste this
template f that is we have to shift this template f at all possible locations in the given image g and
for all possible such locations, we have to find out what is the normalized cross correlation and
wherever we get the normalized cross correlation to be maximum, that is the location where this
template matches the best.
So, let us place this template at different locations in the image and find out what is
corresponding similarity measure we are getting. So, this is the template and this is the given
image. If I place this template over here, we are calling this as location 1; so this location is
given by this red rectangle, then the similarity measure that we are getting is a value 431.05. If I
place it at location 2 which is over here, the similarity measure or the normalized cross
correlation you get is given by 462.17.
If I place it at location 3; then at location 3, the similarity measure is 436.94. If I place the
template at location 4, the corresponding similarity measure is coming out to be 635.69. If I
place it at location 5, the corresponding similarity measure is coming out to be 417.1. If I place it
at this location 6, the corresponding similarity measure comes out to be 511.18.
So, you see that from these similarity measure values and if I compute for all possible locations;
of course for all possible locations, I cannot show it on the screen, so if I compute this for all
possible locations, then you will find that this similarity measure value is coming out to be
maximum that is 635.69 for this location which is location 4 in the given image and exactly this
is the location from where this particular template f was cut out and if you look at this picture
after placing this template, you will find that there is almost a perfect match.
So, for a given image and a given template, if I find out the normalized cross correlation for all
possible shifts u and v, then for the shift u and v where you get the normalized cross correlation
to be maximum, that is the location where the template matches the best. So obviously, this is a
registration problem where we want to find out where that given template matches the best in a
given image.
Now, to come to the other applications of this registration, you will find that this registration is
also applicable for image restoration problem. Earlier, we have talked about the image
restoration problem where you have to estimate that what is the degradation model which
degrades the image and by making use of the degradation model, we can restore the image by
different types of operations like inverse filtering, Wiener filtering, constant least square
estimation and all those different kinds of techniques.
Now here, we are talking about a kind of degradation where the degradation is given in the form
of a geometric distortion and that is a distortion which is introduced by the optical system of the
camera. So, if you take an image of a very large area, you might have noticed that as you move
away from the center of the image, the points try to become closer to each other. So, that is
something which leads to a distortion in the image as the point goes away from the center of the
image. So here, we have shown one such distortion.
So suppose, this is the figure that we want to of which we want to take the image but actually the
image comes in this particular form; then this distortion which is introduced in the image, that
can be corrected by applying this image registration technique. So, for doing this, what we have
to do is we have to register different points in the expected image and the different points in the
degraded image or the restored image.
So, if once I can do that kind of registration; so in this case, this is a point which corresponds to
this particular point. So, if somehow, we can register these 2 that is we can find out we can
establish the correspondence between this point and this point, we can establish the
correspondence between this point and this point, we can establish the correspondence between
this point and this point and we can establish the correspondence between this point and this
point; then it is possible to estimate the degradation model.
So here, you find that for estimating this degradation, we have to go for registration. So, this
registration is also very very important for restoring a degraded image or the degradation is
introduced by the camera optical system. So, the kind of restoration that can be applied here is
Say, if I have an original points say (x, y) in the original image and this point after distortion is
mapped to location say (x prime, y prime); then what we can do is we can go for estimation of a
polynomial degradation model in the sense that I estimate that x prime is a polynomial function
of x and y.
So, I write it in this form that x prime is equal to say some constant k 1 times x plus a constant k 2
times y plus a constant k 3 times xy plus a constant k 4 . Similarly, y prime can be written as some
constant k 5 times x plus k 6 times y plus k 7 times xy plus say k 8 . So from this, you find that if you
can estimate this constant coefficients k 1 to k 8, then for any given point in the original image, we
can estimate what will be the corresponding point in the degraded image. So, for estimation, for
computing this k 1 to k 8 , this constant coefficients because there are 8 such unknowns, I have to
have 8 such equations and those equations can be obtained by 4 pairs of corresponding points
from the 2 images.
So, that is what if I look at this figure; I have to get 4 such correspondence pairs. So, once I have
four such correspondence pairs, I can generate 8 equations and using those 8 equations, I can
solve for all those constant coefficients k 1 to k 8 and once I get that, then what I can do is I can
say that I want an original image, an undistorted image and to a particular point to that
undistorted image; I apply that distortion, find out a point in the distorted image and whatever is
the intensity value at that location in the distorted image, I simply copy that in my estimated
location in the original image. So, I can find out a restored image from the distorted image.
Obviously, while doing this, we will find that there will be some location where I do not get any
information. That is for a particular location say (p, q) in the estimated undistorted image, when I
apply the distortion, then in the distorted image, I do not get any point at that particular location.
So in such cases, we have to go for interpolation techniques to estimate what will be the intensity
value at that point location in the distorted image. So, for that the different interpolation
operations that we have discussed earlier can be used.
So this is how this image registration technique is also playing a major role in restoration of
distorted images.
This image registration technique is also very very useful as I said; in image fusion or combining
different images and for that also we have to go for image registration. Say for example, here we
have given shown 2 types of images; one is magnetic resonance images and CT scan images.
Now, magnetic images that gives you a measure of water content whereas in case of CT images,
CT x-ray images that gives you the brightest region for the bone regions.
So, if I combine if I can combine the magnetic resonance image along with the CT x-ray image,
the fused image that you get where you can get both the informations that is the water content as
well as the nature of the bone in the same image. So naturally, the information extraction is much
more easier in the fused image.
Now again, for doing this, the first operation has to be image registration because the alignment
of the images of the MR images and the CT images, even if they are of the same region; they
may not be properly aligned or they may not be properly scaled, they may not be proper there
may be some distortion in those 2 different images.
So, the first operation we have to do is we have to go for registration; using registration, we have
to get that what is the transformation that can be applied to align, properly align the 2 images and
after the transformation, applying the transformation, when you align the 2 images, then only
they can be fused properly.
So here, there are 2 MR images and there are 2 CT images and obviously you find that this MR
image and this CT image; though they are of the same region but they are not properly aligned,
similarly on the bottom row, this MR image and this CT image, they are not properly aligned.
So, first operation we have to do is alignment. So, in this slide on the left hand side, what we
have got is the result after alignment and on the right hand side, it is the result after fusion. So, in
this fused image, you find that the green regions, this shows you what is the bone structure and
this bone structure have been obtained from the CT image and the other regions contain the
information from MR image. So, this is much more convenient to interpret than taking the MR
image and the CT image separately.
The other application of this is for image mosaicing. That is normally the cameras have very
narrow field of view. So, using a camera which is having a narrow field of view, you cannot
image a very large area. So, what can be done is you take the images, smaller images of different
regions on in the scene and then you try to stitch those smaller images to give you a large field of
view image.
So, that is what has been shown here; on the top, there 2 smaller images, these 2 images are
combined to give a bigger image as shown on the bottom. So, this is a problem which is called
image mosaicing and here again because there are 2 images; they may be scaled differently, their
orientation may be different. So firstly, we have to go for normalization and alignment and for
this normalization and alignment, again the first operation has to be image registration.
This shows another mosaicing example where this bottom image has been obtained from top 8
images. So, all these top 8 images have been combined properly to give you the bottom image.
So, this is the mosaic image that we get. So, with this we complete our discussion on image
registration. Now, let us have some questions on today’s lecture.
So, the first question is what is meant by image registration? Second question, define cross
correlation between 2 functions f (x, y) and g (x, y). Third question, can cross correlation be used
as similarity measure? Fourth question, what is normalized cross correlation? Fifth, find the
cross correlation of the following 2 one - dimensional functions; one is given by f is equal to 1, 2
3, 2, 1 and g equal to 1, 2, 3, 6, 9, 6, 3, 4, 2, 2, 1. So here, you find that these 2 functions, one
dimensional functions - f and g, they are represented in the form of sequence of samples. So,
these are discrete functions and you have to find out the cross correlation of these 2 discrete
functions.
Thank you.
Prof. P. K. Biswas
Lecture - 26
Colour Image Processing
Hello, welcome to the video lecture series on digital image processing. In our last lecture, we
have talked about image restoration image registration problem.
So, we have talked about different image mismatch or match measures, we have talked about the
cross correlation between two images and we have also seen some application of registration
techniques. Now, including this image registration technique, whatever we have done till now in
our digital image processing course, we have seen that all our discussions where mainly based on
black and white images that is we have not considered any colour image during our discussion.
Now, starting from today and coming few lectures, we will talk about the colour image
processing.
So today, what we are going to do is we are going to introduce the concept of colour image
processing. We are going to see that what are primary and secondary colours, we are going to
talk about colour characteristics, then we will see chromaticity diagram and how the chromaticity
diagram can be used to specify a colour. We will see two colour models; one of them is RGB
colour model or red, green and blue colour model and other one is HSI colour model and we will
also see that how we can convert from one colour model to another colour model.
Now, first let us talk about why we want colour image processing when we get information from
black and white images itself. The reason is colour is a very very powerful descriptor and using
the colour information, we can extract the objects of interest from an image very easily which is
not so easy in some cases using black and white or simple gray level image.
And, the second motivation why we go for colour image processing is that or we talk about
colour images is that human eyes can distinguish between thousands of colours and colour
shades whereas, when we talk about on the black and white image or gray scale image, we can
distinguish only about 2 dozens of intensity distributions or different gray shades.
So, that is the reason the colour image processing gives a very very important topic. Firstly
because, we can distinguish between more number of colours and secondly, we can identify
some objects in a colour image very easily which otherwise may be difficult from a simple
intensity image or a gray level image. Now, coming to colour image processing, there are two
major areas.
One of the area, we call as full colour image processing. We say, full colour processing and other
area is pseudo colour processing. Now, what is meant by this full colour processing or pseudo
colour processing? When we talk about full colour processing, the images which are acquired by
full colour TV camera or by a full colour scanner, then you will find that almost all the colours
that we can perceive, they are present in the images.
So, that is what is meant by a full colour image and when we try to such a full try to process such
a full colour image, what we will try to process is we will take into consideration all those
colours which are present in the image whereas, when we talk about this pseudo colour
processing, the pseudo colour processing is a problem where we try to assign certain colours to a
range of gray levels.
When we take an intensity image or simply a black and white image which has intensity levels
from say 0 to 255; what I can do is we can subdivide, we can divide this entire intensity range
into a number of sub range. Say for example, I can divide 0 to say 50, this intensity level will be
in 1 range may be 50 to 100, intensity levels will be in another range and to this range I can
assign one particular colour whereas in this range 50 to 100, I can assign another particular
colour and this pseudo colour image processing is mostly useful for human interpretation.
So, as we said that we can hardly distinguish around 2 dozens of intensity of gray shades
whereas if we go for so in such cases, it may not be possible for us to distinguish between 2 gray
regions which are very near to each other or the intensity values are very near to each other. So,
in such cases, if we go for this pseudo colour pseudo colouring technique that is we assign
different colours to different range of intensity values, then from the same intensity image of
black and white image, we can extract the information much more easily and this is mainly
useful as I said for human interpretation purpose.
Now, what is the problem with the colour image processing? The interpretation of colour as the
colour is interpreted by the human beings, this problem is a psycho physiological problem and
we have not yet been fully understand what is the mechanism by which we really interpret a
colour. So, though the mechanism is not fully understood but the physical nature of colour we
can represent formally, we can express it formally and our formal expression is really supported
by some experimental results.
Now, the concept of colour is not very new. You know from your school level physics, from
school level optics that way back in 1666, it was Newton who discovered the colour spectrum.
So, what he did is his experimental setup was something like this - you have an optical prism and
you pass white light through this optical prism and as the white light passes through the optical
prism, on the other side when this white light comes out of it, the light does not remain white any
more. However, it is broken into a number of colour components which is known as spectrum.
So, as has been shown in this particular diagram, you find that at one end of the spectrum what
we have is the violet, at one end we have violet and at the other end, we have the red colour. So,
the colour components that vary from violet to red and this was really discovered by Newton
way back in 1666.
Now, the thing is how do we perceive colour or how do we say that an object is of a particular
colour? We have seen earlier that we perceive an object, we see an object because light falls on
the object or the object is illuminated by certain source of light, the light gets reflected from the
object, it reaches our eye, then only we can see the object.
Similarly, we can perceive the colour depending upon the nature of the light which is reflected
by the object surface. So, because we have to perceive this nature of the light, so we have to see
that what is the spectrum of light which is or the spectrum of the energy which is really in the
visible range because it is only in the visible range that we are able to perceive any colour.
So, if you consider the electromagnetic spectrum; so as shown here, the electromagnetic the
complete electromagnetic spectrum ranges from gamma rays to radio frequency waves and you
will find that this visible spectrum, the visible light spectrum, it occupies only a very narrow
range of frequencies in this entire electromagnetic spectrum and here you find that the
wavelength of the visible spectrum that roughly varies from say 400 nanometer to 700
nanometer; so, at one end it is around 400 nanometer wavelength and at the other end, it is
around 700 nanometer in wavelength.
So whenever, a light falls on an object and if the object reflects lights of all wavelengths in this
visible spectrum in a balanced manner that is all the wavelengths are reflected in the appropriate
proportion; in that case, that object will be appearing as an white object and depending upon the
wavelength, pre dominant wavelength within his visible spectrum, the object will appear to be a
coloured object and the object colour will depend upon what is the wavelength of light that is
predominantly reflected by that particular object surface.
Now, coming to the attributes of light; if we have an achromatic light that is a light which does
not contain any colour component, the only attribute which describes that particular light is the
intensity of the light whereas if it is a chromatic light, in that case as we have just seen, the
wavelength of the chromatic light within the visible range can vary from roughly 400 nanometer
to 700 nanometer. Now, there are basically 3 quantities which describe the quality of light. So,
what are those quantities?
One of the quantities is what is called radiance, the second quantity is called luminance and the
third quantity is called brightness. So, we have these 3 quantities – radiance, luminance and
brightness which basically describe what is the quality of light. Now, what is this radiance?
Radiance is the total amount of energy which comes out of a light source and as this is the total
amount of energy, so this radiance is to be measured in the form of in units of watts.
Whereas luminance, it is the amount of energy that is perceived by an observer. So, you will find
the difference between the radiance and luminance. Radiance is the total amount of energy which
comes out of a light source whereas luminance is the amount of energy which is perceived by an
observer.
So, as the radiance is measured in units of watts, it is luminance which is measured in units of
what is called lumens whereas the third quantity that is brightness, it is actually a subjective
measure and it is practically not possible to measure the amount of brightness. So, though we can
measure the radiance, luminance, radiance and luminance but practically we cannot measure
what is brightness.
Now, again coming to our colour images, our colour lights; most of you must be aware that when
we consider colour, when we talk about colours, we normally talk about 5 primary colours and
we say that the 3 primary colours are red, green and blue.
So, we consider the primary colours of light, coloured light. So, we consider the primary colours
to be red, green and blue. So, we consider these 3 colours to be the primary colours and normally
we represent it as R, G and B. Now, we find that from the spectrum which was discovered by
Newton, there actually 7 different colours. But out of those 7 colours, we have chosen only these
3 different colours – red, green and blue to be the primary colours. And, we assume that by
mixing these primary colours in different proportions, we can generate all other colours.
Now, why do we choose these 3 colours to be the primary colours? The reason is something like
this that actually there are some cone cells in our eyes which are responsible for colour sensation.
So, there are around 6 to 7 million cone cells around 6 to 7 million cone cells which are really
responsible for colour sensation. Now, out of this 6 to 7 million cone cells, around 65% of the
cone cells, they are sensitive to red light, 33% of the cone cells are sensitive to they sense green
light and roughly 2% of the cone cells, they sense blue lights.
So, because of the presence of these 3 different cone cells in our eyes which sense red, green and
blue, these 3 colour components; so we consider red, green and blue to be our primary colours
and we assume that by you mixing these primary colours in appropriate proportion, we are able
to generate all other different colours.
Now, as per this, CIE standard specified 3 different wavelengths for 3 different colours. So, CIE
is specified red to have a wavelength of 700 nanometer, green to have an wavelength of 546.1
nanometer and blue to have an wavelength of 435.8 nanometer but the experimental result is
slightly different from this. Let us see how the experimental result looks like.
This one, this diagram shows the sensitivity of those 3 different cones in our eyes that we have
just said. So, you find that the cones which are sensitive to blue light to blue colour, these cones
actually receive wavelengths ranging from around 400 nanometer to 550 nanometer whereas, the
cones which are sensitive to green lights, they are sensitive to wavelengths ranging from slightly
higher than 400 nanometer to an wavelength of something around say 650 nanometer whereas,
the cones which are sensitive to red light, they are sensitive to wavelengths stating from 450
nanometer to around 700 nanometer.
Though the sensitivity is maximum for these type of cones, say blue cone is maximally sensitive
at an wavelength of 445 nanometer as is shown in this diagram, so as is shown in this diagram;
you will find that the blue cells, the blue cones, they are most sensitive to an wavelength of 445
nanometer, the green cones are most sensitive to an wavelength of 535 nanometer whereas, the
red cones are most sensitive to an wavelength of 575 nanometer.
So, these experimental figures are slightly different from what was specified by CIE and one
point has to be kept in mind that though CIE standard specifies red, green and blue to be of
certain wavelength but no single wavelength can specify any particular colour. In fact, from the
visible spectrum from the visible domain of the spectrum that we have just seen, it is quite clear
that when we consider two spectrum colours, two adjacent spectrum colours, there is no clear cut
boundary between those two adjacent spectrum colours. Rather, one colour slowly or smoothly
gets merged into the other colour.
So, as you can see from the same diagram that whenever we have a transition from say green to
yellow, you find that we do not have any clear cut boundary between green and yellow.
Similarly, whenever there is a transition from say yellow to red, the boundary is not clearly
defined. But we have a smooth transition from one colour to another colour. So, that clearly says
that no single colour may be called or no single wavelength may be called red, green or blue.
But it is a band of wavelength which give you colour sensation, a band of wavelengths that gives
you green colour sensation, a band of wavelengths that give you red colour sensation, at the same
time a band of wavelengths that give you say blue colour sensation. So, having specific
wavelengths as standard does not mean that these fixed RGB component alone components
alone when mixed properly will generate all other colours.
But rather, we should have a flexibility that we should also allow the wavelengths of these 3
different colours to change because as we have just seen that green actually specifies a band of
wavelength, red actually specifies a band of wavelengths, similarly blue also specifies a band of
wavelengths. So, to generate all possible colours, we should allow the wavelengths of these
colours are red, green and blue also to change.
Now, when I say that these are the different primary colours that is red, green and blue; mixing
of the primary colours generate the secondary colours.
So, when we mix say red and blue, if we mix red and blue, you will find that both red and blue,
they are the primary colours; red and blue will generate a colour called magenta which is a
secondary colour. Similarly, if we mix green and blue, this will generate a colour which is called
cyan and if we mix yellow sorry red and green if we mix red and green, these 2 generate colour
yellow.
So, as we have said that red, green and blue, we consider these 3 colours as primary colours; by
mixing the primary colours, we generate the secondary colours. So, these three colours –
magenta, cyan and yellow, they will be called secondary colours of light. Now here, another
important concept is the pigments.
As we have said that red, green and blue are the primary colours of light and if we mix these
colours will generate the secondary colours of light which for example are magenta, cyan and
yellow; when it comes to the pigments the primary colour of a pigment is defined as an
wavelength which is absorbed by the pigment and it reflects the other 2 wavelengths. So, the
primary colour of light should be the opposite of the primary colour of a pigment. So, as red,
green and blue, they are the primary colours of the light whereas magenta, cyan and yellow, they
are the primary they are the primary colours of a pigment. So, when it comes to pigment, we will
consider this magenta, cyan and yellow, they are to be the primary colours. So, these are the
primary colours for pigments and in the same manner, this red, green and blue which are the
primary colours of light, these 3 will be the secondary colours for pigments.
And, as you have seen that for the colours of light, the primary colours of light if we mix red,
green and blue in appropriate proportion, we will generate white light; similarly for the pigment,
if we mix the cyan, magenta and yellow in appropriate proportion, we will generate the black
colour. So, for pigments, appropriate mixing of the primary colours will generate black whereas,
for light, the appropriate mixing of the primary colours will generate white.
Now so far, what we have discussed that is the primary colours of light which are red, green and
blue are the primary colours of pigments which are magenta, cyan and yellow. These are the
concepts of the colour components we consider when we talk about the colour reproduction or
these are from the hardware point of view. That is for a camera or for a display device, for a
scanner, for a printer, we talk about these primary colour components.
But when we perceive a colour, as human being when we look at a colour, we do not really think
that how much of red component or how much of blue component or how much of green
component that particular colour has. But the way we try to distinguish the colour is based on the
characteristics which are called brightness, hue and saturation.
So for us, for the perception purpose, the colour components will be taken as or the
characteristics are brightness, hue and saturation instead of the red, green and blue or cyan
magenta and yellow. Now, let us see that what does these 3 attributes mean? So, what is
brightness? Brightness is nothing but achromatic notion of intensity. As we have seen that in
case of a black and white image, we talk about the intensity; similarly for a colour image, there is
a chromatic notion of intensity, it is not really intensity which we call as bright or brightness.
Similarly hue, it represents the dominant wavelength in a mixture of colours. So, when you look
at a secondary colour which is a mixture of different primary colours, there will be one
wavelength which is a dominant one, dominant wavelength and the overall sensation of that
particular secondary colour will be determined by the dominant wavelength. So, this hue, this
particular attribute, it indicates that what is the dominant wavelength present in a mixture of
colours.
Similarly, the other term saturation, you find that whenever we talk about a particular colour red,
there may be various shades of red. So, this saturation indicates that what is the purity of that
particular colour or in other words, what is the amount of light which has been mixed to that
particular colour to make it a diluted one. So, these are basically the 3 different attributes which
we normally use to distinguish one colour from another colour.
Now, coming to the spectrum colours, because the spectrum colours are not diluted, there is no
white light white component added to a spectrum colour; so spectrum colours are fully saturated.
Whereas, if we take any other colour which is not a spectrum colour, say for example if we
consider a colour say pink, pink is nothing but a mixing of white with red. So, red plus white,
this makes a pink colour. So, red is a fully saturated colour because it is a spectrum colour and
there is no white light mixed in red. But if we mix white light with red, the colour generated is
pink. So, pink colour is not fully saturated but red is fully saturated.
So, we have these 3 concepts for colour perception that is hue and saturation and the other one is
brightness and as we said that brightness indicates a chromatic notion of the intensity whereas
hue and saturation, they give you the colour sensation. So, we say that hue and saturation
together, they indicate what is the chromaticity of the light whereas this brightness, gives you
some sensation of intensity.
So, using this hue saturation of intensity, what we are trying to do is we are separating the
brightness part and the chromaticity part. So, whenever we tried to perceive a particular colour,
we normally perceive it in the form of hue, saturation and brightness whereas from the point of
view of hardware, it is the red green and blue or magenta cyan and yellow which are more
appropriate to describe the colour.
Now, the amount of light or the amount of red, green and blue lights which are required to form
any particular colour is called a Tristimulus, we call it tri stimulus and obviously because this
indicates what is the amount of red, light green light and blue light which are to be mixed to form
any particular colour; so, this will have 3 components - one is x component, y component and z
component and a colour is normally specified by what is called chromatic coefficients. So, we
call them as chromatic coefficients and this chromatic coefficients are obtained as the coefficient
for red is given by x lower case x is equal to capital X by capital X plus capital Y plus capital Z.
So, this capital X is the amount of red light, capital Y is the amount of green light and capital Z
is the amount of blue light which are to be mixed to form a particular colour. So, the chromatic
coefficient for red which is given by lower case x which is computed like this; similarly
chromatic coefficient for green is computed as lower case y is equal to capital Y by capital X
plus capital Y plus capital Z and similarly for blue, it is capital Z by capital X plus capital Y plus
capital Z. So, this lower case x, y, z, these are called the chromatic coefficients of a particular
colour.
So, whenever we want to specify a colour, we have to specify it by its chromatic coefficients and
from here, you find that this sum of the chromatic coefficient that is lower case x plus lower case
y plus lower case z is equal to 0 is equal to 1. So, this is represented in normalized form. So, as
any colour can be specified by its chromatic coefficients, in the same manner, there is another
way in which a colour can be specified. That is with the help of what is known as a CIE
chromaticity diagram.
So, a colour can be specified both by its chromatic coefficients as well as it can be specified with
the help of a chromaticity diagram. So, let us see what is this chromaticity diagram.
So here, we have shown this chromaticity diagram and you find that it is a colour diagram in a
two dimensional space; we have the horizontal axis which is the lower case which is the axis
representing the lower case x and we have the vertical axis which is the axis representing lower
case y. That means the chromatic coefficient for red is along the horizontal axis and the
chromatic coefficient for green is along the vertical axis and if we want to specify any particular
colour; say for example, I take this particular point and I want to find out that how this particular
colour can be, say here, how this particular colour can be specified?
So, as we have said that we can specify it by its the chromatic coefficients; now 2 of the
components of the chromatic coefficients that is x and y that is red component and green
component, we get we can get from the horizontal axis and the vertical axis and the third
component that is z obviously in this case will be given by z is equal to 1 minus x plus y. So, x
and y are obtained from this chromaticity diagram and I can get the chromatic coefficient
component z simply by using the relation that x plus y plus z is equal to 1 and if you study this
chromaticity diagram, you will find that all the spectral colours, they are represented along the
boundary of this chromaticity diagram.
So, along the boundary, we have all the spectral colours. In this chromaticity diagram, there is a
point which is marked as say this point which is marked as point of equal energy. That means all
the red, green and blue components, they are mixed in equal proportion and this is the one which
is the CIE standard of white. And as we said, the notion of saturation, you will find that all the
points on the boundary because they are the spectrum colours; so all the colors along the
boundary, they are fully saturated and as we move inside the chromaticity diagram away from
the boundary, as you move away from the boundary, the colour becomes less and less saturated.
So, one use of this chromaticity diagram is that we can specify a colour using this chromaticity
diagram, we can find out what are the chromatic coefficients x, y and z by using this
chromaticity diagram and not only that, this chromaticity diagram is also useful for colour
mixing. Let us see how? Say for example, within this I take 2 colour points, say I take one colour
point somewhere here and I take one point somewhere here and I consider another point
somewhere here and if I join these two points by a straight line say like this; in that case, this
straight line indicates that what are the different colours that I can generate by mixing the colour
present at this location and the colour present at this location.
So, all possible mixture of these 2 colours can create all the colours which are lying on this
straight line segment connecting these 2 colour points and same is true for 3 points. Say instead
of just taking these 2 points, if I take a third point somewhere here; then you form a triangle
connecting these 3 colour points. Then if I take the colour present on this location, the colour
present at this location and the colour which is specified at this location and by mixing all these
different all these three colours in different proportions, I can generate all the colours lying
within this rectangular region.
So, this chromaticity diagram is also very helpful for colour mixing operation. For example, we
can get some other information from this chromaticity diagram like this. So, as we have said that
we have this point of equal energy which is the CIE standard of white; now if I take any colour
point on the boundary of this chromaticity which we said that this is nothing but a colour which
is fully saturated and we mention that as we move away from this boundary, the colours that we
get, they are getting less and less saturated and saturation at the point of equal energy which is
the CIE standard of white just we have said, here the saturation is 0, the point is not saturated at
all.
Now, if I draw a straight line from any of these boundaries joining this point of equal energy like
this; so, this indicates that what are the different shades of colour of this particular saturated
colour that we can obtain by mixing white light to this saturated colour? So, as we have said that
as we mix white light, the saturation goes on decreasing that is we can generate different shades
of any particular colour.
So, this particular straight line which connects the point of equal energy which is nothing but the
CIE standard of white and a colour on the boundary which is a fully saturated colour, then what I
can get is all the shades of this particular colour is actually lying on this particular straight line
which joins the boundary point to the point of equal energy.
So, you find that using this chromaticity diagram; we can generate different colours, we can find
out that in what proportion the red, green and blue they must be mixed so that we can generate
any particular colour and this we can do for 2 colours mixing of 2 colours, we can use it for
mixing of 3 colours and so on.
Now, just to mention that as we have said that we have taken red, green and blue as primary
colours; you find that in this chromaticity diagram, your green is if I take green to be somewhere
here, say if I consider this point to be the green point, so this point is the red point and the blue is
somewhere here and if I join these 3 points by straight lines, what i get is a rectangle.
So, by using this red, green and blue; I can generate, as we have just discussed all the colours
which are present within this triangular region. So, as this triangular region does not encompass
the inter part of this chromaticity diagram because this chromaticity diagram is not really
triangular diagram. So, just we have said that using 3 fixed wavelengths as red, green and blue,
we cannot generate all the colours in the visible region and that is also quite obvious from this
chromaticity diagram because if we consider only 3 fixed wavelengths to represent red, green
and blue points; using those 3 wavelengths, we can just form a triangular region and none of the
triangular regions, single triangular region can fully encompass the chromaticity diagram.
So, as a result of that, by using fixed wavelengths for red, green and blue as primary colours; we
cannot generate all the colours as given in this chromaticity diagram. But still using the
chromaticity diagram, we can have many informations, many useful informations; as we said, we
can go for colour mixing, we can find out different shades of colour, we can specify colour and
so on.
Now, coming to this next topic that is colour model; now we need a colour model to specify a
particular colour. A colour model helps us to specify a particular colour in a standard wave.
Now, what is a colour model? Colour model is actually a space or we can represent it as a
coordinate system within which any specified colour will be represented by a single point.
Now, as we have said that we have 2 ways of describing a colour; one is by using the RGB
model by using the red, green and blue components or cyan, yellow and magenta component
which are from the hardware point of view and similarly from the perception point of view, we
have to consider the hue saturation and brightness. So, considering these 2 aspects, we can
generate 2 different colour models, 2 types of colour models. One colour model is oriented
towards hardware that is the colour display device or colour scanner device or colour printer and
the other colour model is to take care of the human perception aspect and we will see that this is
not this will not only take care of the human perception aspect, it is also useful for application
purpose.
So accordingly, we can have a number of different colour models. One of the colour model, we
will call it as RGB model or red, green blue model and the other colour model is cyan, magenta
and yellow and there is an extension of this that is CMYK that is cyan, yellow magenta and
black. This RGB colour model, this is useful for image displays like monitor; CMY or CMYK
colour model, these are useful for image printers and you find that both these colour models are
hardware oriented because both of them try to specify colour using the components as primary
colour components either red, green and blue or cyan magenta and yellow and as we have said
that red green and blue, they are the primary colours of light whereas cyan magenta and yellow,
they are the primary colours of pigments.
And the other colour model that we will also consider is HSI colour model. That is hue,
saturation and intensity or brightness and this HSI colour model, it is application oriented or also,
it is perception oriented. That is how we perceive a particular colour and we have also discussed
that this HSI colour model, this actually decouples the colour from the gray scale information.
So, we have said that this I part, this actually gives you the gray scale information whereas hue
and saturation taken together gives you the chromatic information. So because of this, as in HSI
model, we are decoupling the intensity information from the chromatic information. So, the
advantage that we get is many of the applications or the algorithm which are actually developed
for gray scale images, they can also be applied in case of colour images because here, we are
decoupling the colour from the intensity.
So, all the intensity oriented, all the algorithms which are actually developed for gray scale
images can also be applied on the intensity component or I component of this colour image.
Now first, let us discuss about the RGB model. So, we will talk about RGB colour model first.
As we have said that in case of RGB, a colour image is represented by 3 primary components
and the primary components are red, green and blue. So, these are the 3 components, 3 colour
components which mixed which when mixed in proper form, in appropriate proportion; this
generates all possible colours.
So, our RGB colour model is based on Cartesian coordinate system and we will see in this
diagram that this shows diagrammatically an RGB colour model that we normally use. So, you
find that this RGB colour model is based on the Cartesian coordinate system where the red,
green and blue – R, G and B these components are placed along the coordinate axis. And, the red
component is placed at location (1, 0, 0) as shown in this diagram; so this is the location which
contains the red, this is the location (0, 1, 0) which contains which is the green point and (0, 0,
1), this is the blue point.
So, we have this red point, green point and blue point; they are along 3 corners of a cube in this
RGB, in this Cartesian coordinate system. Similarly, cyan, magenta and yellow, they are at other
3 corners in this scale. Now here, let me mention that this model which is represented, it is in the
normalized form. That is all the 3 colour component that is red, green and blue, they are varying
within the range 0 to 1. So, all this colour components are represented in a normalized form.
Similarly, the cyan, magenta and yellow, they are also represented in normalized form.
Now, the origin of this colour model that is at location (0, 0, 0), this represents black and the
farthest vertex that is (1, 1, 1) that represents white and you find if I join a straight line
connecting the origin to this white point; so this straight line actually represents a gray scale and
we also call it an intensity axis.
So, as you move from the origin which is black to the farthest vertex in the cube which is white,
what we will generate is different intensity values. So, as a result of this, we also call it the
intensity axis or the gray scale axis. So, we stop our discussions today. We have just introduced
today the RGB colour model and we will continue with our discussion in our next class.
Now, let us have some questions on today’s lecture. So, the first question is how do we perceive
colour? The second question, what is the difference between luminance and radiance? The third
question is why red, green and blue are generally accepted as primary colours which when mixed
in appropriate proportions generate other colours? The fourth question, what are pigment
primary colours? The fifth, what are meant by hue and saturation?
Thank you.
Prof. P.K. Biswas
Lecture - 27
Colour Image Processing – II
started our discussion on colour image processing.
So, in our last class, the topics that we have covered are the fundamentals of colour image
processing, we have seen what is a primary colour and what is secondary colour, we have seen
the characteristics of different colours, we have seen the chromaticity diagram and the use of the
chromaticity diagram and we had started our discussion on colour models and there, we just
started that discussion on RGB colour model.
1
Today, we will start our discussion with the colour models. So, we will complete our discussion
on RGB colour model. We will also talk about the HSI or hue saturation and intensity colour
model. We will see how we can convert the colours from one colour model to another colour
model. That is given a colour in the RGB space, how we can convert this to a colour in the HSI
space and similarly, given a colour in the HSI space, how we can convert that to the RGB space.
Then, we will start our discussion on image colour image processing techniques. So, we will talk
about pseudo colour image processing and there, mainly we will talk about 2 techniques. One is
called intensity slicing and the other one is gray level to colour image transformation.
2
So, let us just briefly recapitulate what we have done in the last class. In the last class, we have
mentioned that all the colours of the visible light or the visible spectrum, colour spectrum
occupies a very narrow spectrum in the total electromagnetic band of frequencies or band of
spectrum and the visible spectrum, the wavelength normally varies from 400 nanometer to 700
nanometer. So, at one end, we have the violet and in the other end, we have the red colour.
And out of this, we normally take 3 colour components that is red, green and blue as the primary
colour components because we have mentioned that in our eye there are 3 types of cells, cone
cells which are responsible for colour sensation. There are maximum there are some cone cells
which are responsible which sense the light in the red wave length, there are some cone cells
which sense the green light and there are some cone cells which sense the blue lights and these
lights are mixed together in different proportions in an appropriate way so that we can have the
sensation of different colours.
And, this is the main reason why we say that red, green and blue; they are the primary colours
and by mixing these 3 primary colours in different proportion, we can generate almost all the
colours in the visible spectrum.
Then, we have talked about 2 different colours, 2 types of colours; one is the colour of light,
other one is the colour of the pigment. Now, colour of the light: as we see any particular object,
we can see the colour which is reflected from the object because of the wavelength of the light
which gets reflected from the object surface. Now, when it comes to pigment colour and a colour
falls on it, then the pigment colour, it absorbs the particular wavelength out of the 3 primary
colours and reflects the other wave lengths.
So, the primary colours of light are really the secondary colours of pigment and the secondary
colours of light, they are the primary colours of pigment and because of this, the colours of light,
3
they are called additive primaries whereas the colours of the pigments they are called subtractive
primaries.
And here, you can see in this particular slide that the 3 primaries of light - red, green and blue
when they are mixed together, then red and green mixed together form what is called form the
yellow light. Then, green and blue when they are mixed together, these 2 forms the cyan. And,
red and blue mixed together form the magenta. And, red, green and blue; all these 3 colours
together form what is the white light.
Similarly, when it comes to pigment primaries; yellow which is a secondary colour for light is a
primary colour of a pigment. Similarly, magenta which is a secondary colour of light is also a
primary colour of pigment. Cyan which is a secondary colour of light is a primary colour of
pigment. And here, you find that when this pigment primaries, they are mixed together; then they
form what are the primary colours of light.
So, yellow and magenta, these 2 together form the red light. Yellow and cyan mixed together
form the green light and magenta and cyan joined together mixed together form the blue light.
However, all these 3 pigment primaries that is yellow, magenta and cyan mixed together form
the black light. So, this is the black colour. So, by mixing different colours of light or the
different colours of different colours of primary colours of light or different primaries of the
pigment, we can generate all types of different colours in the visible spectrum.
4
Then, we have also seen what is the chromaticity diagram and we have seen the usefulness of the
chromaticity diagram. So, the chromaticity diagram is useful mainly to identify that in which
proportion different primary colours are to be mixed together to generate any colour. So, if I take
3 points in this chromaticity diagram; so one corresponding to green, one for the primary red and
other for the primary blue, then given any point within this chromaticity diagram, I can find out
that in which proportion red green and blue they are to be mixed.
So, here we find the horizontal axis tells the red component, the vertical axis gives us the green
component and the blue component; so if I write this as x and this as y, then the green
component z is given by 1 minus x plus x plus y. So, I can find out that how much of red, how
much of green and how much of blue; these 3 components are to be mixed to generate a colour
which is at this particular location in this chromaticity diagram.
It also tells us that what are all different possible shades of any of the pure colour which are
available in the light spectrum that can be generated by mixing different amount of white light
weight. So, you find that we have a point of equal energy that we have mentioned in the last
class in this chromaticity diagram which is white as per CIE standard.
So, if I take any pure colour on the boundary of this chromaticity diagram and join this with this
white point, then all the colour points all the colours along this line, they tell us that if I make
different amount of white light to this pure colour, then what are the different shades of this
colour that can be generated.
Then, we have started our discussion on the colour model and we have said that colour model is
very very useful to specify any particular colour and we have said and we started our discussion
on RGB colour model and we have discussed in our last class that RGB colour model is basically
represented by a Cartesian co-ordinate system where the 3 primary colours of light that is red
green and blue, they are represented along 3 Cartesian co-ordinate axis.
5
So, as per this diagram; we have this red axis, we have the green axis and we have the blue axis
and in this Cartesian co-ordinate system, the colours space is represented by a unit cube. So,
when i say it is unit that means the colours are represented in a normalized form. So, in this unit
cube, we have found that at the centre of the cube, we have R, G and B. All these 3 components
are equal to 0, so these points represent black.
Similarly, the farthest vertex from this black point or the origin or the red component is equal to
1, green component is equal to 1 and the blue component is equal to 1. That means all these 3
primary colours are mixed in equal proportion and this point represents white. The red colour is
placed at location (1, 0, 0) where the red component is equal to 1, green component is equal to 0
and the blue component is also equal to 0.
Green is located at location (0, 1, 0) where both red and blue components are equal to 0 and the
green component is equal to 1 and blue is located at the vertex location (0, 0, 1) where both red
and green components are equal to 0 and blue component is equal to 1.
So, these are the locations red, green and blue that is (1, 0, 0) (0, 1, 0) and (0, 0, 1); these are the
locations of 3 primary colours of light that is red, green and blue. And, you find that in this cube,
we have also placed the secondary colours of light which are basically the primary colours of
pigment that is cyan, magenta and yellow. So, these 3 colours – cyan, magenta and yellow, they
are placed in other 3 corners other 3 vertices of this unit cube.
Now, find that from this diagram if I join this 2 points that is black at location (0, 0, 0) with
white at location (1, 1, 1); then the line joining these 2 points - black and white, this represents
what is called a gray scale. So, all the points on this particular line will have different gray
shades, they will not exhibit any colour component.
6
Now, given any specific colour having some proportions of red, green and blue, that colour will
be represented by a single point in this unit cube in a normalized form or we can also say that
that colour will be represented by a vector where vector is drawn from the origin to the point
representing that particular colour having a specific proportion of red, green and blue. So, this is
what is the RGB colour model and you find that from this RGB colour model, we can also have
the cyan, magenta yellow components by simple transformation. So, given any point in the RGB
colour plain, what we can do is if I look at the different colour shades on different faces of this
colour cube, you find that the shades will appear like this.
So, in this colour cube, you find that we have said that the point (1, 0, 0) that represents red and
you find that along the horizontal axis, the colour varies from red to yellow. Similarly, this is a
point which is 0-0 which is (1, 1, 1); so this point represents white colour and in this particular
case, all these colour component that is red, green and blue; each of these colour components are
represented by 8 bit that means we have all together 24 different colour shades which can be
generated in this particular colour model.
So, the total number of colours that can be generated is 2 to the power 24 and you can easily
imagine that is huge number of colours which can be generated if we assign 8 bits to each of the
colour components that is red, green and blue. But in most of the cases, what is useful is called
safe RGB model. The safe RGB model, in safe RGB model, we do not consider all possible
colours that mean all the 2 to the power of 24 different colours but rather, the number of different
colours which are used in such cases is 216.
So, these 216 colours can be generated by having 6 different colours in red, 6 different colour
shades in green and 6 different colour shades in blue. So, you find that on the right hand side, we
have drawn a safe RGB colour cube. So here, you find that we have 6 different shades of any of
the colours that is red, green and blue and using these 6 different shades, we can generate up to 2
to the power of 16 different colours and this 2 to the 216 different colours and these 216 different
7
colours are known as safe RGB colours because they can be displayed in any type of colour
monitor.
So, you should remember that in case of true RGB, though you can have total of 2 to the power
24 different colours but all the colour displays may not have the provision of displaying all the 2
to the power 24 colours but we can display 216 colours in almost all the colour displays. So, this
is what is called safe RGB colour model and the corresponding cube is the safe RGB colour
cube.
So, it is quite obvious from this discussion that any colour image will have 3 different colour
components; one colour component for red, one colour component for green and one colour
component for blue.
So, if I take this particular colour image, you find that the top left image is a colour image and
the other 3 are the 3 plains of it. So, the red colour component of this colour image is represented
in red, the green colour component is represented in green and the blue colour component is
represented in blue.
So here, you find that though we have represented these 3 different components in different
colours that is red, green and blue but they are actually monochrome images and these
monochrome images are black and white images and these black and white images are used to
excite the corresponding phosphor dot on the colour screen.
So this, the red component will activate the red dot, the green component will activate the green
dots and the blue component will activate the blue dots and when these 3 dots are activating
activated together with different intensities, that gives you different colour sensation. So,
obviously, for any type of colour image like this, we will have 3 different plains; one plain
8
corresponding to the red component, the other plain corresponding to the green component and a
plain corresponding to the blue component.
Now, as we said that this red, green and blue, they are mostly useful for the display purpose. But
when it comes to colour printing, the model which is used is the CMY model or Cyan, Magenta
and Yellow model. So, for the image colour image painting purpose, we have to talk about the
CMY model. However, the CMY can be very easily generated from the RGB model.
So, as it is obvious from the colour cube, the RGB cube that we have drawn and the way the
CYM - Cyan, Magenta and Yellow colours are placed at different vertices on that RGB cube,
from there it is quite obvious that specified any colour in the RGB model, we can very easily
convert that to CMY model.
The conversion is simply like this that given RGB components; so we have the red, green and
blue components of a particular colour and what we want to do is we want to convert this into
CMY space and the conversion from RGB to CMY is very simple. What we have to is we have
to simply make this conversion that CMY is equal to (1 1 1) minus RGB.
So here, you remember that these RGB components are represented in the normalized form and
similarly by this expression, the CMY components that we get that will also be represented in
normalized form and as we have said earlier that equal amounts of cyan, magenta and yellow
should give us what is a black colour. So, if we mix cyan, magenta and yellow, these 3 pigments
primaries in equal proportion, then I should then you should get the black colour.
But in practice, what we get is not a pure black but this generates a muddy black. So, to take care
of this problem along with C, M and Y - cyan magenta and yellow, another component is also
specified which is the black component and when we also specify the black component; in that
case, we get another colour model which is the CMYK model.
9
So, the cyan, magenta and blue; so this is CMYK model, so cyan, magenta and yellow that is
they are same as in CMY model. But we are specifying an additional colour which is black
giving us the CMYK model.
So, you find that in case of CMYK model, we actually have 4 different components – cyan,
magenta, yellow and black. If we are given the RGB, we can very easily convert that to CMY.
Similarly, the reverse is also true. Given a specification, a colour in the CMY space; we can very
easily convert that to a colour in the RGB space.
Now, the next colour model that we will consider is the HSI colour model that is hue saturation
and intensity model. So, as we have mentioned in our last class that both RGB as well as CMY
or CMYK, they are actually harder oriented. The RGB colour model is oriented towards the
colour display or colour monitor. Similarly, CMY or CMYK, these 2 models are oriented
towards colour printers whereas when it comes to human interpretation, we said that we do not
really think of that given any particular colour; how much of red, how much of green and how
much of blue is contained within that particular colour. But what we really think of is what is the
prominent colour in that particular specified colour. So, which is what is known as hue.
Similarly, we have said the saturation. It indicates that how much a pure spectrum colour is
really diluted by mixing white colour to it. So, if you mix white colours to appear spectrum
colour in different amounts, what we get is different shades of that particular spectrum colour
and as we said that I - the intensity this actually is the chromatic motion of brightness of black
and white image.
So, what we have is hue which tells us that what is the prominent colour, prominent primary
colour or spectrum colour in that particular specified colour. We have the saturation which
indicates that how much white light has been added to appear spectrum colour to dilute it and we
10
have this component intensity which is actually a chromatic motion of the brightness. Now, the
given the problem is given a colour in RGB space; can we convert that to HSI space?
Now, this HSI model has other importance also in addition to just interpretation by in addition to
human interpretation because you find that this HSI model, it decouples the intensity information
from the colour information. So, this I gives you the intensity information whereas H and S, the
hue and saturation together, this gives you the chromatic information. So, as we can decouple the
chromatic information from the intensity information, many of the image processing algorithms
which are developed for black and white images or … images can be applied to this H to the
images specified in HSI space. So, conversion of an image from the RGB space to HSI space is
very very important.
Now, let us see that how we can convert an image specified or a colour specified in RGB space
to a colour in the HSI space. Now, in order to do this, what we can do is we can reorient the
RGB cube, the RGB space in such a way that the black point or the origin in the RGB cube is
kept at the bottom and the white comes directly above it; so, as shown here.
So, you find that it is the same RGB cube and what we have done is we have simply reoriented
this RGB colour cube so that the black comes at the bottom. So, this is the black one, the black
comes at the bottom and the white comes directly above this black point. So naturally, as before,
the line joining black and white, this represents the intensity axis. which so, any point on this
particular line which joints black and white, they would not show any colour information but
they will have different intensities or different gray shades.
Now, once we have once we reorient this RGB cube like this; now suppose we have a colour
point, we have any colour point specified within this RGB cube, so I have this colour point
specified in the RGB space. Now, for these colour point, now our aim is how we can convert this
RGB specification into HSI specification.
11
So, as we said that the line joining black and white, this line is the intensity axis. So, in the HSI
space, I can very easily compute the intensity component because for this point say I, say this is a
point say X, I can represent this point as a vector joining from black to this particular point and a
intensity component that is associated with this RGB value is nothing but projection of this
vector on the intensity axis.
So, if I project this vector on the intensity axis, then the length of this projection tells us what is
the intensity component associated with this particular RGB specified colour point. Now, to get
this, what I can do is we can draw a plane, we can just pass a plane which is perpendicular to the
intensity axis and containing this particular point X.
So, the point at which this plane will cut the intensity axis that point represents the intensity
associated with the RGB components specified for this particular point X. So, this is how we can
compute the intensity component and then you find the next component that is the saturation.
How we can compute? So, this one tells us that what is the intensity. Now, how can we compute
the saturation? Because the line joining black and white, the intensity axis, any point on the
intensity axis has only gray shades, this does not have any colour component.
So, you can say that the saturation of all the RGB points, the saturation associated with all the
RGB points lying on this intensity axis is equal to 0 and the saturation will increase as the point
will move away from this intensity axis. So, keeping that in mind, we can say the distance of this
point X from this intensity axis, these distance tells us that what is the saturation that is
associated with the RGB components of point X. So, you can very easily compute the intensity
and saturation corresponding to any RGB point given in the RGB space.
Now, the next question is computation of the hue component. So, out of hue saturation and
intensity, we have been able to compute the saturation and intensity very easily. The next
component which is left is the hue component. Now, for hue component computation, the
concept is slightly more complicated.
12
Now, you find that in this diagram, we have shown a plan passing through the points – black,
white and cyan. So, as we said that the line joining black and white is the intensity axis; so,
intensity at point black is equal to 0 and intensity at point white is maximum that is equal to 1
and the other point which defines this particular plane is the cyan that is this particular point.
Now, you will appreciate that for any points in this particular plane defined by these 3 points
black, white and cyan, we will have the same hue because as we said that hue indicates that what
is the prominent wavelength of light present in any particular colour and from our earlier
discussion, you can also very easily verify that for any point given on this particular plane
defined by these 3 points cyan, white and black for any points, the colour components can be
specified by linear combination of these 3 points - cyan, white and black.
Now, because white is a balanced colour which contains all the primary components in equal
proportion and black does not contain any color component; so these 2 points - white and black
cannot contribute to the hue component associated with this point lying in this plane.
So, the only point which can contribute to the hue component is the cyan. So, for all the points
lying in this plane, the hue will be same and it will be same as the hue associated with this point
cyan. So, here you find that if I rotate this particular plane around this black and around the
intensity axis by an angle of 360 degree, then I will trace all the possible points that can be
specified in the RGB colours space. And by tracing, by rotating this plane by 360 degree around
the intensity axis, I can generate all possible hues that can be all possible hues corresponding to
every possible RGB point in the RGB colour cube.
13
Now, in order to do that what I do is like this; suppose, I take projection of this RGB cube on a
plane which is perpendicular to the intensity axis. So, if I take the projection, then the different
vertices of the cube will be projected on a hexagon as shown in this particular figure.
So, the 2 vertices corresponding to red and black, they will be projected at the center of the
hexagon. So, this point will be the projected point for both white and black and the other primary
colours of light and the primary colours of pigments, they will be projected at different vertices
of the hexagon.
So here, you find that if a draw a vector or if I draw lines joining the center of the hexagon to all
the vertices of the hexagon; then red and green, they will be separated by an angle of 120 degree.
Similarly, green and blue, they will be separated by an angle of 120 degree. Similarly, blue and
red, they will also be separated by an angle of 120 degree. In the same manner, for the secondary
colors - yellow and cyan, they will be separated by an angle of 120 degree. Cyan and magenta
will be separated by an angle of 120 degree and similarly magenta and yellow, they will also be
separated by an angle of 120 degree.
However, the angle of separation between red and yellow, this is equal to 60 degree. So, if I take
the projection of the RGB cube on a plane which is perpendicular to the intensity axis, then this
is how the projection is going to look like.
14
Now, you will find that the projection of the shaded plane that we have seen in our pervious slide
will be a straight line like this.
So, for any point specified in the RGB colours space, there will be a corresponding point in our
projected plane. Say, this is a plane corresponding to a color point in the RGB colors space and a
plane on which this color point will lie, the plane defined by the corresponding color point, the
black point and the white point on which this point will lie that will be projected as a straight line
on this particular plane.
15
So, as we rotate the plane by 360 degree around the intensity axis, this particular straight line
will also be rotated by an angle of 360 degree around the center of the hexagon. So, if I rotate
this particular shaded plane by an angle of 360 degree around this black and white axis of the
intensity axis; its projection onto the plane, onto this perpendicular plane which is a straight line
will also be rotated by an angle of 360 degree around the center of this hexagon.
Now, this gives us a hint that how we can find out the hue associated with a particular colour
point specified in the RGB color space. So, the hue can be computed like this that the straight
line, it is the angle between the straight line which is the projection of this shaded plane with one
of the primary colors and normally this primary color is taken to be red and this angle is
normally is measured in anticlockwise direction. so that Using this particular convention, the red
will have a hue which is given by 0 degree and as we rotate this shaded plane around this
intensity axis, this particular straight line which is the projection of it will also be rotated by 360
degree and as it is rotated, the hue is going to be increased.
So, hue is normally measured by the angle between the red axis and the line which is the
projection of this plane on this projection of this shaded plane on this plane of this hexagon.
Now, given this particular concepts that is how we can obtain the hue saturation and intensity
components for any color specified in RGB colour space; now we can find out that if you follow
the geometry of this particular formulation, then we can have very easy relations to compute the
H, S and I components from the R, G and the B components.
So here, the hue component; the H will be simply given by an angle theta and as we said that this
theta is measured anticlockwise from the direction of red, so this will be equal to theta if the blue
component is less than or equal to green component and it will be 360 degree minus theta if blue
component is greater than the green component.
16
So, this is how we can compute the hue component in the HIS in the HSI model where the value
of theta is given by cosine inverse half of R minus G plus R minus B divided by R minus G
square plus R minus B into G minus B into this to the power half where R is the red component,
G is the green component and B is the blue component of the colours specified in the RGB
space.
So from this R, G and B component, we can compute the value of theta following this expression
and from this theta, you can find out the hue component in the HSI space as hue will be equal to
theta if blue component is less than or equal to green component and hue will be equal to 360
degree minus theta if blue component is greater than green component.
Similarly, following the same geometry, we can find out that the saturation is given by 1 minus 3
divided by R plus G plus B into minimum of R, G and B and the intensity is simply given by 1
third into R plus G plus B. So, from these red green and blue components, we can very easily
find out the hue saturation and intensity components.
So, as we have converted from RGB space to the HIS space; similarly, you should also be
convert any colour specified in HSI space into the components in the RGB space. So, to do that
conversation or the inverse conversation, the corresponding expression can be found as you find
that whenever we want to convert from HSI to RGB, then there are 3 regions of interest.
One region is called RG region and in RG region, H lies between 0 degree and 120 degree. The
other region is called GB region and in GB region, H lies between 120 degree and 240 degree
and the third region is called BR region and in BR region, H lies between 240 degree and 360
degree. And, here in the RG region, you find that getting red, green and blue components is very
easy is very easy. What I do is simply the blue component is given by I into 1 minus S where I is
the intensity and S is the saturation. The red component is given by I into 1 plus S cosine H
divided by cosine 60 degree minus H and green component is simply given by 1 minus R plus B.
17
Similarly, in the GB region, the first operation that we have to do is we have to modify H. So, we
have to make H is equal to H minus 120 degree and once we do this, we get the R, G and B
components like this - R is equal to I into 1 minus S, G is given by I into 1 plus S cosine H
divided by cosine 60 degree minus H and blue in the same manner is given by 1 minus R plus G
and in the third sector that is in the BR region, we have to modify first H like this, H should be
equal to H minus 240 degree. And once we do this modification, then the G component is given
by I into 1 minus S, the blue component is given by I into 1 plus S cosine H divided by cosine 60
degree minus H and obliviously, the R component is given by 1 minus G plus B.
So, we find that using these simple expressions, we can convert a colour specified in the RGB
space to a colour in the to the colour components in the HSI space. Similarly, a colour specified
in the HSI space can be easily converted to colour components in the RGB space.
So here, in this diagram we have shown the effects of this different components H, S and I on the
colours. So, in the first row you find that the first rectangle S - a colour for which hue is equal to
0, intensity is equal to 128 and saturation S is equal to 255. So, as we said that our hue is
measured from the red axis, from the red line; so hue equal to 0 indicates that it is red colour. In
this case, the saturation equal to 255 which is the maximum that means it is the pure red colour
and intensity here is 128.
So, for other rectangles in the same row, what we have done is we have kept hue and intensity
constant whereas the saturation is decreased. So, we will find that as we move from left to right,
it appears that this red has become is becoming milky gradually. So, as we have saturation is
equal to 200 which is less than 255, it appears that some amount of white light has been added in
this particular red component and that is very very prominent when S is equal to 100 or even S is
equal to 50 where a large amount of white light has been added in this red component.
18
In the second row, what we have done is we have kept hue and saturation constant that is hue
equal to 0 and saturation equal to 255 and what we have varied is the intensity component or the
I component. So, here you find that as we decrease the I component, the different rectangles still
show the red colour as we move from left to right in the second row, they are still red but the
intensity of the red goes on decreasing.
So, if you just note the difference between the first row and the second row; in that first row it
appears that some white light has been mixed with the red whereas, in the second row there is no
such appearance of mixing of white light but it is of the intensity which has been decreased. If
you look at the third row, in the third row what we have done is we have kept the intensity and
saturation constant but it is the hue component which has been changed.
So, we have started with hue equal to 0 which is red and here we really find that as we change
the hue component, it is really the colour which gets changed unlike the previous 2 rows. In the
first row, it is the saturation or more and more white light is being added to the pure colour; in
the second row, the intensity is getting changed but in the third row, we are keeping the intensity
and saturation same which is the hue component, it is the colour itself that gets able to changed.
So, here you find that when we have hue is equal to 100, it is in the green colour; when we have
hue is equal to 150, it is the blue colour or else when hue is equal to 50, it is a colour between
yellow and green. So, with these we have introduced the various colour models, we have
introduced the RGB colours space, we have introduced the CMY or cyan, magenta and yellow
colour space and also CMYK that is cyan, magenta, yellow and black colure space and we have
also introduced the HSI colour space and we have seen that given any colour, given the
specification of any colour in any of the spaces, we can convert from one space to another. That
is from RGB to CMY the conversion is very easy, we can also convert from RGB to HSI but the
conversion is slightly more complicated.
Now, with this introduction of the colour spaces, next what we will talk about is the colour
image processing.
19
So far, what we have discussed is the representation of a colour or the representation when we
take an image and take the colours component, the colours present in the image; then so far we
have discussed how to represent those colours in either the RGB plane or CMY or CMYK space
or the HSI space and images represented in any of these models can be processed. So, in colour
image processing, we basically have 2 types of processing. One kind of processing is called
pseudo colour processing, this is also sometimes known as false colour and the other kind of
processing is what is called full colour processing.
In pseudo colour processing, as the name implies that these colours are not the real colours of the
image. But we try to assign different colours to different intensity values. So, pseudo colour
processing, actually what it does is it assigns colours to different ranges of gray values based on
certain criteria.
Now, what is the purpose of assigning colours to different ranges of gray values? As we have
mentioned earlier that if we have a simply black and white image, we can distinguish hardly 2
dozens of the gray shades whereas in colour, we can distinguish thousands of colour shades. So,
given a grayscale image or a simply black and white image, if we can assign different colours to
different ranges of gray values, then the interpretation of different ranges of gray values is much
more easier in this pseudo colour images than in the grayscales images.
So, we will discuss about how we can go for pseudo-colouring of an image given in the pseudo-
colouring of a black and white image. So, as we said that this coloring has to be done following
some criteria. So, with this introduction and of course, in case of full colour processing as the
name indicates, the images are represented in the full colour and the processing will also be used,
processing will also be done in the full colour domain.
So, given this introduction of the colour processing techniques, the 2 types of the colour
processing, pseudo colour processing and full colour image processing; we finish our lecture
20
today and we will continue with this topic in our next class. Now, let us come to some questions
on today’s lecture.
The first question is what is the usefulness of chromaticity diagram? How can you convert a
colour specified in RGB model to HSI model? Third question, how does CMYK specification
help in colour printing? The forth question, which colour is indicated by H is equal to 60 that is
hue is equal to 60 and the fifth question, what is the use of pseudo colouring techniques?
Thank you.
21
Prof. P. K. Biswas
Lecture - 28
Colour Image Processing - III
Hello, welcome to the video lecture series on digital image processing. For last two classes, we
are discussing on colour image processing techniques. So, for last two classes, we have
introduced the concept of colours and the colour models.
In our last class, we have discussed about the RGB colour model, we have discussed about the
HSI colour model, along with RGB colour model we have also discussed about CMY and
CMYK colour models. Then after discussing about the HSI colour model, we have talked about
conversion from one colour model to another and we have just told about what is the difference
between pseudo colour and full colour image processing techniques.
1
So, in our last class, we have said that when we talk about colour image processing techniques;
generally, we have 2 categories of colour image processing. One is called pseudo colour image
processing or this is also known as false colour; so, pseudo colour processing and the other
category is what is known as full colour image processing.
So, we have just said that this pseudo colour image processing, the basic purpose of pseudo
colour image processing technique is to assign different colours in different for different
intensity ranges in a black and white image. The purpose is as we have told earlier that given a
black and white image or human eye can simply distinguish between only around 2 dozens of
black and white shades or intensity shades whereas given a colour given a colour image, we can
distinguish among thousands of colour shades.
So, given a black and white image or an intensity image; if we go for pseudo colour processing
techniques that is assign different colours to different ranges of intensity values, in that case
interpretation of such an intensity image is more convenient than the interpretation of an ordinary
or simple intensity level image.
Now, the basic way in which the pseudo colouring technique can be used that is as we said that
the purpose of pseudo colouring technique is to assign different colours to different ranges of
intensity values; the simplest approach in which this pseudo colouring can be done is by making
use of intensity slices. So, what we can do is we can consider an intensity image to be a 3D
surface.
2
So, as shown in this particular slide that given an intensity image say f (x, y) which is the
function of x and y; so different intensity values at different locations of x and y, if we consider
them to be a 3 dimensional surface, then what we can do is we can place planes which are
parallel to the image plane that is parallel to the xy plane.
So, as shown in this particular diagram, if I place such a plane at some intensity value say x i
so at this intensity value say l i ; we have placed a plane which is parallel to the xy plane. Now,
find that this particular plane which is parallel to xy plane, this slices the intensities into 2
different halves.
So, once I get these 2 different halves, what I can do is I can assign different colours to different
to 2 different sides of this particular plane. So, on this side, I can assign one particular colour
whereas on the other side that is this side, I can assign another colour. So, this is the basic
techniques of pseudo coloring. That is you slice the intensity levels and to different slices, you
assign different colours.
3
So, in our case we assume, let us assume that our image, the intensity values, the discrete
intensity values in a black white image varies from say 0 to capital L minus 1. So, I have total L
number of intensity values in our image, L number of intensity values. So one, we have this L
number of intensity values and we assume that an intensity values say l 0 which represents an
intensity levels say black; this means that the corresponding f (x, y) at locations xy where the
intensity is l 0 , this is equal to 0.
Similarly, the L minus first intensity level, I assume that this is equal to white that means all the
corresponding pixels f (x, y) will have a value equal to capital L minus 1. And, let us also assume
that we have we will draw P number of planes number of planes perpendicular to the intensity
axis. So, perpendicular to the intensity axis means they are parallel to the image planes and these
planes will be placed at the intensity values given by say l 1 , l 2 upto say l p . So, first plane will be
placed at intensity value l 1 , the second plane will be placed at intensity value l 2 and this way the
p’th plane will be placed at intensity value l p .
So obviously, in this case, P - the number of planes has to lie from 0 to capital L minus 1 where
L is the number of gray level intensities that we have. So, once we place such P numbers of
planes which are perpendicular to the intensity axis, these P number planes divide the intensities
in to p plus 1 number of intervals. So, once I divide the intensity ranges into p plus 1 number of
intervals, then our colour assignment approach will be that a particular location, the colour to a
location f (x, y), this colour will be equal to C k or instead of calling it f, let me call it some
function say h.
4
So, the colour assigned to location xy which is h (x, y) will be C k if the corresponding intensity
value at that location xy, f (x, y) lies in the range V k where V k is the intensity range which is
defined by the planes placed at the locations l k and l k plus 1. So, as we said that there are P
numbers of planes, so these P number of planes will divide our intensity range into P plus 1
number of ranges or intervals and we call this intervals as interval v 1 v 2 upto interval v p plus 1 .
So, we assign a colour C k to a particular location xy; so we write h (x, y) will be equal to C k if
the intensity value at the corresponding location which is given by f (x, y), this intensity value
belongs to the intervals V k . Now, by using this simple concept that is you divide your intensity
range into a number of intervals and to a particular location in the intensity image, you assign a
colour which is determined by in which of the intervals the intensity of the image at that
particular locations belong; then, what we get is a pseudo coloured image. So, let us see some
examples of this pseudo coloured image.
5
Here we have say said that on the left hand side, we have an intensity image of black and white
image. If I apply pseudo coloring, if I go for the pseudo coloring techniques, then the pseudo
coloured image is as shown on the right hand side. Similarly the bottom one, this is an image
which is an enhanced version of this and if I apply pseudo colouring technique to this particular
black and white image, then the corresponding pseudo coloured image is given on the right hand
side.
So, here you find that interpretation in the pseudo coloured image or the distinction between
different intensity levels in the pseudo coloured image is much more easier than the distinction in
the corresponding intensity image or gray scale image.
6
Now, this particular application will be more prominent in this particular diagram. Here again,
you find that on the left hand side, we have an intensity image or a gray scale image and you find
that in these regions, the intensity values appears to be more or less flat. That means I cannot
distinguish between different intensity levels which are present in this particular diagram or as on
the right hand side if I go for pseudo coloring, you find that these different colours which are
assigned to different intensity levels in this particular black and white image, this clearly tells us
that what are the different regions of different intensity values in this particular black and white
image.
So, another application, the other application of pseudo coloring technique is from gray to colour
of transformation. So here, what we have shown is to different intensity intervals, we have
assigned different colours. Now, when we go from gray scale to color transformation, then what
we have to do is if I have an intensity image or gray scale image that corresponds to a single
plane, I have to convert that to 3 different planes that is R, G and B - red green and blue planes
and those red, green and blue planes when they combine together, they are combined together,
they give you an interpretation of a colour image.
7
So, that kind of colour gray to colour transformation can be done by using this type of
transformation functions. So here, you find that our input image f (x, y), this is an intensity
image or gray scale image. Then, what we are doing is this gray scale image is transformed by 3
different transformations; one corresponds to the red transformation, the other one corresponds
to the green transformations and the third one corresponds to the blue transformation.
This red transformation generates the red plane of this image which is given by f R (x, y), the
green transformations generates the f G (x, y) or the green plane corresponding to this intensity
image f (x, y) and the blue transformation generates f B (x, y) which is the blue plane
corresponding to this intensity image f (x, y).
So, when these 3 images that is f R (x, y), f R (x, y), f G (x, y) and f B (x, y) - the red, green and blue
planes they are combined together and displayed on a colour display, what we get is a pseudo
coloured image. But in this case, you find that the colour is not assigned to different intensity
ranges but the colour is decided, the colour of entire image is decided by the corresponding
transformation functions. So, the colour content of the colour image that we will generate that is
determined by the transformation functions that we use.
8
Now, let us see that what are the kind of colour images that we can obtain using this gray scale to
colour transformation. So, in this diagram as it is shown, on the left hand side, we have an
intensity image or black and white image which is transformed into a colour image. So, on the
right hand side is the corresponding colour image and the colour transformations that has been
used are like this - here we have used that f R (x, y) is equal to f (x, y) that is whatever is the black
and white intensity image that is simply copied to the red plane. The green plane, the f G (x, y) is
generated by 0.33 f (x, y). That means the intensity value at any location in the original black and
white image is divided by 3 and whatever value we get that is copied to the corresponding
locations in a green plan.
Similarly, f B (x, y), the blue plane is generated by dividing the intensity image by multiplying
the intensity image by a value 0.11 or dividing the intensity image by a value 9. So, by these
transformation functions, we have generated f R - the red component, f G - the green component
and f B - the blue component and when you combine this red component, green component and
blue component; the corresponding colour image which is generated is like this.
Now here, you should remember one point that this colour image that is being generated it is a
pseudo coloured image. Obviously, it is not a full colour image or the colour of the original
image is not generated in this matter. So, the only purpose is the different intensity the different
intensity regions will appear as different colours in our coloured image. So, this colouring is
again a pseudo colouring, it is not the real colouring.
9
Now, we have another example on this pseudo coloring. Here, it is a natural scene where again
on the left hand side, we have the intensity image or the black and white image and when you go
for gray scaling to colour transformation, now the transformations are like this - here, the green
component is same as the original intensity image. So, we have taken f G (x, y) is equal to f (x, y)
where f (x, y) is the original intensity image, the red component is generated as one third of (x, y)
and the blue component is generated as one nineth of (x, y).
So, by generating the red, green and blue components, blue planes from the original f (x, y) in
this manner and if you combine them, the corresponding pseudo coloured image that we get is
given on the right hand side. So here, you find that if I compare the earlier image with this; in our
earlier case, the coloured image was showing more of red component because in this case, f R
was same as f (x, y) where as green and blue were scaled down versions of f (x, y) whereas in
this particular case, our pseudo coloured image appears to be green because here f (x, y) here the
green component, the green plane is same as f (x, y) whereas red and blue are taken as scaled
down version of f (x, y).
So, if we change the weightage of these different functions of these different red and blue green,
red and blue planes, the colour appearance will again be different. So, a gray scale image can be
converted to a pseudo coloured image by this kind of conversion by applying different
transformations for different red, green and blue planes.
10
Now, many of you might have seen the x-ray security machines like what is used in airports.
Here, you find that this is an x-ray image on the left hand side of a baggage which is screened by
an x-ray machine. If you have looked at the screen which the security people checks; on the
screen, this image appears in this particular form where you find that the background has
appeared as red, the different garments bags, they have appeared as blue. Of course, there are
different shades whereas there is a particular region over here which is appeared as again red.
Now again, this is a pseudo colouring technique which is applied to obtain this kind of image and
the purpose is if you have a pseudo coloured image like this, you can distinguish between
different objects present in this particular image and in this particular case, normally the kind of
transformation functions for red, green and blue which are used are given like this.
11
The transformation functions are usually sinusoidal functions. So here, what you have is you
have this is the red transformation, this is the intensity values along the horizontal access we
have the intensity values of the gray scale image which varies from 0 to the maximum value
capital L minus 1. The top curve, sinusoidal curve, it shows the red transformation; the middle
one shows the green transformation and the last one shows the blue transformation and here you
find that these different sinusoidal curves, it appears to be a fully rectified sinusoidal curve is
shifted from one another by certain amount so as if we have given some phase shift to this
different sinusoidal curves.
Now, when the transformations are given like this; so if you have an intensity values say some
were here, then the corresponding red component will be generated as this value, the
corresponding green component will be generated as this value and the corresponding blue
component will be generated at as this value. So, this particular intensity level will be coded as a
coloured point as a colour pixel having red component given by this much and the green
component given by this much and blue component given by this much.
Now, what is done for this pseudo coloring purpose is that you define different bands of input
intensity values and the different bands are given to different objects. For example, a band
somewhere here, the band somewhere here, this is for identification of say an explosive, a band
somewhere here is for identification of the garments bags and so on.
So here, you find that if this is the band which is given which is used to detect the explosives, the
amount of red light which is generated, the amount of red component which is generated by this
particular band is the maximum one. So, an explosive will appear to be a red one whereas for this
particular one which is for the garment bags where the red component is not as high as this; so
this will not appear as bag red as an explosive.
12
So, different band of frequencies are identified or the different band of intensity values are
identified or specified to identify different types of items and by using this kind of
transformation, we can distinguish between different object which are there in the bags. So, by
using this pseudo coloring techniques, we can give different intensity values to different intensity
ranges and as you have just seen that we can convert a gray scale or an intensity image to colour
image where the colour image as it is a pseudo coloured image, it will not really have the exact
the colour components but the pseudo colour image gives us the advantage that we can
distinguish between different objects present in the image from its colour appearance.
The next type of image processing techniques: so these are the 2 different pseudo colouring
techniques that we have discussed. The next kind of the image processing techniques that we
discuss is full colour image processing and as we have said that unlike in case of pseudo colour
techniques, in case of full colour image processing, what we will do is we will consider the
actual colours present in the image and as we have said that as there are different colour models,
a colour image can be specified in different colour models.
For example, a colour image can be specified in RGB colour space, a colour image can also be
specified in HSI colour space. Now, because we have this different colour components of any
particular colour pixels, so we can have 2 different categories of colour image processing; one
category of colour image processing is per colour plane processing. So, in case of this, in this
category, what you do this you process every individual colour components of the colour image
and then this different processed components that you have, you combine them together to give
you the coloured processed image.
And, the other type of processing is by using the concept of vectors. So, as we have said that
every colour pixel has 3 colour components, so any colour can be considered as a vector. So, if it
is the colour specified in RGB space, then it is the vector drawn to the point which specifies the
colour from the origin of the RGB colour space.
13
So, there are 2 kinds of processing; one is per colour plane processing in which case, every plane
is processed independently and then the processed planes are combined together to give you the
processed colour output and the other type of processing, the other category of processing is
when all the colour components are processed together and there the colours different colours are
the considered as vectors.
So obviously, the colour at a particular point x y C (x, y), if we are going for an RGB colour
space if it is specified in RGB colour space, the point (x, y) will have 3 colour components; one
is the red component at location (x, y) and the other one is green components at location (x, y)
given by G (x, y) other one is blue component at location (x, y) which is given by B (x, y).
So, every colour is represented by vector and the processing is done by considering these
vectors. That means all the colour components are considered together for processing purpose.
So, accordingly we will have two types of colour processing techniques.
The first kind of processing that we will consider is what we call as colour transformation. Now,
you may recall from our discussion with the gray scale images or black and white images that
where we have defined a number of transformations for enhancement purpose and there we have
defined the transformation as say s is equal to some transformation T of r where r is an intensity
value, intensity at a location in the input image f (x, y) and s is the transformed intensity value in
the corresponding location of the processed image g (x, y) and there the transformation function
was given by S is equal to T (r). Now, we can extend the same idea in our colour processing
techniques.
The extension is like this - now in case of intensity image, we had only one component that is the
intensity component, in case of colour image we have more than one components that is may be
RGB component if the colour is specified in RGB space or HSI component if the colour is
14
specified in HSI space. Correspondingly, we can extend the transformation in case of colour is s i
is equal to some transformation function T i of r 1 r 2 upto r n for i equal to 1, 2 upto n.
So here, we assume that the colour, every colour is specified by a 3 component vector having
values r 1 to r n. s i is a colour component in the processed image G (x, y) and r i , every r i is a
colour component in the processed image f (x, y). So, s i is a colour component in the processed
image G (x, y) and r i is a colour component in the input image, in the input colour image f (x, y)
and here, n is the number of components in this colour specification and T i that is T 1 to T n , it is
actually the set of transformations or colour mapping functions that operate on r i to produce s i .
Now, if we are going for RGB colour space or HSI colour space; then actually, the value of n is
equal to 3 because in all these cases, we have three different components.
Now, first application of this intensity transformation of this colour transformation that we will
use is intensity modification. Now, as we can represent a colour in different colour models or
different colour spaces; so theoretically, it is possible that every kind of colour processing can be
done in any of those colour spaces or using any of those colour models.
However, it is possible that some processing, some kind of operation is more convenient in some
colour space but it is less convenient in some other colour space. However, in such cases, we
have to consider the cost of converting the colours from one colour model to another colour
model.
15
Say for example, in this particular case, you find that if I have a colour image which is given in
RGB colour space, the different colour planes of the same image; this is the red colour plane, this
is the green plane and this is the blue colour plane. So, this colour image can have these different
3 different colours planes in the RGB model. Similarly, the same image can also be represented
in HSI colour space where this left most image gives you the H component, this gives the
saturation component and this gives the intensity component.
Now, from this figure it is quite apparent as we claimed earlier that it is the intensity component
in the HIS model which is the chromatic notion of brightness of image. So here, you find that
this actually indicates what should be the corresponding black and white image for this colour
image. So, as we can represent a colour image in this 3 different, in these different models; so it
is possible, theoretically possible that any kind of operation can be performed in any of these
models.
16
Now, as we said that the first application that we are thinking of that we are talking about is
intensity modification. This intensity modification transformation is simply like this – say, G (x,
y) is equal to some constant k times f (x, y) where f (x, y) is the input image, this is the input
image and G (x, y) is the processed image and in this particular case, if we are going for colours
scaling; then our intensity scaling, intensity reduction, the value of k lies between 0 and 1.
Now, as we said that this operation can be done in different colour planes; so, if we consider the
RGB colour space, then our transformation will be s i is equal to the constant, same constant k
times r i for i varying from for i is equal to 1, 2 and 3 where 1 the index 1 will used to indicate the
red component, index 2 is used to indicate the green component and index 3 is used to indicate
the blue component.
17
So, this indicates that all the different colour planes; the red plane, green plane and blue plane, all
of them are to be scaled by the same scale vector whereas, if I do the same transformation in HSI
space, then as we said that the intensity information is contained only in i. So, the only
transformation, the transformation that will be needed in this particular case is s 3 is equal to the
constant k times r 3 where as the other 2 components corresponding to hue and saturation can
remain the same.
So, we will have s 1 equal to r 1 that is hue of the processed to image will then remain same as the
hue of the input image. We have s 2 is equal to r 2 that is saturation of the processed image will
remain same as saturation of the input image. Only the intensity component will be scaled by the
scale factor k. The similar such operation if we perform in CMY space, then the equivalent
operation in CMY space will be given by s i is equal to the constant k times r i plus 1 minus k and
this has to be done for all the i that is all the planes C, M and Y planes.
So, if I compare the operations that we have to do in RGB colour plane, RGB space, the
operation in HSI space and the operation in CMY space; you find that the operation in HSI space
is the minimum of these 3 different spaces because here only the intensity value is to be scaled,
hue and saturation value remain unchanged whereas both in RGB and CMY space, you have to
scale all the three different planes.
However, as we said that though the operation, the transformation takes minimum time in the
HSI space, transformation has minimum complexity in HSI space but we also have to consider
that what is the complexity of converting from RGB to HSI or CMY to HIS because, that
conversion also has to be taken into consideration. Now, if I apply this kind of transformation,
then the transformed image that we get is something like this.
18
Here, the operation has been done in the HSI space. On the left hand side, we have the input
image and on the right hand side, we have the intensity modified image. So, this is the image for
which the intensity has been modified by a scale factor of around 0.5. So, we find that both the
saturation and hue, they appear to be the same but only the intensity value in this particular case
has been changed. Of course, this equivalent operation can as we said, can also be obtained in
case of RGB plane as well as in CMY plane. But there the transformation operation will take
more computation than the transformation operation in case of HSI plane where we have to scale
only the intensity component keeping the other components intact.
19
The next application of this full colour image processing that will consider is colour
complements. Now, to define this colour complements, let us consider let us first look at a colour
circle.
So, this is the colour circle. You find that in this particular colour circle, if I take the colour at
any point on the circle, the colour which is located at the diagonally opposite location in this
circle is the compliment of the other colour. So, as shown in this figure that here if I take a colour
on this colour circle, its complement is given by the colour on this side and similarly the reverse,
the colour on this side has a complement on the colour on the other side.
So, this simply says that hues which are directly opposite to one another in the colour circles,
they are complements of each other. Now, this colour complement as we have said the colour
complement, this is analogous to the gray scale negatives. When we have talked about the gray
scale or intensity image processing, we have also talked about the negative operation. This
colour complements is analogous to that gray scale negative operation.
So, the same operation which we had used in case of gray scale image to obtain its negative, if I
apply the same transformation to all the R, G and B planes of a colour image represented in RGB
space, then what I get is our complement of the colour image or this is through really the
negative of the colour image.
20
So, those colour images can be obtained by a transformation function of this form. In case of
intensity image, we had a single transformation but in case of colour image, I have to apply this
same transformation on all the colour planes that is I have to get it s i equal to T of r i which is
equal to L minus 1 minus r i . This should be r i for all values of i that means here i will be from 1,
2 and 3. That is for all the colour planes – red, green and blue, I have to apply this same
transformation.
So, by applying this, I get an image like this. Here, you find that on the left hand side, I have a
colour image and on the right hand side, by applying the same transformation on all the 3 planes
21
that is red, green and blue planes, I get a complement image or you find that this is same as the
photographic negative of a colour image.
In the same manner, this is another colour image and if I apply the same transformation to red,
green and blue components of this particular colour image, then I get the corresponding negative
of the complement colour image as shown on the right hand side.
22
The next application that we will consider of this full colour image processing is colour slicing.
You will find that incase of RGB image, we have said that the application of colour slicing is to
highlight the regions of certain intensity ranges or certain intensity region. In the same manner,
the application of colour slicing in case of colour image is to highlight certain colour ranges and
this can be applied, this is useful for identification of objects of certain colour from the
background or to differentiate objects of some colour from some other colour.
The simplest form of colour slicing can be that we can assume that all the colours of interest lies
within the cube of width say W and this cube is centered at a prototypical colour whose
components are given by some vectors say a 1 a 2 and a 3 ; so, as given in this particular diagram.
So here, I assume that I have this cube of width W and the colours of interest are contained
within this cube and the center of this cube is at a prototypical colour which is given by the
colour components a 1 a 2 and a 3 and the simplest type of transformation that we can apply is we
can have the transformation of this form that s i is equal to 0.5 if say r j minus a j , this is equal to
W by 2 for all values of j in 1 and 3 and I set this is equal to r i otherwise and this computation
has to be done for all values of i, i equal to 1, 2 and 3.
23
So, what it means that all those colours which lie outside this cube of width W centered at
location a 1 a 2 a 3, all those colours will be represented by some insignificant colour where all the
red, green and blue components will attain a value of 0.5 but inside the cube, I will return the
original colour.
So, by using this transformation, you find that from this colour image if I want to extract the
regions which are near to red, then I get all those red components as extracted in this right edge
for all other points where in the colour is away from red, you find that they have got a gray
shade. Now, for this kind of application, instead of considering all the colours of interest lying
24
within the cube, we can also consider all the colours to be lying within a sphere centered at
location a 1 a 2 and a 3 .
So here, it is neglect neglects to say that a the vector centered location a 1 a 2 a 3, this tells you that
what is the colour of interest and the width of the cube or the radius of the sphere which ever
may be the case, tells us that what is the variation from this prototype colour that we say that
those colours are also of interest.
The other kind of application of this full colour image processing is say, correction of tones or
tone correction. Now again, I can find on analog in intensity image, in the simple black and
white intensity image where we have said that an intense an image can be low contrast, it may be
an image may be dark, it may be light or bright or it may be low contrast depending upon the
distribution of the intensity values.
In the same manner, for colour images, we define the tone. So, a colour image may have a flat
tone, it may have a light tone or it may have a dark tone and these tones are determined by the
distribution of the intensity values of different RGB components within the image.
25
So, let us see that how these images look like incase of a colour image. So here, we find that on
the left, we have shown an image which is flat in nature, in the middle we have an image which
is having light tone and on the extreme right, we have an image which is having dark tone.
Now, the question is how we can correct the tone of this colour image? Again we can apply
similar type of transformations as we have done in case of intensity image for contrast
enhancement.
26
So, the kind of transformations that we can be that can be applied here is something like this. If
an image is flat, the kind of transformation function that we can use for this flat image is of this
form. So here, it is L minus 1, here also it is L minus 1. So, if you apply this type of
transformation to all the red, green and blue components of this flat image, what we get is a
corrected image.
Similarly, an image which is light whose tone is light; here also we can apply a kind of
transformation. Here you find that what is needed to be done is if this image appears to be
darker, then that will be a corrected image. So, the kind of transformation that we can apply is
something like this; so here, it is L minus 1, here also it is L minus 1. So here, what happen is
wide range of intensity values of the intensity values in the input image is mapped to a narrow
range of intensity values in the output image. So, that gives you the tonal correction for an image
which is light.
Similarly, for the image which is dark, the kind of transformation that can be applied here is just
reverse of this. So, the transformation that will apply in this case will have this type of nature.
So, here we will have L minus 1 that is the maximum intensity value, here also we have L minus
1 that is the maximum intensity value.
So here, the kind of operation that we are doing is a narrow range of intensities in the input
image is matched to a wide range of intensities in the output image. So, by applying these type of
transformations, we can even go for tonal correction of the colour images. Of course, the other
kind of transformation that is histogram based processing can also be applied for colour images
as well where the histogram equalization or histogram matching kind of techniques can be
applied on different planes and different colour planes - red, green and blue colour planes of the
input colour image and of course in such cases, in many cases, it is necessary that after the
processing, the processed image that you get that needs to be balanced in terms of colours.
So, all this different colour image processing techniques that we have discussed till now, you
find that they are equivalent to point processing techniques that we have discussed in connection
with our intensity images or black and white images. Now, in case of our intensity image, we
have also discussed another kind of processing technique that is the neighborhood processing
technique.
Similar neighborhood processing technique can also be applied in case of colour images where
for processing an image, it is not only the colour at a particular intensity location that we will
consider but we also consider the colours at the neighboring intensity values.
27
So, we will talk about 2 such processing operations. The first one that we will consider in this
category is smoothing operation. So, for this smoothing, what we have is that in a smooth image,
the colour component c (x, y) c bar (x, y) will be given by 1 over K summation of c (x, y) where
this c (x, y), this is actually a vector having 3 components in RGB space, this will be red, green
and blue components and this averaging has to be done for all (x, y), for all locations (x, y)
which is in the neighborhood of point (x, y).
So here, I can simply do this operation in a plane vice manner where we can write that c bar (x,
y) is nothing but 1 over k into summation R (x, y) for all (x, y) within the neighborhood of N x, y .
Similarly, 1 upon k summation of G (x, y), again for all (x, y) within the neighborhood of N x, y
and 1 over k summation of B (x, y) for again this summation is carried out over the same
neighborhood of (x, y) and these vectors, the average of these vectors gives us what is call the
smooth image.
28
So, the smoothed image in this particular case, you find that on the left hand side, we have the
original colour image and on the right hand side, we have the smoothed image where this
smoothing is carried over carried over a neighborhood size of 5 by 5.
So, as we done the smoothing operation, in the same manner, we can also go for sharpening
operation and we have discussed in connection with our intensity images that an image can be
sharpened by using second order derivative operators like Laplacian operator. So here, again, if I
apply the Laplacian operator on all 3 planes - the red plane, green plane and blue plane
separately and then combine those results, what I get is a sharpened image.
29
So, by applying that kind of sharpening operation, a sharpened image can appear something like
this. So here, on left hand side we have shown the original image and on the right hand side, you
find that the image is much sharper than the image on the left.
Now, when you come for these neighborhood operations like image smoothing or image
sharpening, the type of operations that we have discussed is park colour plane operations. That is
every individual colour plane is operated individually and then those processed colour planes are
combined to give you the colored process image.
Now, as we said, the same operation can also be done by considering the vectors. Or if I do the
same operation in the HSI colour plane where we can modify only the intensity component
keeping the H and S components unchanged; in such cases, the results that you obtain in the
RGB plane and the result that you obtain in case of HSI plane may be different and I give you as
an exercise to find out why this difference should come. So, with this we finish our discussion on
colour image processing.
Now, let us see some of the questions on today’s lecture. The first question is what is meant by
complement of a colour? The second question is what is the complement of red? Third question,
find the transformation in HSI colour space to obtain colour negatives? Fourth question, what is
the use of colour slicing? Fifth question, a colour image has light tone, what type of
transformation should be used to correct the tone? Sixth question, do you expect any difference
in output when image smoothing operation is carried out in RGB space and HSI space?
Thank you.
30
Prof. P.K. Biswas
Lecture - 29
Image Segmentation - 1
Hello, welcome to the video lecture series on digital image processing. From today, we are going
to start our discussion on image segmentation.
In the last class, what we have done is we have discussed about the colour image processing, we
have talked about what is full colour image processing and under full colour image processing
techniques, we have discussed about various topics like colour transformations, colour
complements or colour negatives, we have talked about colour slicing, we have talked about
what is meant by tone of a colour image and how can we go for tone and colour corrections in
case there is any defect in the image tone. Then, we have also talked about processing techniques
like colour image smoothing and we have also talked about how to sharpen a particular colour
image.
Now, you find that till now, the type of discussions that we have done with respect to our digital
image processing; there our intention was to improve the quality of the image as far as the
visualization is concerned. Say for example, we have tried to sharpen the image, we have tried to
enhance the image, we have tied to reduce the noise component in an image. So, in all these
different techniques that we have discussed; our input was an original image and the output was
a processed image. But our aim was that the processing techniques should be such that the output
image is better visually than the input image.
Now today, when we start our discussion on image segmentation, we are going to talk about or
we are going to start our discussion on another domain of image processing which is called
image analysis. So here, our aim will be to extract some information from the images so that
those informations can be used for high level image understanding operation.
So, in today’s discussion, we will see what is image segmentation, we will talk about what are
different approaches of image segmentation and we will see that image segmentation is mainly
categorized into one of the two categories; the segmentation is either discontinuity based or the
segmentation is region based.
Then, we will talk about different edge detection operations and these edge detection operations
are useful for discontinuity based image segmentation technique. Then we will see that how to
link those edge points which are extracted through different images, through different edge
detection operators so that we can get a meaningful edge and under this linking of edge points,
we will talk about two specific techniques; one is the local processing technique, other one is the
global processing or Hough transformation based technique.
Now, let us see what is meant by image segmentation. By image segmentation, what you mean is
a process of subdividing an image into the constituent parts or objects in the image. So, the main
purpose of subdividing an image into its constituent parts or objects present in the image is that
we can further analyze each of these constituents or each of the objects present in the image once
they are identified or we have subdivided them. So, each of this constituents can be analyzed to
extract some information so that those informations are useful for high level machine vision
applications.
Now, when you say that segmentation is nothing but a process of subdivision of an image into its
constituent parts; a question naturally arises that at which level this subdivision should stop?
That is, what is our level of segmentation? Naturally, the subdivision or the level of subdivision
or the level of segmentation is application dependent. Say for example, if we are interested in
detecting the movement of vehicles on a road; so on a busy road, we want to find out what is the
movement pattern of different vehicles and the image that is given that is an aerial image taken
either from a satellite or from an from a helicopter.
So in this particular case, our interest is to detect the moving vehicles on the road. So, the first
level of segmentation or the first level of subdivision should be to extract the road from those
aerial images and once we identify the roads, then we have to go for further analysis of the road
so that we can identify every individual vehicle on the road and once we have identified the
vehicles, then we can go for vehicle motion analysis.
So here, you find that in this particular application, though an aerial image will contain a large
area; many of the areas will have information from the residential complexes, many of the areas
will have information of water bodies, say for example a sea or river or a pond, many of the
areas will contain information of agricultural lands but our application says that we are not
interested in water bodies, we are not interested in residential areas, neither we are interested in
agricultural lands. But we are only interested in the road segment and once we identify the road
segment, then we have to go further subdivision of the road so that we can identify each and
every vehicle on the road.
So, as I said that our subdivision of an image at the first level should stop after we are able to
extract or identify the road component, the road segments and after that we have to subdivide the
road component to identify the vehicles and we need not go for segmentation of the vehicle in its
constituent parts because that is not of our interest. Similarly, we should not or we need not
segment or analyze the residential complexes or water bodies or agriculture lands for further
subdivision into its constituent parts.
So, as we said that this segmentation or level of subdivision is application dependent; now for
any automated system, what we should have is automatic processes which should be able to
subdivide an image or segment an image to our desired level. So, you will appreciate that image
segmentation is one of the most important task in machine vision applications. At the same time,
image segmentation is also one of the most difficult tasks in this image analysis process and we
will easily appreciate that the success of the image analysis operations or machine vision
applications is highly dependent on the success of the autonomous segmentation of objects or
segmentation of an image.
So, this image segmentation, though it is very difficult but it is a very very important task and
every machine vision application software or system should have a very very robust image
segmentation algorithm. So now, let us see that what are the different image segmentation
algorithms or techniques that we can have.
Now, as we have just mentioned that image segmentation approaches are mainly of two different
types, so we have two different approaches of image segmentation; one of the approach as we
have just said is the discontinuity based approach and the second approach is what is called
similarity based approach.
In discontinuity based approach, the partition or subdivision of an image is carried out based on
some abrupt changes in intensity levels in an image or abrupt changes in gray levels of an image.
So, on the discontinuity based approach, our major interest, we are mainly interested in
identification of say isolated points or identification of lines present in the image or identification
of edges. So, under discontinuity based approach, we are mainly interested in identification of
isolated points or identification of lines or identification of edges.
In the similarity based approach, the approach is slightly different. Here, what we try to do is we
try to group those pixels in an image which are similar in some sense. So, the simplest approach
under this similarity based technique is what is called thresholding operation. So, by thresholding
what we mean is as we have already said that if we have images where every pixel is coded with
8 bits, then we can have intensities varying from 0 to 255 and we can decide a threshold
following some criteria, say we decide that we will have a threshold level of say 128; so we
decide that all the pixels having intensities of having an intensity value greater than 128 will
belong to some region whereas all the pixels having intensity values less than 128 will belong to
some other region. So, this is the simplest thresholding operation that can be used for image
segmentation purpose.
The other kind of segmentation under this similarity based approach can be a region growing
based approach. Now, the way this region growing stuff works is suppose we start from any
particular pixel in an image, then we group all other pixels which are connected to this particular
pixel. That means the pixels which are adjacent to this particular pixel and which are similar in
intensity value.
So, our approach is that you start from a particular pixel and all other pixels which are adjacent
to this particular pixel and which are similar in some sense; in the simplest cases, similar in some
sense means we say that the intensity value of that adjacent pixel is almost same as the intensity
value of the pixel from where we have started growing the region. So, starting from this
particular pixel, you try to grow the region based on connectivity or based on adjacency and
similarity. So, this is what is the region growing based approach.
The other approach under this similarity based technique is called region splitting and merging.
So, under this region splitting and merging, what is done is first you split the image into a
number of different components following some criteria and after you have split the image into a
number of smaller size sub images or smaller size components, then you try to merge some of
those sub images which are adjacent and which are similar in some sense.
So, your first approach is the first operation is you split the image into smaller images and then
try to merge those smaller sub images wherever possible to have a larger segment. So, these are
the different segmentation approaches that we can have and in today’s discussion and in
subsequent discussion, we will try to see details of these different techniques. So first, we will
start our discussion on this discontinuity based image segmentation approach.
So, as we have already said that in discontinuity based image segmentation approach, our interest
is mainly to identify the points isolated points or we want to identify the edges present in the
image or we identify try to identify the lines present in the image and for detection of these kind
of discontinuities that is either detection of points or detection of lines or detection of edges, the
kind of approach that will take is use of a mask.
So, using the masks, we will try to detect isolated points or we will try to detect the lines present
in the image or we will try to detect the edges in the image.
Now, this masks, use of masks, we have discussed earlier in connection with our discussion of
image processing like image smoothing, image sharpening, image enhancement and so on. So
there, we have said that if I consider a 3 by 3 neighborhood like this, we take a mask of size 3 by
3. So here, on right hand side, this is a mask of size 3 by 3 having different coefficient values
given as W minus 1, minus 1 W minus 1, 0 W minus 1, minus 1 and so on taking the center coefficient in the
mask having a value W 0, 0 .
Now, in this mask processing operation, what is done is you shift this mask over the entire image
to calculate some weighted sum of pixel at a particular location. Say for example, if I place this
mask at a location (x, y) in our original image; then using all other different mask coefficients,
we try to find out and weighted some like this - R equal to W ij into f Xi Y (x plus i, Y i plus j)
where i varies from minus 1 to 1 and j varies from minus 1 to 1 and this component, we call as a
value R. Use of this mask as I have said that we have seen in connection with image sharpening
while we have taken different values of the coefficients.
In case of image smoothing, we have taken the values of the mask coefficients to be all ones. So,
that leads to an image averaging operation. So, depending upon what are the coefficient values of
this mask that we choose, we can have different types of image processing operations.
Now here, you find that when I use this mask, then depending upon the nature of the image
around point (x, y), I will have different values of R.
So, when it comes to an isolated point detection, we can use a mask having the coefficient values
like this that the center coefficient in the mask will have a value equal to 8 and all other
coefficients in the mask will have a value of minus 1. Now, we say that a point is detected at a
location say (x, y) in the image where the mask is centered if the corresponding R value, we are
computing the value of R; so, we say that a point is located at location (x, y) in the original
image if the corresponding value of R, the absolute value of this is greater than certain threshold
say T where this T is a non negative threshold value, this is a nonnegative threshold.
So, if the value of R computed at location (x, y) where this mask is centered is the absolute value
of R is greater than T where T is a nonnegative threshold, then we say that a point, an isolated
point is detected at the corresponding location (x, y). Similarly for detection of lines, the mask
can be something like this.
Here, for detection of horizontal lines, you find that you have used a mask at the center row or
the middle row having all values equal to 1 and the top row and the bottom row is having all
values equal to minus 1, all the coefficient values equal to minus 1 and by moving this mask over
the entire image, it detects all those points which lies on a horizontal line.
Similarly, the other mask which is marked here as 45, if you move this mask over the entire
image, this mask will help to detect all the points in the image which are lying on a line which is
inclined at an angle of 45 degree. Similarly, this mask will help to detect all the points which are
lying on a line which is vertical and similarly this mask will detect all the points lying on a line
which is inclined at an angle of minus 45 degree. Now, for line detection, what is done is you
apply all these masks, all these 4 masks on the image.
And, if I take a particular masks say, i'th mask and any other mask say j'th mask and if I find that
the value computed are i with the i'th masks, the absolute value of this is greater than R j where
R j is the value computed with the j'th mask for all j which is not equal to i. This says that the
corresponding point is more likely to be associated with the line in the direction of the mask i.
So, as we said what we are doing is we are taking all the 4 masks, apply all the 4 masks on the
image, compute the value of R for all these masks; now if for an i'th mask if I find that R of i, the
absolute value of R i is greater than absolute value of R j for all j which is not equal to i, in that
case we can conclude that this particular point at which location this is true, this point is more
likely to be content on a line which is in the direction of mask i.
So, these are the 2 approaches; the first one we have said, given a mask which is useful for
identification of isolated points and the second set of masks is useful for detection of points
which are lying on a straight line. Now, let us see that how you can detect an edge in an image.
Now, edge detection is one of the most common approaches, most commonly used approach for
detection of discontinuity of an image in an image. So, we say that an edge is nothing but a
boundary between 2 regions having distinct intensity levels or having distinct gray level. So, it is
the boundary between 2 regions in the image, these two regions have distinct intensity levels. So,
as is shown in this particular slide.
So, here you find that on the top, we have taken 2 typical cases. In the first case, we have shown
a typical image region where we have a transition from a dark region to a brighter region and
then again to a dark region. So, as you move from left to right, you find that you have transitions
from dark to bright, then again to dark and in the next one, we have a transition as we move from
left to right in the horizontal direction, there is a transition of intensity levels from bright to dark
and again to bright.
So, these are the typical scenarios in any intensity image where we will have different regions
having different intensity values and an edge is the boundary between such regions. Now here, in
this particular case, if I try to draw the profile, intensity profile along a horizontal line; you find
that here the intensity profile along a horizontal line will be something like this. So, you have a
transition from dark region to bright region, then from bright region to dark region whereas in
the second case, the transition will be in the other direction; so, bright to dark and again to bright.
So here, you find that we have modeled this transition as a gradual transition, not as an abrupt
transition. The reason is because of quantization and because of sampling; all almost all the
abrupt transitions in the intensity levels are converted to such gradual transitions.
So, this is your intensity profile along a horizontal scan line. Now, let us see that if I differentiate
this, if I take the first derivative of this intensity profile; then the first derivative will appear like
this. In the first case, the first derivative of this intensity profile will be something like this and
the first derivative of the second profile will be something like this.
So, you find that the first derivative response whenever there is a discontinuity in intensity levels
that is whenever there is a transition from a brighter intensity to a darker intensity or wherever
there is a transition from the darker intensity to a brighter intensity.

So, this is what we get by first derivative. Now, if I do the second derivative, the second
derivative will appear something like this and in the second case, the second derivative will be
just the opposite; it will be something of this form, it will be like this. So, you find that first
derivative is positive at the leading edge whereas it is negative at the tailing edge; similarly here
and you find that the second derivative if I take the second derivative, the second derivative is
positive on the darker side of the edge and it is negative on the brighter side of the edge and that
can be verified in both the situations that the second derivative is becoming positive on the
darker side of the edge but it is becoming negative on the brighter side of the edge.
However, we will appreciate that this second derivative is very very sensitive to the noise present
in the image and that is the reason that the second derivative operators are not usually used for
edge detection operation. But as the nature says that we can use these second derivative operators
for extraction of some secondary information that is we can use the sign of the second derivative
to determine whether a point is lying on the darker side of the edge or a point is lying on the
brighter side of the edge and not only that, here you find that there are some zero crossings in the
second derivative and this zero crossing information can be used to exactly identify the location
of an edge whenever there is a gradual transition of the intensity from dark to bright or from
bright to dark.
So, it clearly says that using these derivative operators, we have seen earlier that the derivative
operators are used for image enhancement to enhance the details filled in the image. Now, we see
that these derivative operators operations can be used for detection of edges present in the image.
Now, how to apply these derivative operations? So here, you find that if I want to apply the first
derivative, then first derivative can be computed by using the gradient operation.
So, when I have an image say, f (x, y), I can define the gradient of this image if (x, y) in this
form. So, gradient of this image f will be G x G y - a vector; obviously, the gradient is a vector, so
it will be G x G y and this G x is nothing but del f del x and del f del y which is the G y . So, G x is
the partial derivative of f along x direction and G y is the partial derivative of f along y direction.
So, we can find out the gradient of the image f by doing this operation.
Now, for edge detection operation, what we are interested in is the magnitude of the gradient. So,
the magnitude of the gradient; we will write like this - grad f which is nothing but magnitude of
the vector grad f and which is nothing but G x square plus G y square and take the square root of it
and you find here that computation of the magnitude involves squaring the 2 components G x G y
adding them and then finally taking the square root of this addition.
Obviously, squaring and computing the square root; these 2 are computationally intensive
process. So, an approximation of this is taken as magnitude of the gradient to be sum of
magnitude of G x that is gradient in the x direction plus magnitude of G y that is gradient in the y
direction.
So, this magnitude of the gradient whether I take this or an approximation that is to be this; this
tells us what is the strength of the edge at location (x, y), it does not tell us anything about what
is the direction of the edge at point (x, y). So, we have to compute the direction of the edge that
is the direction of gradient vector f.
And, the direction of gradient vector f at location (x, y), we can write it as alpha (x, y) is equal to
tan inverse G y by G x where G y as we have said that it is gradient in the y direction and G x is the
gradient in the x direction. Now, you find that this alpha (x, y), it tells us what is the direction of
gradient f that is a vector. But actually, x direction is perpendicular to the direction of the
gradient vector f.
So, we have the first derivative operators or the gradient operators and using that gradient
operators, we can find out what is the strength of an edge at a particular location (x, y) in the
image and we can also find out what is the direction of the edge at that particular location (x, y)
in the image and there are various ways in which this first derivative operators can be
implemented and here we will show some operators, some masks which can be used to compute
the image gradient.
So, the first one that we are showing is called a prewitt edge operator. You find that in case of
prewitt edge operator, we have 2 masks; one mask identifies the horizontal edges and the other
mask identifies the vertical edges. So, the mask which finds out the horizontal edges that is
equivalent to having the gradient in the vertical direction and the mask which computes the
vertical edges is equivalent to taking the gradient in the horizontal direction.
So, using these 2 masks, by passing these 2 masks over the intensity image, we can find out the
G x and G y component at different locations in the image and once we compute the G x and G y ,
we can find out what is the strength of an edge at that particular location and what is the
direction of an edge at that particular location.
The second mask which is also a first derivative mask is called a sobel operator. So, here again,
you find that we have 2 different masks; one mask is responsible for computation of horizontal
edges, the other mask is responsible for computation of the vertical edges. Now, if you try to
compare this prewitt operator, prewitt edge operator and sobel edge operator; you find that this
sobel edge operator gives an averaging effect over the image. So, because this sobel edge
operator gives an averaging effect, the effect due to the presence of spurious noise in the image is
taken care of to some extend by the sobel but it does not taken but it is not taken care of by the
prewitt operator. Now, let us see that what kind of result we can use, we can have by using these
edge detection operators.

You find that here we have shown result on a particular image. So, this one is our original image
on the top left, this is our original image; on the top right, it is the edge information which
obtained using the sobel operator and the edge components in this particular case are the
horizontal components. The third image is again by using the sobel operator but here the edge
components are the vertical edge components and the fourth one is the result which is obtained
by combining this vertical component and the horizontal component.
So here, you find that if you compare this image with your original image, you find that different
edges present in the original image, they have been extracted by using this sobel edge operator
and by combining the outputs of the vertical mask and the output of the horizontal mask, we can
have the edge components, I mean we can identify the edges which are there in various
directions. So, that is what we have got in the fourth slide in the fourth image.
So, this prewitt operator and the sobel operator, as we have said that these 2 operators are
basically first derivative operators and as we have already mentioned that for edge detection
operation, the kind of operators derivative operators which are used are mainly the first
derivative operators and out of these 2 - the prewitt and sobel operator; it is the sobel operator
which is generally preferred because the sobel operator also gives an smoothing effect and by
which we can reduce the spurious edges which can be generated because of the noise present in
the image and we have also mentioned that we can also use the second derivative operator for
edge detection operation but the disadvantage of the second derivative operator is it is very very
sensitive to noise.
And secondly, as we have seen that second derivative operator gives us double edges. Once for
every transition, we have double edges which are generated by the second derivative operators.
So, that is these are the reasons why second derivative operators is not normally preferred for
edge detection operation. But the second derivative operators can be used to extract the
secondary information.
So, as we have said that by looking at the polarity of second derivative operator output, we can
determine whether a point lies on the darker side of the edge or the point or a point lies on the
brighter side of the edge and the other information that we can obtain from the second derivative
operator is from the zero crossing, we have seen that second derivative operator always gives a
zero crossing between the positive side and the negative side and the zero crossing points
accurately determine the location of an edge whenever an edge is a smooth edge.
So, those second derivative operators are not normally used for this detection operation but they
can be used for such a secondary information extraction. So, one such second derivative operator
is what is called the Laplacian operator.
We have seen the use of Laplacian operator for enhancement of image details. Now, let us see
that how this Laplacian operators can be used to help in edge detection operation and as you
already know that the Laplacian operator of the function f is given by del 2 f del x 2 plus del 2 f
by del y 2 where del 2 f del x 2 is the second derivative on an x direction and del 2 f del y 2 is the
second derivative in the y direction.
And we have also seen earlier that a mask which implements the second derivative operator is
given by this where we are considering only the horizontal direction and the vertical direction for
computation of the second derivative and we have also discussed earlier that if in addition to this
horizontal and vertical directions, we also consider the diagonal directions for computation of the
second derivative; in that case, the center element will be equal to 8 and all the diagonal elements
will also be equal to minus 1.
So, this is the one that will get if we consider in addition to horizontal direction and vertical
direction, the diagonal directions for computation of the second derivative and we can also have
the inverse of this where all the negative signs will become positive and the positive sign will
become negative. So, this is how we can have a mask for computation of the second derivative or
computation of Laplacian of function f.
But as we have said that this Laplacian operator normally is not used for edge detection
operation because it is very very sensitive to noise and secondly, it leads to double edges at every
transition. But this plays a secondary role to determine whether a point lies on the bright side or a
point lies on the darker side and it is also used to accurately locate or to accurately find out the
location of an edge.
Now, along with this Laplacian operator, as we said that the Laplacian operator is very very
sensitive to noise; to reduce the effect of noise what is done is the image is first smoothed using a
Gaussian operator and that smooth image can now be operated by this Laplacian operator and
these 2 operations can be used together to have an operator something like something which is
called a Laplacian of Gaussian or LOG operator.
So, the essence of LOG or Laplacian of Gaussian operator - LOG that is Laplacian of Gaussian
operator; we can have a Gaussian operator, the Gaussian can be represented by this say, h (x, y)
is equal to exponent of minus x square plus y square upon twice sigma square. So, this is a
Gaussian operator which is having a standard deviation of sigma.
Now here, if we let, x square if we set x square plus y square equal to r square; then the
Laplacian of this h that is del square h can be written in the form - r square minus sigma square
upon sigma to the power 4 into exponential of minus r square divided by 2 sigma square. So, as
we said that our operation is firstly we want to smooth the image using the Gaussian operator
and that smoothed image has to be operated by the Laplacian operator and if these 2 operations
are done one after another, then this reduces the effect of the noise present in the image.
However, these 2 operations can be combined to have a Laplacian of Gaussian operation that
means we can operate the image with the Laplacian of a Gaussian. So, Laplacian of a Gaussian
operation on the image gives us an equivalent result.
Now, we find that in this slide, we have shown that this is our Laplacian operator, this is a
Gaussian mask in two dimensions and if I take the Laplacian of this; the Laplacian of the
Gaussian will appear as shown here. Now, this Laplacian of Gaussian can again be represented
in the form of a mask which is called a Laplacian of Gaussian mask. So, if I represent this
Laplacian of Gaussian in the form of a two dimensional mask, the Laplacian of Gaussian mask
appears like this.
So, here you find that our Laplacian of Gaussian mask or LOG mask that we have shown is a 5
by 5 mask and if you compare this with the LOG - the Laplacian of Gaussian expression on the
surface; you find that here it says that at x equal to 0, LOG - the Laplacian of Gaussian is
positive, then it comes to negative maximum negative, then tries to move towards a value 0.
And, the same is obtained using this particular mask that here you find that at the center, the
value is maximum positive which is 16. Just away from this, it becomes minus 2, then it goes
towards 0 that is it is becoming minus 1. So, if I apply this LOG - the Laplacian of Gaussian
mask on an image, I can detect the location of the edge points.

So, the location of the edge points, you find that here in this particular image, we have shown an
image and on the right hand side, we have shown the output that is obtained using the sobel
operator. So, this is the output which is used using the sobel operator and the bottom one, shows
the output of the LOG operator.
So here, you find that all these bright edges, these are actually the location of the edges present in
the original image. So, this establishes as we said earlier that LOG operator - the Laplacian of
Gaussian operator can identify can determine what is the location of an edge present in an image.
Now, whichever operator we use for detection of edges, whether these are the first derivative
operators or the second derivative operators; as we said that the second derivative operators is
not normally used for edge detection operation because of other problems but it is used to extract
the secondary informations but the first derivative operators like sobel should ideally give us all
the edge points that is any transition from a bright region to a darker region or from a darker
region to a brighter region.
But you find that when you take an image, maybe it is because of the noise or may be because of
non uniform illumination of the scene; when you apply the sobel operator to an image, the edges
are not always connected, the edge points that you get, they are not always connected. So, what
we need to do is we have to link the edge points to get some meaningful edges to extract some
meaningful edge information. Now, there are usually 2 approaches in which this linking can be
done.
So, for edge linking, we can have 2 approaches; one is called, one is the local processing
approach and the other one is global processing approach. So, our aim is whether we are going
for local processing or we are global processing, we are going for global processing; our aim is
that we want to link all those edge points which are similar in some sense so that we can get a
meaningful edge description. So first, we will talk about the local processing approach for edge
linking.
So first, let us talk about the local processing technique. In local processing technique, what is
done is you take an image which is already edge operated. So, for edge operation if I assume that
we are using the sobel edge operation; suppose the image is already operated by the sobel edge
operator, then we consider say every point in that edge image if I call it as an edge image, I
consider each and every point in the edge image.
So, I consider; let us take a point (x, y) in the image which is already operated by the sobel edge
operator. Then, we will link all other points in that edge image which are in the neighborhood of
(x, y) and which are similar to (x, y). So, when I say that 2 points are similar, we must have some
similarity measure. So, you have to have some similarity measure.
So, for this similarity measure, what we use is the first one is the strength of the gradient operator
and we also use the direction of the gradient. So, these two together are taken as similarity
measure to consider whether will say that 2 points are similar or not.
So, our operation will be something like this that we take a point say (x dash, y dash) which is in
the neighborhood of some point (x, y) in the image and we say that these 2 points (x dash, y
dash) and the point (x, y), they are similar if grad f (x, y) that is the strength of the gradient
operator at location (x, y) and grad f (x dash, y dash), they are very close. That means this should
be less than or equal to some non negative threshold T.
And, we also said that the directions should also be similar. That means alpha (x, y) minus alpha
(x dash, y dash), this should be less than some angle threshold A. So, whenever we have a point
(x dash, y dash) which is in some neighborhood of (x, y) and the points are similar that means
they have the similar gradient magnitude value and the similar angle for the edge orientation; we
say that these 2 points are similar and those points will be linked together and such operation has
to be done for each and every other point in the edge detected image to give us some meaningful
edge description.
So, let us stop our discussion at this point today. We will continue with our discussion in our
next class. So now, let us see that some questions, some quiz question on today’s lecture.
The first question is what is edge what is image segmentation? Second question, what are the
basic approaches for segmenting an image? sorry this should be segmenting. Third question is
what is the difference between a line and an edge? Fourth question, why second derivative
operation is not normally used for edge detection? Fifth question, what is advantage of sobel
operator over prewitt operator? And the last question, what is LOG operator and what is its use?
Thank you.
Prof. P.K. Biswas
Lecture - 30
Image Segmentation - II
Hello, welcome to the video lecture series on digital image processing. In today’s class, we will
continue with our discussion on image segmentation.
In the last class, what we have seen is we have discussed about the image analysis, we have seen
what is image analysis and the role of image segmentation in image analysis process. We have
seen that there are mainly 2 approaches for image segmentation; the first approach is
discontinuity based image segmentation technique and the second approach is region based
image segmentation technique.
What we are discussing now is the discontinuity based image segmentation technique. Then, we
have seen that to implement discontinuity based image segmentation technique; first, what we
have to do is we have to detect the edges present in the image. Edge means it is a region where
there is a variation either from the low intensity value to a high intensity value or from a high
intensity value to a low intensity value.
So in such transition regions, we have to detect the position of an edge and by this, what is
expected is to get the boundary of a particular segment. But the problem we have discussed in
1
this process that because of the problem of illumination, if the illumination is non uniform or if
the image is noisy; in that case, the boundary points or the edge points that we get after the edge
detection operation, those points are not continues.
So, take care of this problem, what we have to do is after we detect the edge points, we have to
link the edge points. So, we have said that there are 2 approaches for linking the edge points. The
first approach that we have discussed in the last class is the local processing approach and the
second approach that we will be discussing today that is a global processing approach which is
also called Hough transformation.
So, after today’s lecture, the students will be able explain and implement the local processing
technique for linking the edge points and also the global processing technique that is the students
will be implement the Hough transformation to link the edge points.
2
So, let us just see that what we have seen in the last class. We have said that ideally, this edge
detection technique should identify the pixels line on the boundary between the regions. We say
it is the boundary between the regions because we assume that it is a transition, this region is a
transition region from a low intensity values, from low intensity region to a high intensity region
or from a high intensity region to a low intensity region.
But while trying to implement this, it has been found that the edge points which we expect to be
continues to give us a meaningful boundary description of a segment that cannot be achieved in
practice. Now, this is mainly due to 2 reasons; first one is due to non uniform illumination of the
scene. If the scene is not uniformly illuminated that leads to detection of edge points where the
boundary points will not be continues and the second reason for getting this non continues
boundary points is the presence of noise. That is if the image is noisy, then after doing the edge
detection operation either the boundary points will not be continues or there may be some
spurious edge points which are not actually edge points of the boundary points of any of the
regions.
So, to tackle this problem, we have to go for linking of the edge points so that after linking, we
get a meaningful description of the boundary of a particular segment.
3
So, we have said that there are mainly 2 approaches for edge linking operation. The first
approach is the local processing approach and the second approach is the global processing
approach domain.
In the local processing approach, what we do is we take an edge detected image that is the image
that we have as an input. This is an image containing only the edge pixels, so we assume that
edge points will be white and all the non edge points will be black and in this edge detected
image, we analyze each pixel in a small neighborhood.
4
So for every point (x, y) if that is an edge pixel; we take a small neighborhood of point (x, y) and
we link the other edge points within this neighborhood with the point (x, y) if they are similar in
nature.
So whenever, we find that within the neighborhood, we have 2 edge points which are similar in
nature; then we link these edge points and after linking all such edge points, we get a boundary
of pixels that are similar in nature. So basically, what we get is a well defined boundary of a
particular segment.
So, when you say that we have to link the edge points which are similar in nature, then we have
to have some similarity measure. So, we remember that after edge detection operation, for every
edge point, we get 2 quantities. One is the boundary strength at that particular edge point and the
second quantity that is the direction of edge at that particular edge point.
So, by comparing the boundary strength as well as the direction of a boundary at point (x, y) and
at a point which is in the neighborhood of (x, y); we try to find out whether these 2 points are
similar or not. So, if these 2 points are similar, we simply link that.
So for this, what you have to do is we take an edge point (x, y) and we find out a point (x dash, y
dash) which is in the neighborhood of (x, y). So, what we are doing is we are taking this point (x,
y) and considering the point (x dash, y dash) which is in the neighborhood of (x, y) and we find
out the difference of edge strength.
We know that this gradient operator f (x, y) is the image function and f (x, y) gives the intensity
value at location (x, y) in the image and grad f (x, y), this gives you the gradient of the intensity
value at location (x, y).
So, you compute the gradient of at location (x, y) and also the gradient at location (x dash, y
dash) and if the difference between these 2 is less than or equal to certain threshold T where T is
5
a non negative threshold and at the same time so this gives you whether the strength is similar or
not and at the same time, we also have to check whether the direction at this edge which is given
by this alpha (x, y) at location xy and alpha x dash at location (x dash, y dash); so if their if the
orientation of the direction of the edge is also similar that is the difference is less than or equal to
some angle threshold value A, then we consider these 2 points (x, y) and (x dash, y dash) to be
linked together.
So, in this particular case, our (x dash, y dash) has to be in the neighborhood of (x, y). So, that is
what is represented by (x dash, y dash) belongs to neighborhood of (x, y). So, this is the local
processing technique.
But as we said that we are going for the linking of the edge points because the edge points are
discontinues and normally the neighborhood size that is taken is a small neighborhood. So, if (x
dash, y dash) is not within the neighborhood of (x, y) over a given over a given definition of the
neighborhood; in that case, (x dash, y dash) cannot be linked with the edge point (x, y).
So, to solve this problem because (x dash, y dash), the 2 edge points can be far apart depending
upon the amount of that noise you have or the depending upon the lighting condition but we have
to be we should be able to link those points as well. So, in such cases, the local processing
technique does not help to link the edge points. What we have to go for is the global processing
technique.
And, the global processing technique that we will discuss today is called the Hough
transformation. So, it is Hough transform. So, what is this Hough transform? The Hough
transform is a mapping from the spatial domain to a parameter space. So, let us take an example.
Suppose, I have this (x, y) coordinate system and I have a single straight line in this (x, y)
coordinate system and we know that in the slope intercept form, this straight line is described by
6
a equation which is given by y is equal to mx plus c where m is the slope of the straight line and
c is the intercept value.
Now, for a particular straight line, the values of m and c will be constant. So, I represent them by
m 1 c 1 indicating that these 2 values - the slope and the intercept are constant for a particular
given straight line in the xy plane. So, you find that this particular straight line is now defined, is
now specified by 2 parameters. One of the parameters is m 1 which is the slope of the straight line
and the other parameter is c 1 which is the intercept.
Now, if I map this straight line to the parameter space because I have 2 parameters m and c that
is slope and intercept; so our parameter space will also be a 2 dimensional space. So, what I am
trying to do is I am trying to map this straight line in the parameter space. So, I draw this mc
plane, I will have the slope m along one direction and the intercept c along another direction and
since for this given straight line y equal to m 1 x plus c 1 ; m 1 that is the slope and c 1 that is the
intercept is fixed. So, this particular straight line will be represented by a single point in the mc
plane and this point is at location m 1 and c 1 .
So, you find that when I map a given straight line in the spatial domain to the parameter space, a
straight line gets map to a single point. Now, let us see what happens if we are given a point in
the spatial domain that is in the xy plane, we are given a particular point. Let us see the situation;
what will happen in this case.
So now, what I have is I have again this xy plane and in the xy plane, I have a single point and
let us assume the coordinate of this point is (x 1 , y 1 ). Now, you find that equation of any straight
line in the xy plane as we have seen earlier in the slope intercept form is given by y is equal to
mx plus c. Now, if this straight line y is equal to mx plus c has to pass through this given point
(x 1 , y 1 ); then (x 1 y 1 ) must satisfy this equation.
7
So, in effect what I will get is I will get an equation that is y 1 is equal to mx 1 plus c because this
line y is equal to mx plus c is passing through the given point (x 1 , y 1 ) and this is the equation
that has to be satisfied by all the straight lines that passes through this point (x 1 , y 1 ).
Now, you find that ideally I can have infinite number of straight lines passing through this given
point (x 1 , y 1 ). So, there will be infinite number of straight lines like this and for each of these
straight lines, the value of the slope that is m and the intercept c, it will be different.
So, if I now map this single straight line in our parameter space that is mc plane, you will find
that this m and c, these 2 become the variable where as y y 1 and x 1 , they are the constants. So
now, what I can do is I can rewrite this equation that is y 1 equal to mx 1 plus c in this way, I can
write it as c is equal to minus mx 1 plus y 1 . So here, what I have is I have this x 1 and y 1 , these 2
are constants and c and m are variable.
So, if I map this point (x 1 , y 1 ) into our parameter space; so what I will have now is I will have
this mc plane and in the mc plane, you will find that c equal to minus mx 1 plus y 1 , this is now
the equation of a straight line. So effectively, what I get is I get a straight line in the mc plane
following the equation c is equal to minus mx 1 plus y 1 . So, we have seen 2 cases that in one
case, a straight line in the xy plane is mapped to a point in the mc plane and in the other case, if
we have a point in the xy plane that is mapped to a straight line in the mc plane and this is the
basis of the Hough transformation by using which we can link the different edge points which
are present in the image domain which is nothing but the spatial domain or we can say that this is
nothing but our xy plane.
So now, let us see that what happens if I have 2 points in our spatial domain or the xy plane.
So again, I go to our spatial domain or xy plane; so this is my x axis and this is my y axis and
suppose, I have 2 points - one is say (x i , y i ) and the other point I have in this spatial domain is (x j
and y j . Now, if I draw straight line passing through these points – (x i , y i ) and (x j , y j ); say this is
8
the straight line which passes through these 2 points (x i , y i ) and (x j , y j ) and we know that this
straight line will have an equation of the form - y equal to say m dash x plus c dash.
So, what we have to do by using the Hough transformation is that given these 2 points- (x i , y i )
and (x j , y j ), we have to find out the parameters or the equation of the straight line which is
passing through these 2 points - (x i , y i ) and (x j , y j ).
Now, as we have seen earlier that a point in the xy plane is mapped to a straight line in the mc
plane or in the parameter space. So here, in this case, since we have 2 points in the xy plane, this
will be mapped to 2 different straight lines in the mc plane. So, if I draw in the mc plane, if I find
the mapping in the mc plane, it will be something like this. So, I have this parameter space or mc
plane, the first straight line the first point (x i , y i ) will be mapped to a straight line like this where
equation of this straight line will be given by c equal to minus mx i plus y i and the second point
will be mapped to another straight line say something like this where the equation of this straight
line will be given by c equal to minus mx j plus y j .
And, you will find that the point at which these 2 straight lines meet that is this particular point,
this is the one which gives me the values of m and c. So, this will give me the value of m dash
and c dash and this m dash and c dash are nothing but the parameters of the straight line in the xy
plane which passes through these 2 given points (x i , y i ) and (x j , y j ).
Now, if I consider that there are infinite numbers of points or there are a large number of points
lying on the same straight line in the xy plane; each of these points will be mapped to a particular
straight line in the mc plane like this. But each of these straight lines will pass through this single
point m dash c dash in the parameter space. So by this, what we have seen is as we know that if
there is a single point in the parameter in the spatial domain or in the xy plane that is mapped to a
single straight line in the parameter space that is mc plane.
So, if I have a set of collinear points in the xy plane, each of these collinear points will be
mapped to a single straight line in the parameter space or in the mc plane. But all these straight
lines corresponding to the points which are collinear in the xy plane will pass through, will
intersect at a single point and this point in this case is m dash c dash the values of which that is m
dash and c dash, these give us the slope and intercept values of the straight line on which all
those collinear points in the spatial domain lie and this is the basic essence of using Hough
transformation for linking the edge points.
Now, how do you compute this Hough transformation? So far, what we have discussed, this is in
the continues domain that is our assumption is the values of m and c or m and c are continues
variables. But in our implementation, since we are considering the digital cases; we cannot have
the continues values of m and c. So, we have to see that how this Hough transformation can
really be implemented. So for implementation, what we have to do is this entire mc space has to
be subdivided into a number of accumulator cells. So, as has been shown in this particular figure.
9
So here, you will find that what we have done is this mc space mc plane is divided into a number
of smaller accumulator cells. So here, we have a range of the slopes which are the expected
range of slopes in a particular application and the range is from a minimum that is at the
minimum slope to a maximum slope. So, this is the minimum slope value - m minimum and this
is the maximum slope value - m maximum.
Similarly for c; this is also subdivided and the total range is within an expected a maximum
value and a minimum value. So, c minimum is the expected minimum value of the intercept c
and c maximum is the expected maximum value of the intercept c. So, within this minimum and
maximum within this range, this space is divided into a number of accumulator cells and this
array, this 2 dimensional array; let us name this as an array say A and each of these accumulator
cell locations can be indexed by say index i and j.
So, a cell location at location say (i, j), I may call this location as is a cell location (i, j) and the
corresponding accumulator cell will have a value say A (i, j). So, this A (i, j), this particular cell
the ij’th cell corresponds to the parameter values let us say m i and c j . So, an ij’th, an accumulator
cell (i, j) having an accumulator value A (i, j) corresponds to the corresponding parameter values
m i and c j .
And for implementation, what we do is we initialize each of these accumulators to 0. So initially,

A (i, j) is set to 0 and that is our initialization. So, once we have this array of accumulator cells;
then what we do? In the spatial domain, we have a set of boundary points and in the parameter
space; we have a 2 dimensional array of accumulator cells.
10
So, what we have to do is we have to take a boundary point say (x k , y k ), a single boundary point
in the spatial domain and we have seen earlier that this boundary point (x k , y k ) is mapped to a
straight line in the parameter space that is in the mc plane and the equation of this straight line is
given by c is equal to minus m x k plus y k . So, what we have to do is we have to find out the
values of m and c from this particular equation.
Now, in our case, what we have is the values of m and c is not continuous but they are discrete
and as we have said that an ij’th accumulator cell corresponds to the corresponding parameter
values m i and c j . So, to solve for values of m and c, our basic equation is c equal to minus m x k
plus y k . So here, what we do is we allow the value of m to vary from the minimum to the
maximum as we have said that we have chosen a range a minimum to a maximum; so, we allow
the value of m to take all possible values or all allowed values as specified in our accumulator
cell ranging from the m minimum to m maximum and for each of these values of m, we solve for
the corresponding value of c following this particular equation c is equal to minus m x k plus y k .
Now, the value of c that you get by solving this particular equation for a specific value of m, that
may be a real number whereas, we have to deal with the discrete case. So whenever, we get a
value which may be a real number or may be the value of c that we get which is not allowed as
per our definition of the accumulator cell, then what you have to do is this value of c has to be
rounded off to the nearest allowed value of c as specified in the accumulator cell.
So, if I have say n number of such possible values of m, I will get n number of corresponding
values of c by solving this equation - c equal to minus m x k plus y k . So, suppose by following
this a particular choice of m say m k or we have already used k for something else instead
following it m k , let us call it say particular value of m, say m p ; when I put this value of m p , this
mp in this equation, suppose the corresponding value of c I get after solving this equation is a C q
and you remember that we have initialized our accumulator cells to all the cells having a value of
0.
11
So whenever, for a particular value of m p , I get a value of C q that is the intercept value, then the
operation I do is the corresponding accumulator cell A(p, q) is incremented by 1. So, I make A
(p, q) is equal to A (p, q) plus 1. So, this I have to do for all the points, all the boundary points in
our spatial domain that is in the xy plane and for each of this boundary points, I have to compute
it for every possible or every allowed value of m as allowed in our parameter space. So, at the
end, you will find because for every computed value or computed values m p C q , I am
incrementing the accumulator cell by 1 or the accumulator cells were initialized to 0; so you find
that at the end of the process if an accumulator cell A (i, j) contains a value say capital Q. So, we
are considering this after we consider all the boundary points in the spatial domain.
For each of these, we compute the corresponding say m p and C q , for all allowed values of m, I
find out the corresponding allowed values of c and for each such pair of m p and C q , I do this
incrimination operation on the accumulator cell. So, at the end if an accumulator cell say A(i, j)
contains a value of say capital Q, this indicates that there are Q number of points lying on a
straight line whose equation is given by y is equal to m i x plus C j because as we said that this
accumulator cell (i, j) corresponds to the slope value of m i and the intercept value of c j and for
every point, wherever I get a corresponding value of m and c, the corresponding accumulator cell
is incremented by 1.
So, at the end of the process if a particular accumulator cell A (i, j) contains a value capital Q,
this is an indication that in the spatial domain, I have capital Q number of points or boundary
points which are lying on the straight line y is equal to m i plus c j . Now, the question is what is
the accuracy of this particular procedure? That is how accurate is this estimation of m i and c j ?
That depends upon how many number of accumulator cells I will have in the accumulator array.
So, if I have a very large number of accumulator cells in the accumulator array, then the
accuracy of the computed m and c will be quite high whereas, if I have small number of
accumulator cells in our accumulator array, then the accuracy of this computed value will be
quite less. Now, the question is how can we use this to detect the number of straight lines present
in the spatial domain?
12
Let us consider a case like this. Say, I have an image in the spatial domain and the boundary
points of this image are something like this. So, these are the boundary points in the straight line,
so when I compute this Hough transformation, I get an accumulator cell. So, you will find that in
this particular straight line there are 1, 2, 3, 4, 5, 6, 7, 8; 8 number of points on this side, on this
straight line there are 1, 2, 3, 4, 5, 5 number of points, on this part of the straight line there are
again 5 number of points, on this part of the straight line there are 4 number of points and if I
consider that this part is also a straight line, so on this straight line there are 3 number of points.
So, in our accumulator cell at the end of this Hough transformation operation, I will get one cell
with value equal to 8, I will get another cell with value equal to 5, I will get one more cell with
value equal to 5, I will get one cell with value equal to 4 and I get another cell with value equal
to 3 and if I choose all these values say if I say that I will consider a straight line to be significant
if the corresponding accumulator cell contains a value greater than or equal to 3; then by this
process, the number of straight lines that I will be able to detect is this straight line, this straight
line, then this straight line, this straight line, as well as this straight line.
But if I say that I will consider only those straight lines to be significant where the number of
points, number of collinear points lying on the straight line is greater than or equal to 4; then I
will be detecting only this straight line, this straight line, this straight line and this straight line.
So again, here you will find that by choosing that how many points i should consider to lying on
a straight line so that the straight line will be significant and I consider this to be a boundary
straight line, that also can be varied or tuned depending upon the application by choosing the
threshold on this number of points lying on the straight line.
Now here, you find that in the mc plane, though we are able to find out the straight line segments
but this particular formulation of Hough transformation that is mapping from xy domain to the
parameter domain that is the mc plane has a serious problem. The problem is in mc plane, what
13
we are trying to do is we are trying to find out the slope intercept value of the straight line in the
spatial domain.
Now, the problem comes when this straight line tries to be vertical that is parallel to x axis. If the
straight line is parallel to x axis, then the slope of the straight line that is the value of m tends to
be infinity and in this formulation, we cannot tackle the value of m which becomes very large or
which tends to become infinite. So, what should we do to solve this problem? So, to solve this
problem, instead of considering the slope intercept form, what we can do is we can make use of
the normal representation of a straight line.
So, the normal representation of a straight line is given by this. The formula is rho equal to x
cosine theta plus y sin theta and what we get in case of this normal representation? The line that
we get is something like this.
14
So here again, I have this straight line in the xy frame. But instead of taking the slope intercept
form where the equation was given by y is equal to mx plus c, I take the normal representation
where the equation of the straight line is given by rho is equal to x cosine theta plus y sin theta.
What is rho? Rho is the length of the perpendicular, it is the length of the perpendicular dropped
on the straight line drawn from the origin of the xy frame and theta is the angle made by this
perpendicular with the x axis.
So, you will find that the parameters of the straight line which is defined in this normal form -
rho is equal to x cosine theta plus y sin theta, the parameters are rho which is the length of the
perpendicular drawn from the origin to the given straight line and theta which is the angle
formed by this perpendicular with the x axis. So, unlike in the previous case where the
parameters were the slope m and c, now parameters become rho and theta.
And now, when I have these 2 parameters rho and theta, then the situation is quite manageable.
That is I do not have the situation of leading to a parameter which can take an infinite value. So,
we find that in this particular case, what can be the maximum value of rho and the maximum
value of theta.
15
We consider the value of theta to be ranging in the range of plus minus 90 degree and the value
of rho that is the length of the perpendicular to the straight line from the origin to be square root
of m square plus n square where m by n is the image size and this is quite obvious because if I
have an image of dimension say m by n; so here, the image dimension is there are m number of
rows and say n number of columns and this is the origin of the image frame and in this, you can
find that I cannot draw any straight line for which the value of theta will be beyond this range
plus minus 90 degree and the value of rho will go beyond this plus minus. This should be plus
minus square root of M square plus N square. But what is the difference of our earlier
formulation with this formulation?
16
Now, in this particular case, our equation of the straight line is taken as rho equal to x cosine
theta plus y sin theta whereas in our earlier case, our equation was y is equal to mx plus c. So,
here, you will find that in this particular case, given a single point say (x 1 , y 1 ) in the spatial
domain in the parameter domain in the mc plane, the corresponding equation becomes c is equal
to minus m x 1 plus y 1 which is again the equation of a straight line.
So, when you consider the parameter space to be mc space or we represent a straight line in the
slope intercept form; in that case, a point is mapped to a straight line in the parameter space
whereas in this particular case, for the same given point (x 1 , y 1 ), in the parameter space, the
equation that we get is rho is equal to x 1 cosine theta x 1 cosine theta plus y 1 sin theta.
So here, you find that this x 1 and y 1 , these 2 are constants for a given point (x 1 , y 1 ) in the
parameter space whereas the rho and theta, they are variables. So, a particular point in the spatial
domain is now mapped to a sinusoidal curved in the parameter domain or in the rho theta space.
However, if I have q number of collinear points in the xy plane which will be mapped to q
number of straight lines in the mc plane but all those straight lines will pass through a single
point; in this case, the same q number of collinear points in the xy plane will be mapped to q
number of sinusoidal curve in the rho theta plane but all those sinusoidal curve will intercept at a
particular point, at the single point which gives us the values of rho and theta which are the
parameters for the straight line on which all those q number of collinear points lie.
Now, this is the only difference between the mc plane and the rho theta plane. Apart from this,
the formulation is exactly same as before.
So for computation, again here, what we have to do is we have to define it, the rho theta space
into a number of accumulator cells. So, the accumulator cells are given like this as in this figure.
So, here again, any ij’th accumulator cell, an accumulator cell say (i, j) this ij’th accumulator cell
17
which will have an accumulator value of say A (i, j) corresponds to our parameters theta i and
rho j .
So again, as per as we have done in our previous formulation that for a given point, as we have
seen that our equation becomes rho is equal to x cosine theta plus y sin theta and for a given
point say (x k , y k ) in the spatial domain, our equation becomes rho is equal to x k cosine theta plus
y k sin theta.
What we do is we allow the value of this variable theta to assume any of this allowed values as
given in this accumulator cell. So, the theta can assume any of this allowed values between this
given maximum and minimum and we solve for the corresponding value of the rho and because
the solution of rho that you get may not be one of the allowed values, so what we have to do is
we have to round off the value of rho to one of the nearest allowed values in our rho axis.
So again, as before that at the end of the process, if an accumulator cell say A (i, j) contains a
value equal to Q, this means that there are Q number of collinear points in the spatial domain
which in the spatial domain lying on the straight line which satisfy the equation - rho j is equal to
x cosine theta i plus y sin theta i .
So again as before, depending upon the number of points, putting a threshold on the number of
points which we consider to be a significant point or not; I can determine how many straight
lines in the given boundary image I will extract which will give me a meaningful boundary
description. Now let us see, by applying this technique; what kind of result we can get.
18
Here, what we have shown is as we have said that every point in the spatial domain is mapped to
a sinusoidal curve in the parameter domain in the rho theta plane; so you find that this point 1,
this point 1 over here has been mapped to a straight line, the point 2 has been mapped to a
sinusoidal curve as has been given by this, the point 3 again has been mapped to this particular
sinusoidal curve, 4 have been mapped to this particular sinusoidal curve and 5 have been mapped
to this particular sinusoidal curve.
And now, you will find that if I want to find out the equation of the straight line passing through
say 2, 3 and 4, these 3 points; then you will find that 2, 3 and 4, these 3 sinusoidal curves are
meeting at this particular point in the rho theta plane. So, the corresponding cell will have a value
equal to 3 indicating that there are 3 points lying on the straight line which satisfy this particular
value of theta and this particular value of rho.
So from here, I can get the parameters of the straight line on which these 3 points 2, 3, 4, they are
lying and same is true for other cases as well. For example, 1, 3 and 5; so you will find that this
is the curve for 1, this is the curve for 3 and this is the curve for 5 and all of them are meeting at
this particular point.
So here, whatever the value of theta and rho I get, that is the parameter of the straight line
passing through these points - 1, 3 and 5.
19
So, by applying this, you will find that in one of previous classes, we had shown this image. So
here, we had shown that we have an image of a brick and after edge detection operation, what we
get is the edge points are given on this right hand side. And now, if I apply the Hough
transformation and try to detect the foremost significant straight lines, then by applying Hough
transformation, I find these 4 straight lines which are most significant and the boundary which is
specified by these 4 straight lines.
So, I can always find out that what are these vertex locations and this is a rectangular region
which is actually the boundary of this particular object region. So, with this, we come to the end
of our today’s discussion that is the global edge linking operation where the global is linking is
done by using Hough transformation and as we have said that this Hough transformation is
nothing but a process of mapping from the special domain to the parameter space and now let us
see some of the quiz questions on today’s lecture.
20
The first question is what is Hough transform? Second question, what determines the accuracy of
Hough transform? Third question, what is the difficulty of implementing Hough transforms
using slope intercept representation of straight line? Fourth, how is this problem solved using the
normal representation of straight line? And fifth question: is Hough transform limited to
detection of line segments only?
Thank you.
21
Prof. P.K. Biswas
Image Segmentation - III
Lecture - 31
Hello, welcome to the video lecture series on digital image processing. For last few lectures, we
were discussing about image segmentation operations and image analysis operations. So, in our
last lecture, we have talked about the discontinuity based image segmentation.
We have seen earlier or we have discussed earlier that there are mainly two approaches of image
segmentation. One is the discontinuity based image segmentation and the other one is similarity
based image segmentation. For last two classes, we have talked about the discontinuity based
image segmentation and in discontinuity based image segmentation we have seen that the
segmentation is done using the characteristics of variation of intensity values when there is a
variation of intensity from say background to a foreground object.
So under this, we have seen various point and line and edge detection operations which are used
in this segmentation process. Here, the basic purpose was that an object is to be described by its
boundary or its enclosing boundary which is to be obtained using one of this discontinuity based
operations and we have discussed that though we want that the object boundary should be
continues or it should have a complete definition but because of noise or may be because of non
1
uniform illumination, after performing this different edge detection operations, the edge points
that we get they are not normally continues.
So, to take care of this problem, after this edge detection operation; the edge points that we get,
they are to be linked. So, for that we have discussed about two different approaches. One is local
linking operation where the edge points in the neighborhood are linked together if we find that
those two edge points are similar in nature and for that as similarity criteria, we have taken the
strength of the gradient operator or strength of the edge operator as well as the direction of the
edge at those points.
So, if we find that within a neighborhood, two edge points have the similar edge strength and
also they have similar edge direction; in that case, those two points are linked together to be part
of the same edge. Now, here again, the problem is that if the points are not in the small
neighborhood which is defined but the points are at a larger distance; in that case, this local is
linking operation does not help. So, in such cases, what we have to go for is the global link edge
linking operation.
So, we have discussed a technique that is Hough transform. So, using Hough transform, we have
been able to link the distant edge points and this is an operation which is called global edge
linking operation or it is the global processing technique. Now today, we will start our discussion
on the other type of segmentation which is the similarity based segmentation.
So under similarity based segmentation, there are mainly three approaches. One is called
thresholding technique, the second approach is region growing technique and the third approach
is region splitting and merging technique. Under thresholding technique, again, we have four
different types of thresholding. One is called global threshold, the other type of thresholding is
called dynamic or adaptive thresholding, there is something called optimal thresholding and
there is also a thresholding operation which is called local thresholding.
2
So, we will discuss about these different region based segmentation operations either
thresholding or region growing and the region splitting and merging techniques one after
another. Now, let us first start our discussion with the thresholding technique.
So, first we will discuss about the thresholding technique for segmentation. Now, thresholding is
one of the simplest approaches of segmentation. Suppose, we have an image and as we have said
earlier that an image is described by or represented by a two dimensional function f (x, y) and let
us assume that this image contains a dark object against a light background. So, in such cases, if
there is a dark object against a light background or if it is the reverse that we have a light object
against a dark background; then you will find that the intensity values, they are mainly
concentrated near two regions or we call them two modes. One of region will be towards the
darker side or towards the lower intensity values and other one other mode will be towards the
brighter side or towards the higher intensity values.
So, if you plot the histogram of such an image; so here we are assuming that we have one object
and let us assume that the object is brighter and the background is dark. So, if we plot the
histogram of such an image, the histogram will appear something like this. So, on this side, we
put say intensity value z and this side says our histogram of z. So, as we said that because we are
having one object and we are assuming that the object is bright which is placed against a dark
background; so the intensity values will try to be accumulated, the histogram will give rise to
bimodal histogram where the intensities will be concentrated on dark side as well as on the
brighter side.
So, for such a bimodal histogram, you find that there are two peaks. One peak here and the other
peak here and these two modes or these two peaks are separated by a deep value. So, this is the
valley and this one peak and this is the other peak and as we have assumed that our object is
3
bright and the background is dark; so all these pixels which are grouped in the lower intensity
region, these pixels belong background and the other group of pixels they belong to the object.
Now, the simplest form of the segmentation is if we can choose a threshold value say T in this
valley region and we take a decision that if a pixel at location xy have the intensity value f (x, y)
which is the greater than T; then we say that these pixel belongs to object whereas if f (x, y) is
less than or equal to the threshold T, then these pixel belongs to the background.
So, this is our simple decision role which is to be used for thresholding purpose. So, what we
have to do is we have to choose a threshold in the valley region and then check the image the
segmentation is simply testing each and every pixel to check whether its intensity value is less
than the threshold or the intensity value is greater than the threshold.
So, if the intensity value is greater than the threshold, then we will say that it belongs the pixel
belongs to an object whereas if the intensity value is less than or equal to threshold, we say that
the pixel belongs to the background.
Now, the situation can be even more general. That is instead of having bimodal histogram, we
can have multimodal histograms. That is our histogram can even be of this form, like this. So,
this is our pixel intensity z and on this side is the histogram. So, here you find that the histogram
has three different modes which are separated by two different values. So now, what we can do
is we can choose one threshold say T 1 in the first value region and the other threshold T 2 in say
second value region.
So, what this histogram indicates is that there are three different regions or three different
intensity regions which are separated by some other intensity band and those three different
intensity regions are represented or gives rise to these three different peaks in the histogram. So
here, our decision role can be something like this, that if we find that the intensity value f (x, y)
at a pixel location (x, y) is greater than threshold T 2 , then we say that the point (x, y) belongs to
4
say object O 2 . So, all the intensity vales all the pixels having intensity values greater than T 2 ,
these pixels, we say that they belong to the object O 2 .
In the other case, if a pixel has an intensity value in this region that is greater than T 2 greater
than T 1 and less than T 2 , then we will say that this particular pixel belongs to object O 1 . So, our
decision rule will be that T 1 less than f (x, y) less than or equal to T 2 , then this indicates that the
corresponding pixel (x, y), it belongs to object O 1 .
And obviously, the third condition will be that if f (x, y), the intensity value at a location (x, y) is
less than threshold T 1 ; in that case, we say that the corresponding pixel (x, y), it belongs to the
background. So, even in cases, we can have histograms which are even which will have even
more number of peaks more than three peaks; such cases also, similar such classification is
possible. But what we have to do for this thresholding based segmentation technique is that we
have to choose proper threshold values.
Now, this threshold value of the thresholding operation can be considered as an operation that
involves testing against a function T where this function T is of the form T is equal to T [x, y, p(
x, y) and f (x, y)]. So, this thresholding operation, what we are doing is we are considering or this
can be viewed as an operation to test the image pixels against a function T where this function T
is of this form this, function T is a function of (x, y) which is nothing but the pixel location in the
image, f (x, y) which is nothing but the intensity value at location (x, y); so this is pixel intensity
at location (x, y) and p (x, y) it is some local neighborhood property some local property in a
neighborhood centered at (x, y).
So, in general, this threshold T is a function can be a function of pixel location, the pixel value as
well as the local property within a neighborhood around a pixel location (x, y). So, any
combination of these three that is pixel location, pixel value and neighbor property neighborhood
property, this neighborhood property can even be the average intensity values within a
5
neighborhood around pixel (x, y); so any combination of this T can be a function of any
combination of these 3 terms and depending upon the combination, this T can be either a global
threshold or a local threshold or it can even be an adaptive threshold.
So, in case, the T is the threshold T is only a function of f (x, y), we say that the threshold is a
global threshold whereas if T is a function of f (x, y) and the local property that is p (x, y), then
we say that the threshold T is a local threshold and if in addition to all this, T is also a function of
the location of the pixel that is in the mode general case, if T is a function of (x, y) f (x, y) as
well as p (x, y), then we say that this threshold T is an adaptive or dynamic threshold.
Now, whichever the nature of the threshold T is whether it is local or global or adaptive; our
thresholding operation is by using this threshold, we want to create a thresholded image say g (x,
y) from our input image f (x, y) and we said the value of g (x, y) is equal to 1 if the
corresponding function or the intensity of the image at that location that is f (x, y) is greater than
the threshold T. Now, this threshold T can be either global or local or adaptive and we said g (x,
y) is equal to 0 if f (x, y) is less than or equal to the chosen threshold T.
So, you will find that the basic aim of this thresholding operation is we want to create a
thresholded image g (x, y) which will be a binary image containing pixel values either 0 or 1 and
this value will be set to 0 or 1 depending upon whether the intensity f (x, y) at location (x, y) is
greater than T or it is less than or equal to T.
So, if we have a bright object against a dark background; in that case, g (x, y) equal to 1, this
indicates that the corresponding pixel is an object pixel whereas g (x, y) equal to 0, this will
indicate the corresponding pixel is a background pixel. On the contrary, if we have dark objects
against bright background; in that case, what we will do is we will set g (x, y) equal to 1 if f (x,
y) is less than or equal to T again indicating that in the thresholded image, a pixel location
having an intensity value of 1 that indicates the corresponding pixel belongs to the object and in
6
such case, we will put g (x, y) equal to 0 if f (x, y) is greater than T again indicating that a pixel
in the thresholded image g (x, y) if it is equal to 0, the corresponding pixel is a background pixel.
Now, the question is how to choose this threshold value?
Now, for that let us come to the case; again considering the histogram we have said that if my
histogram is a bimodal histogram of this form, then what I can do is by looking at the histogram,
so this is our intensity value z and on this side we have h (z), by inspecting this histogram, we
can choose a threshold in this deep value region and using this threshold, I can go for the
segmentation operation. Now by doing this, I will show you one particular result.
7
Say for example, in this particular case, here, you find that we have an image where the objects
are dark whereas the background is bright. So naturally, in this case, I will have a histogram
where the histogram will be a bimodal histogram. So, the nature of the histogram will be like
this. So, this will be bimodal histogram. So here, if I choose a threshold T in this region and
using this threshold, I segment this image, then the kind of segmentation that we get is as given
here.
So, here you find, in the second image in the segmented image that your background and object
regions have been clearly separated; even the shadow which is present in the original image that
has been removed in the segmented image. So, this segmentation, though it is a very very simple
operation if you choose the threshold in the value region between the two modes in a bimodal
histogram; then this segmentation, this simple segmentation operation can clearly take out the
object regions from the background.
But here what we have done is we have chosen the histogram to choose the threshold. That is
you inspect the histogram and then from inspection of the histogram, you have to choose the
threshold value. But is it possible to automate this process? That is instead of finding the
histogram by instead of finding the threshold value by looking at the histogram; can we
automatically determine what is the threshold value which should be used for segmenting an
image? So, this operation can be done by using an iterative procedure.
8
So, automatic threshold; so here again for detecting this threshold automatically, what we can do
is we can first choose an initial value of threshold. So, arbitrarily or by or somehow, we can
choose an initial value of threshold and using this initial value of threshold, what we can do is we
can have a segmentation of the image.
So, when you segment the image using this initial value of threshold, the segmentation operation
basically will partition your histogram into two partitions or the image will be divided into two
groups of pixels. So, you can say that one group of pixels, we term them as group G 1 and the
other group of pixels we term them as group G 2 . So, the pixel intensity values in group G 1 will
be similar and the pixel intensity values in group G 2 will also be similar but these two groups
will be different.
Now, once I separate or partition the image intensities into these groups G 1 and G 2 , the next step
that what we will do is you compute the means or the average intensity values mu 1 for group G 1
and the average intensity value mu 2 for group of pixels G 2 . So, once I get this mu 1 and mu 2 that
is the average intensity value in the group of pixels G 1 and also the average intensity value for
the group of pixels G 2 , then in the fourth step, what I do is I choose a new threshold T which is
equal to mu 1 plus mu 2 divided by 2.
And, after doing this, you go back to step two and perform the operation, thresholding operation
once again. So, what we are doing is we are choosing an initial value of threshold, using that
initial value of threshold we are thresholding the image, by thresholding what we are doing is we
are separating the intensity values into two groups G 1 and G 2 ; for group G 1 , I find out the
average intensity value mu 1 , for group G 2 , I also find the average intensity value G 2 , then I find
out a new threshold which is the mean of these two averages that is mu 1 plus mu 2 by 2 and
using this new threshold, I threshold the image again so there by these groups G 1 and G 2 will be
modified and I repeat this process that is thresholding to grouping, then finding out the intensity
averages in the two different groups two separate groups, recalculating the threshold; this entire
9
process will be repeated until and unless I find that the variation in two successive iterations in
the computed value of T is less than some pre specified value.
So, this operation has to continue until you will find that in one at iteration T i and the next
iteration T i plus 1, the threshold value in the i’th iteration T i and in the i plus iteration T i plus 1;
the difference between these two is less than or equal to some pre specified value say T prime.
So, when I attain this condition, I stop my thresholding operation.
So, here you will find that we do not have to go to the histogram to choose the threshold. Rather,
what we do is we choose some initial value of threshold, then go on modifying this threshold
value iteratively; finally you converse, you come to a situation where you find that in two
subsequent iterations, the value of the threshold does not change much and at that position
whatever the thresholded image that you have got that is your final thresholded value.
So, using this kind of adaptive threshold, the kind of result that can be obtained is something like
this. So, here you will find that this is one input image and you can identify this that this is a
finger print image. This is the histogram of that particular image. So obviously, from this
histogram also, I can choose a threshold somewhere here.
But this thresholded output that has been obtained is not by choosing a threshold from the
histogram but this is by automatic threshold selection process that is by doing this iterative
process and it can be observed that from this histogram whatever threshold you choose by this
automatic process, the threshold will be similar to that and here you will find that since the
threshold that you have chosen, this does not consider the pixel location or the local
neighborhood of the pixel intensity values.
Here, the threshold is a global one that is for the entire image, you choose one particular
threshold and using that threshold, you go for segmenting the image. So, the kind of thresholding
10
operation that we have done in this particular case, this is called a global thresholding operation.
Now, you find that in this particular case, this global thresholding will give you very good result
if the intensity of the illumination or the scene is uniform. But there may be cases where the
scene illumination is non uniform and in case of such non uniform illumination, getting a global
threshold which will be applicable over the entire image is very very difficult.
So, let us take one particular example, say in this particular case; on the top, we have an image
and you can easily find out that for this image if I plot the histogram, the histogram will be as
shown on the right hand side. Clearly, this histogram is a bimodal histogram and there is a valley
in between the two modes. So, these modes are separated by a deep value. So obviously, for such
a kind of histogram, I can always choose a threshold inside the valley and segment this image
successfully.
But what happens if the illumination is not proper? If the background illumination is not
uniform, then this image, because of this non uniform illumination may turn out to be an image
like this and whenever I have such an image with poor illumination and you find that the
histogram of this image appears as being on the right hand side and here you find that though the
histogram appears to be a bimodal one, but the valley is not well defined. So, this simple kind of
thresholding operation or the global thresholding operation is likely to fail in this particular case.
So, what should we do for segmenting these kinds of images using the thresholding operation?
Now, one approach is you subdivide this image into a number of smaller sub images. Assuming
that in each of this sub image, the intensity will be more or less uniform or the illumination is
more or less uniform; then for each of the sub image, we can find out a threshold value and using
this threshold value, you can threshold the sub images and then the combination of all of them or
the union of all of them will give you the final thresholded output.
11
So, let us see what we get in the case. As we said that for these kinds of images where the
illumination is non uniform, if I apply a single global threshold; then the kind of output, the
thresholded output that we are going to get is something like this. So, here you find that the
thresholding has failed miserably whereas if I subdivide this image into a number of sub images
as given on this left hand bottom and then for each of these sub images, I identify the threshold
and using that threshold, you go for segmenting that particular sub image and the thresholded
output that you get is given on this right hand side.
Here, you will find that excepting these two, rest of the sub images has been thresholded
properly. So, at least your result is better than what you get with a global threshold operation. So,
now because we are going for a thresholds, lecture of a threshold which is position dependent
because every sub image has a particular position; so now because this threshold selection is
position dependent, it becomes an adaptive thresholding operation. Now, let us try to analyze
that why this adaptive threshold has not been successful for these two sub regions.
12
So, if I look at the nature of the image; here, if you look at this top image, you will find that in
this top image, here is a boundary where this small portion belongs to the background and this
large portion of the image belongs to the object. Now, if I plot the histogram of this, the
histogram will be something like this; because the number of pixels in the background is very
very small, so the contribution of those pixels to the histogram that is within this region is almost
negligible.
So, instead of becoming a bimodal histogram, the histogram is dominated by a single peak and
that is the reason why this thresholding operation has not given good result for this particular sub
region. So, how to solve this problem? Again, our solution approach is same. You subdivide this
image into smaller sub division, so you go for sub dividing further and for each of these smaller
subdivisions, now you try to find out the threshold and segment each of the subdivisions with
each of these sub subdivisions using this particular threshold. So, if I do that you will find that a
kind of result that we get is here and here, the segmentation output is quite satisfactory.
So, if the scene illumination is non uniform, then a global threshold is not going to give us a
good result. So, what we have to do is we have to subdivide the image into a number of sub
regions and find out the threshold value for each of the sub regions and segment that sub region
using this estimated threshold value and here, because your threshold value is position
dependent, it depends upon the location of the sub region; so the kind of thresholding that we are
applying in this case is an adaptive thresholding.
Now, in all these thresholding whether it is global thresholding or adaptive thresholding that we
have discussed so far, none of these cases we have talked about the accuracy of the thresholding
or how accurate or what is the error that has been involved that is by this thresholding process.
So, we can go for a kind of thresholding by making use some statistical property of the image
where the mean error of the thresholding operation will be minimum.
13
So, that is a kind of thresholding operation which is called optimal thresholding. So, what is this
optimal thresholding? Again, let us assume that the image contains two principle gray levels,
intensity regions; one intensity region corresponding to the object and the other intensity region
corresponding to the background and we use a variable and we assume that these intensity
variables can be modeled as a random variable and this random variable is represented by
variable say z.
Now, once we represent the random variable by this z, then the histogram of this particular
image or the normalized histogram can be viewed as a probability density function of this
random variable z. So, the normalized histogram can be viewed as a probability density function
p (z) of this random variable z.
Now, as we have assumed that the image contains two major intensity regions to dominate
intensity vales; so our histogram is likely to be a bimodal histogram. So, the kind of histogram
that we will get for this image is a bimodal histogram. So, it will be something like this and as
we said that the histogram, we are assuming to be a probability density function of the intensity
variable z.
So, this bimodal histogram can be considered as a combination of two probability density
functions or combination of two periods. So, one of them is say probability distribution function
p 1 (z), the other one is probability density function say p 2 (z). So, p 1 (z) indicates the probability
distribution function, the probability density function of the intensities of pixels which belong to
say background and p 2 (z) is the probability density function of the pixel intensity values which
belong to say object.
Now, this overall histogram that is p (z) can now be represented as the combination of p 1 (z) and
p 2 (z). So, this overall p(z), we can write as capital P 1 into p 1 (z) plus capital P 2 into p 2 (z) where
14
this capital P 1 indicates the probability that a pixel will belong to the background and capital P 2
indicates that indicates the probability that a pixel belongs to an object.
So obviously, this capital P 1 plus capital P 2 , this will be is equal to 1. So, these are the pixel
probabilities which belong to either foreground or the background. So here, our assumption is
that we have a bright pixel against a dark background because we are saying that capital P 1 ,
sorry the capital P 1 , it is the probability that a pixel belongs to the background and capital P 2 is
the probability that a pixel belongs to the foreground of the object.
Now, what is our aim in this particular case? Our aim is that we want to determine a threshold T
which will minimize the average segmentation error.
Now, you find that since this over all probability is modeled as a combination of two different
probabilities; so it is something like this. I can say that I have one probability distribution
function which is given by this and the other probability distribution function is say given by this
so that my overall probability distribution function is of this type, this is my over all probability
distribution function.
So, this blue colour, this indicates P 2 (z) and the pink colour this indicates P 1 (z) and the yellow
colour indicates my overall probability density function that is p (z). So, in this particular case, if
I choose a threshold T somewhere here, so this is my threshold T and I say that if f (x, y) is
greater than T, then (x, y) belongs to object. Now here, you find that though we are taking a hard
decision that if f (x, y) is greater than T, then (x, y) belongs to object but the pixel with intensity
value f (x, y) also has a finite probability; say given by this that it may belong to the background.
So, while taking this decision, we are incorporating some error. The error is the area given by
this probability curve for the region intensity value greater than T.
15
So, the probability of considering a background point as an object point or the error leads to an
error that is a background point may be classified as an object point. So, the error that you
encounter in that particular case is given by say E 1 (T) because this error is threshold T
dependent; so write this as E 1 (T) is equal to say P 2 (z) dz, take the integral of this minus infinity
to infinity.
So, what is this? This is the probability that this is the error incorporated that an object pixel may
be classified as a background pixel. Similarly, if a background pixel is classified as an object
pixel, then the corresponding error will be given by E 2 (T) is equal to integral P 1 (z) dz where the
integral has to be taken from T to infinity.
So, these give you the two error values. One of them gives the error that you encounter if you
classify a background pixel as an object pixel and the other one if you segment object pixel as a
background pixel.
So, from these two error expressions, the overall error probability can now be represented as E
(T) is equal to capital P 2 into E 1 (T) plus capital P 1 into E 2 (T). So, you find that this E 2 was the
probability, it was the error of classifying a background pixel as a foreground pixel and E 1 (T)
was the error of classifying an object pixel as a background pixel and P 1 is the probability that a
pixel belongs to background and capital P 2 is the probability that a pixel belongs to the object.
So, the overall probability of error will be given by this expression capital P 2 into E 1 (T) plus
capital P 1 into E 2 (T).
Now, for minimization of this error, what we have to do is we have to take the derivative del E
(T) del T and equate this two 0. So, whatever the value of T that you get that is what is going to
be will the minimum error. So, if put this restriction, then this above expression; we are not
going into the details of mathematical derivation, I will just give you the final result. This can be
given by capital P 1 into p(T) p 1 (T) plus capital P 2 into p 2 sorry this is not plus this is equal. So,
16
we are going to get an expression of this form and the solution of this equation gives the values
of T.
So, if we try to solve this, you will find that what I need is the knowledge of this probability
density functions - p 1 (T) and p 2 (T) . So, as we know that in most of the cases, we normally
assume the Gaussian probability density function. So, if I assume that Gaussian probability
density function; in that case, the overall probability p (z) is represented by capital P 1 divided by
square root of 2 phi sigma 1 e to the power minus z minus mu 1 square by 2 sigma 1 square plus
capital P 2 by square root of 2 phi sigma 2 e to the power minus z minus mu 2 square by 2 sigma
2 square where mu 1 is the average intensity value of the background region and mu 2 is the
average intensity value of the object region and sigma 1 and sigma 2 , they are the standard
deviations of the intensity values in the background region and the intensity values in the object
region.
So, by assuming this Gaussian probability density function, we get the overall probability density
function as given by this expression.
And, by assuming this and then from this particular expression, the value of T can now be found
out as the solution for T is given by, solution of this particular equation - AT square plus BT plus
C is equal to 0 where this A is equal to sigma 1 square minus sigma 2 square, B is equal to 2 into
mu 1 sigma 2 square minus mu 2 sigma 1 square and C is given by sigma 1 square mu 2 square
minus sigma 2 square mu 1 square plus 2 sigma 1 square sigma 2 square ln sigma 2 capital P 1 by
sigma 1 capital P 2 and here if we assume that sigma 1 square is equal to sigma 2 square is equal
to say sigma square; then the value of the threshold T comes out to be T is equal to mu 1 plus mu
2 divided by 2 plus sigma square upon mu 2 mu 1 minus mu 2 ln capital P 2 divided by capital P 1 .
So, this is a simple expression for the value of threshold that we can obtained in this optimal
thresholding operation and this is optimal in the sense that these value of the threshold gives you
17
minimum average error and here again, you find that if the probability - the capital P 1 and capital
P 2 , they are same; in that case, the value of T will be simply mu 1 plus mu 2 by 2 that is the mean
of the average intensities of the foreground region and the background region.
So, as we said that by estimating a threshold by this process, if we segment the image; then the
average error of segmentation will be minimum. That is minimum number of foreground pixels
will be classified as object pixels and minimum number of object pixels will be classified as
foreground pixels.
Now, let us see an example that where these optimal thresholding can give us good results. Let
us take a very complicated case like this. This is the cardiogram and geography in which the
purpose is to detect the ventricle boundaries. You find that the image that is given here is very
very complex and though we can somehow figure out that there is a boundary somewhere here
but it is not very clear.
So, the approach that was taken is this image was divided into a number of sub images; for every
sub image, the threshold was estimated the optimal threshold was estimated and then the
thresholding was done. So, for this optimal thresholding, what was done is for each of the sub
image, say for example, this was divided into a number of sub images like this; for each of the
sub image, what was computed is the histogram and the threshold was computed for those sub
images which shows a bimodal histogram like this whereas you will find that if I take a sub
image here, this normally shows a uni… histogram which is given here.
For these sub images, no threshold was detected. The threshold was detected only for those sub
images which showed bimodal histogram and the threshold for other sub images were estimated
by interpolation of the threshold of the regions having bimodal histogram and then a second level
of interpolation was done, iteration was done to estimate the threshold value at each of the pixel
locations and after doing that for each pixel location using that particular threshold, the decision
18
was taken whether the corresponding value should be equal to 0 or the in the threshold that you
know the corresponding value should be equal to 1.
So, using this, the thresholded image was obtained and the boundary of such thresholded image
when super imposed on this particular image, you will find that this one shows the boundary of
the thresholded image. So, as was estimated that this was the estimated boundary, the boundary
points are quite well estimated in this particular case.
So, with this, we stop this particular lecture on thresholding operations. Now, let us see some of
the quiz questions on today’s lecture.
The first question is what is meant by global, local and adaptive thresholds? The second
question, how do the relative sizes of object and background regions influence threshold
detection? The third question, if the threshold value is to be chosen automatically using iterative
procedure, how should you choose the initial threshold value? The fourth question, what
approach of thresholding should be used in case of non uniform illumination? And, the last
question, what is the objective of choosing optimal threshold?
Thank you.
19
Prof. P.K. Biswas
Lecture - 32
Image Segmentation – IV
Hello, welcome to the video lecture series on digital image processing. We are discussing about
the image segmentation operations particularly the similarity based image segmentation
operations.
So, we have seen that in similarity based image segmentation operation, there are mainly 3
approaches. One of them is the thresholding based technique where you can go for either global
thresholding or dynamic or adaptive thresholding or optimal thresholding or local thresholding.
So, in our last class, we have discussed about the global thresholding operation, we have also
discussed about the dynamic or adaptive thresholding operation and we have also discussed
about the optimal thresholding operation and we have seen that in case of global thresholding, a
threshold value is selected where the threshold value depends only on the pixel intensities in the
image.
1
Whereas, in case of dynamic or adaptive thresholding, it not only depends upon the pixel values
or the intensity values or the pixels in the image, it also depends upon the position of the pixel in
the image. So, the threshold for different pixels in the image will be different. In case of optimal
thresholding, we have tried to find out threshold by assuming that the histogram of the image is a
representative of the probability density function of the pixel intensity values.
So there, if you have a bimodal histogram, the bimodal histogram is considered as a combination
of 2 probability density functions and from the probability density functions, we have tried to
estimate that what is the error incurred by performing the threshold operation when an pixel is
decided to belong to an object or the pixel is decided to belong to the background.
So, because of the probability distribution function of different intensity values, it is possible that
a pixel which actually belongs to the background may be decided to belong to an object or a
pixel which actually belongs to an object after thresholding; it may be classified to belong to a
background. Now, because of this, there is an amount of error which is incorporated by this
thresholding operation.
So, in case of optimal threshold, what we have done is we have tried to estimate that how much
is the error incorporated if we choose a particular threshold. Then, you choose that value of the
threshold where by which your average error will be minimized. There is another kind of
thresholding operation which is the local thresholding operation that we will be discussing today
and we have said that local thresholding operation takes care of the neighborhood property or the
pixel intensity values in the neighborhood of a particular location (x, y).
We will also discuss about the other 2 operations, other 2 segmentation, similarity based
segmentation operations that is region growing technique and region splitting and merging
techniques.
2
So, today’s discussion will be concentrated on local threshold operations where we will consider
in addition to the pixel value, the intensity value, its location; we will also consider the local
neighborhood property and the other 2 similarity based segmentation techniques that is region
growing technique and region splitting and merging technique.
So, first of all let us concentrate on the local thresholding operation. It is now clear that selection
of a good threshold value is very simple if the histogram of the particular image is a bimodal
histogram where the modes are tall, they are narrow and separated by a div value and in addition,
the modes are symmetric. That means if we have a histogram like this; so on this side, we put the
pixel intensity values and this side, we put the histogram.
So, if a histogram is of this form, then we can very easily choose a threshold within this valley
region. These are the 2 histogram modes or 2 histogram peaks which are separated widely by a
value by a valley and within this valley region, we can choose a threshold and by using this
threshold, we can segment the image property. But what happens in most of the cases is that the
histogram is not so clear. It is not so clearly bimodal and this threshold selection also becomes
easy if the histogram is symmetric. That means the area occupied by the object and the area
occupied by the background pixels; they are more or less same.
The problem that occurs that if I have an image like this; so I have an image and within this
image, very small numbers of pixels actually belong to the object and a large number of pixels
belongs to the background and when I have an image like this, the resulting histogram will be
something like this. So, this may be the object pixels and the background pixels give rise to a
histogram of this form and here you find that the contribution to the histogram by the object
pixels is almost negligible because the number of pixels belonging to the object is very small
compared to the number of pixels belonging to the background.
3
So, the bimodal of nature of the histogram is not very visible rather the histogram is dominated
by a single mode by the pixels which belong to the background. Now, how to solve this
problem?
So, this problem can be solved if instead of considering all the pixels in the image to produce the
histogram, if somehow we can identify the pixels which are either on the boundary or near the
boundary between the object and the background in a sense what we are trying to do is that given
an image with an object inside, what we are trying to do is we are trying to identify the pixels in
a very small trip in a narrow strip around this boundary.
So, if we consider only these pixels around the boundary to form the histogram; the advantage in
this case is since we are considering only these pixels near the boundary to form the histogram,
the histogram is will be symmetric. That is the area of the pixels within the object region and the
area of the pixels and the number of pixels within the background region which are being
considered to form the histogram, these 2 numbers of pixels belonging to the object and number
of pixels belonging to the background, they will be more or less same, almost same. So, our
histogram will be symmetric and it will not be dependent upon the relative size of the object and
the background region.
And, the second object is the advantage is the probability of a pixel belonging to the object and
the probability of a pixel belonging to the background within this narrow strip, they are almost
equal. Because if I consider the entire image, then the probability and in the image, the object
region is a very small region; then the property of a pixel belonging to the object is small
compared to the probability of the pixel belonging to the background.
Whereas, if I consider the pixels within a narrow strip around the object boundary; in that case,
the probability of the pixels belonging to the background and the probability of the pixels
belonging to the object they are almost same. So, by considering only those pixels around this
4
narrow strip, I get 2 advantages. One is the pixel belonging the probability of pixel belonging to
the background and the probability of the pixel belonging to the object, they are nearly equal and
at the same time, the area of the foreground region or the object region and the area of the
background region which is used for computation of the histogram that is also nearly same
making your histogram a symmetrical histogram. And once I have this kind of histogram, then
the thresholding operation is very very simple.
Now, the question is if I simply use this kind of approach; in that case, I have to know that what
is the object boundary or what is the boundary between the object region and the background
region? But which is not easily obtained because the basic purpose is segmentation, basic
purpose of segmentation is that we are trying to find out the boundary between the object and the
background.
So, this simple approach as it has been presented that we want to consider the pixels lying on the
boundary or the pixels around the boundary; this cannot be used in this simple form because the
boundary itself is not known. That is the one that we trying to determine. Then what is the
solution?
So, what is the solution? How do we solve this particular problem? Now, the solution is that if
we use the image gradient and the Laplacian image Laplacian, we know that if I have a region
something like this, so I plot the variation of intensity values; so this is the pattern of intensity
values in an image. So obviously, we are putting it in one dimension, the 2 dimension is now
mapped to 1 dimension, so this is my pixel location say x and this is say f(x).
So, this is the variation of intensity along the x direction. If I take the gradient of this and as you
know that the gradient is first order derivative operation, so if I compute the gradient of this, the
gradient will appear something like this. So again, this is my x direction and on this side, what I
5
am putting is del f(x) del x, this the gradient and also if I take the Laplacian which you know is
the second order derivative operator, the Laplacian will appear in this form.
So, this is the second order derivative. Again on this direction, we are putting x; on this direction,
we are putting del 2 f(x) by del x 2. So, this is f(x), this is gradient and this is Laplacian. So, we
have seen earlier that an estimate of the edge points can be obtained from the gradient operator
and from the Laplacian operator and we have discussed earlier that the Laplacian operator is
affected to a large extent by the presence of noise.
So, the output of the Laplacian operator is not directly used for edge detection purpose but it is
used to provide secondary information. So, what we do is you do the gradient operator output to
determine the position of the edge points and the output of the Laplacian operator is used to
determine whether a point is lying on the darker side of the edge point or it is lying on the
brighter side of the edge point.
So, as has been shown here that coming to this intensity distribution, you will find that this is the
bright side and this is the dark side and if I compare this Laplacian, you will find that on the
bright side of the edge, the Laplacian becomes negative whereas on the dark side of the edge, the
Laplacian becomes positive. So, by making use of this information, we can say that whether a
point is lying on the dark side of the edge or it is lying on the bright side of the edge.
So, our approach is though we have said that we want to consider only those pixels for
generation of the histogram which are lying either on the boundary either on the edge between
the object and the background; so, that information can be obtained by using from the output of
the gradient because for all the pixels which are lying on the boundary or near the boundary, the
gradient magnitude will be quite high and then to decide that out of these points which point lies
on the dark side and which point lies on the bright side, we can make use of the Laplacian output
where the Laplacian will be negative if a point is lying on the bright side of the edge and the
Laplacian will be negative if the point lies in the dark and the Laplacian is positive if the point
lies on the dark side of the edge.
6
And, we you seen earlier that in case of an image where the image is modeled as a 2 dimensional
function f(x, y), the gradient of this image that is grad f, magnitude of this is given by magnitude
of G x plus magnitude of G y or square root of G x square plus G y square where this G x is nothing
but partial derivative of f(x, y) with respect to x and G y is nothing but partial derivative of f(x, y)
with respect to y.
So, G x is del f(x, y) del x and G y is del f(x, y) del y and similarly, the Laplacian of this image
that is del square f is given by del 2 f by del x 2 plus del 2 f by del y 2 and we have seen earlier
that to implement this operations in case of digital image, we can have different types of
operators differential operators. One of the operator can compute this grad f and the other
operator that is Laplacian operator can compute the Laplacian of the given image f(x, y). So
here, what we are trying to do is we are trying to estimate whether a point is lying on the edge or
the point is within a small region near the edge and then whether the point is lying on the dark
side of the edge or it is lying on the bright side of the edge.
So, if I assume that we have an image where we have dark object against a bright background; in
that case, for the object pixels, the Laplacian near the edge will be positive and for the
background pixel, the Laplacian near the edge will be negative.
7
So, simply by making use of this property, what we can do is we can create from f(x, y), then
grad of f - gradient of f magnitude of this and del square f from these 3, I can create an image
which is say s (x, y) and we will put s (x, y) is equal to 0 if gradient of f is less than some
threshold t where it indicates that if the gradient as we have said that on the edge points or the
points near the edge, the gradient value will be high.
So, if the gradient value is less than some threshold T, we assume that this point does not belong
to edge point does not belong to an edge or this point is not even within a region near the edge.
So, for such points, we are making s (x, y) is equal to 0 and we will put s (x, y) is equal to
positive if gradient of f is greater than or equal to T indicating that this is an edge point or this is
a point near the edge and at the same time, if del square f is greater than or equal to 0 which
indicates that this point is on the dark side of the edge. That means in this particular case, since
we are assuming that we have dark objects against a background; so this is a point on the object
side or it is a object point near the object and edge boundary and we will put s (x, y) is equal to
negative if it is an edge point or a point near the edge for which again del of f will be greater than
or equal to T and the Laplacian that is del square f will be less than 0.
So, what we are doing is we are creating an image s (x, y) which will have values either 0 or
positive or negative. Now, for implementation, what we can do is these 3 symbols – 0, positive
or negative can actually be represented by 3 distinct intensity values.
So, for example; 0 may be represented by 0, positive may be represented by an intensity value
say 128 and negative may be represented by an intensity value say 255. So, 3 distinct intensity
values will represent these 3 different symbols – 0, positive and negative and then what we have
to do is we have to process this intermediate image s (x, y) to find out the object boundaries of
the object regions.
8
So, here you find that in this representation if s (x, y) is equal to 0 that represents the point does
not belong to the boundary, boundary between object and the background if it is positive, then
the object belongs to then the pixel belongs to the object region. If it is negative, then the pixel
belongs to the background region.
So, by using this kind of processing, an intermediate image that we can get will be something
like this. So, here you find that we have an image which contains one of this 3 symbols either 0,
positive or negative and here, what we have done is this was an object, a dark object against
bright background may be some handwritten characters with an underline and this information
can be processed to find out, this intermediate image can be processed to find out the object
region and the background region.
So, once I get an image of this form, you will find that if I scan the image either along a
horizontal direction or along a vertical direction, then I am going to get a pattern of these 3
symbols. Now, what will be the nature of this pattern?
9
Say for example, whenever there is an edge, say I have this image, this intermediate image and I
want to scan the image along a horizontal line from left to right. Now, while scanning this, since
I have assumed that I have dark objects against a bright background; so, whenever there is a
transition from the background region to the object region, then I will get a situation something
like this. I will get a point having a negative level followed by a point having a positive level. So,
a negative followed by a positive, this indicates that I have a transition from background to
object.
Similarly, when I am scanning, I am moving from object to the background region; then the
combination of these 2 symbols will be just opposite. So here, because I am moving from object
region which is dark to the background region which is bright; so the combination of the symbols
that I will get is a positive followed by a negative. So, whenever I get this kind of transition that
is from a positive to a negative, this indicates that I have a transition from object to background.
10
So, by making use of this observation, if I scan a particular horizontal line or a vertical line; then
I get a sequence of symbols where the sequence of symbols will be something like this. I will put
this as say star, star, star followed by a negative, followed by a positive and then I will have a 0
or positive followed by positive followed by negative and then again a number of stars.
So, if this intermediate image I check either along a horizontal line or along a vertical line and if
that particular scan line contains a part of the object; in that case, my scan pattern will be
something like this where this star star, this indicates any combination of 0, positive or negative.
So here, you will find that firstly, I can get any combination of 0, positive or negative and then
whenever I have a transition from the background region to the object region, I will have a
negative followed by a positive and then within the object region, I can have either 0 or positive
symbols. Then, when I am moving from the object region to the background region, I can have a
transition from positive to negative and then again on the rest part of this scan line, I can have
any combination of 0, positive or negative.
11
And you find that this is what is actually represented in this particular image. When I move along
any scan line, say for example, I am moving along this particular scan line see if I move along
this particular scan line, you will find that initially, I have all 0s, then I have negative symbol
followed by I have positive symbol; then within this, it is either 0 or positive, then again I will
have a transition from positive to negative, then again I will have a number of 0’s and this is how
it continues.
So, by making use of this particular pattern, I can identify that on this particular scan line which
part is which portion of this scan line belongs to the object and which portion of this scan line
belongs to the background.
12
So, the kind of scan lines or symbols on the scan lines that we have obtained is like this. First, I
have any combination of positive, 0 or negative; then I have negative, positive; then I have either
0 or positive; then I have positive followed by negative and then again I can have any
combination of 0, positive or negative and here you find that this inner parenthesis, this transition
from 0 to positive or from positive to 0, this indicates the occurrence of edge points and this
inner parenthesis where I have only 0 or positive symbols, these actually indicates the object
region.
So, for segmentation purpose, what I can do is when I scan this intermediate image s (x, y)
either along a horizontal line or along a vertical line; then only these part of the scan line which
is represented by this inner parenthesis, all those points I will make equal to 1 and rest of the
points on this scan line, I will make equal to 0 and that gives me a segmented output where in
this output image all the scan lines or the part of the object is represented by a pixel value equal
to 1 and all the pixel all the background pixels background regions are represented by a pixel
value equal to 0.
13
So, if I apply this technique on an image, you will find what kind of result that we get. You find
that in this particular case, this is an image, the top part of it is the scanned image of a bank
cheque and here you will find that the signatures and the other figures, they are appearing in a
background and it is not very easy to distinguish which is the object or which is the signature or
figure part and which is really the background part.
And, by making use of this kind of processing and filling all the object regions either with 0 or 1;
we can clearly segment out the signature part and the figure part and here you will find that this
kind of output possibly we cannot get by making use of any of the global thresholding approach.
But here, by using this local thresholding and we call it local thresholding because to find out
this threshold what we have made use of is the gradient of the image and the Laplacian of the
image. And, the gradient and Laplacian, these are local properties local to a particular pixel
location.
So, the kind of thresholding which is inbuilt in this kind of segmentation operation that is what
we call as local thresholding. So, with this we have discussed about the different kind of
thresholding operations. In our earlier class, we have discussed about global thresholding, we
have discussed about the dynamic or adaptive thresholding, we have discussed about the optimal
thresholding and now what we have discussed about is the local thresholding where this local
thresholding operation makes use of the image gradient and image Laplacian and as we said that
this gradient and Laplacian, these are local properties to a particular pixel location.
So, the kind of thresholding which is embedded in this application is nothing but a local
thresholding operation. Though this segmentation operation is obtained by scanning the
intermediate image that is generated, there is no direct thresholding operation involved in it. But
the kind of operation that is embedded in this approach is nothing but what we call as local
thresholding operation. Now, let us go to the other approaches of segmentation.
14
We have said there are 2 other approaches of similarity based segmentation operations; one of
them is region growing segmentation, the other one is called splitting and merging segmentation
operation. So, first let us talk about the region growing operation.
Now, what is this region growing segmentation? It is like this that suppose, we consider that all
the pixels belonging to the image as a set of pixels say R and by this region growing operation or
the segmentation operation, what it does is it partitions this set of pixels R into a number of sub
regions say R 1 R 2 R 3 like this upto say R n .
So, what segmentation is doing is segmentation operation is actually partitioning this set of
pixels R which actually represent the entire image into a number of sub images or partitions that
is n number of partitions R 1 to R n . Now, when it is doing this kind of partitioning that is when I
partition my original set R into n number of such partitions R 1 to R n , these partitioning should
follow certain property.
The properties are if I take the union of all these regions R i union over i, this should give me the
original image R. That means none of the pixels in the image should be left out, it is not that
some pixel is not part of any of the partitions. So, every pixel in the image should be a part of
one of the partitions. The second property is the region R i should be connected and we have
defined earlier that what do we really mean by a connected region.
We have said that given a region, the region will be called connected if I take any 2 points in the
region; then I should be able to find out a path between these 2 points considering only the points
which are already belonging to this region R. So, if every pair of points in this region R i are
connected, then we say that this region R i is connected. So, the second property that this
segmentation or partitioning must follow is that this partitions we get, n number of partitions,
every partition R i should be connected.
15
The third property that must be followed is R i intersection R j that should be equal to null for i
not equal to j. That means if I take 2 partitions say R 1 and R 2 , this R 1 and R 2 should be
disjoined. That means there should not be any common pixels, any common points in these 2
partitions - R 1 and R 2 . Then, if I define a predicate say P over a region R i that should be true
where this P is a logical predicate defined over the points in set R i in this partition R i . So, for a
single partition R i , this logical predicate P should be true and the last property that must be
followed is predicate over R i union R j that must be equal to false.
So, what does it mean? False for i not equal to j; so, this actually means that if I define a
predicate for the pixels of the points belonging to a particular region, then the predicate must be
true for all the points belonging to that particular region and if I take points belonging to 2
different regions R i and R j , then the predicate over this combined set R i union R j must be equal
to false. So, this is what says the similarity. That means all the points belonging to a particular
region must be similar and the points belonging to 2 different regions are dissimilar.
So, what does this region growing actually mean? The region growing as the name implies that it
is a procedure which groups the pixels or sub regions into a larger region based on some
predefined criteria and in our case this, predefined is the defined predicate. So, we start from a
single point and try to find out what are the other points that can be grouped into the same group
which follows the same criteria or for which the predicate value is or for all of which the
predicate is true. So, this region growing operation works like this.
I have image and in this image, I may select a set of points. So somehow, I select a set of points
like this and each of these points, I call as a seed point and then what region growing operation
tries to do is it tries to grow the region starting from the seed point by incorporating all the points
which are similar to the seed point.
16
Now, the similarity measure can be of different types. For example, we can say that 2 points are
similar if their intensity values are very close and the points are dissimilar if their intensity values
are widely different and one of the conditions that we have said that the points must be
connected.
That means coming to this image again, say I have this big image and for region growing, what I
have to do is I have to choose a seed point and our region growing operation will start from the
seed point. So, for this purpose, what I will to do is I can define I can have a 3 by 3
neighborhood around this seed point and since one of the property that this partitions have to
follow; so what I am doing is I am choosing this 3 by 3 neighborhood around the seed point and
since one of the property that this partitions have to follow is that every region or every partition
has to be connected. That means when I start to grow the region starting from the seed point,
then all the points which I will include in the same group or in the same partition, these points
have to be connected. That means I have to start growing this region from the points which are
connected to the seed point.
So here, if I use the 8 connectivity, concept of 8 connectivity; in that case, the point which are to
be grouped or in the same group as this seed point, they have to be one of they must belong to
this 3 by 3 neighborhood of this particular seed point. So effectively, what I am doing is once I
choose a seed point, I check the points in its 3 by 3 neighborhood and all the points which I find
are similar to this seed points, those points are put in the same group and then again I start
growing the region from all these new points which are put in the same group.
So effectively, what I am doing is if I call this seed point which is put which has been selected as
seed point s 0 . Now, from its neighborhood, I may find that the other points which can be put in
the same group as point as this initial seed point s 0 are say s 1 s 2 and say s 5 . Next time I start
growing the region from s 1 itself. I find within the neighborhood of s 1, within this 3 by 3
neighborhood of s 1 following the same 8 connectivity; what are the points which are similar to s 1
17
or what are the points which are similar to the seed point? And, this similarity can be based on
the intensity difference.
If the intensity difference is small, I say that they are similar; if the intensity difference is high, I
say that they are not similar. So, by this, again I start growing the region from s 1 , I start growing
the region from s 2 , I start growing from region from s 5 and so on and this process will stop when
no more new point can be included in the same row.
So effectively, what we are doing is we are selecting a number of seed points in the image
following some criteria. So, we have selected a number of seed points. So, this seed point
selection is application dependent and once you select the seed points; then from the seed points,
we start growing the region in different directions by incorporating more and more points which
are connected as well as similar. And at the end, what we have is a number of regions which are
grown around these seed points.
So, this is what is the basic region growing operation and you will find that this basic region
growing operation can be very easily implemented by using some recursive algorithm. Now, let
us see that what kind of output or result we can get by using this region growing segmentation
operation?
18
So, here is an example. This is the x ray image taken from a weld and you will find that in case
of this x ray image, there may be some cracks within the welded region or there may be some
faults within the welded region and these faults can be captured by using an x ray image. So, this
top left one, this is the x ray image of a welded part and the nature of the problem says that
whenever there is a fault, then the faulty regions in the x ray image are going to have very high
intensity values.
So here, on the left hand side, it is first a simple segmentation operation, the thresholding best
segmentation operation where these are the pixel values, these are the regions having pixel
values near an intensity value of 255 that is the maximum intensity value and as we said that
these faults usually appear as higher intensity values in the x ray image.
Then, what you do is the seed points are actually selected as all the points in this thresholded
image having a value of 255 after the thresholding operation and then you start the region
growing operation around each of these seed points. So, if I grow the region around each of the
seed points; now when you go for this region growing operation, the region growing operation
has to be done on this original image not on the thresholded image. This thresholding operation
is simply done to select the seed points.
Once you get the seed points, come to the corresponding seed point location in your original x
ray image and grow the region starting from those seed locations within the original x ray image
and this one shows the grown regions and now you will find that if I super impose the boundary
of these grown regions, each of them are the grown region; so, if I super impose the boundary of
this grown region on this original x ray image, this super position output is shown on the bottom
right image.
Here, you will find that these are actually the boundary regions boundaries which are super
imposed on the original image. So, you will find that your segmentation operation in this
19
particular case is quite satisfactory. So, by using this similarity measure and incorporating them
along with the region growing operation, we can have quite satisfactory segmentation operation.
So, the next type of segmentation that we said that we will discuss about is splitting and merging
operation, splitting and merging.
Here again, what we are trying to do is we are making as trying to form a segment of all the
pixels which are similar in intensity values or similar in some sense. Our approach in this
particular case will be that if I have an image say R, first you try to find out whether this entire
image region is similar or not or whether the intensity values are similar. If they are not similar,
then you break this image into quadrants. So, just make 4 partitions of this image. Then you
check each and every partition in this image. If they are similar, if all the pixels within a partition
are similar; you leave it as it is. If it is not similar, then again you partition that particular region.
So initially, suppose, this was region R 0 , this was region say R 1 , this was region say R 2 , this was
region say R 3 ; now this R 1 is not uniform, so I partition that again making it R 10 R 11 R 12 and say
R 13 and you go on doing this partitioning until and unless you come to a partition size which is
the smallest size permissible or you come to a situation where the partitions have become
uniform, so I cannot partition them anymore. And in the process of doing this, what I am doing is
I am having a quadratary representation of the image.
So, in case of quadratary representation, if root node is R, my initial partition gives me 4 nodes -
R 0 R 1 R 2 and R 3 . Then R 1 I am partitioning again in R 10 R 11 R 12 and R 13 . Once such
partitioning is completed, then what you do is you try to check all the adjacent partitions to see if
they are similar. If they are similar, you merge them together to form a bigger segment. So, this
is the concept of splitting and merging technique for segmentation. Now, let us see this with the
help of an example.
20
See, I have an image of this form. When you come to this original image, you will find that here
I have this back ground and on this background, I have this object region. This is obviously non
uniform. So, what I do is I partition them into 4 quadrants, each of them are non uniform. So, I
have to partition them again.
So, let us take one particular partition example of one particular partition; so, I partition them in
four again and here you will find that this particular partition is uniform. So, I do not partition it
any more. The rest of the partitions, I have to go for sub partitioning like this. Let us take one of
them; this is partitioned again, this is partitioned again, this is partitioned again and so on.
Now at the end, when I find that I cannot do any more partitioning again either I have reached a
minimum partition size or every partition has become uniform; then I have to look for adjacent
partitions which can be combined together to give me a bigger segment. So, that is what I do in
this case. Here, you find that this partition, this partition, this partition, this partition and this
partition, they can be grouped together. So and then again, this particular group can be combined
with this particular partition, it can be combined with this partition, it can be combined with this
partition and so on.
So finally, what I get is after splitting after the splitting operation; the entire object, I break into a
number of smaller size partitions and then in the merging operation, I try to find out the
partitions which can be merged together to give me a bigger segment size. So, by doing this at
the end of this splitting and merging operation, different objects can be segmented out from the
background.
So, in brief, we have discussed about the different segmentation operations. Initially, it started
with discontinuity based segmentation where we have gone for different edge detection operation
or line detection operation followed by linking and then we have discussed about the similarity
based segmentation. Under similarity based segmentation, we have discussed about various
21
thresholding operations, the region growing operation and lastly, the splitting and merging
operations. Now, let us have some quiz questions on today’s lecture.
The first question, how do the gradient and Laplacian operators help in threshold detection?
Second question, why are the thresholds so obtained categorized as local thresholds? Third
question, how will the combination of 0, positive and negative on a scan line look like if the line
contains multiple object parts? Fourth question, how to choose the seed points for region
growing operation? And the last question, does the region growing operation depend upon the
choice of seed regions or does the result of region growing operation depends upon the choice of
seed regions?
Thank you.
22
Prof. P.K.Biswas
Department of Electronic and Electrical Communication Engineering
Lecture - 33
Mathematical Morphology - I
Welcometo the video lecture series on digital image processing. Till our last class, we have
talked about various image processing techniques and for last few lectures, wewere talking about
image segmentation techniques and as we have seen that image segmentation is nothing but a
process of partitioning a given image into a number of sub images or into a number of sub
partitions.And,we have talked about various techniques for partitioning the image into a number
of partitions or into a number of components.
So, we have seen that image segmentation techniques can basicallybe divided into 2 different
categories.One of the categoriesis discontinuity basedimage segmentation techniques and the
other category is similarity based image segmentation techniques.In discontinuity based
techniques, what we look for is some abrupt changes in the image gray levels or if it is a colour
image some abrupt changes in the colour information in the images and based on these abrupt
changesin the intensity values in the images, we partition the images into different components.
Soessentially, in discontinuity based image segmentation technique,what we look for is some

isolated points or we look for some lines or we look for some edges and similar such
discontinuities present in the image.And based on this discontinuity, once we detect some such
1
isolated points or say points lying on a line,line obviously is a boundary between a lighted image
region to a darker image regionandonce we identify those points, then by some post processing
technique, we try to link those points and after that we get partitioning of the image into different
regions and naturally in this case, the partitioning is based on the discontinuity approach.
Inthe other case,in similarity based technique, we have seen different type of techniques.The first
one that you have seen is the thresholding and we have said that it is one of the simplest
approaches for segmentation of an image.
So, in thresholding based approach, what we have done is by some means we try to find out a
threshold and thresholdingan image means all the intensity values,the regions with the intensity
values which are less than the threshold is put to some partition, some segment and the image
regions where the intensity values are greater than the threshold, they are put to some other
region.
So, when I put these image pixels to different regions based on their intensity values compared to
the threshold, what we essentially do is we partition the image into a number regionswhere the
intensity values indifferent regions are differentand intensity values in certain regionsare similar
in the sense that either they are greater than the threshold for a particular region or the intensity
values of all the pixelsin another regions is less than the threshold.
Theother kind of segmentation technique based on this similarity measurewe have done is the
region growing operation.So, in region growing, what we have done is again by some means, we
try to identify some seed points and then we try to grow the region from the seed points based
some similarity measure.So, when you grow the region, you ascertain that a point will be
included into a particular region if the intensity is similar to that region and at the same time,the
point is also connected to that particular region.
So,you start from a number of seed points and from each seed point, you try to grow the region
based on similarity of intensity value as well as the connectivity property and that is how we get
an image segmentation based on region growing techniquesand the last approach under this
category that is similarity based approach that we have seen is region splitting and merging
operation.
Sohere, you start with the entire image,then if the image intensities if the pixels intensities over
the image is not similar; then you partition the image into four different partition or you make
four different quadrants and then you check for the similarity in each of the quadrants.If in any of
the quadrant,you find that all the pixels have similar intensity values; you donot partition
itanymore but a particular quadrant where if the image pixel intensities are not similar,you
partition again,you sub partition it again in four different quadrants and this process of splitting
continues until and unless we find that a quadrant size is less than a given size or the intensities
in each of the quadrants image intensities, pixel intensities in each of the quadrantsare similar.
Now, while doing so, it is possible that a connected regionof similar intensity will be dividedinto
a number of different partitions. So, in the second phase that is the first phase which we have
done is the splitting that is an image is split into a number of quadrants and this process
2
continuous until and unless we find thatimage intensities in every quadrant is similar or the
quadrant size which is a minimum specified size. So, after splitting operation, as a connected
region may be split into a number ofsub regions; so I go for the second phase which is the
merging operation.
So, in the merging phase, what we do is we take the neighboring regions which might have been
split into two different regions but their intensity values are similar. So, in merging phase, you
try to merge them in a bottom upfashionand once your splitting and merging these two phases
are complete, you get partitions of the image or segmentation of the image into different
segments where the intensity values in different segments will be similar, in each segment will
be similar and the intensity values will be in the different segments will not be similar.
So, these are the different segmentation techniques that we have talked about during our last few
lectures.Today, what we are going to talk about is something different.That is we are going to
talk about a topic called mathematical morphology and we will see its application in image
processing.
So,in today’s lecture,we will see what is mathematical morphology and we will talk
aboutdifferent morphological image processing techniques and in particular, in today’s lecture,
we will try to cover the topics ofdilation, erosion, opening and closing operations.We will come
to each ofthese techniques, discuss each of these techniques one by one and then we will see
what are the applicationsof these techniques in our image processing operations.Sofirstly, let us
see that what is morphology.
3
Now, morphology is a term which is widely used in the field of biology and it tells you that it
discusses about the shape and structure of different animals different plants and so on.In our
application, we will talk about morphology to discuss about the image processing techniques
which take into consideration the structure and shape of objects.
So, the image processing techniques based on the structure and shape of the objects are classified
asmorphological operationsor this is nothing but the application of mathematical morphology in
image processing.Now, when we talk about such morphological operations this have a number of
applications.
Forexample, in one of ourearlier lectures, we had shown this examples where we have an
imageof an object; then what we do is we try to separate the object region from the background
region and for separation of the object region from the background region, we have applied a
simple threshold operation and by applying the threshold the kind of image that we get is shown
on the top right corner and hereyou find that all the white pixels which are supposed to belong to
theobject region, within this white region we have a number of black dots.
Now, these block dots we consider them to be noise.This noise can be filtered by the
morphological operations and after doing the morphological operations,the filtered image that we
get is shown on the bottom andwe will see through our discussion today and our subsequent few
lectures that how such filtering operations can be done by using the mathematical morphological
techniques.
4
So, these are various applications;onesimple application that I have shown earlier.The other
applications of this mathematical morphology is in preprocessing of images that is in case of
filtering; the example that I havejust shown is nothing but a filtering operation because here what
we have done is all the black patches which are present in the segmented imageif I consider that
those black patches arrives because of the presence of noise;then by morphological operation, we
are eliminating those effects those effects of those noise by using the morphological filters.
So, it is a preprocessing kind of application where the application is to filter the noise present in
the image.The other application of the image processing techniques can be for shape
simplification.We can have a situation that we have an object of very very complicated structure
but it is possible to break that complicated structure into a number of sub structures where each
of the substructureswill be simple in nature so that it is easy to describe or easy to quantifythe
shape attributes of that particular structure.
So, in such cases, these morphological operations will also help us to simplify the shape of a
complicated structure.Sonaturally, in this case,the complicated structure will be broken into a
number of substructures where each of the substructures is veryvery simple in natureso that we
can describe them very easily.Then as we said that morphology is a topic which deals with the
shape or structure of the objects.So, we can use the object shape for segmentation operations as
well.
So, the segmentation using the object shape is another major application of these morphological
techniques.Morphological techniques can also be used forimage quantification.By quantification,
what I means is we can find out the area of the object region, we can find out the perimeter of the
object region,we can find out the boundary of the object region and many such related attributes
of the object regions and this is whatI mean by object quantification and later on we will see that
the morphological operations or morphological transformations help us to a great extendfor
quantification of objects.
5
Morphological operations can alsobe used in other applications in other cases;for
example,enhancing the object structure.By enhancing what we means is in one of our earlier
lectures we have seen that given a shape, a two dimensional shape; in many cases, that two
dimensional shapes can be better described by usingthe skeleton and one of the skeletons we
have seen is the medial access.So,these morphological operations are also very very useful to
find out theskeleton of an object.Similarly, these are also useful for thinning operation, these are
also useful for thickening operation, these are also useful to find out the convex hull of a set of
points - we will come later what is meant byconvex hull - these are also useful for object
marking and there are various such applications of the morphological operations.
Naturally, it is not possible to coverentire morphological operations in this particular course and
that is beyond the scope of this course but we will look at some basic image processing
operations which employ morphological techniques.
So,as we said earlier that by morphology what youmean is it commonlydenotes a branch of

biology,we said that morphology is very very a common term in the biology and itdeals with the
form and structure of animals and plants whereas in our case, when we mean use of
mathematical morphology in images processing, sometimes it is also called image
morphology.Bythis we mean the mathematical tool whichhelps us to extract image components
which are useful for representation description of region shape, boundary, skeleton convex hull
etc.
Now, whenever we go for such mathematical morphology in image processing or image

morphology,the basic assumption is an image can be presented by a point set.That is we can give
an image, we can represent an image by a set of points.What do you mean by that?
6
Letus look at this particular figure.So, as you find that we can say that this is nothing but a binary
image where the shaded regions represent object pixels and the black regions represent the
background pixels.So, this is nothing but abinary image.So, when we talk about this
mathematical morphology; initially,initially we shall concentrate on application of mathematical
morphology to binary images which we also called binary morphology.Later on, towards the end
of this topic,we can also see that how this binary morphology can also be extended to take care
of graylevelimages and we will called that as gray level morphology.
So, as I was saying that the basic assumption in application of mathematical morphology in
image processing is that the images should be represented by point sets or I should be able to
represent an image by a set of points.So, looking at this example where I have shown a binary
image and as I said this shaded regions represents the object pixels and the black regions
represents the background pixels and here, you find that this particular binary image can be
represented by a set of point.
So,I can write a setX which is nothing but a set of points, the points belonging to the object
region.So, in this particular case,if I say that this is my x direction and this is my ydirection, you
find that the coordinate of this point is nothing but(2, 0).So, this point (2, 0) is a member of this
set X.Similarly, the point (2, 1) is also a member of this set X.The point (1, 2) is also a member
of this set X but point (2, 2) is not a member because this pixel is a black pixel and it belongs to
the background whereas the point (3, 2), it is a member of this set which is this
point.Solikewise,I can consider the other points which belong to the object region to be member
of this particular set X.
So, you find that this object is now represented by a set of points where a member of this set
represents the corresponding point is an object point.Similarly,the background pixels in this
particular image can be represented byX compliment because all the points which does not
belong to this set, thispoint set X, they belong to the complement of set X or X
7
complement.So,the background pixels is also nothing but a point set which is complement of the
set Xwhere set X is a point set representing the object pixels.
So, this is what I mean by an image should be represented by a point set and once an image is
represented by a point set, then all the morphological operations on the images are nothing but
some set operationson those point sets.So, because this morphological operation will be set
operations; let us quickly review some basicset operation technique.
So,as we all know that suppose I consider a setin a two dimensional phase;so I consider a set
A,point set A insaytwo dimensional space Z2 and if a point say a because we are considering two
dimensional phase, soa point in thistwo dimensional spacewill be represented by on order
pair.So, this point a ifI represent it by an order pairs (a 1 a 2 ); then our first definition is if this
point a is a member of this point set capital A, thenwe will write as a belongs to capital A.So,
this is what says that A belongs to the point setcapital A.
Similarly, given another point in this two dimensional phasesay b; if b does not belong to the
point set A, we write it as b does not belong to the point set A.So, these are some basic set
operations or set notations.Then, we can also define a subset operation.That is we say that a set A
is a subset of set B where both the points setscapital A and capital B are in the two dimensional
space - z square.So, it definethe point set capital A is a subset of the point set capital B if every
element inset A also belong to the set capital B.
However,the reverse may not be true.If I have an element inset B, that may or may not belong to
set capital A.But every element belonging to set capital A must be a member of set capital
B.Inthat case, the set, the point set capital A is a subset of thepoint set capital B.
Wealso have set union operation.So, for 2 sets - capital A and capital B, the union operation A
union B is the set of elements taken from set A and set B.So,if I combine all the elements of set
8
capital A and all the element capital B and take them together, then the set that I get is the union
of the two sets - capital A and capital B.We can also have set intersection operations.So, by
intersection,theintersection of two sets capital A and capital B means that all the elements in A
and B which are common that is if an element belongs to set A and that also belongs to set B,
then that particular common element will be a member of A intersection B. So,A intersection
B,really represents the common elements which are common in both setA and set B.
Similarly, we can have set complementation.So, by complementation,what we mean is aset or

complement of a set capital A,this is nothing but all the elements b such that b does not belong to
set capital A.So, set of all the elements which are not members of the set A, forms the set A
complement and we have seen in our last example that if I represent all the object pixels by a set
say capital X, then all the background pixels will be represented by X complement because the
background pixels are not object pixels and the vice versa,the object pixels are not background
pixels.
We can also have set difference operations.That is we define set difference A minus B and which
is nothing but it is the set of all elements say w such that w belongs to A and this w does not
belong to B. So essentially, if we have some common elements in the sets A and B; so if I
remove the common elementsfrom set A, then whatever setI get, that is the set of elements which
are in the set difference A minus B.
Similarly,we can have other set operation, like set reflection. So, what is this reflection?So,
reflection of a set say B, reflection ofset B;so, this is usually represented as B hat and B hat is
nothing but say it is the set of w such that w is equal to minus b for b belonging to set capital B.
So, if I have a set ofpointswhich I call as capital B, then all the points belonging to set B if I
negate them; then all those negated points, set of all those negated points is actually the
reflection of set B and this reflection of set B is represented by B hat.
9
Similarly, we can have a translation operation on a set. So, if we translate by a vector say z
which is equal to (z 1 ,z 2 )components of this vector z 1 z 2 ,then the translation of a set A by vector z
which is represented by A with subscript z is given by it is set of all point c such that c is equal to
a plus z for a belonging to set capital A.
So,if I shift all the points belonging to set A by this vector capital Z, then the set of all those
points all those shifted points formthe setA z orthe set A which is translated by vector z. Now, as
we are considering about consideringthe point sets and we know that in a two dimensional
phase,a point can also be represented by vector; so, a point set we can also say that it is a set of
two dimensional vectors and as we have said that all our morphological operations will be based
on set operations and some of the set operations which are usually used in morphological
transformation that we have just briefly described.Now, let us see that what we mean by a
morphological transformation.
So, we say a morphological transformation; say psi; so this morphological transformation psi
givesa relation of the image X capital X and as we have just said that this image capital X is
nothing but a point set or set of point.So, this morphological transformation gives a relation of
the image X with another small point set or another small image say capital B which is called a
structuring element.
So, in this case, what we have?We have an image say X - capital X and we have a small
structuring element or a small image capital B.So, this morphological transformation actually
describes or gives a relation between this image X and the small point set capital B which we are
calling as a structuring element.Now, the way we compute this relation is similar to the way we
have done the convolutionoperation or the way we havedone the image registration operation.
So, incase of convolution operation, what we have done is we are given a big image and we have
given a small image.So, there what we do is you take this small image and move this small
10
imagesystematically over the big image and for each such shift or eachsuch translation of
thesmall image over the biggerimage,you do some computation which is nothing but the
convolution operation.
Sohere also, for morphological transformation of a morphological operation,what we will do is

we will translate or we will move the structuring elements systematically over the given image X
and for each translation of the structuring element in the given image,we will perform some
operations which the type of operation will depend upon what type of transformation or
morphological transformation that we want to do.
Sofirstly, we will talk about some simple morphological transformations and one of them, we
call as dilation. So, what is dilation?There are two different definitions of the morphological
dilation operation.However, we will see that both these definitions lead to the same result or in
the other sense that two different definitions are identical.So, what are those definitions?
So first, what you consider isthe dilation operation and this dilation operation is denoted by a
symbol you have across within a circle. So, this is the symbolwhich is used to represent dilation
operation and this dilation operation is actually implemented is defined in terms of setadditionor
vector addition.So,if I have an image or a point set X and I have a structuring elementsay capital
B, then as we said that all the morphological operations actually define a relation of the given set
X with the structuring element B; so this dilation operation has to be implemented or has to be
defined with respect to the structuring element capital B over the image capital X which is
nothing but a point set.
So, we represent X dilation B like this and this X dilation B is defined as it is the set of points p
obviously in the two dimensional space.Sothe two dimensional space we have written as z square
such that p is equal to x plus b where x belongs to capital X and bbelongs to capital B. So, this is
our first definition of the dilation operation.So, what it says that I have been given two sets, two
11
points sets; the first one is capital X which represents an image and the second one is capital B
which represents our structuring element. Now, each of these sets are nothing but as we have
said the set of points or set of vectors in two dimensionalspace.
So, our first definition of dilation operation says that I take every element from set X- capital X
and every other element and every element from the set capital B and add them vectorwise. So,
you take vector addition of every element from set capital X with every element from the set
capital B of the structuring element and set of this resultant vectors that you get, that gives you
the dilation X dilated with the structuring element B.
Theother alternate definition of the same operation is given like this.We can also define X
dilation with B, the structuring element B in such a way that again it is the set of points p such
that if I take the reflection of Band translate this B with vector p, then take the intersection of this
with set X,this should not be equal to null.
So, I have this structuring element and you have defined what we mean by the reflection of a set.
So, I take thereflection of the structuring element capital B and then translate it by a vector say p
then I take the intersection of this translated reflection of set B with the set of points capital Xand
if this intersection is not null; in that case, the translation vector p is a member ofXdilation with
capital B and we will see that though we have two different definition of dilation but both this
definitions are equivalent in the sense that theyproduceidentical results. Nowfirst, let us see that
what we mean by this dilation operation.
So, let us take an image say like this, say I have a binary image which are represented by say
these points. Say, these are the points which belong to the binary image.So, this is my set of
points capital X and I define a structuring element say capital B where this structuring
elementcontains only these two points and obviously,I have to have somecoordinatesystem for
12
both myimage as well as the structuring element. So,I assume say this is the origin of my point
set capital X and suppose this is the origin of the structuring element B.
So, considering these two origins, now you find that thispoint set X is nothing but a set of points
where this X can be represented by this point set consisting of points say(3,2) points (2, 2) points
(3, 1). So similarly,I can consider the other points in it.So, I get the image capital X as the set of
points or set of vectors. Similarly, this structuring element is also nothing but a point set where
you find that this point set isnothing but the point (0, 0) and it includes the point (1,0).
So, as we have seenthe definition of dilation operation that is I take every point fromset X and I
take every point set B and you do the vector addition; take a point from set X, take a point from
set B,do the vector addition of these two points and this vector addition the resultant vector is a
member of X dilation B. So, by doing this,when I add all these points in setsay X with the first
vector or the first point in ourstructuring element B which is nothing but (0, 0); so these points
will be retained because every vectorfrom setX when it is added with (0, 0), the resultant vector
remains the same.
So, all the points inset Xare also membersofset X dilation A. Similarly, if I take a point from this
set Xsay (3, 2) and add to this a vector from our structuring element B which is a (1, 0), this is
nothing but (4, 2).So, this point (4, 2) which is over here, this is also a member of adilationwith
B. Similarly, when I take this point(2, 2), add to this the vector (1, 0), what I get is (3, 2).
Now,(3, 2) is already member of X. So, this point remains a member of X dilation with B.
If I consider this point (3, 3), this point; add to this, this vector (1, 0), then my resultant vector
becomes (4, 3).So, this is my resultant vector which is also a member ofthis a dilation withB. So,
if I continue like this you find that all the points which were belonging to set capital X,
theyremain in capital X dilation with B.Inaddition to that I get other additional points becauseof
addition with this vector (1, 0); so, these additional points are nothing but these points. So, these
are the points which will be added to X dilationwith B.So,our X dilation B will be set of all these
points.So, this gives us what is Adilation with B.
Now, let us see whether we get identical result, the same result if I apply two different
definitions.
13
So, for that I consider, a similarkind of situation and for clear explanation,let us take a very
simple situation.Againas before,I consider say this is the origin of the point set X; so,I am having
two different instances. So, this is my point set X, this is also point set X and I consider say this
is the originof ourpoint set B. So, this is the origin of ourpoint set B and I take a single point in
our image.
Say, this X contains the single point and what I do is I take the structuring element consisting of
only two points like this.So, you will findthat this X contains only the point (3, 2) and the
structuringelement contains two points; one is (minus 1, 0) and other one is (2, 0). So, these are
the two points which belong to the structuring elements.So, you find that these two points are
because I have taken the origin at this location and over here what I consider is the reflection of
B.
So, in this case, the reflection ofB will be this particular set.So, the reflection of B will be
nothingbut (minus 2, 0) and (1, 0) and this set X is again because it is (3, 2); so this is the only
point which belongs to set X. So, in the first case if I apply our first definition that is dilation as a
resultof vector addition.
14
So, what I have to do is I have to take the point form the point set X and add to it every vector
from points setcapital B. So, by doing that what I will get in this particular case ismy dilated
image will consist of only these green boxes. So, these are the only two points because if I add
(minus 1, 0) to this particular point,I come to this green box and if I add (2, 0) to this point,I
come to this green box.So, these are the two points which will be members of X dilation B
following our first definition.
Now, what will happen if I apply the second definition? The second definition is in terms
oftranslationof the reflection of set capital B and we have said that for all those translations
where the reflection ofcapital B or thetranslated reflection of capital Bintersection with X does
not become null, those translations are members of X dilation B.
15
So, if I do that in that case, you find given this uh reflection ofcapital B if I translate this to this
particular location; in that case, this point of B translation will beintersecting with thispoint in
our set capital X. Similarly, if I translate this B reflection to this particular location, then this
particular point in B translation will be intersecting with this point in our set X.Sohere again, you
find that these are the two points or two translation vectors which will be members of X dilation
with B.
Sonow, if you compare this result with this result; you find that we have got identical results in
both of these two cases.So, whichever definition wewill follow, both the definitionsare
equivalent definitions in the sense that they give as identical results when we talk about dilation
ofa given set X with respect to a given structuring element say capitalB.
16
Now, what will be the application of this dilation operation?Let us take again a particular image
like this, say I have an image which consists of say this points. Now, in between,I have saynoisy
points say something like this. So, theseare my object pointsand I consider that these black
pixels, these three black pixels that I havewithin this object region, these black pixels are because
ofthe noise and suppose I consider a structuring elements; so this is my set X and I consider a
structuring elements say capital B which is something like this.So, all these elements are
members ofstructuring element capital Band when I have this structuring element, the origin of
the structuring element,I can consider to be this particular point.So, this is the origin ofthe
Now, findthat ifI dilateif I go for dilation of this given image with the given structuring element,
then the dilationresult will be I mean whichever definition you follow that whether it is vector
addition definition or thetranslated version or the translation of the reflection of b tovarious
locations in our given image X, as we seenthat we get identical, relations identical outputs; so if I
go fordilation of this ah image X with our structuring element capital B, then you will findthat all
these points will be in our dilated output.So, all these points will be in the dilated output and
these noisy points that we hadwithin the object region, they will also be filled up.
So,naturalapplication of this dilation operation as we have seen in one of the applications for
segmentation by thresholding operation that we had a number of black spotswithin the object
region which has supposed to be awhite. So, for filling those black spots, we can make use ofthis
dilation operation.Aswe have seen herethat all these black pixels that these threeblack pixels in
the object region can be filled up by dilation with respect to a structuring element like this. So,
this output is X dilation with B.
17
Nownaturally, in this particular case, we have a side effect.The side effect is it is good that all
these internal noisy points, they have been filled up by white pixels or object pixels but at the
same time, you find that our original boundary the original object region boundarywas this,this
was the object boundary. Now, because of this dilation operation, the boundary gets
expanded.So, in addition to this, all these points are also included in the object region.Soin
effect, what we have is the object boundary is expanded.So, this is naturally aside effect of this
dilation operationand we have other operations by which this expansion, the boundary expansion
can be compensated.
So, let us now talk about some other operation which is another morphological operation that is
image erosion operation.
So, as we have seen that in case of dilation, what we have is we havea given image X and we
have a given structuring elements say capital Band what we are doing is we are dilating this
given image X with respect to the given structuring element B and this dilation operationis
definedin terms ofvector addition.In the same manner, we can have an erosion operation where
erosion can be defined in terms of vector subtraction.
So, as we have seen in case of dilation, we can have two alternate definitions though the
definitions are equivalent.Similarly, incase of erosion, we can also have two different
definitionsand the definitions are equivalent.So, the definition of the erosion operation can be
18
So, again here,we will have a set Xand a structuring element B because for all the morphological
operations, it has to be defined with respect to a structuring element.So, we have a point set X -
capital X representing our image and we have a point set capital B representing the structuring
element and the erosion of the point set X with the structuring element capital B, this can be
defined as as per our first definition, it is the setof points set P belonging to our two dimensional
phase Z square wherep plus b belongs to X for every b belonging to the structuring element
capital B.
So, this is ourfirst definition of the erosion operation.The second definition of erosion operation
can be that the same operation, X minus B,X erosion with B this can bedefined as the set p such
that I take the structuring element B,translate this structuring element by vector p,this must be
our sub set of our set X.
So, you find that this second definition it says that you take the structuring element, then
translate the structuring element by a vector say p and if this translated structuring element is a
sub setof this set X,then the corresponding vector p or the corresponding pointp will be a
member of X eroded with capital B or in other words,for all those translations of the structuring
element capital B, if the structuring element is contained fully within the set of pointsX,then the
corresponding translation or the corresponding point will be a member ofX eroded with B.
So, we will stop our discussion here today.We will consider this erosion operation further and in
our subsequent lecture,in your next lecture, we will see some of the properties of the dilationand
erosion operation.
Thank you.
19
Prof. P.K. Biswas
Lecture - 34
Mathematical Morphology – II
Hello, welcome to the video lecture series on digital image processing.Since our last lecture, we
have started discussion onmathematical morphology and the application of mathematical
morphology in digital image processing.
So in the last class, what we have seen iswe have said what is morphology and we have said that
morphology is basically a topic which was used extensively in the field of biology to talk about
the structure or shape of animals and plants and in our image processing, we use mathematical
morphology as a tool for doing various structure or shape related operationsand we have seen
that to use this morphological operations or image morphology, we must represent an image in
the form of a point set and all the morphological operationsare nothing but different set
operations and the kind of set operations that we have to use depend upon what kind of
morphological transformation we want to do and a morphological transformation of a given
image x describes a relation of the image x with another small point set say b but this small point
set is what we call as structuring element. So, any mathematical morphological operations or
image morphological operation isalways defined with respect to a given structuring element.
Then,in our last class, we have discussed about what is dilation.Wehave seen how the dilation
can be implemented and we have seen how this dilation operation can be used to remove some
noisy points within the object region within an image. Then, we had defined another operation
which is called erosion and in today’s lecture, we are going to discuss elaborately what is
erosion.
So, in today’s lecture, we will have an elaborate discussion on the morphological operation
called erosion.Then, we will see what are the properties of dilation and erosion operations and
when we have discussed about dilation in our previous class,wehave said that as the dilation
operation removes noisy pixels within which are present within an object region; at the same
time, the simple dilation operation also tries to expand the object region.That is after performing
dilation operation; the object area gets expanded whereas we will see today that erosion
operation is just an inverse operation where because of erosion operation, the object area gets
reduced along its boundary.
So, in image processing operations, we use combination of this dilation and erosion operations.
Inone case, the dilation is done first followed by erosion and in the other case, the erosion is done
first followed by the dilation operation and accordinglywe have two different operations; one is
called a morphological opening operation and the other one is called a morphological closing
operations.
Wewill see with examples, what are the effects of this morphologicalopening and morphological
closing operations on a binary image. Now,I talk about binary image because as we said that
initially we will discuss about application of these morphological operationson binary images
andlater onwe will try to expand the definition or the use of morphological operations to grey
level images as well. Butagain in that case, our assumption will be that given agrey level image
can be represented in the form of point set.Then, we will come to another application of this
morphological operation which is called as hit or miss transformation and this hit or miss
transform is actually aimed to detect anobject of a specific shape and size in a given image.
So, if we know what is the shape of the object and what is the size of the object that we are
trying to locate within a given image, then we can make use of this morphologicalhit or miss
transformand we will see thatthis hit or miss transform is nothing but basic morphological
operations that is erosion and dilation operations in combination with other set operations which
we have reviewed in our previous lecture.
So, just to have a recapitulation, let us see what we have done in case of dilation operation. So,
for dilation operation, suppose we have an object of this form; so suppose, this shaded region
represents our object region and the black pixels in this figure represent the background region,
so you find that here in this case, these pixels within the object region which appears to be
black,these we assume that these pixels are because of the presence of noise.This should actually
belong to object and along with this, we take a structuring element a3 by 3 structuring element
like this where the center of the structuring element, we take the origin at the center of
thestructuring element.
So, this is our given image x and this isour structuring element B. So, if we dilate this given
image x with this structuring element B,then as we have said in the last class that after dilation,
these black pixels inside the object region will turn out to be object pixels and at the same time,
the pixels along the boundary of the object region, they will also turn out to be white indicating
that these are also converted to object pixels. So, these are the pixels which are actually
convertedto the object pixels.
So, as we have said that when we do this dilation operation, the dilation operation removes the
noisy pixels inside the foreground region or inside the object region but at the same time, this
dilation operation expands the area of the object region along the boundary.So, this is aside
effect of the dilation operation. Thenwe have defined another kind of operation which we have
said as erosion.
So, let us see how we have definedour erosion operation. Wehave defined erosion as, so now we
are going to discuss about erosion. So again, we assume that we have given an image x,
obviously x is in the form of point set and along with image x say we are also given with a
structuring element say capital B and in the last class, we have just mentioned that erosion
operation image, x eroded with structuring element is represented by this symbol, it is a circle
and within the circle you have a negative sign.
So, x eroded with B, this can be defined as the set of points of p belonging to the two
dimensionalspace say Z square such that p plus b belongs to the set x for every b belonging to
the structuring element capital B. So, this is the first definition of the erosion operation thatwe
have said. This is the set of points p in the two dimensional space Z square such that p plus b
where b is an element in the structuring element capital B. So, this pplus B must belong to our
image capital X.
Wehave also said that there is an alternative definition of the erosion operation.As we have seen
that for dilation, we have two definitions but both the definitions lead to identical results
meaning that the definitions are equivalent; same way, in case of erosion, we can have two
definition. So,the alternate definition for the erosion operation is given like this.

Say,X eroded with B, the structuring element B can be defined as the set of points or the set of
vectors X; as we said that in two dimensional space a point and a vector is equivalent where the
vector is drawn from the origin of the coordinate system, so it is the set of vectors or set of points
x such that the structuring element B translated by vector x is a subset of our image capital X.
So, this second definition is very very interesting that it simply says as we have told in our
previous lecture that the way we compute the morphological transformation is similar to the way
we compute convolution.
That is you try to translate you translate or shift the structuring element over the image in a
systematic fashionand this one says that for those shifts of the structuring element or of those
translations of the structuring element in the image such that the shifted or the translated
structuring element is fully contained within the point set capital X, then those translationsor
those points by which the structuring element has been translated are part of erosion.

So, that is what is meant by this particular definition that it is the set of all the translations such
that thestructuring element translated by that particular translation vector will be a subsetof our
given imageX. Now, let us see what does this actually mean.
Sonow, let us take another binary image, say I have a binary image like this. So, these are the
points in our binary image and I take a structuring elementB which is given by these points and I
assume that the local origin is at the center of the structuring element. So, this is the origin of the
structuring element marked in green colour and what I want to find out is this is our given
imageX and this is the structuring element B, I want to find out the erosion of X with the
structuring capital B.
Now, if I follow our first definition that saysthat this is the set of all the points p or all the vectors
p such that p plus b belong to x where b is an element in the structuring element capital B; so,
following thisdefinition, you find that ifI consider this particular point and I take a particular
vector say this vector which is say (minus 1, 0) from this structuring element and I translate or
and I add this particular value of B to this particular point, then after addition you find that this p
plus b gets translated to this particular point. So, this point is so this point should be a part of the
erosion.
But if I consider this particular point in our structuring element B, the point in this case is (1, 0)
and I consider this b, now you find that this p plus b, in this caseb is (1, 0); so p plus b, the point
get shifted to this particular location and this point is not a member of x. So, infer that because
our definition was that this condition should be true for all b, this is for all b belonging to the
structuring element capital B. So though this condition is true for b equal to (1, 0) but this
condition is not true for B equal to (1, 0).
So, from this we can infer that this point will not be a member of the erosion and if you continue
the same operation for all other points in the given image x, you will find that these are the points
which will not belong to the erosion.Similarly, these points will also not belong to the erosion.
So, the given x,the given image x when eroded with this particular point, this particular
structuring element capital B; the eroded output will be given only by these points and the other
points will be removed from the eroded output.So, this is whatwe get in erosion.
Now, let us see how this erosion can be applied in the image processing operation.Again, let us
take a particular image of this form say I have an image like this.Then say,I have few pixels over
here and I have some distributed pixels like this. So, this is our given image capital X and
Iassume that my structuring element is a 3 by 3 structuring element capital B and the centerof the
structuring element is taken as the origin.
Now, if I erode this imagewith this particular structuring element; in that case, you will find that
all these points following the same operation that we have just done now, you will find that this
point will be removed from the eroded image, this point will also be removed from the eroded
image, all these points will be removed from the eroded image, all these points will be removed
from the eroded image, these points will be removed from the eroded image and these points will
also be removed from the eroded image. So,your erodedimage output will simply be only these 3
pixels.
So, what we have done in this case is we have taken a binary image where there was anextrusion
in the binary image, just one pixel white and there were few pixels distributed over the image
which actually should have beenin the background.Butbecause of noise, they have been
converted to object pixels. So, if I erode such a kind of image with a 3 by 3 structuring element
and we have assumed that the origin of the structuring element is the center, element is the center
pixel; in that case, all those distributed points as well as the extrusion that gets eliminated but at
the same time, we have a side effect.That is all the boundary pixels of the object region, they also
get eliminatedand because of this the area of the object region gets reduced as if the object has
been shrinked.
So, we find thatthis is just an opposite operation, opposite to what we have got in case of
dilation. Incase of dilation, the object region gets expanded whereas in case of erosion, the object
region is reduced. So, these two are two opposite operations, opposite effects. Sonaturally, if I
want to apply these morphological operations in image processing,I would like to retain the size
of the object region. So, if I apply the dilation operation at any point, then that has to be
compensated, the expansion of the object region has to be compensated by subsequent
application of the erosion operation.
Similarly, if I apply erosion operation first, then since the object region gets reduced to
compensate for this, this operation has to be followed by the corresponding dilation operation
and this erosion and dilation has to be done with the same structuring element. So, we will see
the combination of these two operations when we talk about the opening morphological opening
and morphologicalclosing operations.
Now, before we come to the morphological opening and morphological closing operations, let us
see some of the properties of dilation and erosion operations.

Sofirst, let us see some properties of dilation. So, we will talk about say some properties of
dilation first and then we will see some properties of erosion. The first property of the dilation
operation is the dilation operation is commutative. So, what do you mean by saying that the
dilation operation is commutative? So, this means that given an image say capital X and a
structuring element say capital B; then if I dilate the given image capital X with the structuring
element capital B, then the result that I get will be same as if I dilate the structuring element
capital B with the given image capital X.Soessentially,X dilated with B is same as B dilated with
X.
Now, proof of this is quite trivial because as we have said that we can define this morphological
dilation operation by vector addition; so this X dilation with B is nothing but all the resultant
vectors when we compute x plus p where x belongs to our image capital X and b belongs to our
structuring element capital B.So, this what is X dilated with B.
Now, if I take the other one that is B dilated with X; this is nothing but b, the vector b plus vector
x again for b belonging to the structuring element capital B and x belonging to the given image
capital X. Now obviously, this x plus b; thus this resultant vector is same as b plus x resultant
vector. So, that clearly slows that the dilation operation is commutative that is X dilated with B is
same as B dilated with X.

Thesecond property of the dilation operation is the dilation operation is associative. What do you
mean by that? So, this associative property says that if we have two structuring elements say we
have a structuring element capital Band we have a structuring element capital D, then this image
capital X dilated with capital B dilated with capital D. So, here we have said that we have two
structuring elements; one structuring element is capital B, the other structuring element is capital
D.
So, the operation that we are doing in this case is first we are dilating capital B with the second
structuring element capital D, so this dilation output gives me a point set and if I take this
resultant point set as a structuring element and then wedilateour given image capital X with this
resultant structuring element, then what should we get? So, this X dilated with B dilated with D,
this is same as first you dilate X withthe structuring element capital B and then you dilate
thisresultant with the second structuring element capital D.
So, this is what is the associative property of dilation that is x dilated with B dilation capital D is
same as X dilation capital B dilated with capital D where capital B and capital D are two
structuring elements and X is our assumed image.
Now, the third property is I should not call it a property but this is an implementation, how we
can implement the dilation operation more easily. So, from the definition of the dilation
operation we have said that if I want to dilate a given image capital X with a structuring element
capital B, then what I do is I take an element a vector from capital X, take a vector from capital
B; just do vector addition of these two vectors and the resultant vector will belong to the dilation
output and this I have to do for every element, for every vector in the imagecapital X which is to
be added with every vector in a structuring element capital B.
Now, addition of a vector to a pointis nothing but translation of the point by that vector. So, if I
add a vector; a vector b from the structuring element capital B to all the points in x that is
equivalent to or that is same as translating x by that vector b in structuring element capital B. So,
this simple definition follows that I can simply interpret the dilation operation in another way.
So,I can interpret it like this that X dilation with B is nothing but given X is translated by a
vector b, so this b is a vector in our structuring element capital B and I have to consider all the
vectors present in the structuring element capital B; so for all the vectors,I will get for every
vector in capital B,I will get one translated point set X b. So, what I will do is I will take the
union of all these translated point sets.
So, if I take the union of such x b for all b belonging to our structuring element capital B, then
what I get is this X dilated with the structuring element capital B and in fact, following this
interpretation, this dilation operation can be implemented very easily. So, what I have to do is
given a point set capital x,I translate this point set by every vector in our structuring element
capital B and all these translated point sets,I take the union of all these translated pointsets and
this union output is nothing but x dilation with x dilated with the structuring element capital B.
So,this is the third property.I will say that this is an interpretation and following this
interpretation,our implementation of the dilation operation becomes very easy.

Then, the fourth property of the dilation operation is translation invariance.What it says is that if
the given image x is translated by a vector say h and this translated pointset is dilated with the
structuring element capital B, this will be same as x dilated with capital B and this dilation output
is translated with the same vector h.
So, whether I translate the given pointset x first and then dilate with the structuring element
capital B or I first dilate x with capital B and then I translate by the same vector h; the output
remains the same.So, this is what is meant by the translation invariance.
Thenext property of dilation, this is also very interesting which is called the dilation is and
increasing transformation.Itis an increasing transform.So, what do we mean by this? This
increasing transform, by increasing transform what we mean is say if we are given two point
sets; capital X and capital Y such that capital X the point setX is a subset of the point set capital
Y, so if it is so,then if I dilate both this point sets capital X and capital Yby the same structuring
element say capital B, then X dilated with capital B, X dilation B will be a subset of Y dilation
B.So, this is what is meant by an increasing transform.
So, if one set is a subset of the other, then if both of them are dilated by the same structuring
element, then the corresponding dilations, the subset relation; for the corresponding dilations, the
subset relation will also hold.So, these are some of properties of the dilation operation.
Now, let us see that what are the properties of the erosion operation and then we will see that
where the dilation operation and the erosionoperation,they differ. So now, we will see some
properties of erosion. So here again, the first one we will see that we will say that it is an
interpretation as we have seen in case of dilation that given apoint set x and erosion of point set x
with structuring element capital B, this can be interpreted as x translated by minus b and you take
the intersection of all these translated point sets for every b belonging to the structuring element
capital B.
So, it means that I take every point or every vector belonging to the structuring element capital
B; then negate it, translate x, the given image x by that negated vector.So, I get a translated
pointset x minus b and for all these translated point sets if I take the intersection of all these
translated point sets, then the point set that I get that is nothing but my erosion output or x
erosion B.
Now, following this interpretation, you find that if the origin that is (0, 0) is a member of the
structuring element B; in such case, it is always true that x dilation B will always bea subset of
X.Hereagain, the proof of this is very very trivial.Youcan take a very simple example and tryto
do this and you will always find that if the origin, local origin of the structuring element is the
member of the structuring element; in that case, x dilated with B whatever the dilation output
that you get that will always bea subset of the original set, the original point set capital X.
Thenext property is translation.So, I will put this as one.So, as incase of dilation, we have
translation invariance property in this case which says that x translated by vector h dilated with
the point set B will be x dilated with structuring element B and this is translated by the same
vector h whereas x dilated with B translated by the vector h will be x dilated with B which is
translated by the vector minus h. So, these two are the translation properties of the erosion
operation.
Asincase of dilation, the erosion is also an increasing transformation.So, the increasing

transformation property in case of erosion, increasing transform; incase of erosion, the increasing
transform property is something like this that again for two pointsets - capital X and capital Y; if
X is a subset of Y, then if I dilate this capital X and capital Y by the same structuring element
capital B, then the relation will be that x dilated with capital B will also be a subset of y dilated
with capital B.
So, you find that this is similarto what we have done; the property that we have seen in case of
dilation. So, if x is a subset of y, then x dilated with B is a subset of y dilation B. Sosimilarly, in
this case, if x is a subset of y; then x erosion B will also b a subset of y erosion B. The next
property is like this that if we have two structuring elements B and D, say these are the two
structuring elements and structuring elements are such that D is a subset of B; so if it is such and
then a given pointset X is dilated with B and D separately, then X sorry eroded with B and D
separately, then X eroded with B will be a subset of X eroded with D.
So, you find that in this particular case, we have these two structuring elements B and D such
that the structuring element D is a subset of structuring element B. So, in such case, if X is
eroded with structuring element B which is a superset, this will be a subset of X erosion D where
D is a subset. So, following the properties of the erosion and dilation, you can find that the
erosion and dilation,they are different in certain cases.
Incase of dilation, we have seen that x dilation B is same as B dilation X whereas in case of
erosion, X erosion B is not same as B erosion X. So here, the properties of erosion and the
property of dilation, they differand the other one was the associated property that is in case of
dilation,we have said that X dilation B dilation D, this is same as X dilation B dilation D in case
of dilation.However, this is not so incase of erosion that is X erosion B erosion D is not same as
X erosion B erosion D. So, these are the cases where the properties of erosion and the properties
of dilation, they are different.
Now, given this properties, we have just said earlier that if I apply the dilation operation, then the
area of the object region increases; if I apply the erosionoperation, then the area of the object
region decreases.However,both the dilation and erosion operations are useful to remove the noise
present in the image. The dilation operation removes the noisy pixels belonging to the object
region; at the same time,it reduces the object sizeit increases of the object size.
Onthe contrary, the erosion operation removes the noise present in the background and while
doing so, it reduces the size of the object region.So, if I apply one that has to be compensated by
the other.Soaccordingly,as we have said that we have two different operation; one is called
closing operation and other one is called opening operation.

Sonow, let us see what are those opening operation and closing operation.So, I take this
particular binary image.Sohere,you find that apparently, this image contains two different object
regions and there are two noisy regions in this image. Oneis this one, it appears that this inter
region should have been object but may be due to noise, these two object pixels have turned out
to be background pixelsand on this side, this is another object region another object where
internallywe donot have any noise but these two object regions are connected by a thin line
which is just one pixel white. Sogenerally, we can assume that this connection is nothing but
because of the presence of noise.
Sonow, let us see that how we can apply the morphological operations to remove these noises
present in the image. So here again,I assume that my structuring element is a 3 by 3 structuring
element.S0, this is my structuring element – B, capital B and the center of the structuring
element, the center pixel is taken as the local origin. So, the operation that I want to perform first
is dilating this given image with the given structuring element, the 3 by 3 structuring element -
capital B.
So, if I dilate this image with this structuring element capital B, then what is the effect that we
are going to have?

Theeffect will be something like this that these internal noisy pointsas we have assumed, these
will be filled up but simultaneously, the boundary of the object will also be expanded by one
pixel all around the boundary.So, this is what the dilation output that I am going to have. So, this
is the dilation output,all thesepixels will turn out to be object pixels.
So, I have achieved anintended operation.That is I wanted to fill this gap,I wanted to convert
these two background pixels into object pixels which has been done but at the same time, there is
a side effect that the objects have become expanded by one pixel all along the boundary.

So, to negate this expansion, what I do is this dilated image is now eroded by applying the same
structuring elements.So, if I erode this by applying the same structuring element, then after
erosion, as we have said that the effect of erosion is to shrink the boundary, to reduce the object
region all along the boundary; so because of this shrinking, all these additional pixels which have
been introduced by the dilation operation will be removed.
So, all thesepixels will be removed but the effect that I have is that internal noisypixelswhere an
object pixel was turned to be a background pixel that has been reconverted to objected pixel. So,
all the internal noise, the internal noise within an object region has been removed by the dilation
operation and the expansion of the area has been converted by the following erosion operation.
Sohere, the operation that I havedone is first x is x is first dilated with B and this dilated output is
eroded with the same structuring element B. Sofirst, we are doing the dilation operation followed
by the erosion operation and this is an operation which is called morphological closing
operationand the closing operation is represented by this symbol.Sohere,what we have done is a
morphological closing operation. So, the effect is quite clear.After doing the morphological
closing operation, we have removed the internal noise.

Butthe other noise that is this one where two different object regions have been connected by a
thin line which is one pixelwhite that has not been removed. So, in order to remove this,I do the
inverse operation.That is first I apply the erosion and by applying the erosion,I will have
reduction in the object region.Tocompensatefor this reduction,I will have the subsequent dilation
operation.And, this erosion and dilation will also be done by the same with respect to the same
So, if I do erosion what I will get is something like this. Bydoing the erosion operation, these are
the pixels which are going to be removed, all theboundary pixels which are going to be removed;
thesepixels will be removed from this image. This one pixel white thin line will also be removed
and at the same time, these boundary pixels are also going to be removed.
So, what I get at the end of this erosion operation is two object regions; one is this, a shrinked
version of the object and here another shrinked version of the object.So, this shrinking has been
done because first we have applied erosion. Now, to compensate for this shrinking, as we said
that now will perform the inverse, the dilation operation.But in this case, the dilation operation
will be done by the same structuring element capital B.
So, if I do this dilation, then the output that I am going to get will beall these regions will be
restored, these regions will be restored,this internal noise which was filled up earlier that will
remain as it is.Butwhat I remove is this thin pixel,this thinline of one pixel white which was
connecting these two object regions, this I had been able to remove.
So, you find that this image has now been separated into two object regions and the noises
present in the image which were present in the image that has been removed. So, in this case,
what we have done is first we have done the erosion operation.
So, x is eroded with B and subsequently, it has been dilated by the same structuring element
capital B and this is an operation which is known as the opening operation and the opening
operation is represented by asymbollike this a small circle B. So, you find that we have applied
two different operations; one is the opening operation the other one is the closing operation. By
using the closing operation, we remove all the internal noise present in the object region and by
using the opening operation; we remove the external noise which are present in the background
region.
Soinitially, when we said that this morphological operations are also very very useful to remove
the noise or for filtering operation, so by filtering what we mean is we have to use the opening
and closing operation one after the other and by using this opening and closing operations, we
can remove the noise present in the image.
Now, let us talk about another operation, another transformation which we call as hit or miss
transform.So, what is this hit or miss transform? Hitor miss transform is normally a
transformation which is used to detect or to locatean object of a given shape and size in animage.
So, let us assume that we have an object say something like this - a 3 by 3 square object.So, this
is our object and the image that we have is something like this. So here,I have an object region
which is 4pixelby 3pixel and here,I have say another object region which is 3pixel by 3 pixel.
Now, to detect this object within this given image X, the kind of operation that have to do is let
us assume that this is our image A and this is the object x that we are looking for.
Now, what we do is we embed this X into a higher into a larger size window and suppose that
window is W. Sohere, we have two different sets; one is x and other one is the boundary of this
X which is represented by W minus X and now, the kind of operation that we have to do is first
let us do one thing that we will perform A erosion with X. So, here you find that if you erode A
with X, then the output will be something like this. There will be two pixels in the eroded output
and here we will have 1, 2, 3 -these pixelsin the eroded output.
So, this is our first operation when I do A erosion with X and the next operation that I will do is I
will take complement of A which will be eroded with W minus X. So, if I erode A complement
with W minus X, then I will get only one pixel which is this. So now, if I take the intersection of
A eroded with X, this is one set and I take the intersection of this with A complement eroded
with W minus X, then I get a single point which is only this point and you find that this point
identifies that this particular object X is located in this particular location.
So, here you find that though this object region is a superset of this object X but still this has not
been detected.What we have detected is only a single location where this particular object is
present. So,this operation that is A erosion with X intersection with A complement erosion with
X, this is what is known as hit or miss transform and thishit or miss transform is used to locate an
object of a given shape and size within a given image.
Now, this operation can be generalized, this particular transform can be generalized and the
generalized definition is something like this.Suppose, we have a structuring element B and if I
represent this structuring element B if I can represent this as B 1 and B 2 if I can break the
structuring element B into two structuring elements B 1 and B 2 , then for a given image A, the hit
or miss transform of A with B is given by A erosion with B intersection with A complement
erosion with B 2 . So, this is a general definition of hit or miss transform.
So, with this, we stop our lecture today.Wewill continue with our morphological operations in
subsequent classes.
Now, let us see some of the quiz questions of today’s lecture.
Thank You.
Prof. P. K. Biswas
Lecture - 35
Mathematical Morphology- III
Hello, welcome to the video lectures series on digital image processing.Forlast few lectures, we
are discussing about the mathematical morphologyand its applicationto image processing
problems.
So, in the last class,what we have seen is we have seenwhat is meant by morphologicaldilation
operation, we have also seen what is meant by morphological erosion operation and with
fewexamples, we have illustrated that the dilation operation if I have a binaryimage and within
the binaryimage in the object region if I have some background pixels,some pixelswhich are
treated as background and this may happen because of thepresence of noise; in that case,the
morphologicaldilationoperationtries to remove those noisy pixelswhich should have been the
object pixelsbut because of some noisewhich have convertedwhich have been converted to be
thebackground pixels.
And, we have seen thesideeffectof this dilation operationthat as thedilation operation tries to
removethe noisypixels within the object area;at the same time, the dilation operation triesto
expandthe area of the object regionand the reverse operationthat is the morphological erosion
1
operationthat we have seen that it tries toremove the spurious noise present in the background
region and at the same time,as the side effect the erosion operationcontracts theobject region.
That means thearea of the object region gets reduced as we apply erosion operation on a binary
image. Then, we have also seen some properties ofthe dilation and erosion operations.Then we
have seen the combination of erosion and dilation operations whichare termed as opening and
closing operations.
So, incase of opening,what we do is given a binary image and we have said thatfor all the
morphological operations,our basic assumption is that animageshould be representedby a point
set.So, when I say that given abinary image or givena point set; the opening operation,what it
does is first it performs an erosion operation which is followed bya dilation operation and in the
other case, inclosing operation, first what youperformis a dilation operationwhich is followed
bythe corresponding closing operation and for all these operationswhen I go forthe opening or
closing operations,for all these operations we have to usethe same structuring element.
Now, by application of this opening and closing operation, in our last lecture, we had taken an
example where we had shownthat suppose youhave got two differentobjects;in one ofthe
object,there was apatch which appeared to be a backgroundand the two objects were joined by a
thin straight line. So, we assumedthat these are because of noises;when two objectsare joined by
thin straight line that is because of noise.
Similarly, within the object regionif I have some pixelswhich are convertedwhich appeared to be
the background pixels, we also assume that this is also because ofthe presence of noise. So, by
using the opening and closing operationson such animage, we have demonstratedthat such a kind
of noise can be removedand at the end,what we had obtained istwo different regions
belongingtotwo objects.
Andthen finally, in our last class, we have talkedaboutanother kind oftransformation,the

morphologicaltransformation which is termed as hit or miss transform. And, we have illustrated
that purpose of this hit or miss transform is to locate an object ofa specific shape and sizewithin a
given image. So, in that case, what we have done is we assumed that the object of the specific
shape and specific size istreated as a structuring elementand we try to find out the presence of
such an object within the given image and for that we haveusedthe combinationoferosion and
dilation operations and which is termedas hit or miss transform.
2
Intoday’s lecture,we will talk about some applications ofsome further applications
ofmorphological techniques. The first operation that we will discuss is a very simple
operationwhich is boundary extraction. Sohere, what we will tryto do is that given a binary
image containing some objects, we are interested in finding out the boundary of the object
region. Now, in the earlier case, when we discussed about the different edge deduction operation
or different line deduction operation;in that case,we have found different operators like Sobel
operator,Prewitt operator,Laplacian or Gaussian operator which can be used to detect the object
boundaries.
Intoday’s lecture, we will try to address the same problem from the mathematical morphology
point of few. So, we will try to devise some algorithm,somemorphological algorithm by which
the boundaryof an object region can be detected. The second problem that we will talk about is a
region filling operation. So, ifI have just the boundaryof an object,we will try to see whether it is
possible by using themorphological operations to fill the entire region, entire object region which
is enclosed by boundary pixels.
Then, we will also try to find outsome algorithm for extraction of connected components. So
here again, in one of the earlier lectures, we had talked about some algorithm forconnected
component level. So, there the problem is that if you have a set of points which are similarin
nature and they are connected, then what we try to do is level all the pixels, all such pixels with
the same level value or you give an unique identification number to the region to the entire
region which is formed by all those connected pixels having similar values.
So, in this particular case, in this lecture, we will try to find out an algorithm for extraction of
theconnected componentsin a binary image. Then, we willtalk about another operation, another
algorithm which is for convex hull extraction. Now, convex hull is a property,we will define later
on that what is meant by a convex,a set to be convexed and what you mean by convex hull.
3
And, this convex hull is very important, givesyou important information for high level image
understanding operationsor object recognition operationsand we will see that how we can find
out the convex hull.
We will see what is convex, what is meant by aset to be convexedand we will try to find out we
devisean algorithm to find out the convex hull ofa given image or a point set.Then we will also
discuss two more algorithms; one is thinning and the other is the thickening operations. Sohere
again, given a point set,an image;we will try to find out algorithms, howto thin that image
because in one of our earlier lectures we have said that the structural information of an object
iscontained within the skeletonof that object shape and for that earlier we have said something
about medialaccess transformation.
So, this medial accessof an object region is nothing but a thinned version of the object shape. So
here,we willtry to find out that how we can thin an object shape or how we can thin a point set by
using the morphological operations and thickening is of course,the inverseor the reverse of the
thinning operation.So,we will also talk about how the thickening ofa point set can be done.
So, first of all, we will discuss about the boundaryextraction operation.
So, by boundaryextraction, what I mean is suppose we are given animage like this, say my object
region is the set of this shaded pixels.So, this forms my object pixels or the point set say A and
what we are interested in is to find outthe boundary of this objectregion. So, for a given set say
A, this is the point set,I canrepresent theboundaryof this point set as say betaA where beta
Arepresents the boundaryof the given point set Aand this beta A, it can be shown that it is
nothing butthe set A minusAeroded with sum structuring elementB.
So, what we mean by this let us assume that we have a structuring elementwhich is as given
here.So, thisis my structuring elementand I assume that the center of thestructuring elementor the
origin of thestructuring element is the center pixel.So, first operation that we will perform is this
4
that is we will erode the point set Awith this given structuring element3 by 3structuring element
and you know thatwhen we try to erode this given shape with this 3 by 3structuring element;then
as we have said earlier that this erosion operation contractsthearea of the object region and by
performing this erosion operation, we will findthat all these boundary pixels will be
deletedbecause ofcontractions.So, all of thesepixelsare going to be deleted, all of these pixels are
going to be deletedafter we erode this point set Awith this particular structuring element.
Sonow, if I take thedifference ifI subtract the output of this erosion operation from the original
point set A, then because all these boundary pixelshave been removed because of this erosion,
after doing this set difference operation, you find that all theseinternal pixelswhich were there as
part of erosion; all theseinternal pixelsare going to be removed.
Sofinally, what we are left with is as,is obvious on this figure isboundary of the object region.So,
here we findthat boundary ofa given object region can be veryeasily determinedby performingthe
erosion of the original set, originalpoint set Awitha 3 by 3structuring elementand finally whatyou
have to take is you have tosubtract the erosion, the erodedimage from the originalimage and then
what you are left with is the boundary ofthe given object region.
So, this is a very simple operation.Now, let us try to find out that just opposite to this thatif we
are giventhe boundary of anobjectregion; how we can fill up the hollow region within the
boundary again by application ofmorphological operations?So, let us take another example for
this operation.
Sohere again, let us assume that we have animage say something like this.So, this is our given
point set.So, here we find that we have a setof pixels which forms a boundary and within this
boundary,we have a hollow region which is simply represented by white pixels and what we will
try to do is we try to devise analgorithmby which thisinternal hollowregion can be filled up.So,
for performing this operation,the kind of structuring element that we can use is something like
5
this.Again, we use a structuring element within a 3 by 3 windowbut for in thisstructuring
element, you find that the diagonal neighborsof the origin; here again we assume that origin is
the center pixel of this3 by 3windowand in this structuring element, we donot considerthe
diagonal neighbors of the origin.
Now, forregion filling operation,what we have to do is first let us consider say any pixel P within
this hollow region. So,I consider a point within thehollowregion and let me call this point as the
point P and our algorithm will be something like this;first we set this point P is equal to 1 and I
takeand I assume assign a point set say X 0 wereX 0 is initially the point P and a region filling
operation will be performed by iterative application of dilation operations.
So, the algorithmfor thisregion filling operation in the form ofan iterative algorithm can be
written like this; say at stage at the iteration stage k,I say thatX k will be given byX k minus 1,
dilate this with our structuring element B.Sonow, in this case, this is our structuring element B
and what we do withthisdilated point set is thatwe take the intersectionof this with the
complement of thepoint setA.
So, these are the points inour pointset A and obviously the complement of this will be in the
complement all this points will bemade equal to black and allother points will be made equal to
white.So, if I do this particular operation, let us see how this algorithm is going to work.So
initially,what we have done iswe have assumedthat our X 0 is equal to is just the point Pand next
what I do is I dilate thisX 0 following this iteration that X k is equal toX k minus 1 dilation with B
andtake the intersection of this with A complements.
So, if I dilate this X 0 with ourstructuring element Bas given over here;in that case, you findthat
these are thepoints which will be set equal to 1 and since thishas to be intersected,so this point
also will be set equal to 1 but since we will take the intersection with A complement, inA
complement this point will be equal tothis pointwill be black.So, if I take theintersection, this
particular point will be removed.So, what I have?X 1 , X 1 is all these points.
Now, what will be X 2 ? X 2 will be dilation of all these points.So, if I just give the level,say these
are the pointswhich will be made equal to 1 afterperforming dilation in the second iteration.So,
these are the points which will be made equal to 1 afterperforming the second iteration and
taking the intersection with A complement.These are the pointswhich will be made equal to
1after the third iteration and performing the intersection operation.These are the points which
will be made equal to 1after the fourth iteration and performing this intersection operation.These
are the pointswhich will be made equal to 1 after fifthiteration and performing this intersection
operation and these are the pointswhich will be made equal to 1 after sixthiteration and
performing this intersectionoperation.
Now, we find that once I get this, if I perform for the dilation with the samestructuring element
and followed by intersection with a complement,this set is not going tochange any further.So, I
achieve a convergence when I find that X k becomes equaltoX k minus 1. So, whenI get the point
set identical point set in two subsequentiterations,that is my point ofconvergence and at that
point of convergence, whatever X k I get,this X k is nothing but all the points, this X k containsall the
6
points which are filled upbecause of this iterative operationsand finally,the final set will be when
I achieve this convergence,the final set will be given by X k unionwith our original point set A.
So, at the end, you find that all these points within this boundary,they have been made equal to
1and you willfindthat such a region filling operationis very very useful when wetalk about it will
be seen that when we talk about the object description or object representation,we realized that
this kind ofregion filling operation is very very important because unless we do such a kind
ofregion filling operation,our object description will not be compact.So, you find thatwe have
devised a verysimple algorithmfor performing the region filling operation.
The next kind of algorithm as we have said the morphological algorithm that we will discuss is
the connectedcomponent extraction algorithm. Obviously,we have talked aboutthe connected
component leveling problem earlier but here the sameproblem we will tacklewith
themorphological operations.
So, let us see how this connected component extraction can be done.So, here again,we are given
a point setsay A.So, wearetalking about the connected component extraction.Sohere, we are
given a point set A and supposeY is a connected component in set A.So, you are given a set of
points a point set A and we assume that this Y is a connected component in A; so what we will
try to do iswe will try to extract all the points whichbelong toY where Y is the connected
component.That means we are trying to extractall the connected points of a subset of A which
isconnected.Sohere, this subset is ourconnected componentA.
Sohere again, we will use the similar kind ofiterative algorithm and now ouralgorithm will
besomething like this,saythe iterative algorithmwill be X k is equal to again you take theresult
from the previous case, the pervious iteration X k minus 1, dilate this with the structuring element
Band this dilatedresult has tobe intersected with the original point set Aand this operation has to
be computed for various iteration steps.So, it has to be donefor say K is equal to 1,2, 3and soon.
7
Finally,as before,we reach a convergence or the algorithm will terminate when we findthat
X k remains same as X k minus1. That is in two subsequentiterations,the result does not change
and at that point,the algorithmterminatesand in that particular situation when the algorithm
terminates or you reachthe convergence;in that case,we will findthat Y is nothing but X k where
we said that Y is the connected component in the point set A and by this algorithm, we have been
will be able to include all the points inset A in our point setX k which we achieve at the end ofthe
algorithm or when the algorithm converges.
Soagain, let us take an example for this. So, what we will do is as before, let us take a particular
set ofconnected pixels, so something like this.So, these are the set of connected pixels andI
assume a structuring element B,this is my structuring element which is a 3 by 3 structuring
element and as before,I assume that the origin of the structuring element is the center point.
Sohere, what we have said is our iteration algorithm is something like this; we have to have X K
is equal to X K minus 1, dilation of this with our structuring element B and this has to be
intersected with our original point set A. Nowinitially, what we do is i take a point P, say this is
my point P which belongs to Y and I initialize X 0 is equal to point P. Now, again from here, you
find that because X 0 is equal to is nothing but our point P; so if I dilatethis X 0 with our
structuring element B, then whatI am goingto get is X 1 . After doing this dilation, if I intersect
that with our original point set A, then what I am going to getis nothing butthe pointsetX 1 .
Sohere, this being our initial point P if I dilatethis with thisstructuring element; in that case,you
find thatall thesepointsare going to be set to 1. But after dilation operation, as I am taking the
intersection with our original point set A, we find that only points which will be in set X 1 at
theend are only these points.
8
So, these are the points which are going to be in our set X X 1 . If I dilate this further, then these
are the pointswhich are going to be inset X 2 , if I dilate further and do the intersection, then these
are the points which are going to be inset X 3 ,do it further, then these are the points which are
going to be in set X 4 , dilate it further, these are the points which are going to be in set X 5 , dilate
itfurther,these are the pointswhich are going to be in set X 6 andfurther dilation and intersection
operation is not going to change ourX 6 anymore.
So, you find that at that point,I get X 6 is equal to X 7 . So, this is ourpoint of convergence and from
this figure,from this output, what you observe say that when I reach this point of convergence; in
that case, as I said that our original connected set of points, the connected point set Y is nothing
but our point set X 7 which of course is same as point six point set X 6 .
So, you find that when this algorithm converges, then the points all the connected points in set A
which is in our case set Y, this is the connected set of points; so all these points will be
accumulated in the point set X 6 which we have got as a result of this iterative operation of X k is
equal to X k minus 1 dilation with B and take the intersection with our original set A.
So, what we have done in this case is in this particular case, we have started from our original
point set, we have started with the point belonging to the set y and then we have try to grow the
region starting from that particular point. So, the next algorithm that we are going to discuss
iswhat we will call as convex hull.
Whenwe talk about convex hull, the first thing that we have to see that what is meant by a
convex set or what do we understand by a set to be convex. A set say S, a point set S is said to be
convex if I take any pair ofpoints, take any pair of points in the set S and join a straight line
connecting these two points. So, if I find that all the points lying on this straight line also belong
to this particular set S, then we will say that set S is convex.
9
Whereas, if there is any point on the straight line connecting those two points belonging to set S,
so if there is any point on this straight line which does not belong to set S; in that case, we will
say that set S is not convex.
So, coming to an example, say I have a situation something like this; say I have two sets of
points say one set of point is something like this, this is one set of point and maybe I have
another set of points which is something like this. So, this is my set say S 1 , this is say S 2 . You
find that in this point set S 1 , say suppose my set is something like this; so you find that in this
point set S 1 if I take any pairof points, any two points and join a straight line, draw a straight line
through that pair of points, so something like this; you find that all the points lying on this
straight line, they are within set S 1 .
But here, in this particular case, if I take say these two points and join a straight line, then
youwillfind that I have apoint on this straight line which does not belong to set S 2 . So, in this
particular case, you willfind that this set S 1 , this is a convex set, you say that set S 1 is convex
whereas this set S 2 , this is not convex.
So, given a set S, the convex hull of this set S will be the minimal set containing set S which is
convex.So, that is how wedefine convex hull. Given any set S, the set S may be convex or it may
not beconvex.
So,a point set which contains S, some minimal point set that contains set S which is convex is
called the convex hull of set S and if I say that H is the convex hull of set S; then H minus S, this
is what is called as convex deficiency. So, the difference, the set difference of the convex hull
with the original set of points is called the convex deficiency and we will see later that this
convex deficiency can be usedas one of the descriptors of a given set which may be useful which
is useful for high level understating purpose.
10
Now, let us see an algorithm, how we can devise an algorithm that given a points set S; how we
can find out the convex hull of that given set S.
So, let us see that what will be the nature of this algorithm. So here,instead of using single
structuring element,we use a set of structuring elements. So,I assume that B iis a structuring
element wherei varies from say1 to 4.So,I use 4 structuring elements for performing in this
particular operation.
11
So, the 4 structuring elements which are used for determining the convex hull is as shown
here.So, this is the structuring element, let us call this the structuring elements B1, I call this say
structuring element B2, I call this the structuring elements B3 and I call this the structuring
element B4.So, I have 4 such structuring elements which will be used to find out the convex hull.
Thenthe algorithm for finding out the convex hull will be like this;I take a particular structuring
element Bi and using that particular structuring element Bi, I go for a similar kind of iterative
algorithm and this has to be this iterative algorithm has to be applied for each and every
individual structuring element in our set of 4 structuring elements. So,the iterative step or the
iterative algorithm will be something like this.
So,for a given structuring element, for a particular structuring elementsay Bi , wewill perform an
iterative algorithm like this say;X k i is equal to X kminus 1 i. Take the hit or miss transform of this
with the structuring elements Bi and then perform the union of this with our original point set A.
Sohere, A is the original point set and this iterative algorithm has to be done independently for
each of the structuring elements B1 B2 B3 and B4 and in this iterative algorithm, what is our
initial condition?Initial condition is X 0 i is equal to our original point set A.
Nowagain, as before, the algorithm this iterative algorithm with each of the structuring elements
will converse when we find that X K i is equal to X kminus 1 i. That is in two subsequent iterations,
the output does not change. So, in that case,our algorithm converges and if I say at that particular
stage say, what I get as output is nothing but Xi,Xi superscript because this is what we obtain
with the i’th structuring elements Bi, so I represent the set as xi conv.That means the output that I
get when our algorithm converges and I represent this as set say Di.
Thenthe convex hull of A if I represent this as C(A), say C(A) is the convex hull of A, convex
hull of A will be represented by union of Di for i is equal to 1 to 4. Soeffectively, what we are
doing?We are taking 4 different structuring elements, then for each of the structuring elements,
12
we employ an iterative algorithm.Inevery step of iterative algorithm, what we are doing is we are
performing the hit or miss transform of our given set of the output at is this k minus 1 with the
structuring element Bi and then taking the union of the output of this hit or miss transform with
our given set A and after union operation, whatever output the point set that you get that is
assigned to point set X k .
So, we are generating X k from point set X k minus 1 by hit or miss transform, by applying hit or
miss transform with one of the structuring elements and subsequently doing the union operation
with our original point set,the given point set A and this iteration will continue until and unless
the algorithm converges and the algorithm will converge or convergencecriteria is that in two
subsequent iterations,the result does not change.
So, since we are having four different structuring elements,I will get four different point sets at
the end of converges; when the algorithm converges,I will get four different point sets and the
union of all those four different point sets is the convex hull of the given point set A. Now, let us
see how this algorithm actually works.
So, as we have said earlier that these are the four different structuring elements which are used
for extraction of the convex hull. Now, to demonstrate the operation of this algorithm, what we
have is say we take an image like this.
So, this is our point set A and these are the structuring elements,I call it the structuring element
B1, this is the structuring element B2, this is the structuring element B3 and this is the structuring
element B 4.So, what we want to do is we have to perform the hit or miss transform of this
particular given point set A with these different four structuring element.
So, first of all, let us take the hit or miss transform of this given set A with the structuring
element B1 and let us see that what we will be the nature of the output in each of the iteration
13
stages. To demonstrate this output, let us take different colours.So, all thehit or miss transform of
this set A with this structuring element B1 will be represented in black colour.So, if I take the hit
or miss transform of this set A with the structuring element B1, then you find that in first
iteration, these are the points which are going to be filled as you have marked with 1.
So, these are the points which will be filled after applying after the first iteration when I do the
hit or miss transform with this particular structuring element B1.Then, at the end of second
iteration, these are the points which are going to be filled up.Atthe end of third iteration, you find
that these are the points which are going to be filled up and at the end of fourth iteration, you
find that this is the point which will be filled up. Similarly, when I perform the hit or miss
transform of this same set A with respect to or structuring element B2, then you find that at the
end of first iteration, this is the point which is going to be filled up.So, I the represent B2with this
pink colour, the output of hit or miss transform with this pink colour.
So, this is the point which is going to be filled up at the end of first iteration, this is also the point
which will be filled up at the end of first iteration and if I do subsequent iterations on this point
set, you find that I cannot fill up any other point. So, this is where I reach the convergence and
this is the set union with my A, original set A that gives me the output set at the end of
convergence with hit or miss transform with the structuring element B2.
Now, if I take structuring element B3. Now,I represent this as with red colour.So, with B3, you
find that at the end of first iteration; this is one of the points whichwill be filled up, this is
another point that will be filled up, this is another point that will be filled up. So, this is what I
get at the end of first iteration. Atthe end of second iteration, this is the point that will be filled,
this is the point that will be filled up,this is also another point that will be filled up.
Atthe end of third iteration, this is a point which will be filled up, this is a point that will be filled
up,this is also a point which will be filled up. And at the end of fourth iteration,you find that this
point will be filled, this point will be filled and at the end of fifth iteration, this is the point which
will be filled. So, we find that which structuring element B3 and in subsequence iteration,I cannot
fill up any other point by performing the hit or misstransform with my structuring element B3.
So,we find that when I reach convergence, by applying this structuring element B3, then all these
red set of points, all the set of points represented by this red along with the original set of points
that represents the set of points that I get at the end of convergence with our structuring element
B3. Similarly, when I apply the structuring element B4, now let me represent it in blue colour.So,
with B4 at the end of first iteration, you find that these point,I can fill up but after this, when I go
for second iteration,I cannot fill up any other point.
So, at the end of convergence, this is the only point that can be added tomy original set and then
what we have said is that when I reach convergence by performing that iterative algorithm with
independently with each of the structuring elements; all those converged sets,I have to take
union,I have to take union of all those converged sets and this output of the union, the point set
that I get afterperforming the union operation that is what gives me what is called the convex
hull of the given set A.
14
So, in this particular case, you find that all these points, all these points; it actually forms the
convex hull of the given set A because if I take the union, then I get all these different points.
These are all the points that i get.Now,you find that this particular algorithm has a drawback,
draw back because when we defined the convex hull, we have said that it is a minimal set
containing the set A but the set has to be minimal,I mean, the set has to be convex. So, it is the
minimal set containing the original set A which is convex is called the convexhull of the given
set A.
Now, if you look at this particular set, you find that it is not the minimal set.What I can do is I
can remove these points from this set. So, these are the points that I can remove from this set.So,
these are the point that I can remove from this set and the resultant set that I get is still aconvex
set. So, the minimal set is actually this set, not the set that we have obtained by applying those
iterative algorithms.
So, now the question is how we can get the real convex hull in the sense that the set is
minimal?So, that can be done by limiting the expansion of the region beyond the horizontal and
vertical limits of the original point set. So,you find that the horizontal and vertical limits of the
original point set is like this; in the original point set in the veridical direction,I donot have any
point beyond this. Similarly,in the horizontal direction,I donot have any point beyond this.
So, when I am performing this iterative steps if I put a limit that I will not allow to grow the
region beyond the horizontal and vertical dimensions of the original point set,then what I am
going to get is a convex hull in true sense that is it will be minimal and of course, not only
limiting the expansion beyond horizontal and vertical dimensions, if I expand if I limit the
expansion in the diagonal directions as well; in the diagonal dimensions of the original point set,
then what I will get is convex hull of the given set A in the true sense that is it will be minimal
and at the same time, convex.
So, as we have said that convex hull is one of the very very important set, important concept
which can be used for high level image understanding operation because we have said that the
convex deficiency which is the difference between the convex hull of a given set and the given
set. So, for a given set S if the convex hull is H, the set different H minus S which tells you what
is the convex deficiency.So, this convex deficiency is one of the very very important properties
which can be utilized for highlevel image understanding operations. So,we will discuss about
those high level image understanding operations in our subsequentlectures.
Sonow, today let us talk about another morphological operation, another morphological
algorithmand we call it as thinning.
15
So, as we have just said that this thinning is an operation which is useful to find out the skeleton
of a given object shape and we have said earlier that this skeleton maintains the structure of the
shapeor the structural property of the shape and we can get an object descriptors, object
description from the skeleton which can again be used for high level image interpretation or
image understanding operation.
Sonow, let us see that how we can obtain the skeleton of a given shape of a given object shape
by using the morphological operation. Sohere, the thinning operation is defined like this; see if I
thin a given point set A with a structuring elements say B; so this thinning operation is defined as
A minus A again hit or misstransform with the structuring element B. So, this is what is by
thinnedimage when it is thinned with the structuring element B and the same expression as you
know from a set theory that it can be represented as A intersection with A hit or misstransform
with B, take the complement of this.
Again as before, instead of using a single structuring element, we use a set of structuring
elements. So,we will assume that this structuring element B is a set of structuring elements
which a say B1 B 2 B 3upto say Bnwhere every structuring elements Bi is nothing but a rotated
version of structuring element Biminus 1. Then, given the set of structuring elements B1 to Bn, the
thinning ofa given set A with the structuring element B and in this case, it is a set of structuring
elements; so this is defined as first use thin set A with structuring element B1, this result you thin
with structuring elements B2, you continue like this and finally this output you thin with
structuring element Bn. So, this completes one iterative step of the thinning operation.
So, for this given set of structuring elements B1toBn, we are doing successive thinning operations
with different structuring elements present in our set and once you complete one particular
iteration, you have to do this entire operation in a number of iterationsuntil you reach
convergenceand in this particular case, the thinning with aparticular structuring element in our
set of structuring elements follows the same definitions that we have given here.
So, this entire operation that is thinning with all the structuring elements present in our set of
structuring elements is done iteratively over a number of processes until and unless we reach the
16
convergence. So, we will continue with our discussion on this thinning operation as well as we
will see some more morphological operations which are applicable in the image processing in
our subsequent lectures.Now, let us see some of thequiz questions on today’s lecture.
So, the first question is what structuring element is used for boundary extraction using
morphological operations? The second question, if you implement region filling with a
structuring element containing all the eight neighbours of the origin,then what problem will you
face? The third question, give an example of a structuring element which when used to erode an
image A the erosion is not a subset of A. The fourth question, if a structuring element containing
only the four neighbours of the origin including the origin is used for connected component
extraction, then what will be the property of such a connected component?You have to explain
with an example.
Thank you.
17
Prof. P. K. Biswas
Lecture - 36
Mathematical Morphology - IV
Hello, welcome to the video lectures serieson digital image processing. For our last few classes,
we are discussing on mathematical morphology and the application of mathematical morphology
in digital image processing.
So, in your last class, we have discussed about the application of mathematicalmorphology in
image processing techniques and we have talked about themorphological technique for boundary
extraction, we have talked about the morphological technique for regionfilling, we have also
discussed about the extraction of connected components using morphological operations we have
talked about what is a convex hull and how to detect or how to form the convex hull for a given
point set using the morphological operations.
Then we started our discussion on the thinning operation, thinning using the morphological
operations which we will continue today and we will also discuss about the thickening operations
along with some more applications.
So, in today’s lecture, we will complete our discussion on thinning using the morphological
operations.Wewill discuss aboutthickening and we will see that thickening is nothing but a dual
of thinning operation. Wewill also discuss about the morphological techniques to obtain the
skeleton of a given shapeand we have said earlier that the skeleton of a given shape is very very
useful for describing an object shape and this description can be used for high level image
understanding operations for image interpretation purpose.
Thenwe will extend; so for what till skeletonization,whatever we will discuss that is on the
binary images; the application of morphological techniques on binary images.Thenwe will see
that how to extend this morphological operations in gray level images,so we will call it as gray
level morphology and we will see few of the operations morphological operations like dilation
and erosion which is nothing but an extension of the binary morphology to gray level
morphology and particular applications liketop hat transform, that we are going to discuss in
today’s lecture.
So, to start with, as we were discussing about the thinning which we could not complete in our
last lecture; so let us just quickly review what we have done in our last class.So, for thinning,
suppose we are given a point set say A and this point set A has to be thinned by the structuring
element B. So, a thinning operation using the morphological transformations can be obtained like
this; A thinned with B, the structuring element B is defined as A minus A hit or miss transform
with B and we know that from set operations, this set difference operation can be implemented
using set intersection and set complementation.
So, this is equivalent to,the same definition is equivalent to Aintersection with A hit or miss
transform of B and complement of this.Andwe have said in our last class that for thinning
operation, instead of using a single structuring element, what we use is a set of structuring
elements and the thinning is performed with the help of that set of the structuring elements.
So, in our case, for the thinning operation,we will have a set of structuring elements the Band
this set will contain a number of structuring elements,letus call them as B1 B2 B3 and so
on.Suppose there are n numbers of structuring elements,I will have upto Bnwhere all these
structuring elements in this set will follow a particular property that if I consider a structuring
element say Bi in this set then this Bi is nothing but a rotated version of set Bi minus 1.
So, if you rotate the set, the structuring element B1, what I get is the structuring element
B2.Similarly,if I rotate the structuring element B2, what I get is the structuring element B3and so
on. So, every structuring element Bi in this particular set of structuring elements is a rotated
version of the previous structuring element that is Bi minus 1.
Now, using this set of structuring elements,now the thinning operation has to be performed by
applying each of the structuring elements in sequence.So, the thinning of the point set A now is
to be implemented in this form.So, thinning of the structuring element with thinning of the point
set A with the set of structuring elements B has to be performed in this way.
Sofirst, A has to be thinned with the structuring element B1, this has to be thinned with
structuring element B2 and we have to continue like this.Then finally, we have to thin with the
structuring element Bn.So, you look the operation that we have performing.Weare taking the
original set original point set A, take the structuring element B1 from the set of structuring
elements B,thin A with B1; whatever thinned output you get, you thin that output with the
structuring element B2, thin this output with the structuring element B3 and so on and you
continue until you thin all the intermediate results with the last structuring element that is Bnand
in this case, each of these thinning operations that is when we say A is thinned with structuring
element B1, this follows this particular definition that is A minus A hit or miss transform with
the structuring element B1.
Now, this entire operation forms onepass of an interactive algorithm.So, the way we have to
implement is that this entire operation that is the thinning starting with the structuring element B1
to the structuring element Bn, this entire operation has to be donerepetitively,say a number of
times until and unlesswe find that in 2 subsequent operations, the output does not change.So, that
is the stage when thealgorithm converges and the output at that particular instant of time at that
particular time when the algorithm converges, gives the thinned version of the point set A with
respect to the structuring element or the set of structuring elements B.
Now, let us take a particular example.Sayhere,as we said that the structuring element in this case
is a set of structuring elements; so in this particular example, we take say 8 structuring elements
starting from B 1 , so this is structuring element B 1 , structuring element B 2 , structuring element
B 3 , structuring element B 4 , structuring element B 5 , B 6 , B 7 and B 8 .So, we consider 8 different
structuring elements.
Now, thestructuring elements are something like this; so the first structuring element B 1 consists
of these points and here we represent by cross, so these are the points which are don’t care
points.So, when we try to find out the hit or miss transform of the point set A with structuring
element B 1 , what we will look for is that a translated version of this structuring element B 1 at
that translated location,I should get a match at all these different point locationswherever the
points are 1 and all these3 points, the corresponding locations in the point set should also be
equal to 0or the background pixels and we don’t care about the condition of these 2 locations
where we have put a cross.So, this is the first structuring element B1 in our setof structuring
elements.
The next structuring element B 2 as we said that if we rotate B 1 , what we get is B 2 . So, the
structuring element B2 will be like this; this is our structuring element B 2 and these are the don’t
care locations.Wedon’t care about the corresponding locations in our point set A.
Similarly, the structuring element B 3 will be like this; these 2 are the don’t care locations,
structuring element B 4 will be like this with these 2 locations as don’t care locations, structuring
element B 5 will be like this where these 2 locations will be don’t care locations,
structuringelement B 6 is this - these 2 being the don’t care locations, structuring element B 7 is
this where these 2 are the don’t care locations and finally the structuring element B 8 will take this
form with these 2 locations as the don’t care locations and for each of these structuring elements,
we consider the origin of the structuring element to be the center location.That is in each of
thesecases, the origin of the structuring element is the center location; these are the origins of
different structuring elements and so on.
Now, let us consider a typical image and if we try to thin that particular image with this set of
structuring elements; what kind of thinned output that we are going to have?So, I take an image
like this; say,consider this particular binary image which is to be thinned with this set of
structuring elements.

Obviously, in this case, this binary image is our given set A. Now, the way we have defined the
thinning operation is that first we try to thin this point set A with the structuring element B 1 ,
whatever output we get that we thin with the structuring element B 2 , that output has to be
thinned with structuring element B 1 ; so like this,we will continue upto the structuring element B 8
and this completesone pass.The inter operation has to be done second time, it has to be done
third time, it has to be donefourth time and so on until and unless we get a convergence that is
we get a situation that no other change,no change in the thinned output is possible.
So, let us see, first let us consider the structuring element B 1 and try to thin this point set A with
the structuring element B 1 . Now, if you look here, you will find that the points at which this
particular structuring element B 1 will give a positive result for miss and hit and miss transform,
at these particular locations;so this is 1 location,this is 1 location, this is 1 location, this is 1
location, this is 1 location, this is 1 location and this is another location.So, only at these
locations,this particular structuring element B 1 isgoing to give positive results.Thestructuring
element B 1 will not give positive results or the match anywhere else within this particular image.
So, what we will do is because our thinning operations is defined as A minus A hit or miss
transformwith B, so you will find that if I remove these points from my original point set A, then
after performing the thinning operation with the set B 1 , this is the intermediate result that I get.
Now, if I try to thin this with the structuring element B 2 , you will find that B 2 does not fit
anywhere within this particular set. Try to thin with structuring element B 3 and you will find that
B 3 with the structuring element B 3 , the points that we can remove are only these points.So, this
is 1 point, this is 1 point and this is the other point;so, these are the 3 points which can be where
this structuring element B 3 gives a match.So, what we do is we remove these points from the
original point set A.
So, at the end of the operation, this is what is going to be our thinned output, intermediate
thinned output.Then you try to do the hit or miss transformwith B 4 , you will find that are hit or
the miss transformwith B 4 in this particular image cannot remove any of the points present within
this particular image.Try with B 5 , again with B 5 , the points that can be removed are these
points.So, if I do this hit or miss transform with B 5 ; I can remove this particular point,I can
remove this particular point, similarly I can remove this particular point,I can also remove this
particular point.So, these are the points which can be removed after hit or miss transform after
performing the thinning operation with our structuring element B 5 .
So, if I continue like this, you will find that at the end,the points which will remain within this
image are these points.So, these are the points which will be remaining when the algorithm
converges.Now, on this, if I impose the restriction that the connected component that this thinned
output that I get that has to be m connected; in that case, some more points are to be removed
from this particular thinned output.

So, to make it m connected;I have to remove this point,I also have to remove this point,I also
have to remove this particular point,I also have to remove this point,I also have to remove this
point and this point also.
So, at the end, this particular output that I get, the point set that I get, this is the thinned version
of my input original point set a which is thinned with the structuring elements or set of
structuring elements B and you will find that because this is a skeleton kind of structure it can be
used for obtaining the descriptions of a shape which is useful for high level image understanding
operation.Now, as we said that the thinning is the thickening a dual operation of thinning; so just
as in case of thinning, the thickening can also be represented can also be defined in this form.
So, thickening which is represented like this with the structuring element B, this is defined as A
union with A hit or miss transform with B and in the same manner, here also, B is aset of
structuring elements and because B is a set of structuring elements, if I consider the similar type
of operation; then this thickening operation can be implemented in the same manner.
So first,what I have to do isthis B is a set of structuring elements; first I have to do the thickening
operation of A with the structuring element B 1 ,this has to be thickened with this has to be
thickened with structuring element B 2 and we can continue this way.Finally, thickening with
structuring element Bn and this completes our thickening operation.
So,because thickening is adualof thinning,one way of implementing thickening operation isthat

for the given set A, you first compute the A complement.Then what youcan do is you just do the
thinning operation of A complement with the set of structuring elements.After performing this
thinning operation, if you take the complement of the thinned a complement; what we get is the
thickening version of the given point set A.
So, given a point set A, the first operation that we have to perform is you take A
complement.Then what you do is I make a set say C which is nothing but A complement thinned
with the set of structuring elements B and finally if I take the complement of this set C, then
what I get is the thinned the thickening version of this set A with the set of structuring elements
B which is nothing but the complement of the point set C.So, you find that this thickening
operation can be implemented by thinning A complement.
Now,there may be one problem that while doing this operation, there may be some spurious
points which will arrive in the thickened version of the point set A which can be removed by
some post processing operation like opening closing and so on and finally, what we get is the
thickened version of the point set A.
So, as we have done thinning, we have said that this thinning operation gives us some sort of
skeleton of a 2 dimension shape.Now, there are morphological operations, the morphological
technique which can also be used to find out the skeleton of our given shape.
Now, let us see how we can implement the skeleton of a given point set A by using the
morphological operations.So, given a point set A,sonow what we will discussing about the
skeletonization;so given a point set A, we can find out the skeleton of A which let us represent as
S of A. Thiscan be obtained by this operation - S k (A),take the union for k equal to 0 to say
capital M where this S k (A) defined as A dilation with k times k times dilation with B, the
structuring element B,this minus again A dilation k times dilation with B and opening of this
with set B.
So, what this operation gives is this particular operation gives us a number of sub skeletons and
the union of sub skeletons give us the skeleton of the final ofthe given set A which we represent
by S of A. Now this capital M, in this particular case indicates that the last iterative step,as you
find that here what we are doing is when I do this A dilation with kB; so this indicates that A is
dilated with the structuring element B for successive k number of times and this capital M
indicates the last iterative step before A erodes to an empty set.So, we can defined M as M is
nothing but maximum of k such that A eroded with kB, this is not equal to a null set.
So, this M indicates the maximum number of iteration before our given set A erodes to a null set.
Now, as we have found that the skeleton of a given set A, given point set A with respect to
structuring element B, can be found out by repeated application by successive application of
erosion with respect to the given structuring element B and an opening operation with the same
structuring element B.
Similarly, given the skeleton, we can also re reconstruct our original point set A.So, this can be
done in this fashion.So, given a point set A or the skeleton A, when we have all the sub skeletons
S k (A); what we can do is we can erode this S k (A) with the structuring element successively k
number of times where S k (A) is a sub skeleton andtake the union of this for k equal to 0 to
capital M and this is what is our original point set A.
So, let us illustrate this skeletonization and finally getting back our original point set A from the
sub skeletons with the help of an example.So, the example is like this; for this skeletonization,
the structuring element B we will consider is a 3 by 3 structuring element like this.So, this is our
structuring element which we can use for skeletonization purpose and in this case, the origin of
the structuring element is the center element.
Andnow, let us take an image like this;so this is the given image, now what we do is on this side,
let us put the iteration number k,soon this side,we will put the iteration number k. So, this is k
equal to 0, k is equal to 1and k is equal to 2 and on this side let us put the successive
skeletonization operation.So,I put A erosion with A times B.This column, it represents A erosion
with A times B, then opening with B.This column, suppose it represents S k (A) that is the sub
skeleton.This column, it represents the skeleton S k (A) where we take the union of the sub
skeletons S k (A) where k varies from 0to capital N.
Thiscolumn let us assume that it represents S k (A) dilated with kB and we will see what does
this column represent is the union of this.So, if I do it like this, if I erode this given point set A
with respect to our 3 by 3 structuring element; then after eroding it for the first time, the kind of
output that I get is like this.So, this is the output that I will get after eroding.This given point set
A with our 3 by 3 structuring element once.And, if I erode it for the second time, then the output
that we will get is this.
Onthis column where if I perform the opening operation of this successively eroded images, then
what I get over here is like this; in this particular case, what I will get is this and here this will be
a null set.I will not get any point in this particular case that is after eroding for 2 subsequent
operations, eroding it twice and then doing the opening with this structuring element B.
Now, from here, what I get is I get the intermediate or sub skeletons as we have defined the sub
skeleton as a difference operations, the difference of A eroded by A eroded with k times B minus
A eroded K times B open with B. So, if I take the difference of the first column and the second
column, then what I get is our sub skeletons.So, in this case, the sub skeleton will be like this.So,
if you find that this is nothing but the difference of the element in the first column and the
element in the second column.
Similarly, in the second case, the sub skeleton will be like this and in the third case, the sub
skeleton will be like this.Now, if you take the union of all these sub skeletons, as you have
defined that the final skeleton is the union of all the sub skeletons; so the subfinal skeleton will
have this particular shape.So, this our sorry here we will get it like this.
So, you find that this is the element which is actually our skeleton SA and here you find that
when I get this skeleton, this skeleton is not really connected.So, what we get is a disconnected
point set and that is not unnatural because when we have performed the morphological
operations, nowhere we have guaranteed the connectivity.So, it is quite possible that given a set
if I try to find out the skeleton of that point set by using morphological operations, then it may
lead to a skeleton which is unconnected.
Now, let us say that given this sub skeletons whether it is possible to find out the original set A
by using the reverse process that is while doing the skeletons, what we have used is the
successive erosion.Now, in the reverse process, what we have used is the successive dilation.So,
after defining those dilation operations as we have defined earlier, now we can try to the original
point set A from these sub skeletons.
Sohere, you find that in the fifth column which is nothing but S k (A) that is sub skeleton A
eroded successively K times with the structuring element B.So, if I do this; in that case, this
particular column will give me this output and finally in the last column which is nothing but
union of all these operations, so if I take the union of all of them, what I get is this.So, this is
what I get.So, if I compare now, this output with our original point set A, you find that this is
nothing but the point set A which was given.
So, by this type of morphological operations, it is possible to obtain the skeleton of a given shape
and once I have this sub skeletons while doing this skeletonization operation, then from this soft
skeletons, it is also possible to obtain the original point set A by applying the inverse
operation.So, so far what we have discussed, all the morphological operations that we have
discussed, they are meant for the binary images.
Now, let us see whether we can extend this binary morphology,these morphological operations
to gray level images as well.Obviously, our assumption will be that as we have started that every
image, we should be able to represent as a point set if we want to apply the morphological
operations on the image.So, here also the gray level image we should be able to represent as a
point set or set of points.
So, let us assume that we are given a point set A in say n dimensional equilibrium space. So, this
A represents a point set A in n dimensional equilibrium space. Now,out of for this point set A;
the first n minus 1 components, the first n minus 1 coordinates, this represents a spatial domain
and the n’th coordinate, this represents the value of the function.So, this is our basic
interpretation. So, what we are doing is we are taking a set A, a point set A in n dimensional
equilibrium space.So, if I take a point belonging to that set A, then first n minus 1 coordinates of
that particular point, this represents its location in the spatial domain and the n’th coordinate that
is the last coordinate, this represents what is the value of the function at that particular location in
space.
Say for example, if I take a 3 dimensional point say (5,3,7)or in general if I assume a coordinate
systems say (x, y, z). Now, what does it mean?This means that I have a 2 dimensional phase
which is given by (x, y) and z is the value at that location.So, I can represent this as z is equal to
sum function f (x, y). So, find that this first 2 coordinates x and y, this represents a spatial
domain and the last coordinate that is the third coordinate, this represents what is the value at that
particular location.
Sosimilarly, for a gray level image; a gray level image, we will represent as a tripletlike this say
(5,3,7) where this (5,3) this represents a location in a 2 dimensional phase and the last
component 7 represents what is the value in that particular locations.So, when I have a gray level
image, this value represents what is the intensity value or the gray level value at location (5,3) in
the inverse.
Now, you find that if I have this sort of interpretation of a gray level image,I get some
advantage.Now, what is the advantage?The advantage is that now the gray level image, typically
represents a topological sketch and the intensity value I can assume that it represents the height
within that topological sketch.
So, given this, when I have this sort of interpretation; now to extend our gray levelextend our
morphological operations through gray level images, we try to define 2 different terms.One of
the terms we will define is the top surface and the second term that we will define is what is
called an umbra.Now, what is this top surface?Wehave said that we are assuming a point set A
which is in n dimensional space, n dimensional equilibrium space and we have said that the first
n minus first components that represents spatial domain and the last component that is the n’th
component represents a value of the function at a point in the in space.
Now, over here, if I take like this that suppose I have,if we considerin 2 dimension,suppose I
have a set of points something like this, say (1, 1)(1,2)(1,5),then say(2,3), then suppose (2,7) then
say (3,1)(3,3) and say(3,5).Suppose these are the set of points which are given.Now, if we
analyze this point set,I find that for these points, the first component is same which is equal to
1.For these points, the first components is again same which is equal to 2 and for these points,
the first components is again same which is equal to 3.
So, as we have said that the first component represents a point in space.So, coming to our n
dimensional space, the first n minus first components represent the space.So, what we do is for
each n minus 1, we try to find out what is the maximum value of the n’th component in set A.As
in this case, for all these components, the maximum value is 5.For this case, the maximum value
is 7, whereas for this case, the maximum value is again 5.So, the top surface consists of these
values 5,7 and 5.So, the top surface consists of the points (1,5)(2,7) and (3,5).So, these are the
points which make the top surface.

So, coming to our formal definition, what I have is for each of n minus for each n minus 1,I try to
find out what is the maximum value of the n’th component in set S in the given set A and that
maximum value forms the top surface.Soformally, we can define the top surface as like this that
given a set A the top surface T [A] at location x where x is our n minus 1 dimensional ((dual))
this will be given by maximum of y where (x, y) belongs to set A.
So, we find that this(x, y) is an n dimensional an n dual x is n minus 1 ((dual)) and we try to find
out the maximum y for each n minus 1 ((dual)) which is the top surface and for these cases, we
also define the region of support which we define as F.This region of support is defined like this;
it is x belonging to n minus1 dimensional space for some y belonging to 1 dimensional
equilibrium space such that (x, y)belongs to set A. So, we find that we have the region of
support, we have the top surface T [A] anddiagramicwith the help of a diagram, we can represent
like this.

Suppose, we have a set of points given like this;so suppose this is our x dimension, this is y
dimension, this is z dimension,so the projection of these points on the (x, y)dimension, these
gives us the region of support F and if I take the maximum of the values, the maximum of z
values at each point in F, this is what give us the top surface, so T of A.
Now, we define the umbra.So, given a region of support say F which is a subset of n minus 1
dimensional equilibrium space;soone more thing here we find that this top surface, it is
something like a mapping function.Wecan represent these as amapping function which maps this
region of support F to a 1 dimensional equilibrium space.
Now, given this top surface which as we have said that it is the mapping function, we can define
umbra of f asU [f] which is obviously a sub set of an n dimensional equilibrium space.So, F is an
n minus 1 dimensional equilibrium space if I take the Cartesian product with 1 dimensional
equilibrium space, what I get is n dimensional equilibrium space.
Sonow, this U [f],theumbra of the top surface, this is defined as the set (x, y) belonging to
obviously F Cartesian product with the 1 dimensional equilibrium space where this y is less than
or equal to f (x). So, what does it mean?This expression means that umbra consists of all the
points including the top surface, all the points below the top surface, including the top surface
itself. So, the top surface gives you the maximum value at a particular location in the region of
support in the special domain and umbra is everything below the top surface including the top
surface.
So, after giving these definitions, let us see that how the dilation and erosion operation can be
defined in case of gray level images.So, as we have said that a gray level image is a subset of a 3
dimensional space, so given a gray level image; so we define it like this, say suppose we have
given 2 region of support,one is F and other one is K. Theseregions of supports are obviously in
dimension n minus 1dimension equilibrium space.Incase of gray level image, this will be a 2
dimensional equilibrium space.Wehave 2 top surfaces F which maps the region of supports into
1 dimensional equilibrium space and k which maps the region of support K again to 1
dimensional equilibrium space; then we can define the dilation of f and k as the top surface of the
dilation of umbra of f with the umbra of k.
So, what we have to do is given a gray level image, we have to find out the umbra of it.
Similarly, given a structuring element, we have to find out the umbra of the structuring
element.Then we have to dilate the umbra of the image with the umbra of the structuring
element.Then we have to find out the top surface of this dilated output.
So, you take the umbra of the gray level image, you take the umbra of the structuring element;
dilate these 2 umbras, then the top surface of this dilations tells you what is the dilation of the
given gray level image with respect to a given structuring element and in the same manner, the
erosion in case of gray levelimage, this can also be defined in terms of the top surface of erosion
of umbra of f with the umbra of k.
So, you find that extension of the morphological operations from binary image to gray level
images is quite simple.What we do is we convert the images into their umbras.Similarly, you
convert the structuring element into its umbra, then take the erosion or dilation of the
corresponding umbras and the top surfaces of this erosion or dilation gives the corresponding
erosion or dilation of the gray level image with respect to the corresponding with respect to the
given structuring element.
So with this, we stop our discussion on erosion and dilation operations,discussion on

morphological operations.So, in this discussion, what we have done is first we have defined the
basic morphological operations like erosion and dilation and we have seen that these basic
morphological operations are implemented by different set operation.Ourbasic assumption is any
form of image on which this morphological operations are to be applied, they are to be
represented as point sets and then all subsequent applications of this morphological operations in
case of image whether it is for a binary image or for gray level image, we have seen that they are
nothing but applying the basic erosion and dilation operations along with few set operations in
different order.
Andthe net result of applying this morphological operations is that given any point set or given
an image, we can regularizes an object present in the image.Byregularization,what I mean is in
the object region if there are some noises, we can remove those noises from by morphological
operations; in the background region if there are some noises, we can remove those noises by
morphological operations, if 2 object regions are connected to each other because of some
noises,we can remove that.
So, these are various regularization operations that we can perform through morphological
operations.Thebasic aim is of course mathematical morphology is very wide but we are
concentratingon a very very narrow application of it inour image processing applications and the
basic purpose of doing this regularization is we can have a better description of object regions or
the different foregroundregions.Now, let us see some questions on today’s lectures.
So, the first question is does morphological skeletonization guarantees connected

skeleton?Secondquestion, what is top surface?Thethird question, what is umbra?The fourth
question, define gray scale dilation and erosion operations.Thefifth, if an image contains light
objects against varying background; what kind of operation do you suggest to detect the object
region?
Thankyou.
Prof.P.K.Biswas
Indian Institute of Technology, Kharagphur
Lecture - 37
Object Representation and Description-1
Hello, welcome to the video lecture serieson digital image processing.Now,till our last class, we
havediscussed or we have completed one aspect of image processing.That is we have done the
morphological image processing techniques on the segments where we have seen that if the
segments after segmentation operation, the segments that we obtain those are not regular;thenthe
segments can beregularized by using the morphological image processing operations.
So, till last class, what we have done is once an image is segmented, we have said that the
purpose of image segmentation is that an image is partitioned into a number of regions which we
call as segments. Someof the segments that we obtain correspond to theobjects present in the
scene and some of the objects or some of the segments correspond to the background region in
the image and as we have said that because of the presence of noise, the segments that we get,
those segments may not be regular segments.There may be some noises present in the object
regionor even some noises present in the background region.
1
Ifthere are more than oneobjects present in the image, then it may appear that because of the
presence of noise, those different object segments may be joined.One may be joined to the other
by some spurious deadlinesand these deadlines arealso due to the noise.So, what we haveseen for
during our last few lectures is that we can employ morphological techniquesto remove such kind
of noises and after doingthe morphological transformations, the segments that we get are quite
regular.So, for that we have talked about various morphological operations like dilation, erosion
and other morphological transforms which are generated which are developedusing the basic
morphological transformations like dilation and erosion.
Now, from today’slecture, we will gradually move to some other aspect of imageprocessing
whichwe will call as image understanding.So, what we want to do in image understanding topic
is that given the segments or the segmented output of an image or regularized segmented output
of an image whether it is possible to understand or make the computer understand what is the
object present in the image or what is the object whichcorresponds to a given segment present in
the image.
So, in order to do this, in order to make the computer understand theobjects present in the image,
we must have some proper description of the segments or some proper description of the objects
present in the image so thatthese descriptions may be matched againstsome knowledge present or
stored in the computer beforehand.So, once you have such a description which can be generated
from the segments present in the image, these descriptions may be matched against the
description which pre-stored in the computer memory in the form of model base or in the form of
knowledge base and once we findthat a description generated from a particular segment matches
against the description of a particular object; we can immediately infer, thecomputer can
immediately infer that this is the objects of object X or object Y which is present in the image or
a segment say1 is a segment 1 corresponds to say object X which is already there in the
knowledge base.
2
So, in order to do this, what we have said is the first operationthat we have to do is we have to
find out a proper representation mechanism.That is given a segment, how do I represent that
particular segment.Sohere,2 possible representations; the segments can be represented in 1 of the
2 possible ways.One of the possible ways is the boundary based segmentation and the other
possible way is region based segmentation.
So, in case of boundary based segmentation, what we do is you take the boundary or contour of
the segment and represent the boundary in some form so that it helps us to generate some
description of that boundaryand then finally, using this boundary based description, we can try to
match this description against similar such description present in the knowledge base.Whereas,in
case ofregion based description or region based representation, what we are interested is not only
in the boundary of the region or boundary of the segment but we are also interested in the surface
property.So, we can have a region based representation,from that we can have a region based
descriptor, description generated from the regions and these descriptors can again be matched
against the similar descriptors present in the knowledge base to identify a particular object.
So, it is quite clear from this that if we are going for boundary based representation and boundary
based description, then what we are interested in is mainly the shape of the object.Whereas if we
are also interested in the surface reflectance propertysuch as the color texture and so on; in that
case, simple boundary based representation and boundary based description is not
sufficient.What we have to go for is region based description andregion based representation.
So, we can have one ofthese2 types of representations as well as the descriptions;the boundary
based representation andboundary based description.Similarly, we can have region based
description ah representation and region based description.Andfinally,once you obtainthe
descriptorswhether the descriptors are boundary based descriptors or the descriptors are region
based descriptors; finally what we have to go for is some matching mechanism which will match
these descriptors against similar such descriptors present in the knowledge base to identify a
particular object.
So, in today’s lecture, we will mainly concentrate on the boundary based representation and what
arethe different types of the descriptors that we can obtain from boundary-based representation.
So, the first such representation scheme that we shall consider is called chain code.
3
So, as we said that this chain code is a boundary based representation scheme and from this
boundary based representation, we can find out what is the boundary descriptors that arepossible
to obtain.So, let us see what is this chain code.Thechain code is something like this,say given a
boundary in digital domain, as we are talking about the digital image processing techniques; this
boundary is nothing but a set of points set of boundary points,discrete boundary points on the
object contour.
Now, what we can do is once we move from one point on the object contourto the neighbouring
point in the same object contour;then I can haveone possibleone of 8 possible moveif I move,go
for 8 connectivity.So, it is simply like this;sosuppose,I havelet us consider a regular grid,so as we
have said earlierthat if I assume thatI have a center pixel somewhere here and then this center
pixel has 8neighbouring pixels,4 in the diagonal direction,2 in the vertical directions and 2 in the
horizontal directions.
So,since as we have said that the boundary is nothing but a set of points or set of pixels present
in the image and if I assume that this set of pixels are connected and assuming 8connectivity;
then starting from any point on the boundary if I want to move to the next point on the boundary
following some mechanism, either we can move in the clockwisedirection or we can move in the
anticlockwise direction.
Now, as we move say fromi'thpoint on theboundaryto say i plus first point on the boundary; then
I can have only 1 of 8 possible movesand this possible moves are I can move either in the right
direction or I can move in the diagonal direction -right top or I can move in the vertically upward
direction or I can move in the diagonal direction to the left top or I can move in the horizontal
direction to the left or I can move in the diagonal direction,in the directionof left bottom or I can
move in the vertically downward directionor I can move in the diagonal direction that is in the
right bottom direction.
4
Now,you find that as amove from one boundary point to the next boundary point, the
displacementorthe length that I move is 1because I am moving from 1 pixel to 1 of the
neighbouring pixels.So, the length of this move is equal to 1and at the same time, each of these
moves has a specific direction.So, if I represent these moves by numerical numbers; say for
example,I say that move in the right direction is represented by a number say0,move in the top
right direction is represented by a number 1, move in the vertically upward direction is
represented by number 2, move in the top left direction is represented by a number 3, move in
theleft direction is represented by number 4,move in the left bottom direction is represented by a
number 5, move in the bottom direction downward directionis represented by a number 6 and
move in the right bottom direction is represented by a number 7.
So, if I represent each of these moves by using this 7 this 8 distinct numerals; then it is possible
that as we move along the boundary, the inter move, when I complete the cycle that is when I
come back to my starting point from where I have started my move either in the clockwise
direction or in the anticlockwise direction, this entire boundary can be represented by a sequence
of such moves and as each of the moves has got a specific number, so we can say that this entire
boundary is represented by a sequence of such numerals where the numerals vary from 0 to 7.
So, what I getisa code, a numerical code for the boundary which we call aschain code.Now, let
us see that for a given boundary; how we can represent how we can obtain a particular chain
code?Now, as we have said just now that I move from 1 boundary point to the next boundary
point.Now, if I simply apply this, the problem is because we are considering discrete points
representing the object boundary; so what happens because of the presence of noise, say for
example i have 3 consecutive pixels say something like this,let me draw it in a bigger way.
So, these are the 3 consecutive pixels that we have in an image.Now, it may so happen that
because of the discretizationprocess and also because of the presence of noise, the middle point
that is this one is slightly shifted.So, instead of thismiddle point being present over here, the
middle point may be shifted somewhere here.So,we find that because of the slight amount of
noise, the type of chain code that we will get will be totally different.Myideal chain code should
have been a move in this direction and a move in this direction.So, the chain code should have
been ideally 11.Butbecause this point is shifted due to noise or due to discretization whatever
that is may be,what practically the chain codethat I get isa chain code this thatis move in the
vertically upward direction which is 2 and then followed by a move in the right directionwhich is
code 0.
So, instead of obtaining a chain code 11, what I am getting is a chain code 20 which is totally
different from chain code 11.So, this is the major disadvantage of trying to find out the chain
code from the original boundary points.So, instead of trying to find out the chain code from
theoriginal boundary points, what is usually done is we go forresemblingofthe boundary points
or the resembling of the discrete pointsrepresenting a boundary.So, let us see how that is done.
5
Sosuppose,I have a set of points representing a boundary say something like this.So, these are
say set of points which represents a boundary.Now, the approach that I am taking is instead of
trying to find out the chaincode from this original boundary points,I am trying to resample these
boundary pointsby placing a grid.So, as shown in this figure that I have agrid which is
superimposed on this set of boundary points;then what I do is I assign the boundary points to one
of the grid locations based on its proximity or a boundary point will be assigned to one of the
grid point,one of the grid location which is nearest to it.So,based on that, you find thatI can
identify a set of boundary set of grid points which can approximately represent the boundary.
So,what are the grid points that I can have?Now,I marked them ingreen.So, this is one of the grid
points which can be used to represent the boundary, this is one of the grid points which can be
used to represent the boundary, this isanother grid point,this is another grid point, this is another
grid point,this is another grid point,this is another grid point, this is another grid point, similarly
this one, this one, say this one.So, these are the different grid points which may be used to
represent the boundary.
Now,onceI do this, what I do is I represent a chain code,I find out a chain code for this grid
points; not for the original boundary locations, not for the original boundary points.Sogiven this,
you find that this particular representation or grid points can be represented by a boundary by
code like this.So, I can have this move here,I have this move, here I have this move; so this is
what will be my chain code representation.
Now, if you remember the chain codes, the code for different reactions that we had is this
direction was given a code 0, this direction was given a code 1, move in this direction was given
a code 2, move in this direction was given a code 3, move in this direction was given a code 4,
move in this direction was given a code 5, this was given a code 6 and this was given a code 7.
So, by using these codes, you find that if I assume that this is my starting point; this move is
represented by code 0, this move is represented by code 7, this move is represented code
6
6,similarly here it is code 6,here it is 7, here it is 6 again, here it is 4,here it is 5,4,3,4, then
2,2,2,1,1.
Sofinally, my chain code representation that I get for this shape starting from the indicated
starting point will be like this; it will be 0,7,6,6,7,6,4,5,4,3,4,2,2,2,1 and 1.So, this will be the
chain code representation of this particular boundary pixel.Again, you note that when we have
obtained this chain code representation, we have not found out the chain code representation of
the original boundary points but what we have done is we have resampled the boundary points
by placing words and based on the grid locations; we have obtained this chain code
representation.
Now,more advantages of this is, this can also take care of the scaling.That is by having a control,
by varying the grid spacing, we can have the same boundary represented at different scales and
accordingly I will have different chain code.Now, though the chain code representation appears
to be very simple in this particular case, but it is not so simple because you find that the chain
code that we have obtained here is not rotation invariant.
Though the chain code will be translation in variant, we can take care of the scaling bya proper
grid size but the chain code in this case is not rotation invariant.Butwhat we have said earlier is
our purpose of representation is that give once we decide a particular representation, we want to
generate some descriptor out of that representation and these descriptors will be used for final
recognition of the object or identification of the object.
Sonaturally, the descriptors that we want to have should be ideallyrotation, translation and
scaling invariant.Butas we have seenin this case thatthis particular representation, though it is
translation invariant,I can make it scale invariant by having differentgrid sizes butit is not
rotation invariant.Toillustrate that this chain code representation is not rotation invariant, let us
take a very simple figure.Sayfor example,I take a figure something like this.
7
Now, find that for this or let me consider even a simple figure this, again becomes a complicated
one.So, let me consider even a simplefigure.Suppose I have a figure, a simple boundary of a
square and if you remember our different directions were something like this; so, this was
0,1,2,3,4,5,6 and 7.
So, here you find that this simple square, a square boundary; the chain code of this will be if I
start from this starting point, will be 0,0,6,6,4,4, then I have 2,2.So, this is a chain code which
corresponds to this square boundary.Now, what I do is I simply rotate this square figure by an
angle 45 degree.So, what I get?fI rotate the same figure by an angle of 45 degree; the
corresponding figure, the boundary will be like this. Thisis the boundary ifI simply rotate this
square by anangle 45 degree either in the clockwise direction or anticlockwise direction.
And, if I want to find out the chain code of this rotated boundary, then let us see what is the
chain code that will have.AgainI assumethat suppose this is my starting pointto find out the chain
code; here, the chain code will be 7,7,5,5, then 3,3, then 1,1.So, you find that the chain code of
this rotated figure comes out to be like this.So, it is the same figure and now if you compare this
chain code with this chain code, you find that the chain codesare totally different.
So, it is the same figure which is rotated and after rotation,I get totally different chain code
representation. So,this clearlyillustrates that the kind of chain code that we get in this particular
case is not rotation invariant,whereas rotation invariance is our requirement. So, how we can
obtain a chain codewhich is rotation invariant?So, for rotation invariant chain code, we don’t go
for the simple way of chain code extraction or chain code representation but what we go for is
what is called a differential chain code, the differential chain code is something like this.
Letus consider this simple example.What we do is once I get the first order chain code that is
chain code as given state from this procedure, then I take subsequent codes,I take 2 subsequent
codes and try to find out that if from the first code I have to move to the second code, then how
many rotations let us say in the anticlockwise directionI have to perform; so by doing that you
find that in this particular case andbecause this is differential chain code,I have to take the
difference, so insteadof considering chain code as a string, you consider the chain code as a
cycle.That means the first codehas to be obtained by rotation from the last code in our chain code
string.
So, by applying that you find that if I want to move from 0 to 0 because the previous code was 0,
the next code is also 0;so the number of rotations thatI have to perform is 0 rotations.So, when I
move from this 0 to 0, the number of rotations that we have to perform is 0 rotations.WhenI
move from this 0 to 6, so you come to this particular figure,I have to move from 0 to 6; so the
number of rotations that I have to perform is 1,2,3,4,5,6,I have to perform 6 rotations,I represent
this by 6.
When I go from 6 to 6, the number of rotations thatI have to perform is 0.Aswemove from 6 to 4,
this is 6 and this is 4; how many rotations that I have to perform?1,2,3,4,5,6,6 rotations
again.WhenI go from 4 to 4, the number of rotationsis 0.WhenI move from 4 to 2;so this is 4,
this is 2, again you find that I have to perform 6 rotations.Aswe move from 2 to 2,I have to
8
perform 0 rotations.AsI move from2 tosay0,I have to perform; this is 2, this is 0,soyou can
compute that again I have to do 6 rotations.
So,now this deferentialchain code, it becomes 0,0,6, then 0 again, then 6 again, then 0 again, then
6 again,then 0again, then 6 again.Now, what happens in this particular case if the figure is
rotated?Letus see that firstlyI have to move from 7 to 7, so I have to perform 0 rotations.Letme
use some other colour.Sohere, first I have to perform 0 rotations.Then I have to move from 7 to
5; so this is 7, this is 5.Again, in theanticlockwise direction you find that the number ofrotations
that 1has to perform is6 rotations.5 to 5, again 0 rotations;5 to 3 - this is 5, this is 3, again you
can compute that you have to perform 6 rotations;3 to 3 again 0;3 to 1 - this is 3, this is 1,I have
tocalculate the number ofrotations in theanticlockwise direction, soagain you find that this will
be 6 rotations;1 to1,0 rotations;1 to 7 - this is 1, this is 7, number of rotations in the
anticlockwisedirections again that will be 6.
So, the chain code in this particular caseI get that is the differential chain code is
0,6,0,6,0,6,0,6.Herealso, what I have obtained is 0,6,0,6,0,6,0,6. So,you find thatinstead of taking
the direct chain code, if I go for the differential chain code,then whatever be the rotation,I get the
samechain code representation and what I am doing is instead of consideringthis as a chain,I
consider this as a cycle.Similarly, this has to be consideras a cycle.
So,this differential chain code, it will betranslationinvariant, it will be rotation invariant and the
scaling, the scale invariance has to be done by choosing the gridspacing properly.So, I can have
translation, rotation and scale invariant chain code representation, differentialchain code
representation of any given boundary.
Soupto this, what we have done is a differential chain code representations of a boundary.But
what we have said is we need a description so that the description which can be matched
againstthe description present in the knowledge base.So, what is the kind of descriptor that we
can obtain from this differential chain code representation?
9
Sosuppose,I have a chain code representation something like this; say 5,3,2,1,6,3,2 something
like this and as I said that instead of considering this as a chain,I consider this as a cycle.So, a
kind of descriptor that I can obtain from this particular differential chain code representation; so
what I am saying is a differential chain code representation is that if I consider this as a cycle
instead of a chain, then depending upon the starting point, this chain will be different.When I
consider this as a chain, depending upon the starting point from where I start finding out the
chain code, this chain will be different but the cycle will remain the same.
So, what I have to do is I have to redefine a starting pointfrom this cycle so that from that the
chain code that I get that will be say minimum, that will have say minimum numerical
value.So,you find that from this particular code, if I take my starting point here; then what will
be thenumerical value of this?1,6,3,2,5,3,2.Forthe original starting point from where this chain
code, differential chain code has been obtained; the numerical value is 5,3,2,1,6,3,2.IfI take the
starting point here, say at 2, the corresponding numerical value will be2,1,6,3,2,5,3.
So,you find that out of these3 different options that we have considered,this is the one which
gives you the minimum numerical value.So, whatwe have to do is once we get the differential
chain code, considering the differential chain code to be a cycle; once it is a cycle that means the
starting point is open,I can choose the starting point anywhere I like.So, what I do isI choose a
starting point so that from thatstarting pointif I open if I form the chain code, the differential
chain code, the numericalvalue of the differential chain code will be minimum.
So,onceI do that, in that case,I have a unique descriptor, unique description from the chain code
representation or differential chain code representationwhich can be used for the recognition or
understating purpose.So, this particular chain code that we have obtained having the minimum
numerical value; this is what is called the shape number.So, this is what is called the shape
number of the boundary of the object.
10
So, you can use this shape number to represent,to describe any given boundary and as we said
earlier that these kinds of descriptors are used for shape understanding or shape recognition.If we
want to find out or ifwe are interested in the reflectance property of the surface such as
colourtexture and so on,then this shape number is not helpful.So, along with the shape
number,we have to use some of thedescriptors that we will see in our subsequent lectures.So, this
shape number is one of the boundary descriptors that we can use for object understating or object
recognition.There are many other boundary-based descriptors.
So, let us see one more boundary based descriptor which is by using polygonalapproximation of
the boundary.So, what is thispolygonalapproximation of theboundary?Suppose,we are given
withanarbitrary boundary like this;what we wantto do is this boundary has to be represented
orapproximated by a polygon following some criteria.So, there are numerous polygonal
approximation techniques which have been reportedin the literature.Obviously, we cannot
discuss all of them but we will consider some simpler polygonal approximation techniques.
So, by polygonalapproximation, what I mean is this given arbitrary boundary, we want to

represent by a polygon.So, a polygon may be like this; say what I do is I represent the boundary
by a polygon of this form.So, this is a polygonal approximation which I have shown in pink
colour is a polygonalapproximation of this given boundary.Now, let us see that how we can
obtain such a kind of polygonalapproximation?Now,one of the polygonalapproximations that we
will discuss first is what is called minimumperimeter polygon.This is called minimum perimeter
polygon.
Here, the concept is something like this; you enclose this original boundary, the arbitrary
boundary by a set of connected cells.So, when I enclose this boundary by a set of connected
cells, then those connected cells define 2 sorts of restriction or 2 sorts of valves;one is the
external wall,the other one is the internal wall. And once I enclose this boundary by such
11
connected cells, then if I assume that this boundary is composed of say rubber code and let the
rubber code contact itself within that within that set of cells.
So, if we allow the robber code to contract, following the constraints thatwe have putby using the
connected cells that means by specifyingthat what is the outer wall and what is the inner wall;
then within that restriction, this rubber code will try to fit itself such that its perimeter will be
minimum.So, that is a kind of polygon that we get which is called a minimum perimeter
polygon. Now, let us illustrate this with an example.
Sosuppose here, I have a boundary say something like this and when I do that you find thatI have
a set of polygons which enclose this boundary,a connectedaset of connected cells which enclose
this particular boundary.Now, what are those connected cells?Thisis 1cell, this is 1 cell, this is
another cell, this is another cell, this is another cell, this is another one, this is another one,this is
another one, this is another one,similarly this, this, this, this, this.So, these are the connected cells
which actually enclose this particular boundary.
Andonce we have that, you find that I have a set of inner walls. So, in this particular case, the set
of inner walls is this.So, these are the inner walls and theseare the outer walls.So, given this kind
of situation, if I now allow this boundary to contract; what this boundary will do is it will fit
itselfwithin these cells, these connected cells following the restriction given by the inner wall
andthe outer wall and possibly,the kind ofboundary representation,the polygonal
representationthat I will get in this particular case will be something like this asgiven by this blue
colour.
So, this blue colouredone will be a polygonrepresentation of this given boundarywhich is in the
pink colour.So, this is a kind of polygonal approximation that we get which we term as minimum
polygon minimum perimeter polygon. Now, we can have other kind of polygonal approximation
as well.We can have a polygonal approximation by splitting the boundary.
12
Here, the concept is something like this; say suppose I havea boundary of arbitrary shape,so what
I do is somehow following some criteria,I choose 2 points on this boundary and split this original
boundary in 2halves by joining these2 points by a straight line.So, you find thatthis boundary has
now been divided into 2 different codes;one on the upper side and one on the lower
side.ThenwhatI do is for each of these boundary segments; as the boundary has been divided
into2 differenceboundary segments, for each of the boundary segments what I do is I find out a
point lyingon every boundary segment whose distance from the line joining theend points is
maximum.
So, in this particular case, you find that may be on the left side, this is a point whose
perpendicular distance from the line joining these2end points, theseare the end points; these2 end
points aremaximum.Similarly in this case, this is a point on this boundary whose perpendicular
distance from the line joining these2 end points is maximum.Now, what I do is I put a threshold
on this perpendicular distance.So, if I find that this maximum perpendicular distance is more
than the threshold, then again I split the corresponding boundary segment.
So, in this particular case, suppose thisone, this particular distance is more thanthe threshold; so
what I do isI immediately break this upper segment of the boundaryinto 2 sub
segments.Onesegment for one segment, this is the starting point and this is the ending point and
for the other segment, this is the starting point, this is the ending point. Again I do the similar
operation;I join a line between these2 end points,Ijoin a line between these2 end points, again I
find outwhat is the maximum distance of a point on each of the segments from the line joining
theend points.Again, I compare these maximum distances against the threshold.
Now, in this case, it mayso happen thatthis maximum distance is less than the threshold.So, if
this maximum distance is less than the threshold,I donot break them further.Similarly, in this
particular case;I join here,I join here,I find out what is the maximum distance,I find out what is
the maximum distance.
13
Now, suppose this maximum distance is less than the threshold but this maximum distance is
greater than the threshold; so what I will do is I will again segment, break this particular
boundary segment into 2 parts, one will be represented by this and the other onewill be
represented by this.Now, at this point, suppose these perpendicular distances are less than the
threshold, so I can stop my algorithm there.
So, what I get is given this arbitrary boundary, this arbitrary boundary has now been represented
bya polygon which is this. So, this polygon shape that I get is this one.So, this is just a replica of
this pink coloured polygon that we have drawn within this arbitrary boundary.So, I can get a
polygonalrepresentation of this particular boundary in this manner.Thisis what is calleda splitting
techniqueor boundary splitting technique becauseat every stage, we are splitting the boundary
into 2 halves depending upon whether the maximum distance of a pointlying on the
boundaryfrom the line joining the 2 end points of the boundary segment is morethan the
threshold or less than the threshold.
So, if it is more than the threshold, we again sub divide that particular segment of the boundary 2
halves.Ifit is less than the threshold, then we donot go forsub division any more.So, based on
this, we have also got a polygonalapproximation, a polygonalrepresentation of this arbitrary
given boundary.So, there are various such ways for polygonal approximation.
Sayfor example,one of the polygonalapproximation technique may besomething like this; say
start from a particular point,then you keep on then you trace other points lying on the boundary
and go on representing these 2 end points by a straight line something like thisand for every such
position, you find out what is the maximum distance of apoint lying on the boundary from this
straight line segment.Ifthis maximum distance which is the threshold, then I consider this as my
next starting point.
14
So,I start the same operation from here again, something like this.Again, I find that this
maximum distance touches the threshold; so this becomes my starting point,I start from here,
solike this.So,there are various ways in which a polygonal approximation of agiven boundary
can be done and a number of such techniques have been reported in the literature.Wewill not
discuss all of them but the essence is once I have a polygonalrepresentation ofa boundary,the
polygon captures the basic essence ofthe shape of the boundary that is what we wantto generate
our descriptor.
Now, the question is once I havea polygonal approximation of an arbitrary closed boundary;what
are the kinds of descriptors that we can obtain from this polygonal approximation?
So, what we are interested in is say we have a polygonal approximation ofa boundary,so I have
anysuch polygon,say something like this. Now, what is the descriptor that I can generate from
this polygonal representation?So, I have to find out a description of this polygon.Now, there is a
technique where to find out this descriptor, people havegone forauto regressive model.So,incase
ofauto regressive model, what is done is suppose I consider any of thevertices in this polygon, so
I consider this particular vertex;now this vertex if I call it say n'th vertex, it has a subtended
anglesay theta n.Now, if I trace this polygon in say clockwise direction, suppose I trace it in the
clock wise direction, then the given autoregressive model, what isassumed is this angle theta n is
representedby a linear combination of k number of previous angles.
So, what I do is in this case,this theta n that is the angle subtended at the n’th vertex is
represented by a linear combination of k number of previous angles.So,Irepresent it as alpha i say
theta n minus i, take the summation for i equal to 1 tok.So, I have an auto regressivemodel like
this where I consider k number of previous angles to represent, the linear combination ofwhich
represents this n'th angle.
15
So, what I have in this case is if I expand this equation, what I have is theta n is equal tosay alpha
k theta n minus k pluscontinuing like this, say alpha 1 theta n minus 1.So, you find that here I
havek number ofunknowns; k number of alpha k is which are thecoefficient ofthis linear
equation,soI have k number ofunknowns.Tosolve for this,I have to generate k number of
equations;so you take k number of such vertices which are represented by such linear equations.
So,for each of these k number ofvertices,I can generate k number of linear equations and by
solving those k number of liner equations,I can solve for alpha iwhere i varies from 1, 2 kand this
set of alpha i can be used as a descriptor which can be used to describe a given polygonalshape or
because the polygonis an approximate representation of our arbitrary boundary.So, this can be
used asdescriptor of theboundary.So, these are the 2 types of descriptors that we can use.Wecan
alsouse another descriptor, another kind of descriptor which is known as signature.
16
Now, what is this signature?This signature is some 1-dimensionalmappingof a boundary.So, as

we have seen,the boundary is nothing but a 2-dimensional closed curve.So, if it is an object
ideally,I shouldget a closedcurve and that closedcurve is a 2-dimensionalclosed curvethe
signature represents, signature is a 1-dimensional mapping of this closedcurve.So, how do we get
it?This is represented by a function.Say for example,I have so let us havewe have a circle like
this and I find out the centroid of the circle.I find out what is the distance ofdifferent points on
this closed curve on this boundaryfrom this centroid and if I trace this boundary in say
anticlockwise direction, then what I have is from this starting point,I have a displacement
angulardisplacement theta.
So, all these distance values,I represent it I representas a function of theta.So, you find that if it is
a circle, in that case,the distance r theta versus theta it will be a straight line which is parallel to
the horizontalaxis.Whereas, if I have a boundary which is say square or a rectangle like this and
this is the centroid,I compute r theta and this is my angle theta; so if I plotr theta versus theta in
this particular case, this is theta and this is r theta, then what I have is a representationof this
form, like this.
So, what I am doing is this 2-dimensional boundary is converted to a 1-dimensional function and
this 1-dimensional function that is r thetaversus theta is what is known as signature of this
boundary.Nowobviously, in thisparticular case,as we said that the descriptors or the
representations have to be translation, rotation and scale invariant; in this case, you find that this
particular representation is translation invariant butit is not scale invariant because if this
particular dimension, from centroid to this, it has a value equal to A, then the maximum value
that we will get is also A.
So,to make it scaled invariant, what we can do is we can normalize this particular signature
between 0 and 1 by dividing it andby giving adizzy offset equal to the minimum valueand
17
dividing by the maximum value.So, we can have a normalization between 0 and 1 that will make
this signature scale invariant.
Now, the signature is already translation invariant.Butif we want to make it rotation invariant,
then what we have to do is Ihave to define or identify a starting point, from where
tostartscanning this boundary or what is the directionwhich will correspond to say theta equal to
0.So, to obtain this, what I have to do is I have to get a unique point on the boundary ifit is
obtainable.Sayfor example, incase ofa circle, I cannot find out a unique point.So, whatever be
my starting point or whichever location corresponds to theta equal to 0,I will always get this
particular signature.
Butin such cases,where the boundary is not uniform, it is not of circular shape;itis possible to
have a unique point on the boundary and what we can do is we can choose 2furthest points,I can
choose the pair of points on the boundarywhose distance is maximum and out of that whichever
is at a larger distance from the centroid, that I can use as a starting point.
Sayfor example, if I have a shape something like this; so you find that these are the 2 points
which arefurthestand the centroid is somewherehere.Saypoint on this boundary which is furthest
from the centroid,soI can start my scanning.So, this may be my direction of theta equal to 0or
similarly I can have a similarI can choose such a point by using the principalI can access.So, I
can choose 2 points on the boundary which lies on the principalI can access and out of
these2whichever is furthest from the centroid,I can choose that my starting point orthe reference
theta equal to 0 and using this I can have a 1-dimensional mapping of a boundary which give us
a signature.
So with this, westop our lecture today.Wewill continue this topic in our next lecture.
Thank you.
18
Prof. P.K. Biswas
Lecture - 38
Object Representation and Description-II
Hello, welcome to the video lecture series on digital image processing.Sinceour last lecture, we
have started our discussion on aspect, a particular aspect of image processing techniques that is
we are moving towards understanding the object present in animage or we are going
forinterpretation or image interpretation.Now, to start with, we have said that if we are going for
imageinterpretation or image understanding, basically recognition of the objects presentin
ascene; then the first operation that we have to do is we have to look for a proper representation
mechanism of the object regions and once we have arepresentation mechanism, then from for
that representation, we have to find out adescription of the object.
Now, once we get a description of the objet, then this descriptioncan be matched against similar
such descriptions which arestood in the knowledge base in the computer and to whichever such
modelsor description our current description matches, we can say that the object present in the
scene is that kind of object,say it is an object x or object y and so on.
Soinitially, what we have done is in our last lecture,we have discussed about a representation
technique, about some representation techniqueswhich are boundary-based representation and
accordingly we have found out some descriptionfromthis boundary representation. So, the first
kind of the boundary representation method that we have talked about is a chain code
representation and we have also seen that when we have a chaincode representation,thechain
code representationbyitself is not scale independentorrotationindependent though it istranslation
independent.
So, to take care of its dependency on the scale as well as its dependency on rotation,what we
have done is we have resampledthe boundary points by placing grids of differentsizes.So, the
different scales of a region can be taken care by placing grids of proper spacing.Whereas, to
obtain the translation invariance, what we have said is instead oftaking the chain code itself, we
can go for differentialchain codeand we havesaid that we can obtain the differentialchaincode
from the original chain code by considering that given two subsequent codes in the chain
code,wehave to find out that how many rotationsonehas to give either in the clockwisedirection
or the clockwise direction to move to the second code from the first code.
And, if I represent this number ofrotations that we have to give, the number ofsteps of rotation
that we haveto give in the form of a chain thatbecomes a differentialchain code.And finally, to
make it rotation invariant, what we have said is instead of considering that as a chain,we can
consider this as a cycle and in that cycle, what we have to do is we have to redefine our starting
point for chain code generation so that fromthat starting point if I open up that particularcycle,
what I get is a numerical numberand we have said that these numerical number if I follow a
convention, that I will start,I will select a starting point such that the numerical number that is
generated is minimum; thenwhat I get is a chain code or differential chain code based description
of the object shape or of the object boundary which is a descriptor that can be translation rotation
and scale invariant and this particular minimum number that we get, we have said this is what is
called shape number for that particular object shape.
Then we have discussed about anotherboundary-based representation technique which is the

polygonal approximation of the boundary.So, wehavediscussedvarious techniquesofpolygonal
approximation technique;how a given boundarycan berepresented bya polygon?Thenwe have
talked abouta particular description mechanism that if wemodel the verticesorthe corners
atdifferent vertices of the polygon in the form of anautoregressive model ofsayorder k;in that
case, wecan form aset of linear equations.After solving those set of linear equations,we cansolve
for thecoefficientsof thatauto regression modeland these set of coefficientscanalso be usedas a
descriptorwhich representsthat particular polygonal shape and these coefficientscanalsobe used
for matching purpose.
In today’s lecture, what we will talk about is wewill discuss aboutsome more boundary-based
descriptors.Oneof the boundary-based descriptorwe will talkabout iswhat is called a Fourier
descriptor,we will also talk about another descriptor whichis called boundarystraightness and
theother kind of boundary descriptor thatwe will talkabout is called a bending energy. Then as
we have saidthat all these boundary-based descriptors, we have said that this essentiallycaptures
the shape information of the object region.
Now, to capturetheshape information of theobject region,similar such information we canalso get

from the region not only from the boundary; from the entireregion,we can also capture theshape
information ofthatparticular region.So, we will talk about someof the region-based shape
descriptors likeeccentricity, elongatedness, rectangularity, compactness,momentsand some
otherdescriptors which are the shape descriptorsthat can be extracted from the entire region not
only from the boundary.So, let us first talk about thisboundary-based descriptorwhich we said is
the Fourier descriptor.So, let us see what is the Fourier descriptor.
So, suppose you have an object boundarysay something like this;now obviously,I will have
animage axis, seeimage axes are given byx and yand this boundary isnothing but a set of discrete
points in this two-dimensional xy space.So, what is this boundary?This boundary is actually a set
of points and let us represents the point a particular point byS (k).So, this boundary is nothing
but a sequence of such pointsS (k)in our discretetwo-dimensional space xy.
So, this S (k)in the two dimensionwillhave a coordinatewhich is given byx (k)and y (k).Now, if I
assumethis xaxis to be areal axis and y axisto be the imaginary axis; so what I am doing is I am
interpretingthe xaxisas the real axisand y axisas the imaginary axis.Then every boundary point S
(k)in thissequence ofboundary pointscan berepresented by a complex number which is given byx
(k) plus j y (k)where j is equal to squareroot of minus 1.
So, based on this interpretation that we are interpreting the xaxisas the real axisand y axis asthe
imaginary axis,every point every discrete pointon the boundarycan berepresented by acomplex
number.So, ifI takethe sequence of suchboundarypoints when we tracethe boundaryeither in the
clockwise direction or in theclockwise direction,what we getis asequence of complexnumberS
(k)and suppose we havetotal of capitalN numberofpoints on this boundary;so in this
particularcase, ourk will vary from0 to capital N minus 1.So,I have N number of complex
numbers taken in a particular sequence which representsaboundary.
Now, what wewant todo is we want to takethediscreteFourier transform- DFT ofthis sequence
ofcomplex numbers S (k)where kvaries from 0 tocapital N minus 1. So, as we have noted earlier
when we had talked about thediscreteFourier transform, you find that ifI take the discrete Fourier
transformof this N number ofsample points;in this particular case, each of the sample points is a
complex number, what I get is capital N number of coefficientswhich are the Fouriercoefficients.
So,I canwrite thatexpression in this formsay a (u) is equal to1 upon capital N, take the
summationS (k)e to the power minus j2piby capital N ukwhere kvaries from0 tocapital N minus
1.So, by doing this, you find thatI getcapitalN number ofcomplexcoefficientsa (u). So, each of
the coefficientsa (u) iscomplex. Now, these set of complex coefficientscan be used as a
descriptor which describes this particular given shape or this particulargivenboundary.
Nowobviously,if I take the inverse discreteFourier transform,the IDFT ofthe set ofcomplex
coefficientsa (u); thenI should get back S (k)or original setof boundarypoints. So, thisinverse
discreteFourier transformis given by a (u)e to the power j2 pi ukby capital N. Now, u has to
varyfrom 0 to capitalN minus 1.So, thisset of u, thiscapital N number of complexcoefficients,this
is what is known as Fourier descriptor of the boundary points.
So, givena set of boundary points, we consider each of the boundary points as a complex number
by tracing the boundaryin a particular order eitherin the clockwise direction or in the
anticlockwise direction,I get a sequence of suchcomplex numbers;I take the discreteFourier
transformation of that sequence and suchDFT coefficients,the set of DFT coefficients, that
become the Fourier descriptor which describes the given boundary.Andnaturally,if I take the
inversediscreteFourier transformationof theseFourier descriptors,then I get back ouroriginal
boundary points.

Now, in most of the cases, for representing in a shape,we normally donot consider all the capital
N number of Fourier descriptorsthat is all the coefficientsin thisFourier descriptor.Rather, we
consider, sayfirst capitalM number of descriptors for describing theshapewhere obviously
thiscapital M is less then capital N.
Now, using this capital M numberof Fourier descriptors, if I try toreconstructour original
boundaryby taking inverse discreteFouriertransformation;then because we are cutting out some
of the Fourier descriptors,naturally ourreconstruction will not be exactbutrather what we will get
is an approximate reconstruction of our boundary points.So, that approximate reconstruction or
approximate boundary points are given by S (k)is equal to a (u)e to the power j 2 pi uk by capital
Nwherenow this summation will be for u equal to0 tocapital M minus 1.Sohere,you find that and
k will vary from0 to capital N minus 1.
So, find that thoughwe are taking less numberof DFT coefficients orless
numberofFourierdescriptors for reconstruction of our original boundary points, the number of
boundary pointswhich arebeing reconstructed will remain thesame.Butwhat we will lose iswe
will lose some details of the boundary.Sayfor example, if I have aset of boundary points
whichcorresponds to a square; so if I consider very few number of coefficients,may be say one or
twodiscreteFourier transformationscoefficient, in that case our capital Mwill become one or
two,in that case after taking this inverse FourierdiscreteFourier transformation,I will get capital
N number of boundary points which is same as our original number of boundary points.Butnow
the points will lie on a circular shapelike this.
So,I will get back this original shape if I consider all the capital N number of Fouriercoefficients,
the DFTcoefficientsforreconstructing the boundary by inverse discreteFouriertransformation.IfI
consider say very few number of discreteFourier coefficientsay2, then I will get a circular shape.
If I increase it further may be say 10 or so on,I will get a shape something like this.So,
insuchcase, you find that thedetails of the cornerswill be lost but still Ican maintain the basic
essence, the primary essencesofthe shape of ouroriginalboundary which was a square and it
becomes almost a square.
So, by considering few number of DFT coefficients as Fourier descriptors to represent the
boundary,I can approximatelyhavearepresentation of the boundaryand this gives an approximate
description of our boundary.
So, thisFourier descriptor can bea very important descriptor which can be used forobject
recognition purpose.Andobviously, if I have two different shapes which are widely different,
then the Fourier descriptorsforthese two widelydifferent shapes aregoing to be widely
different.So,I can easily discriminate between the descriptorsbetween the shapes between two or
more shapes which are widely different.So, now let us take that what other boundary-based
descriptors that we can have.
So, other boundary descriptors;so, another formof boundarydescriptor is what is called boundary
straightness.This boundary straightnessis defined based on the concept of boundary curvature.So,
as we know that in continuous case,the curvature at a point is given by rate of change of
slope.So, if you take the derivative of the slope at a particular point, what you get is the curvature
at that particular point.So, this boundary straightness even in discrete case, it is defined as
number ofpixels, the ratioof the number of pixels where the boundary direction,the direction of
boundary changes abruptlyto the total number of boundary points.
So,you find that in our case, we are not going for exactmeasurement or exactcalculation of the
curvature at a particular point on the boundary.Butwhat we are doing iswe aretrying to find out,
identify those points on the boundarywhere the direction of the boundarychanges abruptly
because those are the points where the curvature is likely to be quite high and I consider the
number of points on theboundarywhere the direction of the boundary changes abruptly and take
the ratio of these number of points to the total number of boundary pointsand this ratio is what is
called the boundary straightness.
So, you find that lesser the number, smaller the number is that is boundary straightness, the
boundary is going to be more and more linear,itis moreand more straight whereas if this value is
higher, if this ratiois higher that means there are more number of points on the boundary
wherethe curvature is very high.So, we cannot say that the boundaryis straight in such cases.So,
this gives a measureofthe straightnessof the boundary.
Now, the question is thathow to find out that what are the pixelswherethe boundary
changesabruptly?So, if I consider say these are theset of points, so these are the set of points
which are on aboundaryand I take this i'th pixelwhere I wantto find out whether theboundary
changesabruptlyat thisi'thpixel or not.
So, what I do is I take a distance b and consider two more points; one at,say I consider one point
- i plus b in one direction and I consider another point - i minusb in the opposite direction.Then
whatI do is I draw a straight line passing through i and i plus b and I draw a straight line passing
through i minusb and iand I try to find outthat what is the angle betweenthese two straight line,
say anglebeta.
So, we find that if the boundary changesabruptly at thispixel location i,then thisvalue of this
angle betawill be quitehigh whereas if the boundary does notchange abruptlyat this location
i,then the value of this angle beta will be very low.So, depending upon the value of this angle
beta between these two line segments; one passing between i minus b'th point and i'th point and
the other onepassing betweeni'th pointandi pluspassing iplus b'th point,I can determine I can
estimate whetherthere is a sharp change ofboundary direction at this i'th pixel or not.
So, this way I find out the number of pixels where the boundary changes abruptly and the ratioof
that such number of points to the total number of points gives me a measure what is called a
boundary straightness.The other form of boundary based descriptor can be what is called bending
energy.
So, what is this concept of bending energy? Bending energy is supposed we have steel rod and
we want to bend that steel rod to a given shape. Now, while doing so, the amount of energy that
we have to spend to bend the steel rod to a given shape, that is what is called bending energy. So,
considering this, you find that if I want to bend the steelrod suchthat thecurvature will be more,I
have to spend more amount of energywhereas if the shape has a curvature which is less,thenI
havetospend less amount of energy.
So, this bending energy,this can be computed fromthe concept of curvatureat different pointson
the boundary itself.So, at the k'th point if I say that c (k) is the curvature atpointk;in that case,the
bending energycorrespondingto that point is a square of this.So, the total bending energyof the
boundary,this can be computed as say bending energyis equal tosummationof c(k) squarewhere c
k is the curvature at point kand I have to take this sum for all the pointslying on the boundary.
So, this will be for k equal to 1 to say the capital Lif capitalL isthe total number of boundary
points and I normalize this;soI have to take, divide this measure by 1by L.So, this becomes my
normalized bending energyand this normalized bending energyor normally called bending
energy, this is also a feature for a descriptorof the shape ofthe shape of the region.So,based on
the boundary and asimilar in a similar manner, we can find out many other descriptors which
represent theshape based on the boundary information itself.
So, as the shape information as we said earlier that the shape information can be described from
the boundary of the region,similar shape information can also be obtained from the region itself
or the region area.

So, letus seethat what aresuch descriptors that we can obtain from the shape informationthat
wecan obtain fromthe region itself.Sonow,we will talk about region-based shape descriptors.
Oneof the obvious descriptor isthe region area.Now, whatis region area?Region area is nothing
butthe total number of pixels belonging to the area or belonging to the region.So, it is the total
number of pixels belonging to that particular region.
Soobviously,we find that this regionarea can be a descriptorwhich is translation invariant or

rotation invariant because if I translate the object or if I rotate the object, the total area or the
total number of the pixel within thatobject will remain the same whereas the regionarea,this
descriptoris obviously not scale independentbecause if I zoom the object, thetotal number of
pixelsbelonging to the region willbemore whereas if I zoom itout,in that case, the total number of
pixelsbelongingto that particular regionwill beless.
So,this region area, though it is a feature but thiscannot be used for object description or object
recognition.The otherformof descriptor which is called Euler number;now Euler number, it isa
simple topologicalinvariant. Now,what is Euler number?Suppose,I have a regionsomething like
thisand within this regionI have a number of holes.So, youfind thatin this particularcase,I have
oneconnected component of the regionand withinthis connected component,I have a number of
holes. So, in this particular case,I have one connected componentand there are 3holes.
So, the Euler numberis defined like this;say Euler number is equal to s minus n where s is the
total number of connected components of this particular region.So, s minus n and n is the total
number of holes in this region.So, in this particular case, because it has got only one connected
region and 3 holes; so Euler number in this particular case will be equal to minus 2whereas if I
have an object of this form, say one connected region and one hole within this connected region,
in this particular case, the Euler number will be equal to 0.
So,asI said that this Euler number is atopologicalinvariant descriptor.Whyit is invariant?because

if I zoom this particular region or zoom out this particular region, if I translate the object or if I
rotate the object,the Euler number will remain the same.Obviously, the restriction is if I zoom it,
in that case, it should not be stretched so much that the single region breaks into two different
regions or the viewing angle should not be such that one of the holes is not visible, one or more
of the holes is not visible.So, within that constant, you find that this Euler number gives a
descriptor,a topological descriptor which is invariant to rotation, translation and scaling.
Theother kind of descriptor which can be used which is again a region-based descriptor that is
horizontal and vertical projections.
Now, what are these horizontal and vertical projections?Suppose,I have some object region say
something like this and this is my x axis - horizontal axis and this is the vertical axis – y.So, what
I do is I take the projection of this on my horizontal axis, this becomes the horizontal projection
and if you take the projection of this on the horizontal axis,the projection will become something
like this and I also take the projection of this area on the vertical axis.So, in this particular case,
the projection on the vertical axis will become something like this.
So, this horizontal axis, the horizontal projection this i call p h (i) and this can be computed by
simple expression f(i,j)where j varies from and this summation has to be done over this variable j
that means I have to take the summation along a particular column.
Similarly, this projection I call vertical projection; so this is say p v (i) which can be simply
computed in a similar manner.Itwill be the summation f (i, j)where now this summation has to be
taken over i.So, this should be p v (j).So,you find thatdepending upon the shape of this particular
region;these two projections that is the vertical projection and horizontal projection, they are
going to be different and normally these are used as descriptors in case of binary image.This is
not normally used incase of gray level image.
Theother kind of region-based descriptor that can be used what is called eccentricity.Now, what
is eccentricity?Suppose,I have a region say something like this; what I do is I find out a code of
this particular region of maximum length. So, suppose I have a code this which is of maximum
length and I call this code as code A and then I find out another code of maximum length of the
same region which is perpendicular to the code of maximum length.So,I call this code as code B
and my restriction is that these two codes have to be perpendicular to each other.
So, given such a situation; then the ratioof the length of the maximum code, so length of A to the
length of B, so the ratio of these two codes - one is a code of maximum length and the second
one is a second code which is again of maximum length but perpendicular to the first code and
ratio of these two, this is what is called eccentricity of this particular shape.
So, this eccentricity can also be used as one of the descriptors, one of the shape descriptors and
as you see that it is also a region-based descriptor.Ofcourse, a similar such descriptor,I can also
obtain from the boundary of the region.

Anotherkind of descriptor that can be defined shape-based descriptor which is called
elongatedness. Now, to defineelongatedness, we canstart something called a region minimum
bounding rectangle.Now, what is this?Suppose,I have a shape, a region shape of this form,
saysomething like this and what I do is I find out a rectangle,a bounding rectangle of thisregion
which is of minimum size say may be such a bounding rectangle can be of this form, say I have a
bounding rectangle like this.
Now, in this particular case, this shape has not become a proper rectangle because the angles are
to be 90 degree.So,I have a bounding rectangle like this and this bounding rectangle have to be
of minimum shape, minimum size.So, this is what is called minimum bounding rectangle.Now,
once I have such a minimum bounding rectangle, suppose a is the length of the larger side of this
bounding rectangle and b is the length of the smaller side of this bounding rectangle and the
elongatedness in such case is defined as the ratio of a to b that is the length of the larger side of
the minimum bounding rectangle to the length of the smaller side of the minimum bounding
rectangle.This is what is called the elongatedness of this particular shape.
Now,you find that such a kind of elongatedness,I mean this simple definition of elongatedness is
not valid if my region is acurved region.Forexample, if Ihave a region something like this; for
this kind of region, the elongatednessmeasure cannot be simply defined by this ratio of a and
b.So, in such cases to define the elongatedness of this kind of region, what I have to take into
consideration is what is the thickness of this elongated region along with the length of the
elongated region.
So, one of the measure of elongatedness,I mean one way of finding out the elongatednessfor
such curved region is like this; it is defined as for such curved regions as area by maximum
thickness square.So,I find out what is the area of this curved region,I find out what is the
maximum thickness and area divided by maximum thickness square, these gives me a measure
of the elongatedness for such a curved region and one of the way to find it out is you find out
area divided by say2d square where d is the number of erosion steps that is needed before this
region erodes to a null set.So, this is the maximum erosion steps. So,I can defineelongatedness
for a curved region in this way and this elongatedness is one of the descriptors which captures
what is the region shape.
Someother descriptors,some other region-based descriptors can be one of them, can be say
rectangularity.Rectangularityis the ratio of a region area and the area of a bounding rectangle
having a particular direction say k which is maximum.So,I can define this rectangularity as the
ratio of region area and the area of a bounding rectangle which is maximum.So, if I say that F k is
the ratio of the region, ratio of the region area to the area of a bounding rectangle having
direction k; then the rectangularity will be defined as will be simply maximum k over F k .
So, what we have said that this F k is the ratio of the region area to the area of a bounding
rectangle having direction k and the maximum of these ratios is what is called the rectangularity
of that particular given shape, regionshape and here obviously you find that when I say that what
is the direction or a bounding rectangle having a direction k, we have to define what is the
direction of a bounding rectangle.The direction of a bounding rectangle is defined as the
direction of the larger size side of the bounding rectangle.
So, what we have do is given a particular region,I find out bounding rectangles of various
orientations and for each of these orientations, for each of the directions,I find out the ratio of the
region area to the area of the bounding rectangle and for different directions whichever gives me
the maximum ratio that is what is defined asthe rectangularity of that particular region.

Theother kind of descriptor that can be used is what is called compactness and this compactness
is defined as the ratio of area by perimeter square.So, what does it give?Weknow that a circle is
maximally compact.So, this compactness for a circle, for a circle we know that area is given by
pi r square and the perimeter is given by 2 pi r and if I take the square of this and then take the
ratio, this simply becomes 1 upon 4 pi.So, the maximum value of the compactness is 1upon 4 pi
and the minimum value of the compactness can be 0.
Suppose,I have a long, very long region whose thickness is very very small; so in such case,in
the limiting case, what we are going to have is an infinite perimeter but zero area.So, if I take the
ratio in such case that area by perimeter square these gives me a value equal to 0.So, these are
the two limiting values of this compactness; one is 1 upon4 pi which is the maximum and this we
get for a circular region and as the region shape deviates from circularity, the compactness will
be less and less and in the limiting case the minimum value of the compactness will be 0.
So, these are some simple descriptors, the shape descriptors which can be obtained from the
region itself not only from the boundary.Now, there are some more shape descriptors which are
also obtained from the region and those are called moments.

Now, to compute this moment of a region or interpretation is the gray level image that you get is
a probability density that represents a normalized gray level image represents the probability
density of a two-dimensional variable.Now, with this interpretation that a normalized gray level
image tells us the distribution of a two-dimensional variable, we can find out some
statisticalproperties of the graylevel itself and this statistical properties are what are nothing but
moments.
So, a moment of ordersay p plus q, moment of order p plus q is defined like this; we have m pq
which is given by x to the power p, y to the power q, f (x, y) where f (x, y) is the normalized gray
level image,take the integral dx dy and this integral has to be taken in the limit minus infinity to
infinity over y and minus infinity to infinity over x.So, this is what is the definition of a moment
of order p plus q for a normalizedgray level image f (x, y).
Indiscrete case, this expression is converted into m pq is equal to i to the power p, j to the power
q, f (i, j), take the double summation where j varies from minus infinity to infinity and i also
varies from minus infinity to infinity.Obviously,for a finite image, this (j, i) and j limits will be
changing accordingly and in this case (i, j)(i, j) is the location of a pixel in the image and f (i, j) is
the normalized gray level of that particular image.
Now, you find that the moment as it is defined in this particular case, this moment is not
translation invariant.So, to make the moment translation invariant, what we take is what is called
as central moment.

The central moment of order p plus q is defined like this; thecentral moment mpq mu pq is
defined as (x minus x c ) to the power p (y minus y c ) to the power q f (x, y) dxdy and again you
take the double integral in the limit minus infinity to infinity.So, this is what is called the
normalize moment and again in the discrete case,this normalized moment will be expressed as
mu pq is equal to (i minus i c ) to the power p (j minus j c ) to the power q f (i, j) take the double
summation over i and j.Sorry, this should be (i minus x c ) and this should be (j minus y c )where
this x c is nothing but m 10 by m 00 and y c is nothing but m 01 by m 00 .
So,you find that this x c and y c , these two coordinates, what it specifies is nothing but the
centroid of the region.So, that is why it is called central moment and once we have the central
moment, the central moment becomes translation invariant.Butyou find that these moments and
not rotations and scale invariant.So, people have tried to find out the rotation and scale invariants
from these moments.

So, there are 4 different moment invariants which have been suggested, so 4 and those 4moment
invariants are defined like this; sayI 1 equal to mu 20 mu 02 minus mu 11 square upon mu 00 to the
power 4,I 2 is given by mu 30 square mu 03 square minus 6 mu 30 mu 20 mu 12 mu 03 plus 4 mu 30 mu 12
square plus 4 mu 21 cube mu 03 minus 3 mu 21 square mu 12 square this divided by mu 00 to the
power 10.
So, the third invariant is given by I 3 is equal to mu 20 to mu 21 mu 03 minus mu 12 square minus

mu 11 into mu 03 into mu 30 minus mu 21 into mu 12 plus mu 02 into mu 30 into mu 12 minus mu 21
square; this whole thing divided by mu 00 to the power 7 and similarly there is another moment
invariant which is I 4 and that has a very big expression like this.
So, we are not going into the details of how these moment invariants arederived but the details of
this information, if any of you is interested can be obtained from this particular reference.So, this
reference gives the details of how these moment invariants are generated, how these four
different moment invariantsare generated from the moments and these are rotation, translation
and scale invariant.
So, today what we have discussed is we have discussed some of the boundary descriptors, some
of the descriptors using the boundary as well as some of the shape descriptors using the region
itself and these descriptors can be used for high level recognition purpose when we try to match
these descriptors with similar such descriptors which are stored in the knowledge base.
So, we stop our discussion here today.Inour next class,we will discuss about some more region-
based descriptors.Now,today what we have discussed is the region-based descriptor but this these
region-based descriptors does not take into account the reflectance properties such as colour or
texture of the surface. Wewill talk about other region-based descriptorsay for example, texture
descriptors in our next lecture.
So, let us see some of the quiz questions based on today’s lectures and as well as the lecture that
we have a given in our earlier class.
So, the first question is what is chain code?Second question, how can you make chain code-
based descriptor rotation invariant? Third question,what is principal igen axis?Fourth question,
what is Fourier descriptor? Fifth question,explain how Fourier descriptors help in object
recognition? Sixth question, how can you compute elongatedness for curved objects? Seventh
question, what is compactness?What are its maximum and minimum values? And, the last
question, what does zero’th order moment represent in binary images?
Thank you.
Prof. P.K. Biswas
Lecture - 39
Object Representation and Description -III
Hello, welcometo the video lectureseries on digital image processing. Forlast 2 lectures, we have
started discussion on the last phase ofimage understanding process that is representation and
description of the regions present in an image.
Tillour last class, for last 2 lectures,what we have d1 is we have found out some
descriptorswhich we have saidare shape descriptors and we have said that this shape descriptors
can be obtainedeither from the boundary from the region or from the region itself. So, in our last
class,we have seen some of the boundary best descriptors like Fourier descriptor, then boundary
straightnessandbending energy. Similarly,some other shapedescriptorswhich we have
obtainedfrom theregion itself; some of them are eccentricity,elongatedness,rectangularity,
compactness, moments etcetera.
Intoday’s lecture, what we are going todiscuss about is some other descriptors from the region
itself.These are the descriptorsthat we will discuss;they take careof the surface nature, nature of
thesurface. Youfind that all other descriptorsthat we have discussed earlier, those are the shape
descriptorswhich are obtained either from the boundary of the region or from the inter regionbut
they donot give you any ideaabout whatis the surface nature.Itsimply tells that what is shape of
1
the object present in the scene.Intoday’s lectures, we are going to get some descriptors,derive
some descriptorswhich will tell us about what is the shapeor what is thenature ofthe object
surface.
Sotypically,what we willbe talking about the region descriptorsand we will discussabout the
region descriptors which arethe texture descriptors.Now, as wehave said earlier that such region
descriptorswhich givesyou idea about the nature of thesurface,they may be derived fromthe
colour of the surface or also may be derivedfrom the textureof the surface.The colour
informationthat can be obtained,that we have discussed earlier when we discussed aboutthe
colour image processing.
Today, what we will discuss is mostly the textures descriptor that is it will capture the texture
informationof the surface.Now, when we go about this texture descriptor,the texture descriptors
can be obtainedin variousways.So, 1of the technique that we will discuss is histogram moment
based texture descriptor.Wewill also talk abouttheco-occurrence matrix basedtexture
descriptorand we will also discussabout thespectral method or the spectral descriptorswhich are
useful for describing textures.
Now, what isa texture?The texture normally refers to propertiesof the nature ofthe object
surfaceor the structure of the object surface.So, what I mean bytexture,itisbetterexplainedwith
some examples.So, let us see some example textures.
2
So, here you find that we have given four different6 different images and each ofthese imageshas
some textures. So, as you findthat the texturecan be of various form;for example,the pattern ona
cloth, the way in whichthe threads are weaved that also forms a texture, ifyou take the image of
plane surface of say wood that also forms atexture, thetree lines,they also the forma texture.
So, there are various varieties oftexturewhich are available in nature and various varieties of
texture which can alsobe generatedsynthetically. So, the notion of the texture, the concept of
texture; this quite obvious intuitivelybutbecause of this wide variation oftextures, no précised
definition of texture exists. So, what we can say isthe texture is something which consists
ofmutuallyrelatedelements.
So, when we say that it is something that consists of mutually related elements;so basically, what
we are talking about is a group of elements or a group of pixels. So, in some cases,say for
example, in this particular case, you find that we can say that these textures contain some
primitive elements and by repetitiveappearance ofthe primitive elements eitherit is completely
periodic or semi periodic; so, by repetitive appearance ofthese primitive elements,what I get isa
texture image.
These primitiveelements are the group of pixels which form a texture is calledthe texture element
ortexts.So, as we saidthat noprécised definitionoftexture existsbutthe texture provides some very
very importantinformation about the roughness or thesmoothness of a surface.Atthe same time, it
also gives us the some information of theregularityof the surface. So, what are the waysin which
we can obtainthe descriptorswhich describes a particular texture? So, thetexture descriptorscan
be obtained in various ways.
3
Wecan have texture descriptions which are broadly categorized into 2 categories.So, we can have
texture description. So,1 of the ways in whichthe texture description can be obtained is
statistical.So, we can have statistical meansto obtainthe descriptors of the texturesand the other
ways of obtainingthe texture descriptors isfrom the spectral domain.Thatis ifyou take the Fourier
spectrum ofthe texture image and from the Fourierspectrum, we can obtain
sometextures,somedescriptorswhich describe the nature ofthe texture. Letus first considerwhat
are the statistical descriptors that we can obtain from a texture image.
So, the simplest forms of statistical descriptors are obtained from the gray level histogram of a
textureimage and what we do is you findout the momentsofthe gray level histogram. So, let us
4
assumethat in a texture image;z is a variable,Z i is a variable which represents the intensity of
different pixels present in the texture image and supposep (Z i ),this representsthe intensity
histogram or gray level histogram.So, from thisgray level histogram,we can find out different
textures which are nothing but moments of different orders and the descriptorswhich are derived
from these moments of the histogram.
So, given a histogram p (Z i ), we can have an n’th order moment.So, n’th order moment of a
histogram is defined like this;mu n (z),this is n’th order moment is given byZ i minus m to the
power n into p (Z i ) and you take the summation for i is equal to 0 to L minus 1. Sohere, what we
are assumingis that the intensity levels,the texture image has got L number of capital L numberof
intensity levels varying from z 02z capital L minus 1.
So, there are capital L number ofdescript intensity levels and if p (Z i ) is the histogram of such a
texture image, then we can obtain an n’th order momentof this texture image from the histogram
as given by this definition. Now, in this particular case, m is nothing but themean.So, you define
m as Z i p (Z i ) where you have to summit up from i equal to 0 to capital L minus 1. So, this is
what is the mean intensity value within the texture image.
Now, we can infera number of interesting properties from this n’th order histogram moment.
Youfind that ifI take the 0’thorder moment that is mu 0 where value of n equalto 0; in that case,
Z i minus m to the power nthat becomes equal to 1. So, what we are left withis summation of p
(Z i ) where ivaryfrom 0 to capital L minus 1. So, if I addall these different probabilityterms,
thenet summation becomes equal to1. So,you find that in this particular case,the 0’th order
moment of the 0’th order moment of this histogram is simply equal to 1.
Similarly, if I take the first order moment that is mu 1 , you will find that mu 1 will be equal to 0
but very important is the second order moment mu 2 . Sohere,you find following this same
5
expression,mu 2 is nothing butZ i minus m square p (Z i ), take the summation for i equal to 0 to
capital L minus 1.
So, from this particular expression, you will find that mu 2 is nothing but the variance which we
normallywrite as sigma square z. So, the second order moment is basically the variance of the
intensity values present in the image and this variance is a very very important information
because it tells us that what is the variability or what is the range of the intensity values which
are present in a particular image in a given texture image and from this second order moment or
the variance, we can derive very very important texture descriptor. So,we can definean important
texture descriptor like this.
So, R is equal to 1 minus 1 upon 1 plus sigma square z andas we said that the sigma square z is
nothing but mu 2 that is the second ordermoment. So, whatis the importance ofthis particular
descriptor R? Youwill find that if I have a completely plane surface,a completely smooth surface
of uniformintensity;in that case,the value of sigma square z orthe variance will beequal to 0 for
uniform intensity region and in suchcase, because variance sigma squarez is equal to 0, you will
findthat this complete expression R will become equal to 0 in such case.
So, if I have an image ofuniform intensity,the same intensity value for that, value ofR will be
equal to 0. Butif I have variation on the image,the intensity variation in the image;in
thatcase,sigma square z willbe non 0valuesand more the variation is the value ofthe sigma square
z will even be more. So, as the value of sigma square is zincreases,in this particular case, you
will find that 1 upon this particular quantity,1 upon 1 plus sigma square z,this tends to be 0.
Asyou increase sigma square z more and more, the value of this quantity 1 upon 1 plus sigma
squarez,this tends to be 0.
So,as the surface becomes more rough in the sense thatis more variation of intensityvalues in the
image, the value of R turnsto be 1. So, for a completely uniform surface, we have value of R
6
equal to 0whereas forrough surface, depending upon the degree of roughnessor the degree of
variation of intensityvalues, the value of R increases and it reaches a maximum valueequal to 1.
So, it is the value ofthis particular descriptor R which is very very important which catches what
is the variation of the intensity values in a given texture. So, we havesaid, we have seen the
meaning ofthree different histogram moments;mu 0 - the 0’th order histogram moment which is
equal to 1,mu 1 - the first orderhistogram momentwhich is always equal to 0.
So, these 2 histogram momentsdoes not give you any information about the nature of the
texturewhereas the secondhistogram momentmu 2 whichisequipollent tovarianceofthe intensity
values in the image; from this mu 2 , we cangenerate an important texture descriptor which for a
flatsurface or for a uniform image will be equal to 1and it willgradually increase and reaches a
value1for a uniform surface, this value becomesequal to 0and as the surface roughness
increases,the value of Ralso increasesand it reaches a maximum value of1 depending upon the
degree of roughness of the surface.
Now, there areother moments, say for example,mu 4 ormu 3 that is the third order moment;you
will find that this mu 3 (Z)third order moment, thistells you thatwhat is the skewnessof the
histogram.Similarly, the fourth order moment mu 4 , this tells us the relativeflatness of the
histogram.
Sonormally,as texture descriptor, the texture descriptors which are generated which are derived
from the gray level histogram, the descriptors which are used upto fourth order histogram
moments, the moments of higher order like fifth order oreven higher orderabnormally not used
but if you use them, they will give more and more final descriptors of the texture.
So, though this histogram moment based descriptors, they are very simple but they have one
problem.Theproblem isthesehistogram descriptors, histogram moment based descriptors,they
7
donot provide us any information about the relative position of the pixels.Butas we have seen,
you will find that in this particular texture, the intensity values,theintensity values at a position
with respect to some other positionthat carries lot ofinformation.
So, what is the way in which this relative position information can also be obtained for the
texture images, that information that relative position informationalso gives you a lot of
important descriptors which are useful for describing a texture.
8
So,towardsthat we will talk about anothertechnique for obtainingthe texture descriptorswhich are
obtainedfrom what is called co-occurrence matrix.So, as we said that what we are interested in is
along withthe intensity values, the relative positionsof the points with various intensity
valueswithin the texture; so, we have to define anoperator.
So, we define an operatorsay P which we call as position operator and before deriving the co-
occurrence matrix, we will generate a matrix say Awhich will be of dimension capital L by
capital Land this matrix A will be generated using the constraintwhich is specified by the
position operator.So, we are saying that we will generate a matrix of A of dimension capital L by
capital L.So, here again our assumptionsis that the intensity values which are present in the
texture image varies from intensity gray levelsvaries from 0 to L minus 1.So,we will have gray
level ofz 0 value,we will have gray level of z 1 value and like this the maximum gray level that we
can have is Z L minus 1 and in this matrix, capital A which is of size capital L by capital L, a
particular element say a i, j this will indicate the number of times points with intensity
valueZ j occur at a position determined by P relative to points with intensity Z i .
So, what we are doing is we take a point with certain intensity value within the texture
image,then following our position operator capital P, we come to some other point.So,I find that
what is the intensity value; suppose the intensity value at the location that we are considering is
Z i and following the position operator P, the other location where I come the intensity value of
that location is say Z j ; so this a i,j thisindicates that how many times such apair Z i Z j as indicated
by the location operator capital P appears within the given texture.
Obviously, in this particular case,i and jthe indices i and jwill have values within the range0 to
capital L minus 1 so thatI have a matrix A of sizecapital A by capital P.So, what I mean by
this,this will be better explained with help of an example.
9
So, let us take anexample matrix assuming thatthat is the representative of a particular texture
imagethat we are considering.So, let us take an examplelike this;so we have agiven matrix,
image matrixsay iwhich is given by0 0 0 1 2,1 1 0 1 1, 2 2 1 0 0, then say 1 1 0 2 0, then I can
have say 0 0 1 0 1.
Sosuppose, this our given image and let us assume that the position operator that we are saying
say P, this position operator indicates 1 pixel to right.So, from this given image and this position
operator; how we cangenerateour given matrix A?So, here you notice that this particular imageor
the sample of the image that we have said, this contains three distinct intensity levels.
So, those intensity levels aresay Z 0 isequal to 0.So,0 is one intensity level within this image.Z 1 is
equal to 1 andZ 2 is equalto 2.So, there are 3distinct intensity levelswithin this image.So,the
matrix A that we calculate willbe of dimension 3by 3.Now, we have find that following this
positionoperator, whatI have to do is following this position operator,I have to get the pair of
pixels and we have to compare and we have to count that for thosepair of pixels, whatever is the
pair of intensity values, how many times that pair appear within our given image.
Sofirstly,let us assume thatZ i is equal to 0,Z j is also equal to 0.So, here you find that since it says
that P -theposition operatoris1 pixel to theright;so I come to a pixel Z i which is having a valueZ i
equal to 0andI have to goto the next pixel to the right where the value will also be equal to 0.So,
how many times,I have to count how many times this 00 pair;one in the horizontal direction at a
distance of 1 pixel appears within the given image.So, here we find that I have one suchpair;this
is anotherpair, this is another pair and this is onemore pair.So, numberof timesthis 00pairappear
with in our given imageis 4.So, when I computethis matrix A,the 00locationA 00 will be equal to
4.
Similarly, how many timesA 01 will appear within this image?So, A 01 , this is one occurrence;let
me usesome other colour,sothis is oneoccurrence of A 01 , this is another occurrence of 01,thisis
another operands of 01.So, this is 01 pair appear thrice within this given image.So, A 01
locationwill containa value equal tosorry this is one more location pair A 01 occurs.So, this
A 01 element will have a value is equal to 3.So, this will have the value equal to 3.
Similarly, let us say how many times A 02 appear in this image.So A 02 , we can find thatthis is the
only occurrence of A 02 in this particular image.So, this location A 02 will have a value equal to
1.Similarly,I have to check how many times 10 appear in this particular image.So, you will find
that the number of times 10 appear in thisimage ishere I havea 10transition, this one, here also I
have 10transitions,here also I have 10transitions,10 pair.So, A 10 will have a value equal to 3.
Similarly A 11 , this is one pair, A 11 this is one more pair, A 11 this is another pair. So, A 11 will
also have a value equal to 3. A 12 this is 1 pair, A 12 and possibly this is the only pair A 12 within
this image.So,A 12 will also have a value equal to 1. Similarly A 20 , you come tothis particular
image, you will find that this is one pair A 20 and there is no other occurrenceof A 20 pair within
this given image.So,A 20 willalso beequal to 1.
Similarly A 22 , this is the only pair within this image.So,A 2 sorrythis is A 20 then A 21 , A 21 ,this is
the only occurrence of A 21 within this image.So,this A 21 will also be equal to 1. SimilarlyA 22 ,this
10
is the only occurrence ofA 22 . So, this element will also be equal to1.So, as a result,I get myA
matrixwhich I justcomputed in this particular form.
Now, from this matrix A,we can computethe co-occurrence matrixfor this particular image.
Howdo we get the co-occurrence matrix?If I takethe total number of occurrences of all thesepairs
within this imagewhich follows which satisfies our positional operation, the positional restriction
as given by the positional operator capital B; thenyou find that the total number of or the total
such occurrences within this matrix is say n is equal to in this particular case if you take the
summation of all these elements, it will be equal to 19. So, if I divideall the elements of matrix
capital A by this total number ofoccurrences,then what I get is what is calledco-
occurrencematrix.
So, in thisparticular case,our co-occurrence matrix C will be simply given by 1upon 19times
capital A. So, what are to do is I have to divide each and every element in the matrix capital A by
the total number ofoccurrences of pairs of intensity values as dictated byour positional operator
capital P and this resultant matrix that I get is our co-occurrencematrixes. So, what does thisco-
occurrence matrix tell us?
Everyelement say C ij within this co-occurrence matrix,itindicates the joint probability ofpair of
points that satisfies; so pair of pointssatisfying P will have values Z i and Z j . So, when I have this
co-occurrencematrix, every element C ij within this matrix thatindicates the joint probability
ofpair of pointswhich satisfies our positional operator capital P that will have values Z i and Z j .
So, you findthat in histogram based moments oravailable histogram based descriptors that we
havegenerated, we did not have any positional information.Butwhen we have a co-
occurrencematrix, in the co-occurrence matrix, we have the positional information as well as the
intensity relation of the points following certain positional information.
11
So, the question is once we have defined such C ij or once we have obtained a co-
occurrencematrix; what is the algorithm following which we can obtain a co-occurrence matrix?
Sonow, let us see that what kind of algorithm can be usedto obtain thisco-occurrence matrix.
So for this, let us assume that we have our image I given image I which is of size say capital M
by capital M. So, this capital M by capital M is our original image and let us assume that the
intensity levels, the discrete intensity levels, the number of discrete intensity levels which are
present in the imageis say L. So, first of all, what we have a do is we have to generate a matrix
capital A whose size will be capital L by capital L. So, an algorithm can be written like this; this
matrix capital A of size L by L can be initialized to 0.So, this is my initialization.What I mean by
this is we make every element in the matrix capital A equal to 0.
Thenour algorithm can walk in this fashion; say for i equal to 0 to capital M minus 1 do forj
equal to 0 to capital M minus 1 do begin, so here comes the operation for computation ofour
matrix capital A. So, for this computation what I assume is suppose the position operator P is
indicated by a vector because as we said that the position operator says that given a particular
pixel location, what is the location of the other pixel that we are interacted in.So, that can be
obtained simply by some vector addition operation.
So, I assume that P is indicated by a victor, by a position victory whose comp1ntsare say P x in the
x direction on P y in the Y direction. So, what I have is ij maybe index ofone particular pixel
location and from this,I generate the index of another pixel location (r,s) which is nothing but the
vector addition ij plus (P x , P y ). So, this is a vector addition operation.
ThenwhatI have to do is I have to compute whether this image index, image point index(r, s)
thatyou have generated by this vector addition operation, whetherthat is within our image or not.
So,I have to check if(r,s), this vector is less thanM, M. Thenonly this (r,s) position is within our
image because our image is of dimension capital M by capital M which are indexed from0 to
12
capital M minus 1 and 0 to capital M minus 1. So, if this point is satisfied,this condition is
satisfied; that means (r,s) point remains in our image.
So, if this is satisfied, then what I see is what is the intensity value at location ijandwhat is the
intensityvalue at location (r, s)? So, using those intensity values as indices in our matrix capital
A,I can increment thecorresponding location in matrix capital A by 1.So,our operation will be
that if this is true, then A [I (i,j),I (r,s)], these has to be incremented by 1.So, this becomes A [I
(i,j),I (r,s) plus 1 and this is where our iteration ends.
So,you find that at the end of execution of this, the matrix A will contain the number of times a
pair of pixels having a pair of intensity values occur within the image I where the these pair of
pixels follow the position relation as indicated by this position operator or position victor capital
P. Soonce I get this A matrix, the matrix capital A; then from this matrix capital A,if I add all the
elements in the matrix A and divideevery element of the matrix capital A by that
summation,what i essentially i get is our co-occurrencematrix capital C.
Thenext operation to obtain the featuredescriptors, the texturedescriptors from thisco-occurrence

matrix is you analyze this co-occurrence matrix to generate various kinds ofdescriptors. So, what
are the different typesof descriptors that we can obtain from the co-occurrencematrix?Thefirst
type of descriptors that we can obtain from the co-occurrencematrix is the maximum
probabilityor this is nothing but maximum of C ij over all ij.
So, what is this maximum probability indicates?This maximum probability indicates the
strongest response to the position vector P of the given image I. So, this will indicate that
whether given a position operator say capital P if the capital P indicates a position in the
horizontal direction, then how the texture is responding to that horizontal variation.
13
Theother kind of descriptor that we can obtain from this co variance matrix is what is called
element difference moment oforderk and this element difference moment is defend like this; i
minus j to the power kC ij where we have to take the summation over i and j.So, this is what is
called the element difference operator and this element difference operator, you will find that it
will be low if higher values appear along the main diagonal because along themain diagonal,we
will have i is equal to j, so if higher values of C ij they appear along the main diagonal of the co-
occurrencematrix; then this elementdifference moment, k’th order elementdifference moment
will assume a low value.
Similarly, we can have the other operator which is just inverse of this that is C ij dividedby i
minus j to the power kand you have to take the summation of these over i and j and this gives
you just the inverse effect.That means if the higher element values if larger will the value ofC ij
appear along the main diagonal; in that case,this particular value will have this particular
quantity will have a higher value and if the elements along the main diagonal, they have low
values, then this will also give a lower value.
Theother descriptors that we can obtain, fourth descriptor from this co-occurrence matrix C is
what is called entropy and we know that an entropy is defined in this manner; we have C ij , then
log of C ij and we have to take the summation over i and j and as we know that this entropy is
nothing but a measure of randomness,so it tells us that how random the given texture measures.
Similarly, the othertexturemeasure that we can obtain from the co-occurrence matrix say fifth
measure is what is called uniformity andwhich isdefined as summation of C ij square and here
again, you have to take the summation over i and j.So, we will find that this particular value
uniformity, it will have highest value when all C ij ’s are equal.That means our texture is the co-
occurrencematrix for that particular texture is a uniformco-occurrence matrix.So, these are the
various texture descriptors that can be obtained from theco-occurrencematrix.
14
And, as we have said, in our algorithm that given a texture image, how we can compute the co-
occurrence matrix of the giventexture image.So, these are the differencedescriptors, the texture
descriptors whether they are used, whetherthey are derived fromhistogram moments or they are
derived from the co-occurrencematrix, these are the statistical descriptors that we can have of a
given texture.
And, as we said that along with these statistical descriptors, we can also generate some spectral
descriptor.That means we can generatedescriptors from the spectral domain.So, to obtain the
spectral descriptors, what we have to do is we have to find out what is the Fourier spectrum of
the given image.So firstly, we have to find out the Fourier spectrum.
So, what we get from this Fourier spectrum?Theprominent peaks in the Fourier spectrum will tell
us what is the principal direction of the texture patterns.That is whether the principal direction of
the texturepatternis horizontal or principal direction of the texture pattern is vertical or they are
diagonal and so on.
So, this principal direction of the texturepatterns is obtained from the Fourier spectrum or the
prominent peaks in theFourier spectrum and another information that we get is from the location
of the peaks.So, the location of the peaks within theFourier spectrum tells you what are the
fundamental special periods of the patterns.Aswe hadseen earlier that whenever you have a
texture,the texture has some special variation and this special variation can have some
periodicity, theperiodicity maybe strictly defined or it may not be strictly defined.
So, the location of the peaks,location of the prominent peaks in the Fourier spectrum, theygive
you information about what are the fundamental periods of the texture pattern. Now, for
convenience of computation,what we do is this Fourier spectrum is converted into polar
coordinates.So, polar coordinate means I have the Fourier spectrum which is originallyCartesian
coordinate.So,I have directions u and v and these I would like to convert in the polar coordinate.
15
So, in polar coordinate, a particular direction theta, if I find out the variation in aparticular
direction of theta;so each value of theta will give me a 1 dimensional functionwhich I call as S
theta (r) because r is theradial distance.So, if I convert this into polar coordinates,the Fourier
spectrum into polar coordinates; then for each value of theta, so what I have is now the Fourier
spectrum which is represented in the form of S theta (r) as we are going for polar coordinate
representation and each value of theta gives us a function a1 dimensional function which I can
represent as S theta (r).
Similarly, for each value of r that is the radial direction,I can have a 1 dimensional function
which I can represent as S r theta.So, what do this S theta (r) and S r theta indicates? S theta (r) as we
are saying that we are moving in a particular direction along the radial access; so thistells uswhat
is the behavior of the spectrum along a radial direction from the origin.So, as we are moving in
this particular direction and for this particular theta, we are computing this S theta (r), it tells us
that what is the behavior of the spectrum along a radial direction as you move fromthe origin
within the spectrum.
Similarly S r theta, because this S r theta is for a particular value of r; this indicates that what is
thebehavior of the spectrum if I move along a circle of radius r centered at the origin.So, theseare
the2 behaviors of the spectrum that can be obtained if I represent the Fourier spectrum in the
polar coordinates.
And,a more global descriptor can be obtained from these 2 functions.So, you can have more
global descriptors which are given as S r is equal to summation S theta (r) where you take the
summation fromtheta equal to 0to pi and the other descriptor that we can obtain is S theta which
we have to take from S r theta where this r will vary from say 1 to capital Rwhere this capital Ris
relatedto dimension of the imageor similarly it is related to dimension of the Fourier spectral
coefficients that we get.
16
Sonormally, if I have an image of size capital N by capital N, then value ofR is typically taken as
capital N by 2.Sohere,you find that by varying the values of R and theta, R is the radial variable
and theta is the angular variable.So, by varying the values ofthese R and theta,we can
generatetwo1 dimensional functionswhich describe the texture energy content.So, if I vary the
value R,I get one1 dimensional function which gives us an indication of the variation of the
texture energy content as we move along the radial direction globally and if I vary the value of
theta, this gives as the variation of the textural energy content as we move along the concentric
circles within the spectral.
So, from these 2, the descriptors which are normally generatedis one kind of descriptor is the
location of highest peak,the other kind ofdescriptor can be mean and varianceofamplitude as well
as axial variation and which can also have the distancebetween mean andhighest value of the
function.
So, these are the various descriptors that can be obtained from these 2 functions; S r and S theta .
Now,what we mean by this?Letus take look at an example.
Suppose,I have a texture image which is given like this; so this shows atypical texture image.So,
if I have a texture image of this form, you will find that in the frequency domain if in the
Cartesian coordinated domain if I put it as uv,there are few horizontal lines and few vertical lines
and for that I will have some spectral components in these difference locations.Similarly,there
are diagonal lines which are oriented at 45 degreeand 135 degrees; so these are the pointswhich
correspond to those diagonal patterns.
If I plot the S r with r,a typical pattern of this S r 1 dimensional function of this is you can obtain
something like this.So, it says that as we move radially from the center of the spectral plan, what
we are doing is we are moving into more and more frequency components as it is quite obvious
17
that in most of the images, we have higher energy content of higher low frequency energy
content whereas lower, high frequency energy content and this is the typical pattern that we will
get when we plot S r versus r. Butwe get very very important information if I plot S (theta), S
(theta)versus theta.
So, if I plot it from 0 to 180 degrees; so somewhere here we have 90 degrees, somewhere here
we have 45 degrees, somewhere here we have 135 degrees and we will find that we will get a
pattern something like this.So, it clearly shows that we have the periodic nature of the pattern in
the horizontal direction with theta equal to 0, we have the strong periodic nature in the vertical
direction that is theta is equal to 180and we also have sorry theta equal to 90 degree and we also
have stranger periodic patterns in the 2 diagonal directions that is theta equal to 45 degree and
theta equal to 135 degree.
Now, opposed to this if I have a texture pattern simply of this form,I have horizontal pattern and
i have vertical pattern;so if I plot S (theta)versus theta of this particular texture pattern, that kind
of plot that we will get is something like this.So, this is 180 degree, this is 90 degree and this is 0
degree.
So, hereyou will find that if I consider this S (theta)plot,from thisS theta plot,I can clearly
demarcate between this texture and this kind of texture. So, this S (theta), this1 dimensional
function provides very very important descriptor which can be used to describe texture as well as
to discriminate among the textures and the various kinds ofdescriptors as we have said that the
location of the peaks,similarly mean and varianceand also the distance of the highest value from
the mean value; these are the different type of descriptors that can be obtained from this spectral
domain.
So, till now what we have done is we have discussed about different descriptors, the shape
descriptors, we have discussedabout the shape descriptors which are obtained from the
boundary,the shape descriptors which are obtained from the region as well as the texture
descriptors which are also region descriptors.
Now, for object classification or objectidentification,what we have to do is we have to make use

of these descriptors.So, if I can assume that each of the descriptor to be a scalar quantity and
each descriptor is a component of a particular vector;so if I put if I arrange different few of these
descriptors in an ordered manner, what I can have is a vector representation of difference
descriptors and these vector as we will see in our next class,we will call as feature vectors and
based on the feature vectors, we can design some classifier or some recognizer using which we
can recognize the object present in the image.
So,we will discuss about the object recognition or object understanding problems in our next
class.Now, let us see some of the questions quiz questions from our today’s lecture.
18
Thefirst question is what is the significance of second order histogram moment? The
secondquestion, what is co-occurrence matrix? Third question, how do you measure texture
entropy from co-occurrence matrices? The forth question, a texture has prominent periodic
intensity variation in vertical direction; what will be the nature of S theta?
Thankyou.
19
Prof. P.K. Biswas
Lecture - 40
Object Recognition
Hello, welcome to the video lecture series on digital image processing.Now, in today’s lecture,
we will discuss about the final phase of image processing that is image understanding which we
have termed as object recognition.
So,tillour last class, we have seen the different representation and description techniques and we
had said that for understanding the images are to recognize the objects present in the
images,wehave to have a proper representation mechanism so that the objects or the shapes
present in the image can be represented properly.
Andthen, for such a representation scheme, we had to have a proper description; from the
represented shapes,wehave to generate a proper description so that using these descriptions, the
shapes can be matched against a state of similar such descriptions which are kept in the
knowledge base and after this matching, we can identify that the object that we are getting from
the image which particular object it is or we can roughly classify that which class to which class
of the objects which are there in the knowledge base of the computer, this current object belongs.
So, till our last class, we have seen a number of such representation and description techniques.
1
Sofirstly, we had obtained some boundary based descriptors and we have seen that the different
boundary based descriptors or shape numbers, the shape number is something which is generated
from differential chain code representation of the boundary.Wehave seen the auto regression
coefficients where for getting these auto regression coefficients,we had to get a polygonal
approximation, a polygonal representation of the boundary. Then the corners at vertices in this
polygonal, they are represented by an auto regression model and by solving a number of linear
equations, we can find out we can solve for those auto regression coefficientsand this auto
regression coefficientsor state of regression coefficients that also act as a descriptor of the shape.
Then, we have seen a boundary signature and we have said that this boundary signature is
nothing but some one - dimensional representation of a 2 dimensional boundary.So, there what
we have to do is we have get the centroid of the shape and from the centroid of the shape, we
have to get the distance of different boundary points and when you get the distance of different
boundary points; then from the center, the boundary points have to be traced either in the
clockwise direction or in the anticlockwise direction.
And, the direction of this particular distance, if I plot the distance is the direction from the
centroid of that particular object; then what we get is 1 dimensional or 1D functional
representation of the 2 dimensional boundaryand that is what we have called as a boundary
signature and we have seen that if I have shapes of different types for different types of shapes,
we get different types of boundary signatures and this signatures are obviously, they have to be
normalized properly so that the values lies between0 and 1and these boundary signatures can
also be used for recognition of the shape or shape description purpose.
Then, the other boundary based descriptor that we have obtained is Fourier descriptor.So, in that
case what we have done is we have represented different boundary points, the points lying on the
boundary as a complex numberand if you trace the boundary either in the clock wise direction or
in the anticlockwise direction; then basically what I get is a sequence of complex numbersand if I
take the descript Fourier transformation of this sequence of complex numbers, then what I get is
a state of Fourier coefficients and in this case,in general,the state of Fourier coefficients are
complex in nature.
So, this state of Fourier coefficients, they also act as descriptors which can be used for matching
purpose or for recognition purpose.Thenwe have defined, we have obtained some more boundary
based descriptors like boundary straightness, bending energy and so on.Then we have also talked
about some region based shape descriptors.So, in region based shape descriptors, we have
defined we have seen different descriptors like eccentricity whereeccentricity we have said that it
is nothing but the ratio of the length of the major access and the length of the minor access of
that particular shape.So, what you have to get is we have to obtain the length of the major access
and we have to obtain the length of the minor access and the ratio of thesetwo is what is known
as eccentricity.
Thenwe have seen another region based shape descriptor which we have said is elongatedness
and for getting this elongatedness, what you have to obtain is a minimum bounding rectangle and
the ratio of the sides of this minimum bounding rectangle is defined as this elongatedness.We
have also talked about rectangularity, we have also talked about compactness, then we have seen
2
some other moments, moment based descriptors and we have seen that there are 4 different
moment invariants which can be used as descriptors of the shape.
Thenthe other kind of descriptor,we have said is here,wewant to find out the surface reflectance
property of the object present in the image and we have said that this surface reflectance
property, this may be a collar or the texture information feature present on the object surface.So,
this region based descriptors, say for example, for textures, we have talked about the histogram
best moments.Similarly, we have talked about the co-occurrence, the descriptors obtained from
the co-occurrence matrix of the texture and we have also seen the different structural different
spectral descriptors of a particular texture.So, all these different types of descriptors or a set of all
these different types of descriptors, they can be used for recognition of an object or
understanding the object present in the image.
Now,youfind that out of all these, it is the shape number which is slightly different because shape
number basically gives a chain of different numerical values say ranging from 0 to 7 and this
chain or cycle of numerical values actually describe the boundary of the objectwhereas the other
kind of descriptors, we can assume that each of the other descriptors highlight a particular
property of the shape.
So, if I take a set of such descriptors, say if I take n number of such descriptors and put those
descriptors in an ordered manner; then whatI get is an n dimensional vectorand this n
dimensional vector which is normally called as a feature vector; so each of this feature vectors
represent a particular object or a particular object shape.So, we will see that for different objects
or the different object shapes, this feature vector is going to be different.So, the recognition can
be done based on this feature vectors because we assume that for different objects, they will be
different.
3
So,in today’slecture, we will talk about these object recognition problems.Sofirst, we will talk
about the recognition technique using the shape number which is obtained from the boundary of
the object,thenwe will talk about some feature based techniques for object recognition.So, under
that we will talk about the linear discriminant function,we will talk about the minimum distance
classifier and we will talk about optimal statistical classifier.
So, all these different the linear discriminator discriminant function, the minimum distance
classifier or optimal statistical classifier; all these different types of techniques are using the
feature vector as we have said that the feature vectors is nothing but an ordered state of the
different features of descriptors that we have obtained earlier.Andlastly, we will talk about of
another approach of recognition which is the neural network based technique and this neural
network based technique also makes use of the feature vectors as the descriptors of different
objects or different object shapes.So, the first one that we will talk about today is what we said is
the shape number based approach.
So,recognition technique using shape number:so earlier, we have talked about the shape number,
generation of shape number using 8 directions.So, there we have said that in the 8 direction, we
have identified the different directions of moves like this;so this is the direction which was given
as 0, this is 1,this was 2, this was 3, this was 4, this was 5, this was 6 and this direction was 7 and
using this 8 directions, we had obtained the chain code, we have obtained differential chain code
and then we have obtained the shape number.
Now,for simplicity of the discussion, today we will talk about another kind of chain code
generation which does not use 8 directions butit uses 4 different directions.So, today we will talk
about the chain code using 4 directions and in these4 directions,our direction identifications are
like this; so this direction will be mentioned as 0, this direction as 1, this direction as 2 and this
direction as 3.
4
So, if I obtain a differential chain code using these 4 different directions; so first what we do is
we obtain a chain code using the 4 different directions, then you obtain the differential chain
code using 4 different directionsand we have seen that this differential chain code if I consider
that to be a cyclic chain or cyclic set of numbers; then from here, from this differential chain
code,I can redefine the starting point so that the resulting numerical value will be the
minimumand we have said that that particular numerical value is what we have termed as a shape
number.
Now,there is another term which is called order of the shape number order of shape
number.Now, this order of shape number is nothing but the number of digits which are present in
the shape number.So, if I have a shape number something like this, say 0 013210, something
like this;so we find that in this particular shape number there are 1,2,3,4,5,6,7,7 different
digits.So, the order of this particular shape number will be 7.
Now, the problem is if I want to use the shape number as a descriptor and this shape number is to
be used for recognizing a particular object or to say that a particular object is similar to one of
the state of objects which are present in the knowledge base of the computer. Thenwhat you have
to do isthat this shape number must be independent of the starting point I mean whatever starting
point we used to generate the shape number.So, this generated shape number must be
independent of the starting pointand secondly the order of the shape number of the object of the
object shape which we are trying to recognize and the order of the shape number which is therein
our knowledge base which is stored in the computer, they must be same.
Now, the problem is something like this; suppose,I have an object of this form and I have a
similar shape of object but the size is slightly different,so this will give me one particular shape
number say S 1 , this will give meanother particular shape number say S 2 . So,our first job is when
we want to generate the shape number of this particular object say O; in that case, the shape
5
number that I have to generate must be starting point independent and at the same time, whatever
is the order of the shape number of S 1 , the order of the shape number S 2 must be same.
So,to obtain this, what we have to do is we have to align the grid in a particular way so that our
generated shape number is independent of the starting point and not only that, we have decide
that what should be that grid spacing to generate this particular shape number.So,one way to
make it starting point independent is that you align the grids in such a way that the grid is aligned
with the principle axisof the particular shape.So, if I have an object say something like this,then
our principle axisis this one.
So,I want to ensure that the grid that I face,the grid must be aligned with this principle axis and
this is nothing but as we have said that major axisand minor axis is perpendicular to this major
axis.So, our girds must be aligned with this major axis and the minor axisand then to decide
about what should be the grid spacing because this is what we will decide about what will be the
order of the shape number that we generate.
Now, since we are discussing about the chain code using 4 different directions,so you will find
that if I specify the order of a particular chain code of a rectangle what we can do is we can
enclose this particular shape using a rectangle and this rectangle has to be divided into a number
of cells.So, you can divide this rectangle into a number of cells like this and this is what will be
our grids.
So, I have to decide about that how many such cells I should have to generate the shape number
of a particular order.So, for doing this, what we do is we consider that what is the
eccentricity,wehave defined eccentricity earlier; so we define, we find out what is the
eccentricity of this particular shapeand then we generate a rectangle.Suppose, we are interested
in shape number of order say 18, so you want to make the order n equal to 18.So, we want to
obtain a rectangle of whose shape number will be of order 18 and you will find that I have
limited number of such possibilities,I can have limited number of shapes or rectangles having
shape number of order of the shape number equal to 18 and these possibilities are I can generate
a rectangle of size say 2 by 7 or I can have a rectangle of size say 3 by 6 or I can have a rectangle
of size say 4 by 5.
So, you will find that for all these different rectangles, the shape number the order of the shape
number will be equal to 18.Then, suppose the eccentricity of the object shape best matches with
the eccentricity of the rectangle 3 by 6;so what we have to do is we have to put a rectangle of
size 3 by 6centered at the middle point of the major axis of the shape and these grids must be
aligned with the major axis and minor axis of that particular shape.Andonce I have a rectangle of
this particular size 3 by 4 fitted into that particular rectangle of specific eccentricity, then what I
can have is using this form of grid spacing if I generate the shape number, then that particular
shape number will be of order 18.
So, depending upon a shape number of which particular order we want, we have to generate
rectangles of similar size and using the corresponding rectangles of the grid spacings, we can
generate the shape number.Now, our purpose is recognition.
6
So, what we have got is we have got a shape number S and this shape number is so obtained that
it is starting point independent and this shape number is having some order say nthat is the
number of digits present in the shape number is equal to n.Now, given 2 objects; so I have one
object O 1 and another object O 2 . ForO 1 , I have the corresponding shape number S 1 and for O 2 , I
have the corresponding shape number S 2 . Now, when I generate this S 1 S 2 , I make sure that they
are of same order n.
Now, this order n can be varied depending upon spacing of the grids or on which the shape
number is to be calculatedand we have said, discussed just now that how I can decide about what
should be the girds spacing depending upon the shape number of which particular order, we want
to generate.
Now, then comes a concept of degree of similarity.Now, what is this degree of similarity? When
we are given these 2 particular shape numberssay S 1 and S 2 of the given order,then the degree of
similarity say k is defined as the maximum order for which the shape numbers S 1 and S 2 still
matches.So, if I generate S 1 of say order 4 and S 2 say of order 4, then S 1 of order 4 and S 2 of
order 4 will be the same.Say S 1 of order 6 and S 2 of order 6 will be same;S 1 of order 8, S 2 of
order 8 will also be the same but I find that S 1 of order 10 is not same as S 2 of order 10.
In that case,I will say that the degree of similarity between these 2 shape numbers -S 1 and S 2 are
between 2 objects shapes O 1 and O 2 is equal to 8.So,I define this degree of similarity as say if I
say that S i (A) is a shape number of object shape A of order is equal to S i (B), this is also the
shape number of object shape B of the same order i; then S i (A) and S i (B) should be same for all
iless than or equal to kwhereas S i (A) and S i (B) for all i greater than k, they should not be same.
7
So, in such case, this is the highest order for which the shape numbers still matches that is called
the degree of similarity of these 2 shapes - A and Band using this degree of similarity, we can
find out whether given2 different shapes are similar or notand you will find that from here, of
course,I can define a distance function say distance between shapes A and B, this can also be
defined as 1 upon the degree of similarity k because more similar the objects are the distance
between those 2 objects shapes should be less.
Sosuppose, we are given a number of shapes like this; in the data base, say something like this
and suppose this is a shape which I want to recognize.So, for recognition purpose, what I have to
do is or for matching purpose what I have to do is first I have to get all the shape numbers say
this is object A, this is object B, this is object C and this is object X which I want to
recognize.So, first I have to start with the lowest order and for 4 connected chain code, the
lowest order of the shape number is equal to 4.
So, if I find that for the lowest order, all these shapes A B C and X, they are similar, then I put
them as a single node in a chain.So, this is for fourth order.Then I go for sixth order, then still I
may find that A B C and X, they are similar.Then, at the next order, next higher order, suppose at
order 8,I find that B is different but A C and X, they are same.Thenif I go for eighth order, B will
remain to be different and here I may find that A and C, they are similar whereas sorry A and X,
they are similar whereas C is different.
So,this is of order tenand after this, A and X may be different if I go for say twelfth order,I may
find that given A and X are becoming different.Butat least upto tenth order of the shape number,I
find that A and X are similar and B and C are different.So, at this point,I infer that this shape X
matches best with shape A which is or knowledge base.So, this is what is called a decision tree
and using this decision tree, we can obtain or we can recognize a given shape against a set of
shapes which are present in the knowledge base.
8
So, using the shape numbers and using the decision tree and following the concept of the degree
of similarity, we can recognize a particular shape using the shape number itself.Now, the next
topic that we will discuss is that is the featurebased recognition.
Andin this case, as I have said, what I assume is every object is represented by a set of features
or ordered set of features which we call as a feature vectorand suppose, say a feature vectorsay F
of a particular object contains n number of descriptors or n number of features.Now, each of
these features maybe either eccentricity, elongatedness and so on.So, I saythat this feature vector
F is of dimension n.
Now, if I consider an dimensional space or space of n different dimensions, then this feature
vector F will be represented by a point in that n dimensional space.Now, findthat if I have a class
of similar objects; in that case, the feature vectors of every object belonging to that particular
class will be similar.So, if I plot all these feature vectors in that n dimensional space, then these
different feature vectors say if I say that the class contains the n particular class of a objects,
contains say K number of objects.So, I will get K number of feature vectors and the points, the K
number of points corresponding to this K number of feature vectors n or n dimensional space
will be placed close to one another. So, these points will try to form a cluster in the n
dimensional space.
Now, using this concept that the feature vectors of the objects belonging to a particular class will
be very close to each other and the feature vectors of the objects belonging to different classes
will form 2 different clusters, they are located in 2 different locations in our n dimensional space;
we can design a classifier which can classify an unknown object into one of the known
classes.Now, let us say how that can be done.
9
Forour discussion purpose and for simplicity,let us assume that dimension of the feature vectors
that is n is equal to 2.So, if the dimension of the feature vector is equal to 2 that mean every
feature vector is represented by a vector of this form.Itwill have 2 components X 1 and X 2 and
every feature vector is nothing but a vector having these 2 components -X 1 and X 2 . Now, what I
do is let us plot these points in a 2 dimensional space.WhyI am doing for 2 dimension is it is
easier for visualization.
So, suppose this horizontal direction is to represent the first component of the feature vector X 1
and I take this vertical direction to represent the second component of the feature vector that is
X 2 and suppose we have 2 different classes of objects and the classes of objects,I represent it by
omega 1 and omega 2 . So, omega 1 represents one class of objects and omega 2 represent another
class of objects.
So, suppose the points belonging to or the feature vectors of the objects belonging to plus omega
1 , I represent them by this blue line blue circles,so like this and the feature vectors of all the
objects of the objects which belong to class omega 2 , I represent them as green circles, like
this.So, here you find that the feature vectors of the objects belonging to class omega 2 , they
form a cluster something like this and the feature vectors of the objects belonging to class omega
1 , they form a cluster like this.
Now, given such a situation, it is possible that I can design or I can find out a line separating
these 2 regions.Now, design of the classifier means I have to decide that what should be this line
which demarcatesbetween the feature vectors of the objects belonging to class omega 1 and
feature vectors of the objects belonging to class omega 2 . So, this class on this side represents my
class omega 1 and of this side represents my class omega 2 .
Now, that can be obtained very easily because from our school day mathematics we know that if
we have a straight line,then for the points belonging to one side of the straight line,I will get
10
avalue which is greater than 0 and for other sides on the straight line,I get a value equal to
1.Sobasically, what I have to do is I have to get an equation of this straight line say g (X),I call it
g (X) such that for any point or any vector say X 1 ,let me use some of the notation because
otherwise it will be confusing.So, let me represent them by super script.So, I say that X super
script 1 is the feature vector of an object belonging to class omega 1 and X super script 2 is a
feature vector for an object belonging to class omega 2 .
So, I can assume that for all these points X 1 my g (X1) that should be greater than 0and g (X2)
super script 2 for all the objects belonging to class omega 2 , this should be less than 0.So, once I
decide once I design this g (X) having this sort of property; then for any unknown feature vector
X if I find that g (X) is greater than 0,I immediately infer that X belongs to class omega 1
whereas if g (X) is less than 0,I immediately infer that X belongs to class omega 2 whereas of
course, g (X) equal to 0 that is the boundary case that means these are the feature vectors which
fall on this particular lineand such a function g (X), this is called a discriminant function.
So,I can design a discriminant function which basically divides the entire 2 dimensional space
into different halves, one half for all objects belonging to class omega 1 and the other half
corresponds to the objects belonging to class omega 2 and for any unknown feature vector X, we
just find outwhat is the value of g (X).Ifg (X) becomes greater than 0, wedecide that this
particular X belong to class omega 1 ; if g (X) is less than 0, we decide that the particular X
belongs to class omega 2 .
So, the design consideration is we have to find out what is this g (X) and this is what we have to
do for a 2 class problem thatmeans if we have only 2 classes of objects. Now, what happens if
we have multiple classes of objects?
So, suppose we have M number of classes of different objects; so in such case, the usual practice
is for every class, you design a discriminant function.That means I have a set of discriminant
11
functions g 1 (x), g 2 (x) like this,I have M number of discriminant functions g M (x) and in such
case,our decision rule will be something like this.
Thedecision rule is if I find that for a feature vector X, g i (X), so g i (X) is the discriminant
function for class omega i , g i (X) is the discriminant function for class omega I ;so, if I find that
g i (X) is greater than g j (X) for all j and j is not equal to iobviously.Inthat case,I decide that this
feature vector or the object having the feature vector X belongs to class omega i and here you
find that that the decision boundary between these 2 classes - omega i and omega j is given by g i
(X) minus g j (X) that is equal to 0because obviously for all the feature vectors all the points lying
on the decision boundary which is the boundary between the class omega i and omega j , I have
to have the corresponding functional values to be same that is g i (X) should be same as g j (X)and
the usual practice is that you design a discriminator or discriminant function for every pair of
classes.That means we design g ij (X)which is nothing but g i (X)minus g j (X) and here you find
that for a feature vector X if I find thatg ij (X) will be greater than 0.
Thisclearly says that I should have I must have g i (X) which is greater than g j (X).So, if I find that
g ij (X) is greater than 0,I immediately infer that X belongs to class omega iwhereas if g ij (X) is
less than 0,I immediately infer that X belongs to class omega j . So, what we have to do in this
case is for every pair of classes, we have to generate a discriminant function of the form g
ij (X).Now, this particular functional form, this gives ofa basis of designing a number of
classifiers.
12
So,one such basic classifier let us discuss which is called a minimum distance classifierminimum
distance classifier.So, what is this minimum distance classifier?Suppose,I have capital M number
of object classes and for the objects belongingto a particular class,I have a set of feature
vectors.So, what I say is every class, say every i’th class is represented by a mean feature
vectorbut the mean feature vector of object class say omega i is designed as is obtained as m i
equal to 1 upon N i summation of all the feature vectors X for all X belonging to class omega i .
So, what it say is I have say N i number of feature vectors of the objects belonging to plus omega
i.
So, if I take the mean of all those feature vectors, then the mean vector that is m i , this is a
representative of class omega i.So, this i 2 is for each and every class.So, when I have capital M
number of classes,I will have capital M number of such representative mean factorsand these are
the representatives of different classes.Then, what I do is given any unknown vector X; what i do
is i find out the Euclidian norm of this vector X from these mean representative classes.
So,I just find out D i (X) which is X minus m i . So, this gives me the Euclidian norm or the
Euclidianthe distance of this feature vector X from the representative of m’th class.So,I will say
that this particular feature vector X will belong to that particular class whose distance or
Euclidian norm, the distance of whose representative, the mean vector from this particular vector
X is the minimum.
So, when I have representatives like this; say this is m 1 , this may be say m 2 , this may be say m 3 ,
this may be say m 4 . So, these are the representatives of 4 different classes and if I have a feature
vector X 1 in somewhere here,then if I compute the Euclidian distance of X from each of these
different representatives,I find that the Euclidian distance between X and m 3 , this is
minimum.So, what I say is that this X belongs to class m 3 . Now,we find that what does this
Euclidian norm mean.
13
ThisEuclidian norm is nothing butX minus m i . ThisEuclidian norm if I expand this, it simply
becomes X transpose X minus 2 X transpose m i plus m i transpose m i . So, what I am saying is
this becomes my D i (X).So, what I am saying is if these D i (X) is minimum for a particular value
i, then X is assigned to that particular classand that is equivalentto having a discriminant function
of the form g i (X) which is nothing but X transpose m i minus half of m i transpose m i . So, that
can be easily obtained or easilygotfrom this particular expression.
So, I will say that this X will belong to that particular class for which this discriminant function
g i (X) is the maximum.So, X will be assigned to that particular class for which the g i (X) gives the
maximum numerical value.So, this is a particular class of classifier which is known as minimum
distance classifier because we are assigning the vector X to a particular class which gives the
minimum distance.
14
Now,following the same approach, we can have a classifier which is called optimum classifier or
optimum statistical classifier.So, what is this optimal statistical classifier?Here, you find that
every feature vector X that is coming from objects belonging to a particular class omega i. Then
the job of the classifier is the classifier does not know that from which class this feature vector X
have been generated.So, the classifier based on it is decision rule has to decide that to which
particular class, the feature vector X has to be assigned.
Nowsuppose, the classifier decides that X should belong to omega j . So, find that X has been
generated form class omega i but the classifier has decided wrongly that it belongs to class
omega j . So, once the there is such a wrong decision, then the classifier incurs a loss.So, we
represent this loss by say L ij . That means the loss incurred for taking a decision in favor of class
omega j when the actual class is omega i . Then this optimal classifier is designed based on the
concept that average loss of taking a decision will be minimized.So, how we can represent this
average loss?
So, you find that this average loss can be written as r (X) which is nothing but L kj into P of
omega k given feature vector X.So, what is this L kj ? L kj is the loss incurred for taking a
decision that the feature vector belongs to class omega j no omega k and we have to take the
summation for k equal to 1 to M.That is for all the possible classesand what is this P (omega
k X)?P (omega k X) is the probability of class omega k given a vector X.
15
Now, from our basic probability theory,we know that P (a given b) can be written as P (a),
probability a into probability of b given a divided by probability of b.Now, using this, you will
find now this r j (X), the average loss, this can be written as 1 upon probability of X into
summation k equal to 1 to capital M where capital M is the total number of classes into L kj P (X
given omega k ) multiplied by P of omega k .
Now, here you find that what is this P (X given omega k )?P (X given omega k ) is nothing but the
probability density functions of the feature vectors belonging to class omega k and capital P
(omega k ) this is the probability of occurrence of class omega k . Now, here you find because this
P (X) will be common to all the functions to all this loss functions r j (X); so this P (X) can be
removed from the particular expression.So, once I remove this P (X), the form of r j (X) is
something like this.r j (X) k can now become L kj into P (X given omega k ) into P (omega k )
where k is equal to 1 to capital M.So, as we have said that the job of the classifier is to take that
particular decision for which the average loss is minimum; so here the classifier will assign this
vector X to a particular class i for which this loss function r i (X) will be minimum.
Now, usually for a practical purpose, we assume that this loss function L kj that is the loss
incurred for taking a decision in favor of k when the actual class is say j is equal 0 for taking a
correct decision and equal to 1 for taking a wrong decision.
16
So, this L kj or I can write it as L ij is actually written as 1 minus delta ij . That means I am taking
decision that the object belong to class j whereas the object actually belongs to class omega i . So,
in this particular case, this delta ij , this will be equal to 1 whenever i is equal to jso that the loss
function becomes equal to 0and this will be equal to 0 whenever i is not equal to j so that the loss
functions become equal to 1and by taking this particular modification, now you will find that we
can write this loss function r j (X), the average loss r j (X) which can be written as p (X) minus p
(X given omega j ) into p (omega j ).
So, you will find that in this particular case, the feature vector X will be assigned to a class i for
which this r i (X) will be minimum or we can say that it will be assigned to class i if we find that P
(X given omega i ) into p (omega i ),this is greater than P of X probability density function omega
j into the probability of occurrence of class omega j.So, whenever such a situation occurs that P
of X given omega i into P omega i is greater than P of X given omega j into P omega j ; in that
case, the feature vector X will be assigned to class omega i and here you find that this is nothing
but this is equivalent to having a discriminant function of the form g i (X) which is equal to P of X
given omega i into p of omega i .
So, here you find that to obtain such a discriminate function,I need to have 2 different probability
terms.One ofthe probability terms is the probability distribution of the feature vectors belonging
to class omega i and the second term that I must have is the probability of occurrence of class
omega i . So, only after having these 2,I can have I can design our requireddiscriminant function
g i (X).
So, to obtain these 2 probability terms, you have to do a lot of experiments.Tosimplify the
matter, what is done is in most of the cases, you assume a particular probability distribution
function or probability density functionand as we have seen that for most of the applications, we
normally use Gaussian probability density function.
17
So, if I assume Gaussian probability density function that is Gaussian PDF which is of the form
P of X given omega i is equal to 1 upon 2pi to the power n by 2 into C i to the power half
exponential minus X minus m i transpose C i inverse into X minus m i where m i is the mean of the
vectors X belonging to plus omega i and C i is the covariance matrix of all the vectors belonging
to class omega i .
So, from here, it can be deduced that g i (X) can be obtained which is of this form, a
discriminantfunction g i (X) can be written like this; ln P of omega i minus half ln C i minus half X
minus m i transpose into C i inverse into X minus m i . So, if we make few further
simplificationsthat if we say that the covariance matrix for all the classes is same that is C i equal
to C and all the classes are equally probable; in that case, g i (X) can be further simplified as it
simply becomes minus half into X minus m i transpose C inverse into X minus m i .
So, here you find that such a discriminant function again leads to a particular type of minimum
distance classifier and for that minimum distance classifier, we have to take this particular
function as the distance function and this is what is known as Mohalanobis distance.So, just
byextending the same concept, we can have a probabilistic.Thisis also called optimal statistical
classifier because it tries to minimize the loss incurred, average loss incurred for taking a
particular decision.Now, this type of features can also be used to train a neural network and we
can use a neural network for the recognition purpose and the type of neural network which is
most common for recognition is what is called a multi-layer feed forward network.
18
So, here it is something like this; in each of these layers, we have a number of neurons.Thisis
called input layer and this is the output layer.Thenumber of neurons in the output layer is same as
the number of classes that we have and the number of neurons in the input layer is same as
thedimensionality of the feature vectorsand there are one or more hidden layers.So, these are the
hidden layers and from every layer, the neurons are connected to the neurons of the upper layer
through some connection vets,so something like this.
So, what you can do is you can train this particular neuralnetwork using some feature vectors
whose class belongingness are known so that these vets are adjusted properly and once the
neuron this neural networkis trained, then for any given unknown feature vector, the neural
network will be able to classify that feature vector into one of the m different classes.So, while
training, what you have to do is suppose we feed a feature vector X which belongs to class say
omega i and we know that if the input is from a class omega i , I should get an output which is say
t i but actually I get something other than t i .
So, in that case,what I have an error and based on the error, the error information is propagated
backwards to adjust all these connection vets and that is why this is also known as
backpropagation learning.Thatis when you train the neural network, what you give is the error
back propagation concept for training the neural network.So, we will not go into the details of
this neural network approach.
Sotypically, these are the different approaches which can be used for object recognition
purpose.Obviously, of course, there are other schemes like we can represent an object in the form
of a graph and we can go for graph matching techniques for recognition of the object.So, with
this we come to the end of today’s lecture.
19
Now, let us see some of the questions based on today’s lecture.Thefirst question is what is order
of shape number?Howdo you define degree of similarity between 2shapes?Themean feature
vectors of 4 different object classes are located at 4 vertices of a unit square.Drawthe decision
boundaries of a minimum distance classifier?Then, define Mohalanobis distance.Why the
training procedure of a feed forward multi-layer neural network is termed as back propagation
learning?And, what is the difference between supervised learning and unsupervised learning?
So, with this, we come to the end of our video lecture on digital image processing and I hope that
you will find this material quite useful.
Thankyou.
20

NPTEL Transcript PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

NPTEL Transcript PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Digital Image Processing

Prof .P. K. Biswas

(Refer Slide Time: 1:40)

(Refer Slide Time: 6:54)

(Refer Slide Time: 7:42)

(Refer Slide Time: 11:07)

(Refer Slide Time: 14:04)

(Refer Slide Time: 14:51)

(Refer Slide Time: 17:08)

So, let us look at this different applications one after another.

(Refer Slide Time: 27:36)

(Refer Slide Time: 28:55)

(Refer Slide Time: 30:19)

(Refer Slide Time: 38:48)

(Refer Slide Time: 41:40)

(Refer Slide Time: 44:54)

(Refer Slide Time: 45:44)

(Refer Slide Time: 48:58)

(Refer Slide Time: 51:39)

(Refer Slide Time: 52:18)

(Refer Slide Time: 52:58)

(Refer Slide Time: 1:05)

(Refer Slide Time: 2:37)

(Refer Slide Time: 3:03)

(Refer Slide Time: 3:37)

(Refer Slide Time: 7:30)

(Refer Slide Time: 9:52)

(Refer Slide Time: 12:08)

(Refer Slide Time: 13:31)

(Refer Slide Time: 14:52)

Similarly, if I take the convolution of 2 signals in time domain, that is equivalent to

(Refer Slide Time: 49:26)

(Refer Slide Time: 49:36)

I get at minus 6. At minus 6, you find that the value is 0.

(Refer Slide Time: 49:42)

(Refer Slide Time: 49:44)

(Refer Slide Time: 49:53)

At n equal to minus 1, I get value equal to 5.

At n equal to 0, I get value of 7.

(Refer Slide Time: 50:00)

At n equal to plus 1, I get value of 9.

At n equal to plus 2, I get value of 3.

(Refer Slide Time: 50:07)

At n equal to plus 3, again I get the value of 0.

(Refer Slide Time: 51:00)

(Refer Slide Time: 51:12)

(Refer Slide Time: 54:41)

(Refer Slide Time: 57:06)

(Refer Slide Time: 1:06)

(Refer Slide Time: 2:59)

(Refer Slide Time: 5:38)

(Refer Slide Time: 7:24)

(Refer Slide Time: 9:52)

(Refer Slide Time: 11:08)

(Refer Slide Time: 012:58)

(Refer Slide Time: 16:39)

(Refer Slide Time: 17:41)

(Refer Slide Time: 22:35)

(Refer Slide Time: 23:23)

(Refer Slide Time: 28:26)

(Refer Slide Time: 30:03)

(Refer Slide Time: 31:21)

(Refer Slide Time: 37:00)