Documente Academic
Documente Profesional
Documente Cultură
PLANT
CLASSIFICATION
OBJECT TRACKING
IN VIDEO
Prepared By,
Dr. V. Sathiesh Kumar,
Assistant Professor,
Department of Electronics Engineering,
MIT, Anna University.
Ph: 044-22516238
Email: sathiieesh@gmail.com
www.sathieshkumar.com
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 1
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Objectives:
Step 3: Run the python script (load_display_save.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python load_display_save1.py
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 2
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (load_display_save.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python load_display_save.py --image new.jpeg
or
$ python load_display_save.py -i new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 3
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
What a pixel is, how pixels are used to form an image, and then how to access and manipulate pixels in
OpenCV.
Objectives:
What is a pixel?
Pixels are the raw building blocks of an image.
Every image consists of a set of pixels.
There is no finer granularity than the pixel.
Normally, a pixel is considered the ―color‖ or the ―intensity‖ of light that appears in a given place in
our image.
If we think of an image as a grid, each square in the grid contains a single pixel.
If the image has a resolution of 600 x 450, meaning that it is 600 wide and 450 pixels tall.
Overall, there are 600 x 450 = 270,000 pixels in our image.
Most pixels are represented in two ways: grayscale and color.
In a grayscale image, each pixel has a value between 0 and 255, where zero is corresponds to
―black‖ and 255 being ―white‖. The values in between 0 and 255 are varying shades of gray, where
values closer to 0 are darker and values closer 255 are lighter.
Color pixels, however, are normally represented in the RGB color space (one value for the Red
component, one for Green, and one for Blue, leading to a total of 3 values per pixel).
Each of the three Red, Green, and Blue colors are represented by an integer in the range 0 to 255,
which indicates how ―much‖ of the color there is.
Given that the pixel value only needs to be in the range [0,255] we normally use an 8-bit unsigned
integer to represent each color intensity.
We then combine these values into a RGB tuple in the form (red, green, blue).
To construct a white color, we would fill each of the red, green, and blue buckets completely up
(255, 255, 255), since white is the presence of all color.
Then, to create a black color, we would empty each of the buckets out (0, 0, 0), since black is the
absence of color.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 4
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
To create a pure red color, we would fill up the red bucket (and only the red bucket) up completely
(255, 0, 0).
For your reference, here are some common colors represented as RGB tuples:
Black: (0, 0, 0)
White: (255, 255, 255)
Red: (255, 0, 0)
Green: (0, 255, 0)
Blue: (0, 0, 255)
Aqua: (0, 255, 255)
Fuchsia: (255, 0, 255)
Maroon: (128, 0, 0)
Navy: (0, 0, 128)
Olive: (128, 128, 0)
Purple: (128, 0, 128)
Teal: (0, 128, 128)
Yellow: (255, 255, 0)
The point (0, 0) corresponds to the upper left corner of the image. As we move down and to the
right, both the x and y values increase.
Here we have the letter ―I‖ on a piece of graph paper. We see that we have an 8 x 8 grid with 64
total pixels.
The point at (0, 0) corresponds to the top left pixel in our image, whereas the point (7, 7)
corresponds to the bottom right corner. It is important to note that we are counting from zero rather
than one.
The Python language is zero indexed, meaning that we always start counting from zero.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 5
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
From there, we are given a tuple representing the Red, Green, and Blue components of the image.
However, it‘s important to note that OpenCV stores RGB channels in reverse order.
While we normally think in terms of Red, Green, and Blue, OpenCV actually stores them in the
order of Blue, Green, and Red.
# images are just NumPy arrays. The top-left pixel can be found at (0, 0)
(b, g, r) = image[0, 0]
print "Pixel at (0, 0) - Red: {r}, Green: {g}, Blue: {b}".format(r=r, g=g, b=b)
# now, let's change the value of the pixel at (0, 0) and make it red
image[0, 0] = (0, 0, 255)
(b, g, r) = image[0, 0]
print "Pixel at (0, 0) - Red: {r}, Green: {g}, Blue: {b}".format(r=r, g=g, b=b)
cv2.imshow("Original-RedDot@0,0", image)
# compute the center of the image, which is simply the width and height divided by two
(cX, cY) = (w / 2, h / 2)
# since we are using NumPy arrays, we can apply slicing and grab large chunks in image
# Top left corner
tl = image[0:cY, 0:cX]
cv2.imshow("Top-Left Corner", tl)
# in a similar fashion, let's grab the top-right, bottom-right, and bottom-left corners and display
tr = image[0:cY, cX:w]
br = image[cY:h, cX:w]
bl = image[cY:h, 0:cX]
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 6
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# now let's make the top-left corner of the original image green
image[0:cY, 0:cX] = (0, 255, 0)
Step 3: Run the python script (getting_and_setting.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python getting_and_setting.py --image new.jpeg
or
$ python getting_and_setting.py -i new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 7
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Objectives:
The main objective of this lesson is to become familiar with the cv2.line, cv2.rectangle and
cv2.circle functions.
Program 3:
Step 1: Write the code in Text Editor
# import the necessary packages
import numpy as np
import cv2
# initialize our canvas as a 300x300 with 3 channels, RGB with a black background
canvas = np.zeros((300, 300, 3), dtype="uint8")
# draw a green line from the top-left corner of our canvas to the bottom-right
green = (0, 255, 0)
cv2.line(canvas, (0, 0), (300, 300), green)
cv2.imshow("Canvas", canvas)
cv2.waitKey(0)
# draw a 3 pixel thick red line from the top-right corner to the bottom-left
red = (0, 0, 255)
cv2.line(canvas, (300, 0), (0, 300), red, 3)
cv2.imshow("Canvas", canvas)
cv2.waitKey(0)
# draw a green 50x50 pixel square, starting at 10x10 and ending at 60x60
cv2.rectangle(canvas, (10, 10), (60, 60), green)
cv2.imshow("Canvas", canvas)
cv2.waitKey(0)
# draw another rectangle, this time we'll make it red and 5 pixels thick
cv2.rectangle(canvas, (50, 200), (200, 225), red, 5)
cv2.imshow("Canvas", canvas)
cv2.waitKey(0)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 8
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# let's draw one last rectangle: blue and filled in by specifying -1 as the thickness
blue = (255, 0, 0)
cv2.rectangle(canvas, (200, 50), (225, 125), blue, -1)
cv2.imshow("Canvas", canvas)
cv2.waitKey(0)
# reset our canvas and draw a white circle at the center of the canvas with
# increasing radii - from 25 pixels to 150 pixels
# loop over a number of radius values, starting from 0 and ending at 150, incrementing by
# 25 at each step.
# the xrange function is exclusive; therefore, we specify a stopping value of 175 rather than
# 150. the output of xrange function stops at 150 and does not include 175.
canvas = np.zeros((300, 300, 3), dtype="uint8")
(centerX, centerY) = (canvas.shape[1] / 2, canvas.shape[0] / 2)
white = (255, 255, 255)
for r in xrange(0, 175, 25):
cv2.circle(canvas, (centerX, centerY), r, white)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 9
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (drawing.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python drawing.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 10
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 4:
Step 1: Write the code in Text Editor
# import the necessary packages
# user created library "imutils" contains a handful of ―convenience‖ methods to more easily
# perform common tasks like translation, rotation, and resizing (and with less code).
import numpy as np
import argparse
import imutils
import cv2
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 11
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# now, let's shift the image 50 pixels to the left and 90 pixels up, we
# accomplish this using negative values
M = np.float32([[1, 0, -50], [0, 1, -90]])
shifted = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))
cv2.imshow("Shifted Up and Left", shifted)
# finally, let's use our helper function in imutils to shift the image down 100 pixels
shifted = imutils.translate(image, 0, 100)
cv2.imshow("Shifted Down", shifted)
cv2.waitKey(0)
Let us define a "translate" convenience function in "imutils.py" package, that takes care of this for us:
# import the necessary packages
import numpy as np
import cv2
Step 3: Run the python script (translation.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Installing imutils package ($ pip install imutils)
Accessing the gurus virtual environment (imutils is preinstalled)
$ workon gurus
$ python translation.py -i new.jpeg
or
$ python translation.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 12
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
1.4.2: ROTATION
Rotating an image by some angle Θ.
Rotation by an angle Θ, can be defined by constructing a matrix M in the form:
𝑐𝑜𝑠𝜃 −𝑠𝑖𝑛𝜃
M=
𝑠𝑖𝑛𝜃 𝑐𝑜𝑠𝜃
Given an (x, y)-Cartesian plane, this matrix can be used to rotate a vector Θ degrees (counter-
clockwise) about the origin.
In this case, the origin is normally the center of the image; however, in practice we can define any
arbitrary (x, y) coordinate as our rotation center.
From the original image I, the rotated image R is then obtained by simple matrix multiplication,
R=IM.
However, OpenCV also provides the ability to (1) scale (i.e. resize) an image and (2) provide an
arbitrary rotation center to perform the rotation about.
Our modified rotation matrix M is thus,
𝛼 𝛽 1 − 𝛼 × 𝑐𝑥 − 𝛽 × 𝑐𝑦
M=
−𝛽 𝛼 𝛽 × 𝑐𝑥 + (1 − 𝛼) × 𝑐𝑦
where 𝛼 = 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑐𝑜𝑠𝜃, 𝛽 = 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑠𝑖𝑛𝜃, cx and cy are the respective (x, y)-coordinates that the
rotation is performed about.
Program 5:
Step 1: Write the code in Text Editor
# import the necessary packages
import numpy as np
import argparse
import imutils
import cv2
# grab the dimensions of the image and calculate the center of the image
(h, w) = image.shape[:2]
(cX, cY) = (w / 2, h / 2)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 13
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# rotate our image by 45 degrees (counter clockwise rotation), scale value of 1.0
# scale value of 2.0, the image will be doubled in size
# scale value of 0.5, the image will be half the original size
# If you want the entire image to fit into view after the rotation you‘ll need to modify the width
# and height, denoted as (w, h) in the cv2.warpAffine function.
M = cv2.getRotationMatrix2D((cX, cY), 45, 1.0)
rotated = cv2.warpAffine(image, M, (w, h))
cv2.imshow("Rotated by 45 Degrees", rotated)
# rotate our image around an arbitrary point rather than the center
M = cv2.getRotationMatrix2D((cX - 50, cY - 50), 45, 1.0)
rotated = cv2.warpAffine(image, M, (w, h))
cv2.imshow("Rotated by Offset & 45 Degrees", rotated)
# finally, let's use our helper function in imutils to rotate the image by 180 degrees (flipping it
# upside down)
rotated = imutils.rotate(image, 180)
cv2.imshow("Rotated by 180 Degrees", rotated)
cv2.waitKey(0)
Let‘s reduce the amount of code we have to write and define our own custom "rotate" method in the
"imutils.py" package.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 14
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (rotation.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Installing imutils package ($ pip install imutils)
Accessing the gurus virtual environment (imutils is preinstalled)
$ workon gurus
$ python rotation.py -i new.jpeg
or
$ python rotation.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 15
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
1.4.3: RESIZING
Scaling, or simply resizing, is the process of increasing or decreasing the size of an image in terms
of width and height.
When resizing an image, it‘s important to keep in mind the aspect ratio ( which is the ratio of the width
of the image to the height of an image).
Ignoring the aspect ratio can lead to resized images that look compressed and distorted.
The formal definition of interpolation is ―the method of constructing new data points within the range
of discrete set of known points.‖
In this case, the ―known points‖ are the pixels of our original image.
And the goal of an interpolation function is to take these neighborhoods of pixels and use them to
either increase or decrease the size of image.
In general, it‘s far more beneficial (and visually appealing) to decrease the size of the image.
This is because the interpolation function simply has to remove pixels from an image.
On the other hand, if we were to increase the size of the image the interpolation function would have
to ―fill in the gaps‖ between pixels that previously did not exist.
Objectives:
The primary objective of this topic is to understand how to resize an image using the OpenCV library.
Interpolation Methods:
The goal of an interpolation function is to examine neighborhoods of pixels and use these
neighborhoods optically increase or decrease the size of image without introducing distortions (or at
least as few distortions as possible).
The first method is nearest neighbor interpolation, specified by the cv2.INTER_NEAREST flag. This
method is the simplest approach to interpolation. Instead of calculating weighted averages of
neighboring pixels or applying complicated rules, this method simply finds the ―nearest‖ neighboring
pixel and assumes the intensity value. While this method is fast and simple, the quality of the resized
image tends to be quite poor and can lead to ―blocky‖ artifacts.
Secondly, we have the cv2.INTER_LINEAR method with performs bilinear interpolation (y=mx+c).
OpenCV uses this method by default when resizing images. Taking neighboring pixels and using this
neighborhood to actually calculate what the interpolated value should be (rather than just assuming the
nearest pixel value).
Other methods are cv2.INTER_AREA, cv2.INTER_CUBIC and cv2.INTER_LANCZOS4 interpolation
methods.
cv2.INTER_CUBIC and cv2.INTER_LANCZOS4 methods are slower (since they no longer use
simple linear interpolation and instead use splines) and utilize bicubic interpolation over square pixel
neighborhoods. The cv2.INTER_CUBIC method operates on a 4 x 4 pixel neighbor and
cv2.INTER_LANZOS4 operates over a 8 x 8 pixel neighborhood.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 16
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
When increasing (up sampling) the size of an image, consider using cv2.INTER_LINEAR and
cv2.INTER_CUBIC. The cv2.INTER_LINEAR method tends to be slightly faster than the
cv2.INTER_CUBIC method, but go with whichever one gives you the best results for your images.
When decreasing (down sampling) the size of an image, the OpenCV documentation suggests using
cv2.INTER_AREA— although this method is very similar to nearest neighbor interpolation. In either
case, decreasing the size of an image (in terms of quality) is always an easier task than increasing the
size of an image.
Finally, as a general rule, cv2.INTER_LINEAR interpolation method is recommended as the default
for up sampling or down sampling. Because it simply provides the highest quality results at a modest
computation cost.
Program 6:
Step 1: Write the code in Text Editor
# import the necessary packages
import argparse
import imutils
import cv2
# we need to keep in mind aspect ratio so the image does not look skewed or distorted
# we calculate the ratio of the new image to the old image.
# Let's make our new image have a width of 150 pixels
# Aspect ratio=width/height
# In order to compute the ratio of the new height to the old height, we simply define our ratio r to
# be the new width (150 pixels) divided by the old width, which we access using image.shape[1]
# Now that we have our ratio, we can compute the new dimensions of the image.
# The height is then computed by multiplying the old height by our ratio and converting it to an
# integer. By performing this operation we are able to preserve the original aspect ratio of the
#image.
r = 150.0 / image.shape[1]
dim = (150, int(image.shape[0] * r))
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 17
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# The last parameter is our interpolation method, which is the algorithm working behind the
# scenes to handle how the actual image is resized.
resized = cv2.resize(image, dim, interpolation=cv2.INTER_AREA)
cv2.imshow("Resized (Width)", resized)
# what if we wanted to adjust the height of the image? We can apply the same concept, again
# keeping in mind the aspect ratio, but instead calculating the ratio based on height -- let's make
# the height of the resized image 50 pixels
# The new width is obtained by multiplying the old width by the ratio, again allowing us to
#maintain the original aspect ratio of the image.
r = 50.0 / image.shape[0]
dim = (int(image.shape[1] * r), 50)
# of course, calculating the ratio each and every time we want to resize an image is a real pain
# let's create a function where we can specify our target width or height, and have it take care of
# the rest for us.
resized = imutils.resize(image, width=100)
or
resized = imutils.resize(image, height=50)
cv2.imshow("Resized via Function", resized)
cv2.waitKey(0)
# increase the size of the image by 3x using the current interpolation method
resized = imutils.resize(image, width=image.shape[1] * 3, inter=method)
cv2.imshow("Method: {}".format(name), resized)
cv2.waitKey(0)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 18
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Let‘s reduce the amount of code we have to write and define our own custom "resize" method in the
"imutils.py" package.
# if both the width and height are None, then return the original image
if width is None and height is None:
return image
Step 3: Run the python script (resize.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Installing imutils package ($ pip install imutils)
Accessing the gurus virtual environment (imutils is preinstalled)
$ workon gurus
$ python resize.py -i new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 19
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
1.4.4: FLIPPING
OpenCV also provides methods to flip an image across its x or y axis or even both.
Flipping operations are used less often.
Objectives:
In this lesson you will learn how to horizontally and vertically flip an image using the cv2.flip function.
Program 7:
Step 3: Run the python script (flipping.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 20
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
$ workon gurus
$ python flipping.py -i new.jpeg
or
$ python flipping.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 21
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
1.4.5: CROPPING
Cropping is the act of selecting and extracting the Region of Interest (or simply, ROI), which is the
part of the image we are interested in.
When we crop an image, we want to remove the outer parts of the image that we are not interested
in. This is commonly called selecting our Region of Interest, or more simply, our ROI.
Example: In a face detection application, we would want to crop the face from an image.
And if we were developing a Python script to recognize dogs in images, we may want to crop the
dog from the image once we have found it.
Objectives:
Our primary objective is to become very familiar and comfortable using NumPy array slicing to crop
regions from an image.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 22
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Objectives:
1. To familiarize ourselves with image addition and subtraction.
2. To understand the difference between OpenCV and NumPy image arithmetic operations.
Program 9:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 23
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# images are NumPy arrays, stored as unsigned 8 bit integers -- this implies that the values of
# our pixels will be in the range [0, 255]; when using functions like cv2.add and cv2.subtract,
# values will be clipped to this range, even if the added or subtracted values fall outside the
# range of [0, 255]. Check out an example:
print "max of 255: " + str(cv2.add(np.uint8([200]), np.uint8([100])))
print "min of 0: " + str(cv2.subtract(np.uint8([50]), np.uint8([100])))
# NOTE: if you use NumPy arithmetic operations on these arrays, the value will be modulos
#(wrap around) instead of being clipped to the [0, 255] range. This is important to keep in mind
# when working with images.
print "wrap around: " + str(np.uint8([200]) + np.uint8([100]))
print "wrap around: " + str(np.uint8([50]) - np.uint8([100]))
# let's increase the intensity of all pixels in our image by 100 -- we accomplish this by
# constructing a NumPy array that is the same size of our matrix (filled with ones) and the
# multiplying it by 100 to create an array filled with 100's, then we simply add the images
# together; notice how the image is "brighter"
M = np.ones(image.shape, dtype = "uint8") * 100
added = cv2.add(image, M)
cv2.imshow("Added", added)
# similarly, we can subtract 50 from all pixels in our image and make it darker
M = np.ones(image.shape, dtype = "uint8") * 50
subtracted = cv2.subtract(image, M)
cv2.imshow("Subtracted", subtracted)
cv2.waitKey(0)
Step 3: Run the python script (arithmetic.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python arithmetic.py -i new.jpeg
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 24
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
or
$ python arithmetic.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 25
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Objectives:
By the end of this topic you‘ll understand the four primary bitwise operations:
1. AND
2. OR
3. XOR
4. NOT
Program 10:
# secondly, let's draw a circle, centered at the center of the image, with a radius of 150 pixels
circle = np.zeros((300, 300), dtype = "uint8")
cv2.circle(circle, (150, 150), 150, 255, -1)
cv2.imshow("Circle", circle)
# A bitwise 'AND' is only True when both rectangle and circle have a value that is 'ON.'
# Simply put, the bitwise AND function examines every pixel in rectangle and circle.
# If both pixels have a value greater than zero, that pixel is turned 'ON' (i.e set to 255 in the
# output image). If both pixels are not greater than zero, then the output pixel is left 'OFF' with a
# value of 0.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 26
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# A bitwise 'OR' examines every pixel in rectangle and circle. If EITHER pixel in rectangle or
# circle is greater than zero, then the output pixel has a value of 255, otherwise it is 0.
bitwiseOr = cv2.bitwise_or(rectangle, circle)
cv2.imshow("OR", bitwiseOr)
cv2.waitKey(0)
# The bitwise 'XOR' is identical to the 'OR' function, with one exception: both rectangle and
# circle are not allowed to BOTH have values greater than 0.
bitwiseXor = cv2.bitwise_xor(rectangle, circle)
cv2.imshow("XOR", bitwiseXor)
cv2.waitKey(0)
# Finally, the bitwise 'NOT' inverts the values of the pixels. Pixels with a value of 255 become 0,
# and pixels with a value of 0 become 255.
bitwiseNot = cv2.bitwise_not(circle)
cv2.imshow("NOT", bitwiseNot)
cv2.waitKey(0)
Step 3: Run the python script (bitwise.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python bitwise.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 27
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
1.4.8: MASKING
A combination of both bitwise operations and masks are used to construct ROIs that are non-
rectangular.
This allows us to extract regions from images that are of completely arbitrary shape.
A mask allows us to focus only on the portions of the image that interests us.
Objectives:
Program 11:
# Masking allows us to focus only on parts of an image that interest us. A mask is the same
# size as our image, but has only two pixel values, 0 and 255. Pixels with a value of 0 are
# ignored in the original image, and mask pixels with a value of 255 are allowed to be kept. For
# example, let's construct a rectangular mask that displays only the person in the image
mask = np.zeros(image.shape[:2], dtype="uint8")
cv2.rectangle(mask, (0, 90), (290, 450), 255, -1)
cv2.imshow("Mask", mask)
# Apply our mask -- notice how only the person in the image is cropped out
# The first two parameters are the image itself. Obviously, the AND function will be True for all
# pixels in the image; however, the important part of this function is the mask keyword
# argument. By supplying a mask, the cv2.bitwise_and function only examines pixels that are
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 28
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# ―on‖ in the mask. In this case, only pixels that are part of the white rectangle.
masked = cv2.bitwise_and(image, image, mask=mask)
cv2.imshow("Mask Applied to Image", masked)
cv2.waitKey(0)
# Now, let's make a circular mask with a radius of 100 pixels and apply the mask again
mask = np.zeros(image.shape[:2], dtype="uint8")
cv2.circle(mask, (145, 200), 100, 255, -1)
masked = cv2.bitwise_and(image, image, mask=mask)
cv2.imshow("Mask", mask)
cv2.imshow("Mask Applied to Image", masked)
cv2.waitKey(0)
Step 3: Run the python script (masking.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python masking.py -i new.jpeg
or
$ python masking.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 29
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Objectives:
By the end of this topic you should understand how to both split and merge channels of an image by
using the cv2.split and cv2.merge functions.
Program 12:
# Load the image and grab each channel: Red, Green, and Blue. It's important to note that
# OpenCV stores an image as NumPy array with its channels in reverse order! When we call
# cv2.split, we are actually getting the channels as Blue, Green, Red!
image = cv2.imread(args["image"])
(B, G, R) = cv2.split(image)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 30
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
cv2.destroyAllWindows()
Step 3: Run the python script (splitting_and_merging.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python splitting_and_merging.py -i new.jpeg
or
$ python splitting_and_merging.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 31
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
As you can see from the above figure, we are sliding this kernel from left-to-right and top-to-bottom
along the original image.
At each (x, y)-coordinate of the original image we stop and examine the neighborhood of image
pixels located at the center of the image kernel.
We can take this neighborhood of pixels, convolve them with the kernel, and we get a single output
value.
This output value is then stored in the output image at the same (x, y)-coordinate as the center of the
kernel.
Kernel looks like:
1 1 1
1
K=9 1 1 1
1 1 1
Above we have defined a square 3 x 3 kernel.
Kernels can be an arbitrary size of M x N pixels, provided that both M and N are odd integers.
Why do both M and N need to be odd?
Take a look at our introduction to kernels above — the kernel must have a center (x, y)-coordinate.
In a 3 x 3 kernel, the center is located at (1, 1), assuming a zero-index array of course.
This is exactly why we use odd kernel sizes — to always ensure there is a valid (x, y)-coordinate at
the center of the kernel.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 32
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Convolution:
In image processing, convolution requires three components,
1. An input image.
2. A kernel matrix that we are going to apply to the input image.
3. An output image to store the output of the input image convolved with the kernel.
Convolution itself is very easy and it involves the following steps.
1. Select an (x, y)-coordinate from the original image.
2. Place the center of the kernel at this (x, y) coordinate.
3. Multiply each kernel value by the corresponding input image pixel value — and then take the sum
of all multiplication operations. (More simply put, we‘re taking the element-wise multiplication of the
input image region and the kernel, then summing the values of all these multiplications into a single
value. The sum of these multiplications is called the kernel output.)
4. Use the same (x, y)-coordinate from Step 1, but this time store the kernel output in the same (x, y)-
location as the output image.
Here is an example of convolving (which is normally denoted mathematically as the * operator) a 3x3
region of an image with a 3x3 kernel:
−1 0 1 93 139 101 −93 0 101
O= −2 0 2 ∗ 26 252 196 = −52 0 392 =231
−1 0 1 135 230 18 −135 0 18
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 33
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Structuring Element:
Well, you can (conceptually) think of a structuring element as a type of kernel or mask.
However, instead of applying a convolution, we are only going to perform simple tests on the pixels.
Just like in image kernels, the structuring element slides from left-to-right and top-to-bottom for each
pixel in the image.
Also just like kernels, structuring elements can be of arbitrary neighborhood
sizes.
For example, let‘s take a look at the 4-neighborhood and 8-neighborhood of the central pixel red
below:
Here we can see that the central pixel (i.e. the red pixel) is located at the center of the neighborhood.
The 4-neighborhood (left) then defines the region surrounding the central pixel as the pixels to the
north, south, east, and west.
The 8-neighborhood (right) then extends this region to include the corner pixels as well.
This is just an example of two simple structuring elements.
But we could also make them arbitrary rectangle or circular structures as well — it all depends on
your particular application.
In OpenCV, we can either use the cv2.getStructuringElement function or NumPy itself to define our
structuring element.
A structuring element behaves similar to a kernel or a mask — but instead of convolving the input
image with our structuring element, we‘re instead only going to be applying simple pixel tests.
Types of morphological operations:
1. Erosion
2. Dilation
3. Opening
4. Morphological gradient
5. Black hat
6. Top hat (or "White hat")
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 34
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Erosion:
Just like water rushing along a river bank erodes the soil, an erosion in an image ―erodes‖ the
foreground object and makes it smaller.
Simply put, pixels near the boundary of an object in an image will be discarded, ―eroding‖ it away.
Erosion works by defining a structuring element and then sliding this structuring element from left-to-
right and top-to-bottom across the input image.
A foreground pixel in the input image will be kept only if ALL pixels inside the structuring element are
> 0. Otherwise, the pixels are set to 0 (i.e. background).
Erosion is useful for removing small blobs in an image or disconnecting two connected objects.
We can perform erosion by using the cv2.erode function.
Dilation:
The opposite of an erosion is a dilation.
Just like an erosion will eat away at the foreground pixels, a dilation will grow the foreground pixels.
Dilations increase the size of foreground object and are especially useful for joining broken parts of
an image together.
Dilations, just as an erosion, also utilize structuring elements — a center pixel p of the structuring
element is set to white if ANY pixel in the structuring element is > 0.
We apply dilations using the cv2.dilate function.
Opening:
An opening is an erosion followed by a dilation.
Performing an opening operation allows us to remove small blobs from an image: first an erosion is
applied to remove the small blobs, then a dilation is applied to regrow the size of the original object.
Closing:
The exact opposite to an opening would be a closing.
A closing is a dilation followed by an erosion.
As the name suggests, a closing is used to close holes inside of objects or for connecting
components together.
Performing the closing operation is again accomplished by making a call to cv2.morphologyEx, but
this time we are going to indicate that our morphological operation is a closing by specifying the
cv2.MORPH_CLOSE flag.
Morphological Gradient:
A morphological gradient is the difference between the dilation and erosion.
It is useful for determining the outline of a particular object of an image.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 35
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Black Hat:
The black hat operation is the difference between the closing of the input image and the input image
itself.
In fact, the black hat operator is simply the opposite of the white hat operator.
Program 13:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 36
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
cv2.destroyAllWindows()
cv2.imshow("Original", image)
# close all windows to clean up the screen and initialize the list of kernels sizes
# kernelSizes variable defines the width and height of the structuring element.
cv2.destroyAllWindows()
cv2.imshow("Original", image)
kernelSizes = [(3, 3), (5, 5), (7, 7)]
# loop over the kernels and apply an "opening" operation to the image
# The cv2.getStructuringElement function requires two arguments: the first is the type of
# structuring element (rectangular-cv2.MORPH_RECT or cross shape-cv2.MORPH_CROSS,
# circular structuring element- cv2.MORPH_ELLIPSE) and the second is the size of the
# structuring element
for kernelSize in kernelSizes:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
opening = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel)
cv2.imshow("Opening: ({}, {})".format(kernelSize[0], kernelSize[1]), opening)
cv2.waitKey(0)
# loop over the kernels and apply a "closing" operation to the image
for kernelSize in kernelSizes:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
closing = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
cv2.imshow("Closing: ({}, {})".format(kernelSize[0], kernelSize[1]), closing)
cv2.waitKey(0)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 37
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# loop over the kernels and apply a "morphological gradient" operation to the image
for kernelSize in kernelSizes:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
gradient = cv2.morphologyEx(gray, cv2.MORPH_GRADIENT, kernel)
cv2.imshow("Gradient: ({}, {})".format(kernelSize[0], kernelSize[1]), gradient)
cv2.waitKey(0)
Step 3: Run the python script (morphological.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python morphological.py -i new.jpeg
or
$ python morphological.py --image new.jpeg
Program 14:
# construct a rectangular kernel (w, h) and apply a blackhat operation which enables us to find
# dark regions on a light background
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)
# similarly, a tophat (also called a "whitehat") operation will enable us to find light regions on a
# dark background
tophat = cv2.morphologyEx(gray, cv2.MORPH_TOPHAT, rectKernel)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 38
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# show the output images (tophat-light against dark background are clearly displayed)
# (blackhat-dark against light background are clearly displayed)
cv2.imshow("Original", image)
cv2.imshow("Blackhat", blackhat)
cv2.imshow("Tophat", tophat)
cv2.waitKey(0)
Step 3: Run the python script (hats.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python hats.py -i new.jpeg
or
$ python hats.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 39
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Averaging:
An average filter does exactly what you think it might do — takes an area of pixels surrounding a
central pixel, averages all these pixels together, and replaces the central pixel with the average.
To accomplish our average blur, we‘ll actually be convolving our image with a MxN normalized filter
where both M and N are both odd integers.
This kernel is going to slide from left-to-right and from top-to-bottom for each and every pixel in our
input image.
The pixel at the center of the kernel is then set to be the average of all other pixels surrounding it.
Let‘s go ahead and define a 3x3 average kernel that can be used to blur the central pixel with a 3
pixel radius:
1 1 1
1
K=9 1 1 1
1 1 1
Notice how each entry of the kernel matrix is uniformly weighted — we are giving equal weight to all
pixels in the kernel.
An alternative is to give pixels different weights, where pixels farther from the central pixel contribute
less to the average; this method of smoothing is called the Gaussian blurring.
As the size of the kernel increases, so will the amount in which the image is blurred.
Gaussian:
Gaussian blurring is similar to average blurring, but instead of using a simple mean, we are now using
a weighted mean, where neighborhood pixels that are closer to the central pixel contribute more
―weight‖ to the average.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 40
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
And as the name suggests, Gaussian smoothing is used to remove noise that approximately follows a
Gaussian distribution.
The end result is that our image is less blurred, but more naturally blurred, than using the average
method.
Furthermore, based on this weighting we‘ll be able to preserve more of the edges in our image as
compared to average smoothing.
Just like an average blurring, Gaussian smoothing also uses a kernel of MxN, where both M and N
are odd integers.
However, since we are weighting pixels based on how far they are from the central pixel, we need an
equation to construct our kernel.
The equation for a Gaussian function in one direction is:
1 𝑥2
−
𝐺 𝑥 = 𝑒 2𝜎 2
2𝜋𝜎 2
And it then becomes trivial to extend this equation to two directions, one for the x-axis and the other
for the y-axis, respectively:
1 𝑥 2 +𝑦 2
−
𝐺 𝑥, 𝑦 = 𝑒 2𝜎 2
2𝜋𝜎 2
where x and y are the respective distances to the horizontal and vertical center of the kernel and is
the standard deviation of the Gaussian kernel.
When the size of our kernel increases so will the amount of blurring that is applied to our output
image.
However, the blurring will appear to be more ―natural‖ and will preserve edges in our image better
than simple average smoothing.
A Gaussian blur tends to give much nicer results, especially when applied to natural images.
Median:
Traditionally, the median blur method has been most effective when removing salt-and-pepper noise.
When applying a median blur, we first define our kernel size .
Then, as in the averaging blurring method, we consider all pixels in the neighborhood of size KxK
where K is an odd integer.
Notice how, unlike average blurring and Gaussian blurring where the kernel size could be
rectangular, the kernel size for the median must be square.
Furthermore (unlike the averaging method), instead of replacing the central pixel with the average of
the neighborhood, we instead replace the central pixel with the median of the neighborhood.
The reason median blurring is more effective at removing salt-and-pepper style noise from an image
is that each central pixel is always replaced with a pixel intensity that exists in the image.
And since the median is robust to outliers, the salt-and-pepper noise will be less influential to the
median than another statistical method, such as the average.
Again, methods such as averaging and Gaussian compute means or weighted means for the
neighborhood — this average pixel intensity may or may not be present in the neighborhood.
But by definition, the median pixel must exist in our neighborhood.
By replacing our central pixel with a median rather than an average, we can substantially reduce
noise.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 41
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Bilateral:
Thus far, the intention of our blurring methods have been to reduce noise and detail in an image;
however, as a side effect we have tended to lose edges in the image.
In order to reduce noise while still maintaining edges, we can use bilateral blurring.
Bilateral blurring accomplishes this by introducing two Gaussian distributions.
The first Gaussian function only considers spatial neighbors.
That is, pixels that appear close together in the (x, y)-coordinate space of the image.
The second Gaussian then models the pixel intensity of the neighborhood, ensuring that only pixels
with similar intensity are included in the actual computation of the blur.
Intuitively, this makes sense. If pixels in the same (small) neighborhood have a similar pixel value,
then they likely represent the same object.
But if two pixels in the same neighborhood have contrasting values, then we could be examining the
edge or boundary of an object — and we would like to preserve this edge.
Overall, this method is able to preserve edges of an image, while still reducing noise.
The largest downside to this method is that it is considerably slower than its averaging, Gaussian,
and median blurring counterparts.
Program 15:
# load the image, display it, and initialize the list of kernel sizes
image = cv2.imread(args["image"])
cv2.imshow("Original", image)
kernelSizes = [(3, 3), (9, 9), (15, 15)]
# loop over the kernel sizes and apply an "average" blur to the image
# The larger our kernel becomes, the more blurred our image will appear.
for (kX, kY) in kernelSizes:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 42
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# loop over the kernel sizes and apply a "Gaussian" blur to the image
# The last parameter in cv2.GaussianBlur function is our σ, the standard deviation of the
# Gaussian distribution. By setting this value to 0, we are instructing OpenCV to automatically
# compute based on our kernel size. In most cases, you‘ll want to let your σ be computed.
for (kX, kY) in kernelSizes:
blurred = cv2.GaussianBlur(image, (kX, kY), 0)
cv2.imshow("Gaussian ({}, {})".format(kX, kY), blurred)
cv2.waitKey(0)
# loop over the kernel sizes (square kernels) and apply a "Median" blur to the image
for k in (3, 9, 15):
blurred = cv2.medianBlur(image, k)
cv2.imshow("Median {}".format(k), blurred)
cv2.waitKey(0)
Step 3: Run the python script (blurring.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python blurring.py -i new.jpeg
or
$ python blurring.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 43
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 16:
# load the image, display it, and construct the list of bilateral filtering parameters that we are
# going to explore. These parameters correspond to the diameter, σcolor and σspace of the bilateral
# filter, respectively.
image = cv2.imread(args["image"])
cv2.imshow("Original", image)
params = [(11, 21, 7), (11, 41, 21), (11, 61, 39)]
Step 3: Run the python script (bilateral.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 44
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 45
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
OBJECTIVES
1. Understand the role lighting conditions play in the development of a successful computer vision
system.
2. Discuss the four primary color spaces you‘ll encounter in computer vision: RGB, HSV, L*a*b*, and
grayscale (which isn‘t technically a color space, but is used in many computer vision applications).
LIGHTING CONDITIONS
Every single computer vision algorithm, application, and system ever developed and that will be
developed, depend on the quality of images input to the system.
We‘ll certainly be able to make our systems more robust in relation to poor lighting conditions, but
we‘ll never be able to overcome an image that was captured under inferior conditions.
Lighting can mean the difference between success and failure of your computer vision algorithm.
Lighting conditions should have three primary goals:
1. High Contrast: Maximize the contrast between the Regions of Interest in your image (i.e. the
―objects‖ you want to detect, extract, classify, manipulate, etc. should have sufficiently high
contrast from the rest of the image so they are easily detectable).
2. Generalizable: Your lighting conditions should be consistent enough that they work well from
one ―object‖ to the next.
3. Stable: Having stable, consistent, and repeatable lighting conditions is the holy grail of computer
vision application development. However, it‘s often hard (if not impossible) to guarantee — this
is especially true if we are developing computer vision algorithms that are intended to work in
outdoor lighting conditions. As the time of day changes, clouds roll in over the sun, and rain
starts to pour, our lighting conditions will obviously change.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 46
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
However, this is not exactly the most friendly color space for developing computer vision based
applications.
In fact, it‘s primary use is to display colors on a monitor.
But despite how unintuitive the RGB color space may be, nearly all images you‘ll work with will be
represented (at least initially) in the RGB color space.
HSV MODEL: The HSV color space transforms the RGB color space, remodeling it as a cylinder rather
than a cube.
As we saw in the RGB section, the ―white‖ or ―lightness‖ of a color is an additive combination of each
Red, Green, and Blue component.
But now in the HSV color space, the lightness is given its own separate dimension.
Let‘s define what each of the HSV components are:
Hue: Which ―pure‖ color. For example, all shadows and tones of the color ―red‖ will have the same
Hue.
Saturation: How ―white‖ the color is. A fully saturated color would be ―pure,‖ as in ―pure red.‖ And a
color with zero saturation would be pure white.
Value: The Value allows us to control the lightness of our color. A Value of zero would indicate pure
black, whereas increasing the value would produce lighter colors.
It‘s important to note that different computer vision libraries will use different ranges to represent
each of the Hue, Saturation, and Value components.
However, in the case of OpenCV, images are represented as 8-bit unsigned integer arrays. Thus,
the Hue value is defined the range [0, 179] (for a total of 180 possible values, since [0, 359] is not
possible for an 8-bit unsigned array) — the Hue is actually a degree (Θ) on the HSV color cylinder. And
both saturation and value are defined on the range [0, 255].
The value controls the actual lightness of our color, while both Hue and Saturation define the actual
color and shade.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 47
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
The HSV color space is used heavily in computer vision applications — especially if we are
interested in tracking the color of some object in an image.
It‘s far, far easier to define a valid color range using HSV than it is RGB.
L*a*b* MODEL:
While the RGB color space is easy to understand (especially when you‘re first getting started in
computer vision), it‘s non-intuitive when defining exact shades of a color or specifying a particular range
of colors.
On the other hand, the HSV color space is more intuitive but does not do the best job in representing
how humans see and interpret colors in images.
For example, let‘s compute the Euclidean distance between the colors red and green; red and
purple; and red and navy in the RGB color space:
>> import math
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 48
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
GRAYSCALE:
Simply the grayscale representation of a RGB image.
The grayscale representation of an image is often referred to as ―black and white,‖ but this is not
technically correct.
Grayscale images are single channel images with pixel values in the range [0, 255] (i.e. 256 unique
values).
True black and white images are called binary images and thus only have two possible values: 0 or
255 (i.e. only 2 unique values).
Be careful when referring to grayscale image as black and white to avoid this ambiguity.
However, converting an RGB image to grayscale is not as straightforward as you may think.
Biologically, our eyes are more sensitive and thus perceive more green and red than blue.
Thus when converting to grayscale, each RGB channel is not weighted uniformly, like this:
Y=0.333xR+0.333xG+0.333xB
Instead, we weight each channel differently to account for how much color we perceive of each:
Y=0.299xR+0.587xG+0.114xB
Again, due to the cones and receptors in our eyes, we are able to perceive nearly 2x the amount of
green than red.
And similarly, we notice over twice the amount of red than blue.
Thus, we make sure to account for this when converting from RGB to grayscale.
The grayscale representation of an image is often used when we have no use for color (such in
detecting faces or building object classifiers where the color of the object does not matter).
Discarding color thus allows us to save memory and be more computationally efficient.
Program 17:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 49
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
import cv2
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 50
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (colorspaces.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python colorspaces.py -i new.jpeg
or
$ python colorspaces.py --image new.jpeg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 51
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
OBJECTIVES:
1. Be able to define what thresholding is.
2. Understand simple thresholding and why a thresholding value T must be manually provided.
3. Grasp Otsu‘s thresholding method.
4. Comprehend the importance of adaptive thresholding and why it‘s useful in situations where lighting
conditions cannot be controlled.
WHAT IS THRESHOLDING?
Thresholding is the binarization of an image.
In general, we seek to convert a grayscale image to a binary image, where the pixels are either 0 or
255.
A simple thresholding example would be selecting a threshold value T, and then setting all pixel
intensities less than T to zero, and all pixel values greater than T to 255.
In this way, we are able to create a binary representation of the image.
Normally, we use thresholding to focus on objects or areas of particular interest in an image.
SIMPLE THRESHOLDING:
Applying simple thresholding methods requires human intervention.
We must specify a threshold value T.
All pixel intensities below T are set to 255.
And all pixel intensities greater than T are set to 0.
We could also apply the inverse of this binarization by setting all pixels greater than T to 255 and all
pixel intensities below T to 0.
OTSU's METHOD:
But in real-world conditions where we do not have any a priori knowledge of the lighting conditions,
we actually automatically compute an optimal value of T using Otsu‘s method.
Otsu‘s method assumes that our image contains two classes of pixels: the background and the
foreground.
Furthermore, Otsu‘s method makes the assumption that the grayscale histogram of our pixel
intensities of our image is bi-modal, which simply means that the histogram is two peaks.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 52
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Histogram is simply a tabulation or a ―counter‖ on the number of times a pixel value appears in the
image.
Based on the grayscale histogram, Otsu‘s method then computes an optimal threshold value T such
that the variance between the background and foreground peaks is minimal.
However, Otsu‘s method has no a priori knowledge of what pixels belong to the foreground and which
pixels belong to the background — it‘s simply trying to optimally separate the peaks of the histogram.
It‘s also important to note that Otsu‘s method is an example of global thresholding — implying that a
single value of T is computed for the entire image.
In some cases, having a single value of T for an entire image is perfectly acceptable — but in other
cases, this can lead to sub-par results.
The first is that Otsu‘s method assumes a bi-modal distribution of the grayscale pixel intensities of our
input image. If this is not the case, then Otsu‘s method can return sub-par results.
Secondly, Otsu‘s method is a global thresholding method.
In situations where lighting conditions are semi-stable and the objects we want to segment have
sufficient contrast from the background, we might be able to get away with Otsu‘s method.
But when the lighting conditions are non-uniform — such as when different parts of the image are
illuminated more than others, we can run into some serious problem. And when that‘s the case, we‘ll
need to rely on adaptive thresholding.
ADAPTIVE THRESHOLDING:
For simple images with controlled lighting conditions, single value of T is not a problem.
But for situations when the lighting is non-uniform across the image, having only a single value of T
can seriously hurt our thresholding performance.
Simply put, having just one value of T may not suffice.
In order to overcome this problem, we can use adaptive thresholding, which considers small
neighbors of pixels and then finds an optimal threshold value T for each neighbor.
This method allows us to handle cases where there may be dramatic ranges of pixel intensities and
the optimal value of T may change for different parts of the image.
In adaptive thresholding, sometimes called local thresholding, our goal is to statistically examine the
pixel intensity values in the neighborhood of a given pixel p.
The general assumption that underlies all adaptive and local thresholding methods is that smaller
regions of an image are more likely to have approximately uniform illumination. This implies that local
regions of an image will have similar lighting, as opposed to the image as a whole, which may have
dramatically different lighting for each region.
However, choosing the size of the pixel neighborhood for local thresholding is absolutely crucial.
The neighborhood must be large enough to cover sufficient background and foreground pixels,
otherwise the value of T will be more or less irrelevant.
But if we make our neighborhood value too large, then we completely violate the assumption that
local regions of an image will have approximately uniform illumination.
Again, if we supply a very large neighborhood, then our results will look very similar to global
thresholding using the simple thresholding or Otsu‘s methods.
In practice, tuning the neighborhood size is (usually) not that hard of a problem.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 53
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
You‘ll often find that there is a broad range of neighborhood sizes that provide you with adequate
results — it‘s not like finding an optimal value of T that could make or break your thresholding output.
So as I mentioned above, our goal in adaptive thresholding is to statistically examine local regions of
our image and determine an optimal value of T for each region — which begs the question: Which
statistic do we use to compute the threshold value T for each region?
It is common practice to use either the arithmetic mean or the Gaussian mean of the pixel intensities
in each region (other methods do exist, but the arithmetic mean and the Gaussian mean are by far the
most popular).
In the arithmetic mean, each pixel in the neighborhood contributes equally to computing T.
And in the Gaussian mean, pixel values farther away from the (x, y)-coordinate center of the region
contribute less to the overall calculation of T.
The general formula to compute T is thus:
T=mean(IL)-C
where the mean is either the arithmetic or Gaussian mean, IL is the local sub-region of the image I , and
C is some constant which we can use to fine tune the threshold value T.
Program 18:
# load the image, convert it to grayscale, and Gaussian blur with sigma=7 radius.
# Applying Gaussian blurring helps remove some of the high frequency edges in the image that
# we are not concerned with and allow us to obtain a more ―clean‖ segmentation.
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
cv2.imshow("Image", image)
# apply basic thresholding -- the first parameter is the image we want to threshold, the second
# value is our threshold check
# if a pixel value is greater than our threshold (in this case, T=200), we set it to be BLACK,
# otherwise it is WHITE.
# Our third argument is the output value applied during thresholding. Any pixel intensity p that is
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 54
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# greater than T is set to zero and any p that is less than T is set to the output value.
# The function then returns a tuple of 2 values: the first, T, is the threshold value. In the case of
# simple thresholding, this value is trivial since we manually supplied the value of T in the first
# place. But in the case of Otsu‘s thresholding where T is dynamically computed for us, it‘s nice
# to have that value. The second returned value is the threshold image itself.
(T, threshInv) = cv2.threshold(blurred, 200, 255, cv2.THRESH_BINARY_INV)
cv2.imshow("Threshold Binary Inverse", threshInv)
# using normal thresholding (rather than inverse thresholding), we can change the last
# argument in the function to make the coins black rather than white.
(T, thresh) = cv2.threshold(blurred, 200, 255, cv2.THRESH_BINARY)
cv2.imshow("Threshold Binary", thresh)
Step 3: Run the python script (simple_thresholding.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python simple_thresholding.py -i coins01.png
or
$ python simple_thresholding.py --image coins01.png
Inference:
Program 19:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 55
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# load the image, convert it to grayscale, and Gaussian blur with sigma=7 radius
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
cv2.imshow("Image", image)
# apply Otsu's automatic thresholding -- Otsu's method automatically determines the best
# threshold value `T` for us
# T=0, Remember that Otsu‘s method is going to automatically compute the optimal value of T
# for us. We could technically specify any value we wanted for this argument; however, I like to
# supply a value of 0 as a type of ―don‘t care‖ parameter.
# The third argument is the output value of the threshold, provided the given pixel passes the
# threshold test.
# The last argument is one we need to pay extra special attention to. Previously, we had
# supplied values of cv2.THRESH_BINARY or cv2.THRESH_BINARY_INV depending on what
# type of thresholding we wanted to perform. But now we are passing in a second flag that is
# logically OR‘d with the previous method. Notice that this method is cv2.THRESH_OTSU,
# which obviously corresponds to Otsu‘s thresholding method.
(T, threshInv) = cv2.threshold(blurred, 0, 255,
cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
cv2.imshow("Threshold", threshInv)
print "Otsu's thresholding value: {}".format(T)
Step 3: Run the python script (otsu_thresholding.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python otsu_thresholding.py -i coins01.png
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 56
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
or
$ python otsu_thresholding.py --image coins01.png
Inference:
Program 20:
# instead of manually specifying the threshold value, we can use adaptive thresholding to
# examine neighborhoods of pixels and adaptively threshold each neighborhood -- in this
# example, we'll calculate the mean value of the neighborhood area of 25 pixels and threshold
# based on that value; finally, our constant C is subtracted from the mean calculation (in this
# case 15)
# second parameter is the output threshold
# third argument is the adaptive thresholding method. Here we supply a value of
# cv2.ADAPTIVE_THRESH_MEAN_C to indicate that we are using the arithmetic mean of the
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 57
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# local pixel neighborhood to compute our threshold value of T. We could also supply a value of
# cv2.ADAPTIVE_THRESH_GAUSSIAN_C to indicate we want to use the Gaussian average
# The fourth value to cv2.adaptiveThreshold is the threshold method, again just like in the
# Simple Thresholding and Otsu‘s Method sections. Here we pass in a value of
# cv2.THRESH_BINARY_INV to indicate that any pixel value that passes the threshold test will
# have an output value of 0. Otherwise, it will have a value of 255.
# The fifth parameter is our pixel neighborhood size. Here you can see that we‘ll be computing
# the mean grayscale pixel intensity value of each 25x25 sub-region in the image to compute
# our threshold value T.
# The final argument to cv2.adaptiveThreshold is the constant C which lets us fine tune our
# threshold value.
thresh = cv2.adaptiveThreshold(blurred, 255,
cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 25, 15)
cv2.imshow("OpenCV Mean Thresh", thresh)
Step 3: Run the python script (adaptive_thresholding.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python adaptive_thresholding.py -i license_plate.png
or
$ python adaptive_thresholding.py --image license_plate.png
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 58
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
LESSON 1.10:GRADIENTS
We will be using gradients for detecting edges in images, which allows us to find contours and
outlines of objects in images.
We use them as inputs for quantifying images through feature extraction — in fact, highly successful
and well-known image descriptors such as Histogram of Oriented Gradients (HoG) and Scale-Invariant
Feature Transform (SIFT) are built upon image gradient representations.
Gradient images are even used to construct saliency maps, which highlight the subjects of an image.
OBJECTIVES
1. Define what an image gradient is.
2. Compute changes in direction of an input image.
3. Define both gradient magnitude and gradient orientation.
4. Learn how to compute gradient magnitude and gradient orientation.
5. Approximate the image gradient using Sobel and Scharr kernels.
6. Learn how to use the cv2.Sobel function to compute image gradient representations in OpenCV.
IMAGE GRADIENTS
The main application of image gradients lies within edge detection.
Edge detection is the process of finding edges in an image, which reveals structural information
regarding the objects in an image.
Edges could therefore correspond to:
1. Boundaries of an object in an image.
2. Boundaries of shadowing or lighting conditions in an image.
3. Boundaries of ―parts‖ within an object
So how do we go about finding the edges in an image?
The first step is to compute the gradient of the image. Formally, an image gradient is defined as a
directional change in image intensity. At each pixel of the input (grayscale) image, a gradient measures
the change in pixel intensity in a given direction. By estimating the direction or orientation along with the
magnitude (i.e. how strong the change in direction is), we are able to detect regions of an image that
look like edges.
In the image above we examine the 3x3 neighborhood surrounding the central pixel.
Our x values run from left to right, and our y values from top to bottom.
In order to compute any changes in direction we‘ll need the north, south, east, and west pixels.
If we denote our input image as I, then we define the north, south, east, and west pixels using the
following notation:
North: I(x,y-1) South: I(x,y+1) East: I(x+1,y) West: I(x-1,y)
Again, these four values are critical in computing the changes in image intensity in both the x and y
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 59
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
direction.
To demonstrate this, let‘s compute the vertical change or the y-change by taking the difference
between the north and south pixels:
Gy= I(x,y-1)- I(x,y+1)
Similarly, we can compute the horizontal change or the x-change by taking the difference between
the east and west pixels:
Gx= I(x+1,y)- I(x-1,y)
So now we have Gx and Gy, which represent the change in image intensity for the central pixel in
both the x and y direction.
So now the big question becomes: what do we do with these values?
To answer that, we‘ll need to define two new terms — the gradient magnitude and the gradient
orientation.
The gradient magnitude is used to measure how strong the change in image intensity is. The
gradient magnitude is a real-valued number that quantifies the ―strength‖ of the change in intensity.
While the gradient orientation is used to determine in which direction the change in intensity is
pointing. As the name suggests, the gradient orientation will give us an angle or Θ that we can use to
quantify the direction of the change.
On the left we have a 3x3 region of an image where the top half of the image is white and the bottom
half of the image is black. The gradient orientation is thus equal to Θ=90°.
And on the right we have another 3x3 neighborhood of an image, where the upper triangular region
is white and the lower triangular region is black. Here we can see the change in direction is equal to
Θ=45°.
But how do we actually go about computing the gradient orientation and magnitude?
3x3 neighborhood of an image:
Here we can see that the central pixel is marked in red. The next step in determining the gradient
orientation and magnitude is actually to compute the changes in gradient in both the x and y direction.
Using both Gx and Gy , we can apply some basic trigonometry to compute the gradient magnitude ,
and orientation Θ:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 60
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Inspecting this triangle you can see that the gradient magnitude is the hypotenuse of the triangle.
Therefore, all we need to do is apply the Pythagorean theorem and we‘ll end up with the gradient
magnitude:
G G2x G2y
The gradient orientation can then be given as the ratio of Gx to Gy.
180
arctan 2(Gy ,Gx )
The arctan2 function gives us the orientation in radians, which we then convert to degrees by
multiplying by the ratio of 180/π.
Let‘s go ahead and manually compute G and Θ so we can see, how the process is done:
In the above image we have an image where the upper-third is white and the bottom two-thirds is
black.
Using the equations for Gx and Gy, we arrive at:
Gx=0-0=0 and Gy=255-0=255
G 02 2552 255
180
As for our gradient orientation: arctan 2(255,0) 90
Sure enough, the gradient of the central pixel is pointing up as verified by the Θ=90°.
Another example:
In this particular image we can see that the lower-triangular region of the neighborhood is white while
the upper-triangular neighborhood is black. Computing both Gx and Gy we arrive at:
Gx=0-255=-255 and Gy=0-255=-255
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 61
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 62
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
3 0 3
G x 10 0 10
3 0 3
3 10 3
Gy 0 0 0
3 10 3
Program 21:
# load the image, convert it to grayscale, and display the original image
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Original", image)
# the `gX` and `gY` images are now of the floating point data type, so we need to take care to
# convert them back to an unsigned 8-bit integer representation so other OpenCV functions can
# utilize them
gX = cv2.convertScaleAbs(gX)
gY = cv2.convertScaleAbs(gY)
# combine the sobel X and Y representations into a single image, weighting each gradient
# representation equally.
sobelCombined = cv2.addWeighted(gX, 0.5, gY, 0.5, 0)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 63
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (sobel.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python sobel.py -i bricks.png
or
$ python sobel.py --image bricks.png
Inference:
Experiment 22: Gradient orientation and magnitude in OpenCV The end goal of this program will be to
(1) compute the gradient orientation and magnitude, and then
(2) only display the pixels in the image that fall within the range minΘ<=Θ<=maxΘ.
Program 22:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 64
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# load the image, convert it to grayscale, and display the original image
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Original", image)
# find all pixels that are within the upper and low angle boundaries
# following lines handles selecting image coordinates that are greater than the lower angle
# minimum. The first argument to the np.where function is the condition that we want to test
# again, we are looking for indexes that are greater than the minimum supplied angle. The
# second argument is the array that we want to check — this is obviously our orientations array.
# And the final argument that we supply is the value if the check does not pass. In the case that
# the orientation is less than the minimum angle requirement, we‘ll set that particular value to -1.
idxs = np.where(orientation >= args["lower_angle"], orientation, -1)
# The second argument is the idxs list returned by previous line since we are looking for
# orientations that pass both the upper and lower orientation test.
# The idxs now contains the coordinates of all orientations that are greater than the minimum
# angle and less than the maximum angle. Using this list, we construct a mask, all coordinates
# that have a corresponding value of > -1 are set to 255 (i.e. foreground). Otherwise, they are
# left as 0 (i.e. background).
idxs = np.where(orientation <= args["upper_angle"], idxs, -1)
mask = np.zeros(gray.shape, dtype="uint8")
mask[idxs > -1] = 255
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 65
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (mag_orientation.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python mag_orientation.py -i coins02.png
or
$ python mag_orientation.py --image coins02.png
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 66
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
OBJECTIVES:
1. What the Canny edge detector is and how it is used.
2. The basic steps of the Canny edge detector.
3. How to use the cv2.Canny function to detect edges in images.
4. How to extend the Canny edge detector to create the auto_canny, a zero parameter edge detector.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 67
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Instead, we want to apply edge detection to find the structure and outline of the objects in the image
so we can further process them.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 68
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 23:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 69
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# together, we almost always want to apply edge detection to a single channel, grayscale image
# this ensures that there will be less noise during the edge detection process.
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
Step 3: Run the python script (canny.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python canny.py -i coins01.png
or
$ python canny.py --image coins01.png
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 70
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Library: imutils.py
Program 24:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 71
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# load the image, convert it to grayscale, and blur it slightly to remove high frequency noise
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (3, 3), 0)
# apply Canny edge detection using a wide threshold, tight threshold, and automatically
# determined threshold
wide = cv2.Canny(blurred, 10, 200)
tight = cv2.Canny(blurred, 225, 250)
auto = imutils.auto_canny(blurred)
Step 3: Run the python script (auto_canny.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python auto_canny.py -i teacup.jpg
or
$ python auto_canny.py --image teacup.jpg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 72
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
OBJECTIVES:
1. Find and detect the contours of objects in images.
2. Extract objects from images using contours, masks and cropping.
Program 25:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 73
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
cv2.imshow("Original", image)
# find all contours in the image and draw ALL contours on the image
#The cv2.findContours function is destructive to the input image (meaning that it manipulates it)
# so if you intend on using your input image again, be sure to clone it using the copy() method
# prior to passing it into cv2.findContours.
# We‘ll instruct cv2.findContours to return a list of all contours in the image by passing in the
# cv2.RETR_LIST flag.
# This flag will ensure that all contours are returned. Other methods exist, such as returning only
# the external most contours, which we‘ll explore later.
# Finally, we pass in the cv2.CHAIN_APPROX_SIMPLE flag. If we did not specify this flag and
# instead used cv2.CHAIN_APPROX_NONE, we would be storing every single (x, y)-coordinate
# along the contour. In general, this not advisable. It‘s substantially slower and takes up
# significantly more memory. By compressing our horizontal, vertical, and diagonal segments
# into only end-points we are able to reduce memory consumption significantly without any
# substantial loss in contour accuracy.
# Finally, the cv2.findContours function returns a tuple of 2 values.
# The first value is the contours themselves. These contours are simply the boundary points of
# the outline along the object.
# The second value is the hierarchy of the contours, which contains information on the topology
# of the contours. Often we are only interested in the contours themselves and not their actual
# hierarchy (i.e. one contour being contained in another) so this second value is usually ignored.
# We then draw our found contours. The first argument we pass in is the image we want to draw
# the contours on. The second parameter is our list of contours we found using the
# cv2.findContours function.
# The third parameter is the index of the contour inside the cnts list that we want to draw.
# If we wanted to draw only the first contour, we could pass in a value of 0. If we wanted to draw
# only the second contour, we would supply a value of 1. Passing in a value of -1 for this
# argument instructs the cv2.drawContours function to draw all contours in the list.
# Finally, the last two arguments to the cv2.drawContours function is the color of the contour
# (green), and the thickness of the contour line (2 pixels).
(cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
clone = image.copy()
cv2.drawContours(clone, cnts, -1, (0, 255, 0), 2)
print "Found {} contours".format(len(cnts))
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 74
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
cv2.destroyAllWindows()
# find only external contours and ignore the ovular region inside the orange rectangle.
# re-clone the image and close all open windows
clone = image.copy()
cv2.destroyAllWindows()
# find contours in the image, but this time, keep only the EXTERNAL contours in the image.
# Specifying cv2.RETR_EXTERNAL flag instructs OpenCV to return only the external most
# contours of each shape in the image, meaning that if one shape is enclosed in another, then
# the contour is ignored.
(cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(clone, cnts, -1, (0, 255, 0), 2)
print "Found {} EXTERNAL contours".format(len(cnts))
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 75
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (finding_and_drawing_contours.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python finding_and_drawing_contours.py -i images/basic_shapes.png
or
$ python finding_and_drawing_contours.py --image images/basic_shapes.png
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 76
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
OBJECTIVES:
You should be able to compute various properties of objects using contours, including:
1. Centroid/Center of Mass
2. Area and Perimeter
3. Bounding boxes and Rotated Bounding Boxes
4. Minimum enclosing circles
5. Fitting an ellipse
CONTOUR PROPERTIES:
1. CENTROID/CENTER OF MASS:
The ―centroid‖ or ―center of mass‖ is the center (x, y)-coordinate of an object in an image.
This (x, y) coordinate is actually calculated based on the image moments, which are based on the
weighted average of the (x, y) coordinates/pixel intensity along the contour.
Moments allow us to use basic statistics to represent the structure and shape of an object in an
image.
The centroid calculation itself is actually very straightforward: it‘s simply the mean (i.e. average)
position of all (x, y)-coordinates along the contour of the shape.
5. FITTING AN ELLIPSE:
Fitting an ellipse to a contour is much like fitting a rotated rectangle to a contour.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 77
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Under the hood, OpenCV is computing the rotated rectangle of the contour. And then it‘s taking the
rotated rectangle and computing an ellipse to fit in the rotated region.
Program 26:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 78
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
cv2.imshow("Centroids", clone)
cv2.waitKey(0)
clone = image.copy()
# compute the center of the contour and draw the contour number
M = cv2.moments(c)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
cv2.putText(clone, "#%d" % (i + 1), (cX - 20, cY), cv2.FONT_HERSHEY_SIMPLEX,
1.25, (255, 255, 255), 4)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 79
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 80
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
cv2.imshow("Ellipses", clone)
cv2.waitKey(0)
Step 3: Run the python script (contour_properties.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python contour_properties.py -i images/more_shapes.png
or
$ python contour_properties.py --image images/more_shapes.png
Inference:
OBJECTIVES:
We are going to build on our simple contour properties and expand them to more advanced contour
properties, including:
1. Aspect ratio
2. Extent
3. Convex hull
4. Solidity
1. ASPECT RATIO:
The aspect ratio is simply the ratio of the image width to the image height.
Aspect ratio = image width / image height
Shapes with an aspect ratio < 1 have a height that is greater than the width — these shapes will
appear to be more ―tall‖ and elongated. For example, most digits and characters on a license plate
have an aspect ratio that is less than 1 (since most characters on a license plate are taller than they are
wide).
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 81
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
And shapes with an aspect ratio > 1 have a width that is greater than the height. The license plate
itself is an example of a object that will have an aspect ratio greater than 1 since the width of a physical
license plate is always greater than the height.
Finally, shapes with an aspect ratio = 1 (plus or minus some ϵ of course), have approximately the
same width and height. Squares and circles are examples of shapes that will have an aspect ratio of
approximately 1.
2. EXTENT:
The extent of a shape or contour is the ratio of the contour area to the bounding box area:
extent = shape area / bounding box area
Recall that the area of an actual shape is simply the number of pixels inside the contoured region.
On the other hand, the rectangular area of the contour is determined by its bounding box, therefore:
bounding box area = bounding box width x bounding box height
In all cases the extent will be less than 1 — this is because the number of pixels inside the contour
cannot possibly be larger the number of pixels in the bounding box of the shape.
3. CONVEX HULL:
A convex hull is almost like a mathematical rubber band.
More formally, given a set of X points in the Euclidean space, the convex hull is the smallest possible
convex set that contains these X points.
In the example image below, we can see the rubber band effect of the convex hull in action:
On the left we have our original shape. And in the center we have the convex hull of original shape.
Notice how the rubber band has been stretched to around all extreme points of the shape, but leaving
no extra space along the contour — thus the convex hull is the minimum enclosing polygon of all points
of the input shape, which can be seen on the right.
Another important aspect of the convex hull that we should discuss is the
convexity.
Convex curves are curves that appear to ―bulged out‖. If a curve is not bulged
out, then we call it a convexity defect.
The gray outline of the hand in the image above is our original shape.
The red line is the convex hull of the hand. And the black arrows, such as in
between the fingers, are where the convex hull is ―bulged in‖ rather than ―bulged
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 82
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
out‖.
Whenever a region is ―bulged in‖, such as in the hand image above, we call them convexity defects.
4. SOLIDITY:
The solidity of a shape is the area of the contour area divided by the area of the convex hull:
solidity = contour area / convex hull area
Again, it‘s not possible to have a solidity value greater than 1.
The number of pixels inside a shape cannot possibly outnumber the number of pixels in the convex
hull, because by definition, the convex hull is the smallest possible set of pixels enclosing the shape.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 83
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# compute the convex hull of the contour, then use the area of the original contour and
# the area of the convex hull to compute the solidity
hull = cv2.convexHull(c)
hullArea = cv2.contourArea(hull)
solidity = area / float(hullArea)
# initializing char variable to indicate the character that we are looking at — in this case, we
# initialize it to be a ? indicating that the character is unknown.
char = "?"
# The letter X has four large and obvious convexity defects — one for each of the four V‘s that
# form the X. On the other hand, the O has nearly no convexity defects, and the ones that it has
# are substantially less dramatic than the letter X. Therefore, the letter O is going to have a
# larger solidity than the letter X.
# if the solidity is high, then we are examining an `O`
if solidity > 0.9:
char = "O"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 84
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (tictactoe.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python tictactoe.py
Inference:
# load the Tetris block image, convert it to grayscale, and threshold the image
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 85
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# to create a binary image, where the background pixels are black and the foreground pixels
# (i.e. the Tetris blocks) are white.
image = cv2.imread("images/tetris_blocks.png")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 225, 255, cv2.THRESH_BINARY_INV)[1]
# find external contours in the thresholded image and allocate a NumPy array with the same
# shape as our input image
(cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
hullImage = np.zeros(gray.shape[:2], dtype="uint8")
# compute the area of the contour along with the bounding box to compute the aspect
# ratio
area = cv2.contourArea(c)
(x, y, w, h) = cv2.boundingRect(c)
# compute the aspect ratio of the contour, which is simply the width divided by the height
# of the bounding box
# the aspect ratio of a shape will be < 1 if the height is greater than the width. The
# aspect ratio will be > 1 if the width is larger than the height. And the aspect ratio will
# be approximately 1 if the width and height are equal.
# used for discriminating the square and rectangle pieces
aspectRatio = w / float(h)
# use the area of the contour and the bounding box area to compute the extent
extent = area / float(w * h)
# compute the convex hull of the contour, then use the area of the original contour and
# the area of the convex hull to compute the solidity
hull = cv2.convexHull(c)
hullArea = cv2.contourArea(hull)
solidity = area / float(hullArea)
# visualize the original contours and the convex hull and initialize the name of the shape
cv2.drawContours(hullImage, [hull], -1, 255, -1)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 86
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# Now that we have computed all of our contour properties, let‘s define the actual rules
# and if statements that will allow us to discriminate between the various if Tetris blocks:
# if the aspect ratio is approximately one, then the shape is a square
if aspectRatio >= 0.98 and aspectRatio <= 1.02:
shape = "SQUARE"
Step 3: Run the python script (contour_properties_2.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python contour_properties_2.py
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 87
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Inference:
CONTOUR APPROXIMATION
Contour approximation is an algorithm for reducing the number of points in a curve with a reduced set
of points — thus, an approximation. This algorithm is commonly known as the Ramer-Douglas-Peucker
algorithm, or simply: the split-and-merge algorithm.
The general assumption of this algorithm is that a curve can be approximated by a series of short line
segments.
And we can thus approximate a given number of these line segments to reduce the number of points
it takes to construct a curve.
Overall, the resulting approximated curve consists of a subset of points that were defined by the
original curve.
OBJECTIVES:
1. Understand (at a very high level) the process of contour approximation.
2. Apply contour approximation to distinguish between circles and squares.
3. Use contour approximation to find ―documents‖ in images.
Program 27: From the image given below, to detect only the rectangles, while ignoring the
circles/ellipses.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 88
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# A rectangle has 4 sides. And a circle has no sides. Or, in this case, since we need to
# represent a circle as a series of points: a circle is composed of many tiny line
# segments — far more than the 4 sides that compose a rectangle. So if we approximate
# the contour and then examine the number of points within the approximated contour,
# we‘ll be able to determine if the contour is a square or not. Once we have the
# approximated contour, we check the len (i.e. the length, or number of entries in the list)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 89
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# to see how many vertices (i.e. points) our approximated contour has. If our
# approximated contour has a four vertices, we can thus mark it as a rectangle.
if len(approx) == 4:
# draw the outline of the contour and draw the text on the image
cv2.drawContours(image, [c], -1, (0, 255, 255), 2)
(x, y, w, h) = cv2.boundingRect(approx)
cv2.putText(image, "Rectangle", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX,
0.5, (0, 255, 255), 2)
# show the output image
cv2.imshow("Image", image)
cv2.waitKey(0)
Step 3: Run the python script (approx_simple.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python approx_simple.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 90
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# we need to discard all this noise and find only the receipt outline?
# It is a two-step process. The first step is to sort the contours by their size, keeping only the largest
# ones and the second step is to apply contour approximation.
# find contours in the image and sort them from largest to smallest, keeping only the largest ones
# we have only the 7 largest contours in the image
(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:7]
# show the difference in number of vertices between the original and approximated contours
print "original: {}, approx: {}".format(len(c), len(approx))
# if the approximated contour has 4 vertices, then we have found our rectangle
if len(approx) == 4:
# draw the outline on the image
cv2.drawContours(image, [approx], -1, (0, 255, 0), 2)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 91
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (approx_realworld.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python approx_realworld.py
Inference:
The original receipt contour had over 279 points prior to approximation — that original shape was by no
means a rectangle! However, by applying contour approximation we were able to sift through all the
noise and reduce those 279 points down to 4 points. And since our 4 points formed a rectangle, we can
thus label the region as our receipt.
SORTING CONTOURS
OpenCV does not provide a built-in function or method to perform the actual sorting of contours.
OBJECTIVES:
1. Sort contours according to their size/area, along with a template to follow to sort contours by any
other arbitrary criteria.
2. Sort contoured regions from left-to-right, right-to-left, top-to-bottom, and bottom-to-top using only a
single function.
Program 28:
# Defining our sort_contours function which will enable us to sort our contours.
# Function takes two arguments. The first is cnts, the list of contours that the we want to sort,
# The second is the sorting method, which indicates the direction in which we are going to sort
# our contours (i.e. left-to-right, top-to-bottom, etc.).
def sort_contours(cnts, method="left-to-right"):
# initialize the reverse flag and sort index
# These variables simply indicate the sorting order (ascending or descending) and the
# index of the bounding box we are going to use to perform the sort
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 92
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# handle if we are sorting against the y-coordinate rather than the x-coordinate of the
# bounding box
if method == "top-to-bottom" or method == "bottom-to-top":
i=1
# construct the list of bounding boxes and sort them from top to bottom
# first compute the bounding boxes of each contour, which is simply the starting (x, y)
# coordinates of the bounding box followed by the width and height
# The boundingBoxes enable us to sort the actual contours. Using this code we are able
# to sort both the contours and bounding boxes.
boundingBoxes = [cv2.boundingRect(c) for c in cnts]
(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
key=lambda b:b[1][i], reverse=reverse))
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 93
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# find contours in the accumulated image, keeping only the largest ones
# to sort them according to their size by using a combination of the Python sorted function and
# the cv2.contourArea method — this allows us to sort our contours according to their area (i.e.
# size) from largest to smallest.
(cnts, _) = cv2.findContours(accumEdged.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:5]
orig = image.copy()
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 94
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
cv2.waitKey(0)
Step 3: Run the python script (sort_contours.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python sort_contours.py --image images/lego_blocks_1.png --method "top-to-bottom"
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 95
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Because histograms captures the frequency distribution of a set of data. And it turns out that
examining these frequency distributions is a very nice way to build simple image processing techniques
— along with very powerful machine learning algorithms.
OBJECTIVES:
1. What is a histogram?
2. How to compute a histogram in OpenCV.
3. How to compute a grayscale histogram of an image.
4. Write code to extract a ―flattened‖ RGB histogram from an image.
5. Extract multi-dimensional color histograms from an image.
What is a histogram?
A histogram represents the distribution of pixel intensities (whether color or gray- scale) in an image.
It can be visualized as a graph (or plot) that gives a high-level intuition of the intensity (pixel value)
distribution.
We are going to assume a RGB color space in this example, so these pixel values will be in the range
of 0 to 255.
When plotting the histogram, the X-axis serves as our ―bins‖.
If we construct a histogram with 256 bins, then we are effectively counting the number of times each
pixel value occurs.
In contrast, if we use only 2 (equally spaced) bins, then we are counting the number of times a pixel is
in the range [0, 128] or [128,255].
The number of pixels binned to the x-axis value is then plotted on the y-axis.
In the figure given below, we have plotted a histogram with 256-bins along the x-axis and the
percentage of pixels falling into the given bins along the y-axis.
Examining the histogram, note that there are three primary peaks.
The first peak in the histogram is around x=20 where we see a sharp spike in the number of pixels,
clearly there is some sort of object in the image that has a very dark value.
We then see a much slower rising peak in the histogram, where we start to ascend around x=50 and
finally end the descent around x=120. This region probably refers to a background region of the image.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 96
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Finally, we see there is a very large number of pixels in the range x=220 to x=245. It‘s hard to say
exactly what this region is, but it must dominate a large portion of the image.
By simply examining the histogram of an image, you get a general understanding regarding the
contrast, brightness, and intensity distribution.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 97
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 29:
# normalize the histogram, simply dividing the raw frequency counts for each bin of the
# histogram by the sum of the counts, this leaves us with the percentage of each bin rather than
# the raw count of each bin.
hist /= hist.sum()
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 98
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
plt.ylabel("% of Pixels")
plt.plot(hist)
plt.xlim([0, 256])
plt.show()
Step 3: Run the python script (grayscale_histogram.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python grayscale_histogram.py -i grayscale-histogram_total_pixels.jpg
Inference:
Program 30:
# grab the image channels, initialize the tuple of colors and the figure
# OpenCV reverses this order to BGR
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 99
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# Now we move on to multi-dimensional histograms and take into consideration two channels at
# a time. For example, ―How many pixels have a Red value of 10 AND a Blue value of 30?‖
# ―How many pixels have a Green value of 200 AND a Red value of 130?‖ By using the
# conjunctive AND, we are able to construct multi-dimensional histograms.
# let's move on to 2D histograms -- reduce the number of bins in the histogram from 256 to 32
fig = plt.figure()
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 100
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# our 2D histogram could only take into account 2 out of the 3 channels in the image so now let's
# build a 3D color histogram (utilizing all channels) with 8 bins in each direction -- we can't plot
# the 3D histogram, but the theory is exactly like that of a 2D histogram, so we'll just show the
# shape of the histogram
hist = cv2.calcHist([image], [0, 1, 2], None, [8, 8, 8], [0, 256, 0, 256, 0, 256])
print "3D histogram shape: %s, with %d values" % (hist.shape, hist.flatten().shape[0])
Step 3: Run the python script (color_histograms.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python color_histograms.py -i color_histograms_flattened.jpg
Inference:
HISTOGRAM EQUALIZATION
Histogram equalization improves the contrast of an image by ―stretching‖ the distribution of pixels.
Consider a histogram with a large peak at the center of it.
Applying histogram equalization will stretch the peak out towards the corner of the image, thus
improving the global contrast of the image.
Histogram equalization is applied to grayscale images.
This method is useful when an image contains foregrounds and backgrounds that are both dark or
both light.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 101
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
It tends to produce unrealistic effects in photographs; however, is normally useful when enhancing
the contrast of medical or satellite images.
Program 31:
# show our images -- notice how the contrast of the second image has been stretched
cv2.imshow("Original", image)
cv2.imshow("Histogram Equalization", eq)
cv2.waitKey(0)
Step 3: Run the python script (equalize.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python equalize.py --image histogram_equalization.jpg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 102
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 32:
# The mask defaults to None if we do not have a mask for the image.
def plot_histogram(image, title, mask=None):
# grab the image channels, initialize the tuple of colors and the figure
chans = cv2.split(image)
colors = ("b", "g", "r")
plt.figure()
plt.title(title)
plt.xlabel("Bins")
plt.ylabel("# of Pixels")
# construct a mask for our image -- our mask will be BLACK for regions to IGNORE and WHITE
# for regions to EXAMINE
# We define our as a NumPy array, with the same width and height as our beach image.
# Then draw a white rectangle starting from point (60, 210) to point (290, 390).
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 103
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# This rectangle will serve as our mask — only pixels in our original image belonging to the
# masked region will be considered in the histogram computation.
mask = np.zeros(image.shape[:2], dtype="uint8")
cv2.rectangle(mask, (60, 290), (210, 390), 255, -1)
cv2.imshow("Mask", mask)
# compute a histogram for our image, but we'll only include pixels in the masked region
plot_histogram(image, "Histogram for Masked Image", mask=mask)
Step 3: Run the python script (histogram_with_mask.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python histogram_with_mask.py
Inference:
For the masked image, most red pixels fall in the range [10, 25], indicating that red pixels contribute
very little to our image. This makes sense, since our ocean and sky are blue. Green pixels are then
present, but these are toward the lighter end of the distribution, which corresponds to the green foliage
and trees. Finally, our blue pixels fall in the brighter range and are obviously our blue ocean and sky.
Most importantly, compare our masked color histograms to the unmasked color histograms. Notice how
dramatically different the color histograms are. By utilizing masks, we are able to apply our computation
only to the specific regions of the image that interest us — in this example, we simply wanted to
examine the distribution of the blue sky and ocean.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 104
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
OBJECTIVES:
1. Review the classical two-pass algorithm used for connected-component analysis.
2. Apply connected-component analysis to detect characters and blobs in a license plate image.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 105
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Then, in the second pass, the connected-component analysis algorithm loops over the labels
generated from the first pass and merges any regions together that share connected labels.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 106
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
structure.
If so, then we can proceed on to the next step — the label of the current pixel already has the
smallest possible value based on how it is connected to its neighbors.
Otherwise, we follow the tree until we reach a root in the structure. Once we have reached a root, we
assign the value at the root to the current pixel:
By applying this second pass, we can connect blobs with
different label values but that are actually part of the same blob.
The key to efficiency is to use the union-find data structure for
tree-traversal when examining label values.
On the left, you can see an image of a license plate, and on the right, we can see the threshold binary
image of the license plate.
Our goal is to use connected-component analysis to label each of the white ―blobs‖ in the license
plate and then analyze each of these blobs to determine which regions are license plate characters and
which ones can be discarded.
Program 33:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 107
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# extract the Value component from the HSV color space and apply adaptive thresholding to
# reveal the characters on the license plate
V = cv2.split(cv2.cvtColor(plate, cv2.COLOR_BGR2HSV))[2]
thresh = threshold_adaptive(V, 29, offset=15).astype("uint8") * 255
thresh = cv2.bitwise_not(thresh)
# perform connected component analysis on the thresholded images and initialize the mask to
# hold only the "large" components we are interested in
# we make a call to the label method of measure, which performs our actual connected-
# component labeling. The label method requires a single argument, which is our binary thresh
# image that we want to extract connected-components from. We‘ll also supply neighbors=8 to
# indicate we want to perform connected-component analysis with 8-connectivity. Finally, the
# optional background parameter indicates that all pixels with a value of 0 should be considered
# background and ignored by the label method.
# The label method returns labels, a NumPy array with the same dimension as our thresh
# image. Each (x, y)-coordinate inside labels is either 0 (indicating that the pixel is background
# and can be ignored) or a value > 0, which indicates that it is part of a connected-component.
# Each unique connected-component in the image has a unique label inside .
labels = measure.label(thresh, neighbors=8, background=0)
mask = np.zeros(thresh.shape, dtype="uint8")
print("[INFO] found {} blobs".format(len(np.unique(labels))))
# Now that we have the labels, we can loop over them individually and analyze each one to
# determine if it is a license plate character or not.
# loop over the unique components
for (i, label) in enumerate(np.unique(labels)):
# if this is the background label, ignore it
if label == 0:
print("[INFO] label: 0 (background)")
continue
# otherwise, construct the label mask to display only connected components for the
# current label
# However, in the case we are examining a foreground label, we construct a labelMask
# with the same dimensions as our thresh image. We then set all (x, y)-coordinates in
# labelMask that belong to the current label in labels to white — here, we are simply
# drawing the current blob on the labelMask image.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 108
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# At Last, we need to determine if the current blob is a license plate character or not. For
# this particular problem, this filtering is actually quite simple — all we need to do is use
# the cv2.countNonZero to count the number of non-zero pixels in the labelMask and
# then make a check to see if numPixels falls inside an acceptable range to ensure that
# the blob is neither too small nor too big. Provided that numPixels passes this test, we
# accept the blob as being a license plate character.
print("[INFO] label: {} (foreground)".format(i))
labelMask = np.zeros(thresh.shape, dtype="uint8")
labelMask[labels == label] = 255
numPixels = cv2.countNonZero(labelMask)
# if the number of pixels in the component is sufficiently large, add it to our mask of
# "large" blobs
if numPixels > 300 and numPixels < 1500:
mask = cv2.add(mask, labelMask)
Step 3: Run the python script (connected_components_labeling.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python connected_components_labeling.py
Inference:
Note: In versions of scikit-image <= 0.11.X, the background label was originally -1. However, in newer
versions of scikit-image (such as >= 0.12.X), the background label is 0. Make sure you check which
version of scikit-image you are using and update the code to use the correct background label as this
can affect the output of the script.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 109
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
How to quantify and abstractly represent an image using only a list of numbers?
The process of quantifying an image is called feature extraction.
The process of feature extraction governs the rules, algorithms, and methodologies we use to
abstractly quantify the contents of an image using only a list of numbers, called a feature vector.
Normally real, integer, or binary valued.
Image descriptors and feature descriptors govern how an image is abstracted and quantified, while
feature vectors are the output of descriptors and used to quantify the image. Taken as a whole, this
process is called feature extraction.
Reasons to extract the features from the image are:
1. to compare the images for similarity;
2. to rank images in search results when building an image search engine;
3. to use when training an image classifier to recognize the contents of an image.
OBJECTIVES:
To learn about:
1. Feature vector
2. Image descriptor
3. Feature descriptor
FEATURE VECTOR
Feature vectors are used to represent a variety of properties of an image, such as the shape, color, or
texture of an object in an image. They can also combine various properties.
A feature vector could jointly represent shape and color. Or it could represent texture and shape. Or it
could represent all three!
The general process of extracting a feature vector from an image is shown below:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 110
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
IMAGE DESCRIPTOR:
An image descriptor is an algorithm and methodology that governs how an input image is globally
quantified and returns a feature vector abstractly representing the image contents.
Global — this implies that we will be examining the entire image to compute the feature vector.
Apply Image
[0.51, 0.42, 0.96, ....]
Descriptor
Examples of image descriptors are color channel statistics, color histograms and Local Binary
Patterns, etc.,.
One of the primary benefits of image descriptors is that they tend to be much simpler than feature
descriptors.
The feature vectors derived from image descriptors can be immediately passed down to the classifier
to recognize the contents of an image or building an image search engine.
Image descriptors are not robust to changes in rotation, translation and viewpoints.
FEATURE DESCRIPTORS:
Feature descriptor is an algorithm and methodology that governs how an input region of an image is
locally quantified.
A feature descriptor accepts a single input image and returns multiple feature vectors.
Examples of feature descriptors are SIFT, SURF, ORB, BRISK, BRIEF, and FREAK.
Feature descriptors tend to be much more powerful than our basic image descriptors since they take
into account the locality of regions in an image and describe them in separately.
As you‘ll see later in this section, feature descriptors also tend to be much more robust to changes in
the input image, such as rotation, translation, orientation (i.e. rotation), and changes in viewpoint.
In most cases the feature vectors extracted using feature descriptors are not directly applicable to
building an image search engine or constructing an image classifier in their current state (the exception
being keypoint matching/spatial verification, which we detail when identifying the covers of books).
This is because each image is now represented by multiple feature vectors rather than just one.
To remedy this problem, we construct a bag-of-visual-words, which takes all the feature vectors of
an image and constructs a histogram, counting the number of times similar feature vectors occur in an
image.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 111
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
OBJECTIVES:
1. Learn how to extract color channel statistic feature vectors from images.
2. Apply color channel statistics and the Euclidean distance to rank images for similarity.
The color channel image descriptor can be broken down into three steps:
Step 1: Separate the input image into its respective channels. For an RGB image, we want to examine
each of the Red, Green, and Blue channels, independently.
Step 2: Compute various statistics for each channel, such as mean, standard deviation, skew, and
kurtosis.
Step 3: Concatenate the statistics together to form a ―list‖ of statistics for each color channel — this
becomes our feature vector.
Program 34:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 112
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# the Euclidean distance simply takes the sum of squared difference between each entry in
# the p and q vectors, and finally takes the square-root of this sum.
# A larger Euclidean distance implies that the two points are farther away from each other in a
# Euclidean space. A smaller Euclidean distance implies that the two points are closer
# together in a Euclidean space, with a distance of 0 implying that the points are identical.
from scipy.spatial import distance as dist
from imutils import paths
import numpy as np
import cv2
# extract the mean and standard deviation from each channel of the BGR image, then
# update the index with the feature vector
# In this case, our feature vector consists of the means and standard deviations,
# allowing us to characterize the color distribution of our images.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 113
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# display the query image and grab the sorted keys of the index dictionary
# we‘ll be using the trex_01.png image as our query image — all other images in our dataset,
# (i.e. trex_02.png , trex_03.png , and trex_04.png ) will be compared to trex_01.png .
query = cv2.imread(imagePaths[0])
cv2.imshow("Query (trex_01.png)", query)
keys = sorted(index.keys())
# load the current image and compute the Euclidean distance between the query image (i.e. the
# 1st image) and the current image
# the dist.euclidean function to compute the Euclidean distance between the query image
# feature vector and the feature vectors in our dataset. As I mentioned above, similar images
# will have a smaller Euclidean distance, whereas less similar images will have
#a larger Euclidean distance.
image = cv2.imread(imagePaths[i])
d = dist.euclidean(index["trex_01.png"], index[k])
# display the distance between the query image and the current image
cv2.putText(image, "%.2f" % (d), (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
0.75, (0, 255, 0), 2)
cv2.imshow(k, image)
Step 3: Run the python script (color_channel_stats.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python color_channel_stats.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 114
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
OBJECTIVES:
1. Learn how histograms can be used as image descriptors.
2. Apply k-means clustering to cluster color histogram features.
COLOR HISTOGRAMS:
Color histogram counts the number of times a given pixel intensity occurs in an image.
Using a color histogram we can express the actual distribution or ―amount‖ of each color in an image.
The counts for each color/color range are used as feature vectors.
If we decided to utilize a 3D color histogram with 8 bins per channel, we could represent any
image of any size using only 8 x 8 x 8 = 512 bins, or a feature vector of 512-d.
The size of an image has no effect on our output color histogram — although it‘s wise to resize large
images to more manageable dimension to increase the speed of the histogram computation.
k-means is a clustering algorithm.
k-means is to partition n data points into k clusters.
Each of the n data points will be assigned to a cluster with the nearest mean.
The mean of each cluster is called its ―centroid‖ or ―center‖.
Applying k-means yields k separate clusters of the original n data
points.
Data points inside a particular cluster are considered to be ―more
similar‖ to each other than data points that belong to other clusters.
In this particular program, we will be clustering the color histograms
extracted from the images in our dataset — but in reality, you could be
clustering any type of feature vector.
Histograms that belong to a given cluster will be more similar in color distribution than
histograms belonging to a separate cluster.
One caveat of k-means is that we need to specify the number of clusters we want to generate ahead
of time.
There are algorithms that automatically select the optimal value of k.
For the time being, we‘ll be manually supplying a value of k=2 to separate the two classes of images.
Program 35: Before we can cluster vacation photo dataset into two distinct groups, we first need to
extract color histograms from each of the 10 images in the dataset. With that in mind, let‘s go ahead
and define the directory structure of this project:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 115
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
|--- example
| |--- __init__.py
| |--- descriptors
| | |---- __init__.py
| | |--- labhistogram.py
|--- cluster_histograms.py
First we‘ll be defining our image descriptor inside the descriptors sub-module of the example
package. And inside the descriptors sub-module, we‘ll create a LabHistogram class to extract color
histograms from images in the L*a*b* color space:
# Save it as labhistogram.py
# Define image descriptors as classes rather than functions
# import the necessary packages
import cv2
class LabHistogram:
def __init__(self, bins):
# store the number of bins for the histogram
self.bins = bins
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 116
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
import numpy as np
import argparse
import cv2
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True, help="path to the input dataset directory")
ap.add_argument("-k", "--clusters", type=int, default=2,help="# of clusters to generate")
args = vars(ap.parse_args())
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 117
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
data.append(hist)
# Now that we have all of our color features extracted, we can cluster the feature vector using
# the k-means algorithm. We initialize k-means using the supplied number of clusters via
# command line argument. And a call to clt.fit_predict not only performs the actual clustering,
# but performs the prediction as to which histogram (and thus which associated image) belongs
# to which of the 2 clusters.
clt = KMeans(n_clusters=args["clusters"])
labels = clt.fit_predict(data)
#print labels
# Now that we have our color histograms clustered, we need to grab the unique IDs for each
# cluster. This is handled by making a call to np.unique, which returns the unique values inside
# a list. For each unique label , we need to grab the image paths that belong to the cluster). And
# for each of the images that belong to the current cluster, we load and display the image to our
# screen.
# loop over the unique labels
for label in np.unique(labels):
# grab all image paths that are assigned to the current label
labelPaths = imagePaths[np.where(labels == label)]
# loop over the image paths that belong to the current label
for (i, path) in enumerate(labelPaths):
# load the image and display it
image = cv2.imread(path)
cv2.imshow("Cluster {}, Image #{}".format(label + 1, i + 1), image)
Step 3: Run the python script (cluster_histograms.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python cluster_histograms.py --dataset dataset
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 118
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 119
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
However, being able to capture details at a small scale also is the biggest drawback of the
algorithm — we cannot capture details at varying scales, only the fixed 3 x 3 scale.
To handle this, an extension to the original LBP implementation was proposed to handle variable
neighborhood sizes.
To account for variable neighborhood sizes, two parameters were introduced:
1. The number of points p in a circularly symmetric neighborhood to consider (thus removing
relying on a square neighborhood).
2. The radius of the circle r, which allows us to account for different scales.
It‘s also important to keep in mind the effect of both the radius r and the number of points p.
The more points p you sample, the more patterns you can encode, but at the same time you increase
your computational cost.
# initialize the local binary patterns descriptor and initialize the index dictionary where the image
# filename is the key and the features are the value
# define a dictionary called index , where the key to the dictionary is the unique shirt image
# filename and the value is the extracted LBPs. We‘ll be using this dictionary to store our
# extracted feature and aid us in comparing the query image to our dataset.
index = {}
radius=8
numPoints=24
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 120
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# compute the Local Binary Pattern representation of the image, and then use the LBP
# representation to build the histogram of patterns
lbp = feature.local_binary_pattern(image,numPoints, radius, method="uniform")
(hist, _) = np.histogram(lbp.ravel(), bins=range(0, numPoints + 3),
range=(0, numPoints + 2))
# normalize the histogram
hist = hist.astype("float")
hist /= (hist.sum() + eps)
# load the query image and extract Local Binary Patterns from it
query = cv2.imread(args["query"])
queryFeatures = describe(cv2.cvtColor(query, cv2.COLOR_BGR2GRAY))
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 121
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (search_shirts.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ # python search_shirts.py --dataset shirts --query queries/query_01.jpg
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 122
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 123
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Implementing this descriptor requires dividing the image into small connected regions called cells,
and then for each cell, computing a histogram of oriented gradients for the pixels within each cell.
We can then accumulate these histograms across multiple cells to form our feature vector.
gX gY
Now that we have our gradient images, we can compute the final gradient magnitude representation
of the image:
Combined
Finally, the orientation of the gradient for each pixel in the input image can then be computed by:
𝐺𝑥
Ɵ = tan−1 ( )
𝐺𝑦
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 124
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Given both |G| and , we can now compute a histogram of oriented gradients, where the bin of the
histogram is based on and the contribution or weight added to a given bin of the histogram is based
on |G| .
Now, for each of the cells in the image, we need to construct a histogram of oriented gradients using
our gradient magnitude |G| and orientation mentioned above.
But before we construct this histogram, we need to define our number of orientations.
The number of orientations control the number of bins in
the resulting histogram.
The gradient angle is either within the
range [0,180] (unsigned) or [0,360] (signed).
Finally, each pixel contributes a weighted vote to the
histogram — the weight of the vote is simply the gradient
magnitude |G| at the given pixel.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 125
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 126
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
import imutils
import cv2
# find contours in the edge map, keeping only the largest one, presumed to be the car logo
(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
c = max(cnts, key=cv2.contourArea)
# take the largest contour region, compute the bounding box, and extract the ROI.
# extract the logo of the car and resize it to a canonical width and height
# having various widths and heights for your image can lead to HOG feature vectors of different
# sizes — in nearly all situations this is not the intended behavior that you want.
# Remember, our extracted feature vectors are supposed to characterize and represent the
# visual contents of an image. And if our feature vectors are not the same dimensionality, then
# they cannot be compared for similarity. And if we cannot compare our feature vectors for
# similarity, we are unable to compare our two images at all.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 127
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# Because of this, when extracting HOG features from a dataset of images, you‘ll want to define
# a canonical, known size that each image will be resized to. In many cases, this means that
# you‘ll be throwing away the aspect ratio of the image. Normally, destroying the aspect ratio of
# an image should be avoided — but in this case we are happy to do it, because it ensures (1)
# that each image in our dataset is described in a consistent manner, and (2) each feature
# vector is of the same dimensionality.
# our logo is resized to a known, predefined 200 x 100 pixels
(x, y, w, h) = cv2.boundingRect(c)
logo = gray[y:y + h, x:x + w]
logo = cv2.resize(logo, (200, 100))
# Finally, given the HOG feature vector, we then update our data matrix and labels list with the
# feature vector and car make, respectively.
data.append(H)
labels.append(make)
# Given our data and labels we can now train our classifier
# To recognize and distinguish the difference between our five car brands, we are going to use
# scikit-learns KNeighborsClassifier.
# The k-nearest neighbor classifier is a type of ―lazy learning‖ algorithm where nothing is
# actually ―learned‖. Instead, the k-Nearest Neighbor (k-NN) training phase simply accepts a set
# of feature vectors and labels and stores them — that‘s it! Then, when it is time to classify a
# new feature vector, it accepts the feature vector, computes the distance to all stored feature
# vectors (normally using the Euclidean distance, but any distance metric or similarity metric can
# be used), sorts them by distance, and returns the top k ―neighbors‖ to the input feature vector.
# From there, each of the k neighbors vote as to what they think the label of the classification is.
# In our case, we are simply passing the HOG feature vectors and labels to our k-NN algorithm
# and ask it to report back what is the closest logo to our query features using k=1 neighbors.
print "[INFO] training classifier..."
model = KNeighborsClassifier(n_neighbors=1)
model.fit(data, labels)
print "[INFO] evaluating..."
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 128
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# extract Histogram of Oriented Gradients from the test image and predict the make of
# the car
# call to our k-NN classifier, passing in our HOG feature vector for the current testing
# image and asking the classifier what it thinks the logo is.
(H, hogImage) = feature.hog(logo, orientations=9, pixels_per_cell=(10, 10),
cells_per_block=(2, 2), transform_sqrt=True, visualise=True)
pred = model.predict(H.reshape(1, -1))[0]
Step 3: Run the python script (recognize_car_logos.py ) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python recognize_car_logos.py --training car_logos --test test_images
Inference: Of course, this approach only worked, because we had a tight cropping of the car logo. If we
had described the entire image of a car, it is very unlikely that we would have been able to correctly
classify the brand. But again, that‘s something we can resolve when we get to the Custom Object
Detector, specifically sliding windows and image pyramids.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 129
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 130
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 38:
Step 3: Run the python script (detect_fast.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_fast.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 131
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
HARRIS:
Harris detector is one of the most common corner detectors that you‘ll encounter in the computer
vision world.
It is quite fast (not as fast as the FAST keypoint detector), but more accurately marks regions as
corners.
The Harris keypoint detector is heavily rooted in linear algebra; however, the most intuitive way to
understand the detector is to take a look at the following figure:
On the left, we have the original image that we want to detect keypoints on.
The middle image represents the gradient magnitude in the x direction. Finally, the right image
represents the gradient magnitude in the y direction.
Here, we have a simple 2x2 pixel region.
The top-left and bottom-right pixels are black, and the top-right and bottom-left
pixels are white.
At the center of these pixels, we thus have a corner (denoted as the red
circle).
So how can we algorithmically define this region as a corner?
Simple! We‘ll just take the summation of the gradient values in the region in
both the x and y direction, respectively: (𝐺𝑥 )2 and (𝐺𝑦 )2 .
If both these values are sufficiently ―large‖, then we can define the region as a
corner.
This process is done for every pixel in the input image.
This method works because the region enclosed inside the red circle will have
a high number of both horizontal and vertical gradients.
To extend this method to arbitrary corners, we first need to (1) compute the
gradient magnitude representation of an image, and (2) then use these gradient
magnitude representations to construct a matrix M:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 132
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Now that M is defined, we can take the eigenvalue decomposition of the matrix, leaving us a ―score‖
indicating the ―cornerness‖ (i.e. a value to quantify and score how much of a ―corner‖ the region is):
Program 39:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 133
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (detect_harris.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_harris.py
Inference:
The Harris detector found 453 corners in our image, most of which correspond to the corners of the
keyboard, the corners on the author text, and the corners on the book logo.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 134
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 135
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 40:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 136
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (detect_dog.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_dog.py
Inference:
Using the DoG detector, we have found 660 keypoints on the book. Notice how the keypoints have
varying size — this is due to the octaves that we have formed to detect local minima and maxima.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 137
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
For each of the 16 windows, we compute the gradient magnitude and orientation, just like we did for
the HOG descriptor.
Given the gradient magnitude and orientation, we next construct an 8-bin histogram for each of the 4
x 4 pixel windows:
The amount added to each bin is dependent on the magnitude of the gradient.
However, we are not going to use the raw magnitude of the gradient.
Instead, we are going to utilize Gaussian weighting.
The farther the pixel is from the keypoint center, the less it contributes to the overall histogram.
Finally, the third step of SIFT is to collect all 16 of these 8-bin orientation histograms and concatenate
them together:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 138
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Given that we have 16 of these histograms, our feature vector is of thus: 16 x 8 = 128-d.
Program 41:
# load the input image, convert it to grayscale, detect keypoints, and then
# extract local invariant descriptors
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kps = detector.detect(gray)
(kps, descs) = extractor.compute(gray, kps)
# show the shape of the keypoints and local invariant descriptors array
print("[INFO] # of keypoints detected: {}".format(len(kps)))
print("[INFO] feature vector shape: {}".format(descs.shape))
Step 3: Run the python script (extract_sift.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python extract_sift.py
Inference:
Again, it‘s important to note that unlike global image descriptors such as Local Binary
Patterns, Histogram of Oriented Gradients, or Haralick texture (where we have only one feature vector
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 139
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
extracted per image), local descriptors return N feature vectors per image, where N is the number of
detected keypoints. This implies that given N detected keypoints in our input image, we‘ll obtain N x
128-d feature vectors after applying SIFT.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 140
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 141
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
(viewpoint, scale, deformation, occlusion, illumination, background clutter, and intra-class variation) and
still be able to detect the presence of the object, even under less-than-ideal circumstances.
Program 42: Write a code to perform actual object recognition, specifically recognizing stop signs in
images.
The CALTECH-101 dataset is a very popular benchmark dataset for object detection and has been
used by many researchers, academics, and computer vision developers to evaluate their object
detection algorithms.
The dataset includes 101 categories, spanning a diverse range of objects including elephants,
bicycles, soccer balls, and even human brains, just to name a few.
When you download the CALTECH-101 dataset, you‘ll notice that it includes both an images and
annotations directory.
For each image in the
dataset, an associated
bounding box (i.e. (x, y)-
coordinates of the object)
is provided.
Our goal is to take both
the images and the bounding boxes (i.e. the annotations) and train a classifier to detect the presence of
a given image in an image.
We are presented with not only the labels of the images, but also annotations corresponding to
the bounding box surrounding each object.
We‘ll take these bounding boxes, extract features from them, and then use these features to build our
object detector.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 142
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# --class : This is the path to our specific CALTECH-101 class that we want to train an object
# detector for. For this example, we‘ll be using the stop sign class.
#--annotations : For each image in the dataset, we also have the corresponding bounding boxes
# for each object in the image — the --annotations switch specifies the path to our bounding
# boxes directly for the specific class we are training on.
#--output : After our model has been trained, we would like to dump it to file — this is the path to
# our output classifier.
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--class", required=True,help="Path to the CALTECH-101 class images")
ap.add_argument("-a", "--annotations", required=True, help="Path to the CALTECH-101 class
annotations")
ap.add_argument("-o", "--output", required=True, help="Path to the output detector")
args = vars(ap.parse_args())
# grab the default training options for our HOG + Linear SVM detector,
# initialize the images list to store the images we are using to train our classifier as well as
# initialize the boxes list to store the bounding boxes for each of the images.
print("[INFO] gathering images and bounding boxes...")
options = dlib.simple_object_detector_training_options()
images = []
boxes = []
# loop over the annotations and add each annotation to the list of bounding boxes
bb = [dlib.rectangle(left=long(x), top=long(y), right=long(w), bottom=long(h))
for (y, h, x, w) in annotations]
boxes.append(bb)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 143
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (train_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python train_detector.py --class stop_sign_images --annotations stop_sign_annotations
--output output/stop_sign_detector.svm
Program 43: Gathered 11 images of stop signs from Google that our classifier has not been trained on.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 144
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (test_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python test_detector.py --detector output/stop_sign_detector.svm --testing stop_sign_testing
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 145
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Utilizing an image pyramid allows us to find objects in images at different scales of an image.
At the bottom of the pyramid, we have the original image at its original size (in terms of width and
height).
At each subsequent layer, the image is resized (sub-sampled) and optionally smoothed via Gaussian
blurring.
The image is progressively sub-sampled until some stopping criterion is met, which is normally a
minimum size being reached, indicating that no further sub-sampling needs to take place.
When combined with the second component of our ―scanner‖ the sliding window, we can find objects
in images at various locations.
As the name suggests, a sliding window ―slides‖ from left to right and top to bottom of each scale in
the image pyramid.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 146
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Again, by leveraging both image pyramids and sliding windows together, we are able to detect
objects at various locations and scales in an image.
Program 44:
# if the resized image does not meet the supplied minimum size, then stop constructing the
# pyramid
if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]:
break
# loop over the layers of the image pyramid and display them
for (i, layer) in enumerate(pyramid(image, scale=args["scale"])):
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 147
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (test_pyramid.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python test_pyramid.py --image florida_trip.png --scale 1.5
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 148
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 45:
# The first is the image that we are going to loop over. The second argument is the stepSize .
# The stepSize indicates how many pixels we are going to ―skip‖ in both the (x, y) direction.
# Normally, we would not want to loop over each and every pixel of the image (i.e. stepSize=1 )
# as this would be computationally prohibitive if we were applying an image classifier at each
# window. In practice, it‘s common to use a stepSize of 4 to 8 pixels. Remember, the smaller
# your step size is, the more windows you‘ll need to examine. The larger your step size is, the
# less windows you‘ll need to examine — note, however, that while this will be computationally
# more efficient, you may miss detecting objects in images if your step size becomes too large.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 149
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# The last argument, windowSize , defines the width and height (in terms of pixels) of the
# window we are going to extract from our image .
def sliding_window(image, stepSize, windowSize):
# slide a window across the image
for y in xrange(0, image.shape[0], stepSize):
for x in xrange(0, image.shape[1], stepSize):
# yield the current window
# returns a tuple containing the x and y coordinates of the sliding
# window, along with the window itself.
yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])
# load the input image and unpack the command line arguments
image = cv2.imread(args["image"])
(winW, winH) = (args["width"], args["height"])
# This is where we would process the window, extract hog features, and
# apply a machine learning classifier to perform object detection
# since we do not have a classifier yet, let's just draw the window
clone = layer.copy()
cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
cv2.imshow("Window", clone)
# normally we would leave out this line, but let's pause execution
# of our script so we can visualize the window
cv2.waitKey(1)
time.sleep(0.025)
Step 3: Run the python script (test_sliding_window.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python test_sliding_window.py --image car.jpg --width 96 --height 36
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 150
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
To run imglab , you need to supply two command line arguments over two separate commands:
The first is your output annotations file which will contain the bounding boxes you will manually draw
on each of the images in your dataset.
The second argument is the dataset path which contains the list of images in your dataset.
For this lesson, we‘ll be using a subset of the MIT + CMU Frontal Images dataset as our training data,
followed by a subset of the CALTECH Web Faces dataset for testing.
First, let‘s initialize our annotations file with a list of images in the dataset path:
$ ./imglab -c ~/Desktop/faces_annotations.xml ~/Desktop/faces
From there, we can start the annotation process by using the following command:
$ ./imglab ~/Desktop/faces_annotations.xml
As you can see, the imglab GUI is displayed to my
screen, along with the images in my dataset of faces.
To draw a bounding box surrounding each object in my
dataset, I simply select an image, hold the shift key on my
keyboard, and drag-and-draw the bounding rectangle,
then release my mouse.
Note: It‘s important to label all examples of objects in an
image; otherwise, dlib will implicitly assume that regions
not labeled are regions that should not be detected (i.e.,
hard-negative mining applied during extraction time).
Finally, if there is an ROI that you are unsure about and
want to be ignored entirely during the training process,
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 151
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
simply double click the bounding box and press the i key. This will cross out the bounding box and
mark it as ―ignored‖.
While annotating a dataset of images is a time consuming and tedious task, you should nonetheless
take your time and take special care to ensure the images are properly labeled with their
respective bounding boxes.
Remember, machine learning algorithms are only as good as their input data — if you put garbage in,
you‘ll only get garbage out. But if you take the time to properly label your images, you‘ll get much better
results.
Program 46:
# grab the default training options for the HOG + Linear SVM detector, then
# train the detector -- in practice, the `C` parameter should be cross-validated
# we define the options to our dlib detector. The most important argument to set here is C , the
#―strictness‖ of our SVM. In practice, this value needs to be cross-validated and grid-searched
# to obtain optimal accuracy.
print("[INFO] training detector...")
options = dlib.simple_object_detector_training_options()
options.C = 1.0
options.num_threads = 4
options.be_verbose = True
dlib.train_simple_object_detector(args["xml"], args["detector"], options)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 152
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (train_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python train_detector.py --xml face_detector/faces_annotations.xml --detector
face_detector/detector.svm
Program 47:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 153
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
for b in boxes:
(x, y, w, h) = (b.left(), b.top(), b.right(), b.bottom())
cv2.rectangle(image, (x, y), (w, h), (0, 255, 0), 2)
Step 3: Run the python script (test_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python test_detector.py --detector f
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 154
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
To demonstrate how to build a home surveillance and motion detection system capable of running
in real-time on your Raspberry Pi.
This motion detection system will monitor a particular area of your house (such as the front door for
motion).
When activity occurs, the frame that best captures and characterizes the motion (according to a
criteria we‘ll define later) will be written to disk.
Once the frame has been written to disk, it becomes easy to apply any other type of API integration,
such as uploading the image to an online server, texting ourselves a picture of the intruder, or
uploading the image to Dropbox.
Background subtraction is critical in many computer vision applications.
The applications of background subtraction are to count the number of cars passing through a toll
booth, count the number of people walking in and out of a store.
The background of our video stream is largely static and unchanging over consecutive frames of a
video.
Therefore, if we can model the background, we monitor it for substantial changes.
If there is a substantial change, we can detect it — this change normally corresponds to motion on
our video.
Now obviously in the real-world this assumption can easily fail.
Due to shadowing, reflections, lighting conditions, and any other possible change in the environment,
our background can look quite different in various frames of a video.
And if the background appears to be different, it can throw our algorithms off.
That‘s why the most successful background subtraction/foreground detection systems utilize fixed
mounted cameras and in controlled lighting conditions.
Program 48:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 155
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# if the frame could not be grabbed, then we have reached the end of the video
if not grabbed:
break
# compute the absolute difference between the current frame and first frame
frameDelta = cv2.absdiff(firstFrame, gray)
thresh = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)[1]
# dilate the thresholded image to fill in holes, then find contours on thresholded image
thresh = cv2.dilate(thresh, None, iterations=2)
(cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 156
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# compute the bounding box for the contour, draw it on the frame and update the text
(x, y, w, h) = cv2.boundingRect(c)
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
text = "Occupied"
Step 3: Run the python script (motion_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python motion_detector.py --video videos/example_01.mp4
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 157
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
VIOLA-JONES ALGORITHM
Viola and Jones focus on detecting faces in images, but the framework can be used to train detectors
for arbitrary ―objects,‖ such as cars, buildings, kitchen utensils, and even bananas.
Recall when we discussed image kernels and how we slid a small matrix across our image from left-
to-right and top-to-bottom, computing an output value for each center pixel of the kernel?
Well, it turns out that this sliding window approach is also extremely useful in the context of detecting
objects in an image
In the figure above, we can see that we are sliding a fixed size window across our
image at multiple scales.
At each of these phases, our window stops, computes some features, and then
classifies the region as Yes, this region does contain a face, or No, this region does
not contain a face.
For each of the stops along the sliding window path, five rectangular features are
computed.
To obtain features for each of these five rectangular areas, we
simply subtract the sum of pixels under the white region from the
sum of pixel from the black region.
Interestingly enough, these features have actual real
importance in the context of face detection:
1. Eye regions tend to be darker than cheek regions.
2. The nose region is brighter than the eye region.
Therefore, given these five rectangular regions and their
corresponding difference of sums, we are able to form features
that can classify parts of a face.
Then, for an entire dataset of features, we use
the AdaBoost algorithm to select which ones correspond to facial regions of an image.
However, as you can imagine, using a fixed sliding window and sliding it across every (x, y)-
coordinate of an image, followed by computing these Haar-like features, and finally performing the
actual classification can be computationally expensive.
To combat this, Viola and Jones introduced the concept of cascades or stages.
At each stop along the sliding window path, the window must pass a series of tests where each
subsequent test is more computationally expensive than the previous one.
If any one test fails, the window is automatically discarded.
Program 49:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 158
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
import cv2
Step 3: Run the python script (detect_faces.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_faces.py --face cascades/haarcascade_frontalface_default.xml --image
images/messi.png
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 159
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 50:
# if a video path was not supplied, grab the reference to the webcam
if not args.get("video", False):
camera = cv2.VideoCapture(0)
# keep looping
while True:
# grab the current frame
(grabbed, frame) = camera.read()
# if we are viewing a video and we did not grab a frame, then we have
# reached the end of the video
if args.get("video") and not grabbed:
break
# resize the frame, convert it to grayscale, and detect faces in the frame
frame = imutils.resize(frame, width=400)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faceRects = detector.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=5,
minSize=(30, 30), flags=cv2.cv.CV_HAAR_SCALE_IMAGE)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 160
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (detect_faces_video.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_faces_video.py --face cascades/haarcascade_frontalface_default.xml
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 161
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
The semantic gap is the difference between how a human perceives the contents of an image versus
how an image can be represented in a way a computer can understand and process.
Again, a quick visual examination of the two photos above can reveal the difference between the two
species of animals.
But in reality, the computer has no idea that there are animals in the image to begin with.
To make this point more clear, take a look this photo of a tranquil beach
below:
We might go about describing the image as follows:
Spatial: The sky is at the top of the image and the sand/ocean are at
the bottom.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 162
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Color: The sky is dark blue, the ocean water is lighter than the sky, while the sand is tan.
Texture: The sky has a relatively uniform pattern, while the sand is very coarse.
So how do we encode all this information in a way that a computer can understand?
The answer is to use various forms of image descriptors and deep learning methods.
By using image descriptors and deep learning we can actually extract and quantify regions of an
image.
Some descriptors are used to encode spatial information.
Others quantify the color of an image.
And other features are used to characterize texture.
Finally, based on these characterizations of the image, we can apply machine learning to ―teach‖ our
computers what each type of image ―looks like.‖
CHALLENGES
If the semantic gap was not enough of a problem, we also have to handle variations in how an image
or an object in an image appears. For example, we
have viewpoint variation:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 163
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
still be able to detect and label the presence of the dog in both images.
Just as challenging as the deformations and occlusions mentioned above, we also need to handle
changes in illumination. Take a look at the following image of a coffee cup captured in standard lighting
and low lighting:
The image on the left was photographed with standard overhead
lighting.
And the image on the right was captured with very little lighting.
We are still examining the same coffee cup — but based on the
lighting conditions the cup looks dramatically different.
UNSUPERVISED LEARNING
In contrast to supervised
learning, unsupervised learning has no labels
associated with the input data, and thus we cannot
correct our model if it makes an incorrect
prediction.
Thus, most unsupervised learning methods are
focused on deducing structure present in the input
data.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 164
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
SEMI-SUPERVISED LEARNING
So what happens if we only have some of the labels associated with our data and no labels for the
other?
Is there a way that we can apply some hybrid
of supervised and unsupervised learning and
still be able to classify each of our data points?
It turns out the answer is yes — we just need
to apply semi-supervised learning.
Our semi-supervised learning algorithm
would take the known pieces of data, analyze
them, and then try to label each of the
unlabeled data points for use as extra training data.
This process can then repeat for many iterations as the semi-supervised algorithm learns the
―structure‖ of the data to make more accurate predictions and generate more reliable training data.
The overall goal here is to generate more training data, which the algorithm can use to make itself
―smarter‖.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 165
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
It’s extremely important that the training set and testing set are independent of each other and
do not overlap!
If you use your testing set as part of your training data then your classifier has an unfair advantage,
since it has already seen the testing examples before and ―learned‖ from them.
Instead, you must keep this testing set entirely separate from your training process and use it only to
evaluate your classifier.
Step 5: Evaluation
Last, we need to evaluate our trained classifier.
For each of the feature vectors in our testing set, we present them to our classifier and ask it
to predict what it thinks the label of image is.
We then tabulate the predictions of the classifier for each point in the testing set.
Finally, these classifier predictions are compared to the ground-truth label from our testing set.
The ground-truth labels represent what the category actually is.
From there, we can compute the number of predictions our classifier got right and compute aggregate
reports such as precision, recall, and f-measure, which are used to quantify the performance of our
classifier as a whole.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 166
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
We can see that each of these sets of data points are grouped relatively close together in our n-
dimensional space.
This implies that the distance between two red dots is much smaller than the distance between a red
dot and a blue dot.
However, in order to apply the k-Nearest Neighbor classifier, we first need to select a distance metric
or a similarity function.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 167
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 51:
# take the MNIST data and construct the training and testing split, using 75% of the
# data for training and 25% for testing
(trainData, testData, trainLabels, testLabels) = train_test_split(np.array(mnist.data),
mnist.target, test_size=0.25, random_state=42)
# now, let's take 10% of the training data and use that for validation
(trainData, valData, trainLabels, valLabels) = train_test_split(trainData, trainLabels,
test_size=0.1, random_state=84)
# initialize the values of k for our k-Nearest Neighbor classifier along with the
# list of accuracies for each value of k
kVals = range(1, 30, 2)
accuracies = []
# loop over various values of `k` for the k-Nearest Neighbor classifier
for k in xrange(1, 30, 2):
# train the k-Nearest Neighbor classifier with the current value of `k`
model = KNeighborsClassifier(n_neighbors=k)
model.fit(trainData, trainLabels)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 168
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
i = np.argmax(accuracies)
print("k=%d achieved highest accuracy of %.2f%% on validation data" % (kVals[i],
accuracies[i] * 100))
# re-train our classifier using the best k value and predict the labels of the test data
model = KNeighborsClassifier(n_neighbors=kVals[i])
model.fit(trainData, trainLabels)
predictions = model.predict(testData)
# show a final classification report demonstrating the accuracy of the classifier for each of the
# digits
print("EVALUATION ON TESTING DATA")
print(classification_report(testLabels, predictions))
# convert the image for a 64-dim array to an 8 x 8 image compatible with OpenCV,
# then resize it to 32 x 32 pixels so we can see it better
image = image.reshape((8, 8)).astype("uint8")
image = exposure.rescale_intensity(image, out_range=(0, 255))
image = imutils.resize(image, width=32, inter=cv2.INTER_CUBIC)
Step 3: Run the python script (mnist_demo.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python mnist_demo.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 169
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
LOGISTIC REGRESSION
Let‘s consider a simple two-class classification problem, where we
want to predict if a given image contains a cat or a dog.
We‘ll assign cats to have a label of 0 and dogs to have a label of 1.
Let‘s denote this set of labels as L = {0, 1}.
We‘ll also assume that we have extracted a set of (arbitrary) feature
vectors from our dataset of images to characterize the contents of
each images.
We‘ll call this set of feature vectors F.
Given our set of labels L and feature vectors F, we would like to
create a mathematical function that takes a feature vector as an input
and then returns a value of 0 or 1 (corresponding to the cat or dog
prediction).
If we were to plot this function, it would look something like this:
Where we extract the following feature vector:
This x value is then passed through our sigmoid function where the output is
constrained such that:
Any output from s(x) that is >=0.5 will be classified as 1 (cat) and anything <0.5 will be classified as 0
(dog).
This seems simple enough.
But the big questions lies in defining this weight vector w.
What are the best weight values for w?
And how do we go about finding them?
To answer that, let‘s go back to the input to the sigmoid function x:
Again, v is our input feature vector, and w are the weights associated with each entry in the feature
vector.
Our goal is to find the values of w that make our classifier as accurate as possible; and in order to find
appropriate values of w, we‘ll need to apply gradient ascent/descent.
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 170
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
where are the feature vectors associated with our training data.
Based on this error E, we can then update our weight vector w via:
Program 52:
# grab a small subset of the Labeled Faces in the Wild dataset, then construct
# the training and testing splits (note: if this is your first time running this
# script it may take awhile for the dataset to download -- but once it has downloaded
# the data will be cached locally and subsequent runs will be substantially faster)
print("[INFO] fetching data...")
dataset = datasets.fetch_lfw_people(min_faces_per_person=70, funneled=True, resize=0.5)
(trainData, testData, trainLabels, testLabels) = train_test_split(dataset.data, dataset.target,
test_size=0.25, random_state=42)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 171
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (train_and_test.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python train_and_test.py
Inference:
LINEAR SEPARABILITY
In order to explain SVMs, we should first start with the concept of linear separability.
A set of data is linearly separable if we can draw a straight line that clearly separates all data points
in class #1 from all data points belonging to class #2:
In the figures above, we have two classes of
data represented by blue squares and red
circles, respectively.
In Plot A(left) and Plot B (center), we can
clearly draw a (straight) line through the space
that cleanly places all blue squares on one
side of the line and all red circles on the other.
These plots are examples of data points that
are linear separable.
However, in Figure C (right), this is not the
case.
Here, we see four groupings of data points.
The blue squares are present at the top-left
and bottom-right of the plot, whereas the red circles are at the top-right and bottom-left region (this is
known as the XOR [exclusive OR] problem).
Regardless of whether we have a line, plane, or a hyperplane, this separation is our decision
boundary — or the boundary we use to make a decision if a data point is a blue rectangle or a red
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 172
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
circle. All data points for a given class will lay on one side of the decision boundary, and all data points
for the second class on the other.
Given our decision boundary, I am more confident that the
highlighted square is indeed a square, because it is farther away from the
decision boundary than the circle is.
This all makes sense, but how do we come up with this decision boundary?
For example, all 3 plots below can separate the two classes of data — is one of
these separations better than the other?
Program 53:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 173
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# construct the training and testing split by taking 75% of data for training and 25% for testing
(trainData, testData, trainLabels, testLabels) = train_test_split(X, y, test_size=0.25,
random_state=42)
# train the linear SVM model, evaluate it, and show the results
print("[RESULTS] SVM w/ Linear Kernel")
model = SVC(kernel="linear")
model.fit(trainData, trainLabels)
print(classification_report(testLabels, model.predict(testData)))
print("")
# train the SVM + poly. kernel model, evaluate it, and show the results
print("[RESULTS] SVM w/ Polynomial Kernel")
model = SVC(kernel="poly", degree=2, coef0=1)
model.fit(trainData, trainLabels)
print(classification_report(testLabels, model.predict(testData)))
Step 3: Run the python script (classify.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python classify.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 174
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
K-MEANS ALGORITHM
The k-means algorithm is used to find k clusters in a dataset, where the number of clusters k is a user
supplied value.
Each cluster is represented by a single data point called the centroid.
The centroid is defined as the mean (average) of all data points belonging to the cluster and is thus
simply the center of the cluster:
Here, we can see three clusters of data with the
centroids highlighted as white X‗s.
A visual inspection of this figure reveals that
the X mark for each cluster is the average of all data
points belonging to the cluster.
The pseudo-code for k-means is quite simple:
Step 1: Start off by selecting k random data
points from your input dataset —
these k random data points are your initial
centroids.
Step 2: Assign each data point in the dataset
to the nearest centroid. This requires
computing the distance from each data point to
each centroid (using a distance metric such as
the Euclidean distance) and assigning the data
point to the cluster with the smallest distance.
Step 3: Recalculate the position of all centroids by computing the average of all data points in
the cluster.
Step 4: Repeat Steps 2 and 3 until all cluster assignments are stable (i.e. not flipping back and
forth) or some stopping criterion has been met (such as a maximum number of iterations).
Program 54:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 175
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# convert the canvas to grayscale, threshold it, and detect contours in the image
gray = cv2.cvtColor(canvas, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
thresh = cv2.threshold(gray, 10, 255, cv2.THRESH_BINARY)[1]
(cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# loop over the indexes of the current cluster and draw them
for j in np.where(clt.labels_ == i)[0]:
cv2.drawContours(mask, [cnts[j]], -1, 255, -1)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 176
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (cluster_colors.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python cluster_colors.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 177
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 55:
# if a video path was not supplied, grab the reference to the webcam
if not args.get("video", False):
camera = cv2.VideoCapture(0)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 178
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# keep looping
while True:
# grab the current frame
(grabbed, frame) = camera.read()
# if we are viewing a video and we did not grab a frame, then we have
# reached the end of the video
if args.get("video") and not grabbed:
break
# resize the frame, blur it, and convert it to the HSV color space
frame = imutils.resize(frame, width=600)
blurred = cv2.GaussianBlur(frame, (11, 11), 0)
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# only draw the enclosing circle and text if the radius meets a minimum size
if radius > 10:
cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 255), 2)
cv2.putText(frame, colorName, (cX, cY),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 255), 2)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 179
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF
Step 3: Run the python script (track.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python track.py --video BallTracking_01.mp4
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 180
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 56:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 181
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
import cv2
import numpy as np
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 182
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
matches.append((m[0].trainIdx, m[0].queryIdx))
# load the query image, convert it to grayscale, and extract keypoints and descriptors
queryImage = cv2.imread(args["query"])
gray = cv2.cvtColor(queryImage, cv2.COLOR_BGR2GRAY)
(queryKps, queryDescs) = describe(gray)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 183
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 3: Run the python script (search.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python search.py --db books.csv --covers covers --query queries/query01.png
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 184
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Program 57:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 185
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
import h5py
i, j =0, 0
k =0
# loop over the images
for training_name in train_labels:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 186
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
dir=os.path.join(train_path, training_name)
current_label=training_name
k =1
for x in range(1,81):
file=dir+"/"+str(x) +".jpg"
# read the image and resize it
image= cv2.imread(file)
image= cv2.resize(image, fixed_size)
# Global Features
fv_meanstddev=fd_meanstddev(image)
fv_haralick=fd_haralick(image)
# show status
print "Processed Image: {} in {}".format(k, training_name)
i+=1
k +=1
j +=1
scaler=MinMaxScaler(feature_range=(0, 1))
rescaled_features=scaler.fit_transform(global_features)
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 187
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
h5f_data.close()
h5f_label.close()
print "[STATUS] Training Features and Labels saved.."
Step 4: Run the python script (global.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ global.py
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 188
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
# Prepare MODELS
models= []
models.append(('LR', LogisticRegression(random_state=9)))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier(random_state=9)))
models.append(('RF', RandomForestClassifier(n_estimators=num_trees, random_state=9)))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(random_state=9)))
results= []
names= []
scoring="accuracy"
global_features_string= h5f_data['dataset_1']
global_labels_string= h5f_label['dataset_1']
global_features=np.array(global_features_string)
global_labels=np.array(global_labels_string)
h5f_data.close()
h5f_label.close()
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 189
Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
Step 7: Run the python script (train_test.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ pythontrain_test.py
Inference:
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 190