IPCVML-2017 Workshop Tutorial PDF

Two Days Hands-on Training in "Image Processing, Computer Vision and
Machine Learning (IPCVML-2017)"
PLANT
CLASSIFICATION
OBJECT TRACKING
IN VIDEO
IDENTIFYING THE COVER OF THE

BOOK
Prepared By,
Dr. V. Sathiesh Kumar,
Assistant Professor,
Department of Electronics Engineering,
MIT, Anna University.
Ph: 044-22516238
Email: sathiieesh@gmail.com
www.sathieshkumar.com
Department of Electronics Engineering, MIT 27th and 28th January 2017 Page 1
CHAPTER I: BASICS OF COMPUTER VISION
LESSON 1.1: INTRODUCTION TO OPENCV
Objectives:
1. Load an image off disk using the cv2.imread function.

2. Display the image on your screen using cv2.imshow .
3. Write your image back to disk in a different image file format using cv2.imwrite .
4. Use the command line to execute Python scripts.
Experiment 1: Loading, displaying and writing an image in hard disk.

i. Using the file location address in python script
ii. Using argument parsing from terminal window
Program 1.1: Using the file location address in python script

Step 1: Write the code in Text Editor
# import the necessary packages
import cv2
# load the image and show some basic information on it

image = cv2.imread( "new.jpeg")
print "width: %d pixels" % (image.shape[1])
print "height: %d pixels" % (image.shape[0])
print "channels: %d" % (image.shape[2])
# show the image and wait for a keypress

cv2.imshow("Image", image)
cv2.waitKey(0)
# save the image -- OpenCV handles converting file types automatically

cv2.imwrite("newimage.jpg", image)
Step 2: Save the code as "load_display_save1.py"
Step 3: Run the python script (load_display_save.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python load_display_save1.py
Program 1.2: Using argument parsing from terminal window

import argparse
import cv2
# construct the argument parser and parse the arguments

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())
# load the image and show some basic information on it

image = cv2.imread(args["image"])
print "width: %d pixels" % (image.shape[1])
print "height: %d pixels" % (image.shape[0])
print "channels: %d" % (image.shape[2])
# show the image and wait for a keypress

cv2.waitKey(0)
# save the image -- OpenCV handles converting file types automatically

cv2.imwrite("newimage.jpg", image)
Step 2: Save the code as "load_display_save.py"
Step 3: Run the python script (load_display_save.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Accessing the gurus virtual environment
$ workon gurus
$ python load_display_save.py --image new.jpeg
or
$ python load_display_save.py -i new.jpeg
Inference:
LESSON 1.2: IMAGE BASICS
What a pixel is, how pixels are used to form an image, and then how to access and manipulate pixels in
OpenCV.
Objectives:
1. Have a full understanding of what a ―pixel‖ is.

2. Understand the coordinate system of an image.
3. Have the ability to access the Red, Green, and Blue (RGB) values of a pixel.
4. Along with set the RGB values of a pixel.
5. Have a gentle introduction to extracting regions from an image.
What is a pixel?
 Pixels are the raw building blocks of an image.
 Every image consists of a set of pixels.
 There is no finer granularity than the pixel.
 Normally, a pixel is considered the ―color‖ or the ―intensity‖ of light that appears in a given place in
our image.
 If we think of an image as a grid, each square in the grid contains a single pixel.
 If the image has a resolution of 600 x 450, meaning that it is 600 wide and 450 pixels tall.
 Overall, there are 600 x 450 = 270,000 pixels in our image.
 Most pixels are represented in two ways: grayscale and color.
 In a grayscale image, each pixel has a value between 0 and 255, where zero is corresponds to
―black‖ and 255 being ―white‖. The values in between 0 and 255 are varying shades of gray, where
values closer to 0 are darker and values closer 255 are lighter.
 Color pixels, however, are normally represented in the RGB color space (one value for the Red
component, one for Green, and one for Blue, leading to a total of 3 values per pixel).
 Each of the three Red, Green, and Blue colors are represented by an integer in the range 0 to 255,
which indicates how ―much‖ of the color there is.
 Given that the pixel value only needs to be in the range [0,255] we normally use an 8-bit unsigned
integer to represent each color intensity.
 We then combine these values into a RGB tuple in the form (red, green, blue).
 To construct a white color, we would fill each of the red, green, and blue buckets completely up
(255, 255, 255), since white is the presence of all color.
 Then, to create a black color, we would empty each of the buckets out (0, 0, 0), since black is the
absence of color.
 To create a pure red color, we would fill up the red bucket (and only the red bucket) up completely
(255, 0, 0).
 For your reference, here are some common colors represented as RGB tuples:
Black: (0, 0, 0)
White: (255, 255, 255)
Red: (255, 0, 0)
Green: (0, 255, 0)
Blue: (0, 0, 255)
Aqua: (0, 255, 255)
Fuchsia: (255, 0, 255)
Maroon: (128, 0, 0)
Navy: (0, 0, 128)
Olive: (128, 128, 0)
Purple: (128, 0, 128)
Teal: (0, 128, 128)
Yellow: (255, 255, 0)
Overview of the Coordinate System
 The point (0, 0) corresponds to the upper left corner of the image. As we move down and to the
right, both the x and y values increase.
 Here we have the letter ―I‖ on a piece of graph paper. We see that we have an 8 x 8 grid with 64
total pixels.
 The point at (0, 0) corresponds to the top left pixel in our image, whereas the point (7, 7)
corresponds to the bottom right corner. It is important to note that we are counting from zero rather
than one.
 The Python language is zero indexed, meaning that we always start counting from zero.
Experiment 2: Accessing and Manipulating Pixels

 Remember, OpenCV represents images as NumPy arrays.
 Conceptually, we can think of this representation as a matrix, as discussed in Overview of the
Coordinate System section above.
 In order to access a pixel value, we just need to supply the x and y coordinates of the pixel we are
interested in.
 From there, we are given a tuple representing the Red, Green, and Blue components of the image.
 However, it‘s important to note that OpenCV stores RGB channels in reverse order.
 While we normally think in terms of Red, Green, and Blue, OpenCV actually stores them in the
order of Blue, Green, and Red.
Program 2: Getting and setting the pixel values

import argparse
import cv2

# load the image, grab its dimensions, and show it

(h, w) = image.shape[:2]
cv2.imshow("Original", image)
# images are just NumPy arrays. The top-left pixel can be found at (0, 0)
(b, g, r) = image[0, 0]
print "Pixel at (0, 0) - Red: {r}, Green: {g}, Blue: {b}".format(r=r, g=g, b=b)
# now, let's change the value of the pixel at (0, 0) and make it red
image[0, 0] = (0, 0, 255)
(b, g, r) = image[0, 0]
print "Pixel at (0, 0) - Red: {r}, Green: {g}, Blue: {b}".format(r=r, g=g, b=b)
cv2.imshow("Original-RedDot@0,0", image)
# compute the center of the image, which is simply the width and height divided by two
(cX, cY) = (w / 2, h / 2)
# since we are using NumPy arrays, we can apply slicing and grab large chunks in image
# Top left corner
tl = image[0:cY, 0:cX]
cv2.imshow("Top-Left Corner", tl)
# in a similar fashion, let's grab the top-right, bottom-right, and bottom-left corners and display
tr = image[0:cY, cX:w]
br = image[cY:h, cX:w]
bl = image[cY:h, 0:cX]
cv2.imshow("Top-Right Corner", tr)

cv2.imshow("Bottom-Right Corner", br)
cv2.imshow("Bottom-Left Corner", bl)
# now let's make the top-left corner of the original image green
image[0:cY, 0:cX] = (0, 255, 0)
# Show our updated image

cv2.imshow("Updated", image)
cv2.waitKey(0)
Step 2: Save the code as "getting_and_setting.py"
Step 3: Run the python script (getting_and_setting.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python getting_and_setting.py --image new.jpeg
or
$ python getting_and_setting.py -i new.jpeg
Inference:
LESSON 1.3: DRAWING
 What if we wanted to draw a single line? Or a circle?

 NumPy does not provide that type of functionality — it‘s only a numerical processing library.
 Luckily, OpenCV provides convenient, easy-to-use methods to draw shapes on an image.
Objectives:
 The main objective of this lesson is to become familiar with the cv2.line, cv2.rectangle and
cv2.circle functions.
Experiment 3: Drawing Shapes-Define images manually using NumPy arrays.
Program 3:
import numpy as np
import cv2
# initialize our canvas as a 300x300 with 3 channels, RGB with a black background
canvas = np.zeros((300, 300, 3), dtype="uint8")
# draw a green line from the top-left corner of our canvas to the bottom-right
green = (0, 255, 0)
cv2.line(canvas, (0, 0), (300, 300), green)
cv2.imshow("Canvas", canvas)
cv2.waitKey(0)
# draw a 3 pixel thick red line from the top-right corner to the bottom-left
red = (0, 0, 255)
cv2.line(canvas, (300, 0), (0, 300), red, 3)
cv2.waitKey(0)
# draw a green 50x50 pixel square, starting at 10x10 and ending at 60x60
cv2.rectangle(canvas, (10, 10), (60, 60), green)
cv2.waitKey(0)
# draw another rectangle, this time we'll make it red and 5 pixels thick
cv2.rectangle(canvas, (50, 200), (200, 225), red, 5)
cv2.waitKey(0)
# let's draw one last rectangle: blue and filled in by specifying -1 as the thickness
blue = (255, 0, 0)
cv2.rectangle(canvas, (200, 50), (225, 125), blue, -1)
cv2.waitKey(0)
# reset our canvas and draw a white circle at the center of the canvas with
# increasing radii - from 25 pixels to 150 pixels
# loop over a number of radius values, starting from 0 and ending at 150, incrementing by
# 25 at each step.
# the xrange function is exclusive; therefore, we specify a stopping value of 175 rather than
# 150. the output of xrange function stops at 150 and does not include 175.
canvas = np.zeros((300, 300, 3), dtype="uint8")
(centerX, centerY) = (canvas.shape[1] / 2, canvas.shape[0] / 2)
white = (255, 255, 255)
for r in xrange(0, 175, 25):
cv2.circle(canvas, (centerX, centerY), r, white)
# show our work of art

cv2.waitKey(0)
# draw 25 random circles

# In order to draw a random circle, we need to generate three values: the radius of the circle,
# the color of the circle, and the pt (the (x, y) coordinate) of where the circle will be drawn.
for i in xrange(0, 25):
# randomly generate a radius size between 5 and 200, generate a random
# color, and then pick a random point on our canvas where the circle will be drawn
radius = np.random.randint(5, high=200)
color = np.random.randint(0, high=256, size = (3,)).tolist()
pt = np.random.randint(0, high=300, size = (2,))
# draw our random circle

cv2.circle(canvas, tuple(pt), radius, color, -1)
# Show our masterpiece

cv2.waitKey(0)
# load the image

image = cv2.imread("new.jpeg")
# draw a circle (two filled in circles) and a rectangle
cv2.circle(image, (168, 188), 90, (0, 0, 255), 2)

cv2.circle(image, (150, 164), 10, (0, 0, 255), -1)
cv2.circle(image, (192, 174), 10, (0, 0, 255), -1)
cv2.rectangle(image, (134, 200), (186, 218), (0, 0, 255), -1)
# show the output image

cv2.imshow("Output", image)
cv2.waitKey(0)
Step 2: Save the code as "drawing.py"
Step 3: Run the python script (drawing.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python drawing.py
Inference:
LESSON 1.4: BASIC OPERATIONS ON IMAGE

1.4.1: TRANSLATION
 Translation is the shifting of an image along the x and y axis.
 Using translation, we can shift an image up, down, left, or right, along with any combination of the
above.
 Mathematically, we define a translation matrix M that we can use to translate an image:
1 0 𝑡𝑥
M= 0 1 𝑡
𝑦
where tx is the number of pixels we will shift the image left or right. Negative values of tx will shift the
image to the left and positive values will shift the image to the right.
where ty is the number of pixels we will shift the image up or down. Negative values of ty will shift the
image up and positive values will shift the image down.
Experiment 4: Translation Operation
Program 4:
# user created library "imutils" contains a handful of ―convenience‖ methods to more easily
# perform common tasks like translation, rotation, and resizing (and with less code).
import numpy as np
import argparse
import imutils
import cv2

# load the image and show it

# NOTE: Translating (shifting) an image is given by a NumPy matrix in the form:

# [[1, 0, shiftX], [0, 1, shiftY]]
# You simply need to specify how many pixels you want to shift the image in the X and Y
# let's translate the image 25 pixels to the right and 50 pixels down
# Now that we have our translation matrix defined, the actual translation takes place using the
# cv2.warpAffine function. The first argument is the image we wish to shift and the second
# argument is our translation matrix M. Finally, we manually supply the dimensions (width and
# height) of our image as the third argument.
M = np.float32([[1, 0, 25], [0, 1, 50]])
shifted = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))

cv2.imshow("Shifted Down and Right", shifted)
# now, let's shift the image 50 pixels to the left and 90 pixels up, we
# accomplish this using negative values
M = np.float32([[1, 0, -50], [0, 1, -90]])
cv2.imshow("Shifted Up and Left", shifted)
# finally, let's use our helper function in imutils to shift the image down 100 pixels
shifted = imutils.translate(image, 0, 100)
cv2.imshow("Shifted Down", shifted)
cv2.waitKey(0)
Let us define a "translate" convenience function in "imutils.py" package, that takes care of this for us:
import numpy as np
import cv2
def translate(image, x, y):

# define the translation matrix and perform the translation
M = np.float32([[1, 0, x], [0, 1, y]])
# return the translated image
return shifted
Step 2: Save the code as "translation.py"
Step 3: Run the python script (translation.py) from terminal window (Ctrl+Alt+T)
Go to root folder
Installing imutils package ($ pip install imutils)
Accessing the gurus virtual environment (imutils is preinstalled)
$ workon gurus
$ python translation.py -i new.jpeg
or
$ python translation.py --image new.jpeg
Inference:
1.4.2: ROTATION
 Rotating an image by some angle Θ.
 Rotation by an angle Θ, can be defined by constructing a matrix M in the form:
𝑐𝑜𝑠𝜃 −𝑠𝑖𝑛𝜃
M=
𝑠𝑖𝑛𝜃 𝑐𝑜𝑠𝜃
 Given an (x, y)-Cartesian plane, this matrix can be used to rotate a vector Θ degrees (counter-
clockwise) about the origin.
 In this case, the origin is normally the center of the image; however, in practice we can define any
arbitrary (x, y) coordinate as our rotation center.
 From the original image I, the rotated image R is then obtained by simple matrix multiplication,
R=IM.
 However, OpenCV also provides the ability to (1) scale (i.e. resize) an image and (2) provide an
arbitrary rotation center to perform the rotation about.
 Our modified rotation matrix M is thus,
𝛼 𝛽 1 − 𝛼 × 𝑐𝑥 − 𝛽 × 𝑐𝑦
M=
−𝛽 𝛼 𝛽 × 𝑐𝑥 + (1 − 𝛼) × 𝑐𝑦
where 𝛼 = 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑐𝑜𝑠𝜃, 𝛽 = 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑠𝑖𝑛𝜃, cx and cy are the respective (x, y)-coordinates that the
rotation is performed about.
Experiment 5: Rotation Operation
Program 5:
import numpy as np
import argparse
import imutils
import cv2


# grab the dimensions of the image and calculate the center of the image
(cX, cY) = (w / 2, h / 2)
# rotate our image by 45 degrees (counter clockwise rotation), scale value of 1.0
# scale value of 2.0, the image will be doubled in size
# scale value of 0.5, the image will be half the original size
# If you want the entire image to fit into view after the rotation you‘ll need to modify the width
# and height, denoted as (w, h) in the cv2.warpAffine function.
M = cv2.getRotationMatrix2D((cX, cY), 45, 1.0)
rotated = cv2.warpAffine(image, M, (w, h))
cv2.imshow("Rotated by 45 Degrees", rotated)
# rotate our image by -90 degrees (clock wise rotation by 90 degree)

M = cv2.getRotationMatrix2D((cX, cY), -90, 1.0)
cv2.imshow("Rotated by -90 Degrees", rotated)
# rotate our image around an arbitrary point rather than the center
M = cv2.getRotationMatrix2D((cX - 50, cY - 50), 45, 1.0)
cv2.imshow("Rotated by Offset & 45 Degrees", rotated)
# finally, let's use our helper function in imutils to rotate the image by 180 degrees (flipping it
# upside down)
rotated = imutils.rotate(image, 180)
cv2.imshow("Rotated by 180 Degrees", rotated)
cv2.waitKey(0)
Let‘s reduce the amount of code we have to write and define our own custom "rotate" method in the
"imutils.py" package.
def rotate(image, angle, center=None, scale=1.0):

# grab the dimensions of the image
# if the center is None, initialize it as the center of the image

if center is None:
center = (w / 2, h / 2)
# perform the rotation

M = cv2.getRotationMatrix2D(center, angle, scale)
# return the rotated image

return rotated
Step 2: Save the code as "rotation.py"
Step 3: Run the python script (rotation.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python rotation.py -i new.jpeg
or
$ python rotation.py --image new.jpeg
Inference:
1.4.3: RESIZING
Scaling, or simply resizing, is the process of increasing or decreasing the size of an image in terms
of width and height.
When resizing an image, it‘s important to keep in mind the aspect ratio ( which is the ratio of the width
of the image to the height of an image).
Ignoring the aspect ratio can lead to resized images that look compressed and distorted.
The formal definition of interpolation is ―the method of constructing new data points within the range
of discrete set of known points.‖
In this case, the ―known points‖ are the pixels of our original image.
And the goal of an interpolation function is to take these neighborhoods of pixels and use them to
either increase or decrease the size of image.
In general, it‘s far more beneficial (and visually appealing) to decrease the size of the image.
This is because the interpolation function simply has to remove pixels from an image.
On the other hand, if we were to increase the size of the image the interpolation function would have
to ―fill in the gaps‖ between pixels that previously did not exist.
Objectives:
The primary objective of this topic is to understand how to resize an image using the OpenCV library.
Interpolation Methods:
The goal of an interpolation function is to examine neighborhoods of pixels and use these
neighborhoods optically increase or decrease the size of image without introducing distortions (or at
least as few distortions as possible).
The first method is nearest neighbor interpolation, specified by the cv2.INTER_NEAREST flag. This
method is the simplest approach to interpolation. Instead of calculating weighted averages of
neighboring pixels or applying complicated rules, this method simply finds the ―nearest‖ neighboring
pixel and assumes the intensity value. While this method is fast and simple, the quality of the resized
image tends to be quite poor and can lead to ―blocky‖ artifacts.
Secondly, we have the cv2.INTER_LINEAR method with performs bilinear interpolation (y=mx+c).
OpenCV uses this method by default when resizing images. Taking neighboring pixels and using this
neighborhood to actually calculate what the interpolated value should be (rather than just assuming the
nearest pixel value).
Other methods are cv2.INTER_AREA, cv2.INTER_CUBIC and cv2.INTER_LANCZOS4 interpolation
methods.
cv2.INTER_CUBIC and cv2.INTER_LANCZOS4 methods are slower (since they no longer use
simple linear interpolation and instead use splines) and utilize bicubic interpolation over square pixel
neighborhoods. The cv2.INTER_CUBIC method operates on a 4 x 4 pixel neighbor and
cv2.INTER_LANZOS4 operates over a 8 x 8 pixel neighborhood.
So which interpolation method should you be using?

In general, cv2.INTER_NEAREST is quite fast, but does not provide the highest quality results. So in
very resource-constrained environments, consider using nearest neighbor interpolation.
 When increasing (up sampling) the size of an image, consider using cv2.INTER_LINEAR and
cv2.INTER_CUBIC. The cv2.INTER_LINEAR method tends to be slightly faster than the
cv2.INTER_CUBIC method, but go with whichever one gives you the best results for your images.
 When decreasing (down sampling) the size of an image, the OpenCV documentation suggests using
cv2.INTER_AREA— although this method is very similar to nearest neighbor interpolation. In either
case, decreasing the size of an image (in terms of quality) is always an easier task than increasing the
size of an image.
 Finally, as a general rule, cv2.INTER_LINEAR interpolation method is recommended as the default
for up sampling or down sampling. Because it simply provides the highest quality results at a modest
computation cost.
Experiment 6: Image resizing
Program 6:
import argparse
import imutils
import cv2


# we need to keep in mind aspect ratio so the image does not look skewed or distorted
# we calculate the ratio of the new image to the old image.
# Let's make our new image have a width of 150 pixels
# Aspect ratio=width/height
# In order to compute the ratio of the new height to the old height, we simply define our ratio r to
# be the new width (150 pixels) divided by the old width, which we access using image.shape[1]
# Now that we have our ratio, we can compute the new dimensions of the image.
# The height is then computed by multiplying the old height by our ratio and converting it to an
# integer. By performing this operation we are able to preserve the original aspect ratio of the
#image.
r = 150.0 / image.shape[1]
dim = (150, int(image.shape[0] * r))
# perform the actual resizing of the image
# The last parameter is our interpolation method, which is the algorithm working behind the
# scenes to handle how the actual image is resized.
resized = cv2.resize(image, dim, interpolation=cv2.INTER_AREA)
cv2.imshow("Resized (Width)", resized)
# what if we wanted to adjust the height of the image? We can apply the same concept, again
# keeping in mind the aspect ratio, but instead calculating the ratio based on height -- let's make
# the height of the resized image 50 pixels
# The new width is obtained by multiplying the old width by the ratio, again allowing us to
#maintain the original aspect ratio of the image.
r = 50.0 / image.shape[0]
dim = (int(image.shape[1] * r), 50)
# perform the resizing

resized = cv2.resize(image, dim, interpolation=cv2.INTER_AREA)
cv2.imshow("Resized (Height)", resized)
cv2.waitKey(0)
# of course, calculating the ratio each and every time we want to resize an image is a real pain
# let's create a function where we can specify our target width or height, and have it take care of
# the rest for us.
resized = imutils.resize(image, width=100)
or
resized = imutils.resize(image, height=50)
cv2.imshow("Resized via Function", resized)
cv2.waitKey(0)
# construct the list of interpolation methods

methods = [
("cv2.INTER_NEAREST", cv2.INTER_NEAREST),
("cv2.INTER_LINEAR", cv2.INTER_LINEAR),
("cv2.INTER_AREA", cv2.INTER_AREA),
("cv2.INTER_CUBIC", cv2.INTER_CUBIC),
("cv2.INTER_LANCZOS4", cv2.INTER_LANCZOS4)]
# loop over the interpolation methods

for (name, method) in methods:
# increase the size of the image by 3x using the current interpolation method
resized = imutils.resize(image, width=image.shape[1] * 3, inter=method)
cv2.imshow("Method: {}".format(name), resized)
cv2.waitKey(0)
Let‘s reduce the amount of code we have to write and define our own custom "resize" method in the
"imutils.py" package.
def resize(image, width=None, height=None, inter=cv2.INTER_AREA):

# initialize the dimensions of the image to be resized and grab the image size
dim = None
# if both the width and height are None, then return the original image
if width is None and height is None:
return image
# check to see if the width is None

if width is None:
# calculate the ratio of the height and construct the dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None

else:
# calculate the ratio of the width and construct the dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image

resized = cv2.resize(image, dim, interpolation=inter)
# return the resized image

return resized
Step 2: Save the code as "resize.py"
Step 3: Run the python script (resize.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python resize.py -i new.jpeg
Inference:
1.4.4: FLIPPING
 OpenCV also provides methods to flip an image across its x or y axis or even both.
 Flipping operations are used less often.
Objectives:
 In this lesson you will learn how to horizontally and vertically flip an image using the cv2.flip function.
Experiment 7: Image flipping
Program 7:

import argparse
import cv2

ap.add_argument("-i", "--image", required=True, help = "Path to the image")

# flip the image horizontally (code/flag=1)

flipped = cv2.flip(image, 1)
cv2.imshow("Flipped Horizontally", flipped)
# flip the image vertically (code/flag=0)

flipped = cv2.flip(image, 0)
cv2.imshow("Flipped Vertically", flipped)
# flip the image along both axes (code/flag=-1)

flipped = cv2.flip(image, -1)
cv2.imshow("Flipped Horizontally & Vertically", flipped)
cv2.waitKey(0)
Step 2: Save the code as "flipping.py"
Step 3: Run the python script (flipping.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python flipping.py -i new.jpeg
or
$ python flipping.py --image new.jpeg
Inference:
1.4.5: CROPPING
 Cropping is the act of selecting and extracting the Region of Interest (or simply, ROI), which is the
part of the image we are interested in.
 When we crop an image, we want to remove the outer parts of the image that we are not interested
in. This is commonly called selecting our Region of Interest, or more simply, our ROI.
 Example: In a face detection application, we would want to crop the face from an image.
 And if we were developing a Python script to recognize dogs in images, we may want to crop the
dog from the image once we have found it.
Objectives:
 Our primary objective is to become very familiar and comfortable using NumPy array slicing to crop
regions from an image.
Experiment 8: Image cropping

Program 8:
import cv2

image = cv2.imread("florida_trip.png")
# cropping an image is accomplished using simple NumPy array slices (h, w) --

# let's crop the face from the image
face = image[85:250, 85:220]
cv2.imshow("Face", face)
cv2.waitKey(0)
# ...and now let's crop the entire body

body = image[90:450, 0:290]
cv2.imshow("Body", body)
cv2.waitKey(0)
Step 2: Save the code as "crop.py"

Step 3: Run the python script (crop.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python crop.py
Inference:
1.4.6: IMAGE ARITHMETIC

 In this lesson you‘ll learn how to add and subtract images, along with two important differences you
need to understand regarding arithmetic operations in OpenCV and Python.
 In reality, image arithmetic is simply matrix addition.
 Suppose we were to add the following two matrices:
9 3 2 0 9 4 9 12 6
+ =
4 1 4 7 9 4 11 10 8
 So it‘s obvious at this point that we all know basic arithmetic operations like addition and subtraction.
 But when working with images, we need to keep in mind the limits of our color space and data type.
 For example, RGB images have pixels that fall within the range [0, 255].
 What happens if we are examining a pixel with intensity 250 and we try to add 10 to it?
 Under normal arithmetic rules, we would end up with a value of 260.
 However, since RGB images are represented as 8-bit unsigned integers, 260 is not a valid value.
 So what should happen? Should we perform a check of some sorts to ensure no pixel falls outside
the range of [0, 255], thus clipping all pixels to have a minimum value of 0 and a maximum value of
255? Or do we apply a modulus operation, and ―wrap around?‖ Under modulus rules, adding 10 to 255
would simply wrap around to a value of 9.
 Which way is the ―correct‖ way to handle images additions and subtractions that fall outside the
range of [0, 255]?
 The answer is that there is no correct way — it simply depends on how you manipulating your pixels
and what you want the desired results to be.
 However, be sure to keep in mind that there is a difference between OpenCV and NumPy addition.
 NumPy will perform modulus arithmetic and ―wrap around.‖
 OpenCV, on the other hand, will perform clipping and ensure pixel values never fall outside the
range [0, 255].
 Do you want all values to be clipped if they fall outside the range [0, 255]? Then use OpenCV‘s built
in methods for image arithmetic.
 Do you want modulus arithmetic operations and have values wrap around if they fall outside the
range of [0, 255]? Then simply add and subtract the NumPy arrays as you normally would.
Objectives:
1. To familiarize ourselves with image addition and subtraction.
2. To understand the difference between OpenCV and NumPy image arithmetic operations.
Experiment 9: Arithmetic operation performed in an Image
Program 9:

import numpy as np
import argparse
import cv2


# images are NumPy arrays, stored as unsigned 8 bit integers -- this implies that the values of
# our pixels will be in the range [0, 255]; when using functions like cv2.add and cv2.subtract,
# values will be clipped to this range, even if the added or subtracted values fall outside the
# range of [0, 255]. Check out an example:
print "max of 255: " + str(cv2.add(np.uint8([200]), np.uint8([100])))
print "min of 0: " + str(cv2.subtract(np.uint8([50]), np.uint8([100])))
# NOTE: if you use NumPy arithmetic operations on these arrays, the value will be modulos
#(wrap around) instead of being clipped to the [0, 255] range. This is important to keep in mind
# when working with images.
print "wrap around: " + str(np.uint8([200]) + np.uint8([100]))
print "wrap around: " + str(np.uint8([50]) - np.uint8([100]))
# let's increase the intensity of all pixels in our image by 100 -- we accomplish this by
# constructing a NumPy array that is the same size of our matrix (filled with ones) and the
# multiplying it by 100 to create an array filled with 100's, then we simply add the images
# together; notice how the image is "brighter"
M = np.ones(image.shape, dtype = "uint8") * 100
added = cv2.add(image, M)
cv2.imshow("Added", added)
# similarly, we can subtract 50 from all pixels in our image and make it darker
M = np.ones(image.shape, dtype = "uint8") * 50
subtracted = cv2.subtract(image, M)
cv2.imshow("Subtracted", subtracted)
cv2.waitKey(0)
Step 2: Save the code as "arithmetic.py"
Step 3: Run the python script (arithmetic.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python arithmetic.py -i new.jpeg
or
$ python arithmetic.py --image new.jpeg
Inference:
1.4.7: BITWISE OPERATIONS

 What happens if our ROI is non-rectangular?
 What would you do then?
 A combination of bitwise operations and masking can help us extract non-rectangular ROIs from
image with ease.
 Bitwise operations operate in a binary manner and are represented as grayscale images.
 A given pixel is turned ―off‖ if it has a value of zero and it is turned ―on‖ if the pixel has a value
greater than zero.
Objectives:
 By the end of this topic you‘ll understand the four primary bitwise operations:
1. AND
2. OR
3. XOR
4. NOT
Experiment 10: Bitwise operation performed in an Image
Program 10:

import numpy as np
import cv2
# first, let's draw a rectangle

# initialize our rectangle image as a 300 x 300 NumPy array
# then draw a 250 x 250 white rectangle at the center of the image. -1 thickness (completely
#filled)
rectangle = np.zeros((300, 300), dtype = "uint8")
cv2.rectangle(rectangle, (25, 25), (275, 275), 255, -1)
cv2.imshow("Rectangle", rectangle)
# secondly, let's draw a circle, centered at the center of the image, with a radius of 150 pixels
circle = np.zeros((300, 300), dtype = "uint8")
cv2.circle(circle, (150, 150), 150, 255, -1)
cv2.imshow("Circle", circle)
# A bitwise 'AND' is only True when both rectangle and circle have a value that is 'ON.'
# Simply put, the bitwise AND function examines every pixel in rectangle and circle.
# If both pixels have a value greater than zero, that pixel is turned 'ON' (i.e set to 255 in the
# output image). If both pixels are not greater than zero, then the output pixel is left 'OFF' with a
# value of 0.
bitwiseAnd = cv2.bitwise_and(rectangle, circle)

cv2.imshow("AND", bitwiseAnd)
cv2.waitKey(0)
# A bitwise 'OR' examines every pixel in rectangle and circle. If EITHER pixel in rectangle or
# circle is greater than zero, then the output pixel has a value of 255, otherwise it is 0.
bitwiseOr = cv2.bitwise_or(rectangle, circle)
cv2.imshow("OR", bitwiseOr)
cv2.waitKey(0)
# The bitwise 'XOR' is identical to the 'OR' function, with one exception: both rectangle and
# circle are not allowed to BOTH have values greater than 0.
bitwiseXor = cv2.bitwise_xor(rectangle, circle)
cv2.imshow("XOR", bitwiseXor)
cv2.waitKey(0)
# Finally, the bitwise 'NOT' inverts the values of the pixels. Pixels with a value of 255 become 0,
# and pixels with a value of 0 become 255.
bitwiseNot = cv2.bitwise_not(circle)
cv2.imshow("NOT", bitwiseNot)
cv2.waitKey(0)
Step 2: Save the code as "bitwise.py"
Step 3: Run the python script (bitwise.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python bitwise.py
Inference:
1.4.8: MASKING
A combination of both bitwise operations and masks are used to construct ROIs that are non-
rectangular.
This allows us to extract regions from images that are of completely arbitrary shape.
A mask allows us to focus only on the portions of the image that interests us.
Objectives:
1. Leverage masks to extract rectangular regions from images, similar to cropping.

2. Leverage masks to extract non-rectangular and arbitrarily shaped regions from images, which basic
cropping cannot accomplish.
Experiment 11: Masking
Program 11:

import numpy as np
import argparse
import cv2

# load the image and display it

# Masking allows us to focus only on parts of an image that interest us. A mask is the same
# size as our image, but has only two pixel values, 0 and 255. Pixels with a value of 0 are
# ignored in the original image, and mask pixels with a value of 255 are allowed to be kept. For
# example, let's construct a rectangular mask that displays only the person in the image
mask = np.zeros(image.shape[:2], dtype="uint8")
cv2.rectangle(mask, (0, 90), (290, 450), 255, -1)
cv2.imshow("Mask", mask)
# Apply our mask -- notice how only the person in the image is cropped out
# The first two parameters are the image itself. Obviously, the AND function will be True for all
# pixels in the image; however, the important part of this function is the mask keyword
# argument. By supplying a mask, the cv2.bitwise_and function only examines pixels that are
# ―on‖ in the mask. In this case, only pixels that are part of the white rectangle.
masked = cv2.bitwise_and(image, image, mask=mask)
cv2.imshow("Mask Applied to Image", masked)
cv2.waitKey(0)
# Now, let's make a circular mask with a radius of 100 pixels and apply the mask again
cv2.circle(mask, (145, 200), 100, 255, -1)
cv2.imshow("Mask Applied to Image", masked)
cv2.waitKey(0)
Step 2: Save the code as "masking.py"
Step 3: Run the python script (masking.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python masking.py -i new.jpeg
or
$ python masking.py --image new.jpeg
Inference:
1.4.9: SPLITTING AND MERGING CHANNELS

As we know, an image is represented by three components: a Red, Green, and Blue channel.
―How do I access each individual Red, Green, and Blue channel of an image?‖
Since images in OpenCV are internally represented as NumPy arrays, accessing each individual
channel can be accomplished in multiple ways.
However, we‘ll be focusing on the two main methods that you should be using: cv2.split and
cv2.merge.
Objectives:
By the end of this topic you should understand how to both split and merge channels of an image by
using the cv2.split and cv2.merge functions.
Experiment 12: Splitting and Merging the color channels in an image
Program 12:

import numpy as np
import argparse
import cv2

# Load the image and grab each channel: Red, Green, and Blue. It's important to note that
# OpenCV stores an image as NumPy array with its channels in reverse order! When we call
# cv2.split, we are actually getting the channels as Blue, Green, Red!
(B, G, R) = cv2.split(image)
# show each channel individually

cv2.imshow("Red", R)
cv2.imshow("Green", G)
cv2.imshow("Blue", B)
cv2.waitKey(0)
# merge the image back together again

merged = cv2.merge([B, G, R])
cv2.imshow("Merged", merged)
cv2.waitKey(0)
cv2.destroyAllWindows()
# visualize each channel in color

zeros = np.zeros(image.shape[:2], dtype = "uint8")
cv2.imshow("Red", cv2.merge([zeros, zeros, R]))
cv2.imshow("Green", cv2.merge([zeros, G, zeros]))
cv2.imshow("Blue", cv2.merge([B, zeros, zeros]))
cv2.waitKey(0)
Step 2: Save the code as "splitting_and_merging.py"
Step 3: Run the python script (splitting_and_merging.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python splitting_and_merging.py -i new.jpeg
or
$ python splitting_and_merging.py --image new.jpeg
Inference:
LESSON 1.5: KERNELS

 If we think of an image as a big matrix, then we can think of a kernel or convolution matrix as a tiny
matrix that is used for blurring, sharpening, edge detection, and other image processing functions.
 Essentially, this tiny kernel sits on top of the big image and slides from left to right and up to down,
applying a mathematical operation at each (x, y)-coordinate in the original image
 We can also use convolution to extract features from images and build very powerful deep learning
systems.
 As you can see from the above figure, we are sliding this kernel from left-to-right and top-to-bottom
along the original image.
 At each (x, y)-coordinate of the original image we stop and examine the neighborhood of image
pixels located at the center of the image kernel.
 We can take this neighborhood of pixels, convolve them with the kernel, and we get a single output
value.
 This output value is then stored in the output image at the same (x, y)-coordinate as the center of the
kernel.
 Kernel looks like:
1 1 1
1
K=9 1 1 1
1 1 1
 Above we have defined a square 3 x 3 kernel.
 Kernels can be an arbitrary size of M x N pixels, provided that both M and N are odd integers.
 Why do both M and N need to be odd?
 Take a look at our introduction to kernels above — the kernel must have a center (x, y)-coordinate.
 In a 3 x 3 kernel, the center is located at (1, 1), assuming a zero-index array of course.
 This is exactly why we use odd kernel sizes — to always ensure there is a valid (x, y)-coordinate at
the center of the kernel.
Convolution:
 In image processing, convolution requires three components,
1. An input image.
2. A kernel matrix that we are going to apply to the input image.
3. An output image to store the output of the input image convolved with the kernel.
 Convolution itself is very easy and it involves the following steps.
1. Select an (x, y)-coordinate from the original image.
2. Place the center of the kernel at this (x, y) coordinate.
3. Multiply each kernel value by the corresponding input image pixel value — and then take the sum
of all multiplication operations. (More simply put, we‘re taking the element-wise multiplication of the
input image region and the kernel, then summing the values of all these multiplications into a single
value. The sum of these multiplications is called the kernel output.)
4. Use the same (x, y)-coordinate from Step 1, but this time store the kernel output in the same (x, y)-
location as the output image.
 Here is an example of convolving (which is normally denoted mathematically as the * operator) a 3x3
region of an image with a 3x3 kernel:
−1 0 1 93 139 101 −93 0 101
O= −2 0 2 ∗ 26 252 196 = −52 0 392 =231
−1 0 1 135 230 18 −135 0 18
Inference:
LESSON 1.6: MORPHOLOGICAL OPERATIONS

 Morphological operations are simple transformations applied to binary or grayscale images.
 We normally apply morphological operations to binary images.
 More specifically, we apply morphological operations to shapes and structures inside of images.
We can use morphological operations to increase the size of objects in images as well as decrease
them.
We can also utilize morphological operations to close gaps between objects as well as open them.
 Morphological operations ―probe‖ an image with a structuring element. This structuring element
defines the neighborhood to be examined around each pixel. And based on the given operation and the
size of the structuring element we are able to adjust our output image.
Structuring Element:
 Well, you can (conceptually) think of a structuring element as a type of kernel or mask.
 However, instead of applying a convolution, we are only going to perform simple tests on the pixels.
 Just like in image kernels, the structuring element slides from left-to-right and top-to-bottom for each
pixel in the image.
 Also just like kernels, structuring elements can be of arbitrary neighborhood
sizes.
For example, let‘s take a look at the 4-neighborhood and 8-neighborhood of the central pixel red
below:
 Here we can see that the central pixel (i.e. the red pixel) is located at the center of the neighborhood.
 The 4-neighborhood (left) then defines the region surrounding the central pixel as the pixels to the
north, south, east, and west.
 The 8-neighborhood (right) then extends this region to include the corner pixels as well.
 This is just an example of two simple structuring elements.
 But we could also make them arbitrary rectangle or circular structures as well — it all depends on
your particular application.
 In OpenCV, we can either use the cv2.getStructuringElement function or NumPy itself to define our
structuring element.
 A structuring element behaves similar to a kernel or a mask — but instead of convolving the input
image with our structuring element, we‘re instead only going to be applying simple pixel tests.
 Types of morphological operations:
1. Erosion
2. Dilation
3. Opening
4. Morphological gradient
5. Black hat
6. Top hat (or "White hat")
Erosion:
 Just like water rushing along a river bank erodes the soil, an erosion in an image ―erodes‖ the
foreground object and makes it smaller.
 Simply put, pixels near the boundary of an object in an image will be discarded, ―eroding‖ it away.
 Erosion works by defining a structuring element and then sliding this structuring element from left-to-
right and top-to-bottom across the input image.
 A foreground pixel in the input image will be kept only if ALL pixels inside the structuring element are
> 0. Otherwise, the pixels are set to 0 (i.e. background).
 Erosion is useful for removing small blobs in an image or disconnecting two connected objects.
 We can perform erosion by using the cv2.erode function.
Dilation:
 The opposite of an erosion is a dilation.
 Just like an erosion will eat away at the foreground pixels, a dilation will grow the foreground pixels.
 Dilations increase the size of foreground object and are especially useful for joining broken parts of
an image together.
 Dilations, just as an erosion, also utilize structuring elements — a center pixel p of the structuring
element is set to white if ANY pixel in the structuring element is > 0.
 We apply dilations using the cv2.dilate function.
Opening:
 An opening is an erosion followed by a dilation.
 Performing an opening operation allows us to remove small blobs from an image: first an erosion is
applied to remove the small blobs, then a dilation is applied to regrow the size of the original object.
Closing:
 The exact opposite to an opening would be a closing.
 A closing is a dilation followed by an erosion.
 As the name suggests, a closing is used to close holes inside of objects or for connecting
components together.
 Performing the closing operation is again accomplished by making a call to cv2.morphologyEx, but
this time we are going to indicate that our morphological operation is a closing by specifying the
cv2.MORPH_CLOSE flag.
Morphological Gradient:
 A morphological gradient is the difference between the dilation and erosion.
 It is useful for determining the outline of a particular object of an image.
Top Hat/White Hat:

 A top hat (also known as a white hat) morphological operation is the difference between the original
input image and the opening.
 A top hat operation is used to reveal bright regions of an image on dark backgrounds.
 Up until this point we have only applied morphological operations to binary images.
 But we can also apply morphological operations to grayscale images as well.

 In fact, both the top hat/white hat and the black hat operators are more suited for grayscale images
rather than binary ones.
Black Hat:
 The black hat operation is the difference between the closing of the input image and the input image
itself.
 In fact, the black hat operator is simply the opposite of the white hat operator.
Experiment 13: Morphological operations in an image
Program 13:

import argparse
import cv2

# load the image and convert it to grayscale

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# apply a series of erosions

# The for loop controls the number of times, or iterations, we are going to apply the erosion.
# As the number of erosions increases, the foreground logo will start to ―erode‖ and disappear.
# The cv2.erode function takes two required arguments and a third optional one.
# The first argument is the image that we want to erode — in this case, it‘s our binary.
# The second argument to is the structuring element. If this value is None, then a 3x3
# structuring element, identical to the 8-neighborhood structuring element will be used.
# Of course, you could supply your own custom structuring element here instead of None.
# The last argument is the number of the erosion is going to be performed.
eroded = cv2.erode(gray.copy(), None, iterations=i + 1)
cv2.imshow("Eroded {} times".format(i + 1), eroded)
cv2.waitKey(0)
# close all windows to clean up the screen
# apply a series of dilations

# In cv2.dilate function, the first argument is the image we want to dilate; the second is our
# structuring element, which when set to None is a 3x3 8-neighborhood structuring element
# the final argument is the number of dilation we are going to apply.
dilated = cv2.dilate(gray.copy(), None, iterations=i + 1)
cv2.imshow("Dilated {} times".format(i + 1), dilated)
cv2.waitKey(0)
# close all windows to clean up the screen and initialize the list of kernels sizes
# kernelSizes variable defines the width and height of the structuring element.
kernelSizes = [(3, 3), (5, 5), (7, 7)]
# loop over the kernels and apply an "opening" operation to the image
# The cv2.getStructuringElement function requires two arguments: the first is the type of
# structuring element (rectangular-cv2.MORPH_RECT or cross shape-cv2.MORPH_CROSS,
# circular structuring element- cv2.MORPH_ELLIPSE) and the second is the size of the
# structuring element
for kernelSize in kernelSizes:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
opening = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel)
cv2.imshow("Opening: ({}, {})".format(kernelSize[0], kernelSize[1]), opening)
cv2.waitKey(0)

# loop over the kernels and apply a "closing" operation to the image
closing = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
cv2.imshow("Closing: ({}, {})".format(kernelSize[0], kernelSize[1]), closing)
cv2.waitKey(0)

# loop over the kernels and apply a "morphological gradient" operation to the image
gradient = cv2.morphologyEx(gray, cv2.MORPH_GRADIENT, kernel)
cv2.imshow("Gradient: ({}, {})".format(kernelSize[0], kernelSize[1]), gradient)
cv2.waitKey(0)
Step 2: Save the code as "morphological.py"
Step 3: Run the python script (morphological.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python morphological.py -i new.jpeg
or
$ python morphological.py --image new.jpeg
Experiment 14: To detect the license plate region in a car
Program 14:

import argparse
import cv2


# construct a rectangular kernel (w, h) and apply a blackhat operation which enables us to find
# dark regions on a light background
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)
# similarly, a tophat (also called a "whitehat") operation will enable us to find light regions on a
# dark background
tophat = cv2.morphologyEx(gray, cv2.MORPH_TOPHAT, rectKernel)
# show the output images (tophat-light against dark background are clearly displayed)
# (blackhat-dark against light background are clearly displayed)
cv2.imshow("Blackhat", blackhat)
cv2.imshow("Tophat", tophat)
cv2.waitKey(0)
Step 2: Save the code as "hats.py"
Step 3: Run the python script (hats.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python hats.py -i new.jpeg
or
$ python hats.py --image new.jpeg
Inference:
LESSON 1.7: SMOOTHING AND BLURRING

Blurring happens when a camera takes a picture out of focus.
Sharper regions in the image lose their detail.
The goal here is to use a low-pass filter to reduce the amount of noise and detail in an image.
Practically, this means that each pixel in the image is mixed in with its surrounding pixel intensities.
This ―mixture‖ of pixels in a neighborhood becomes our blurred pixel.
In fact, smoothing and blurring is one of the most common pre-processing steps in computer vision
and image processing.
Many image processing and computer vision functions, such as thresholding and edge detection,
perform better if the image is first smoothed or blurred. By doing so, we are able to reduce the amount
of high frequency content, such as noise and edges (i.e. the ―detail‖ of an image).
By reducing the detail in an image we can more easily find objects that we are interested in.
Furthermore, this allows us to focus on the larger structural objects in the image.
Types of Blurring:
1. averaging,
2. Gaussian blurring,
3. median filtering
4. bilateral filtering
Averaging:
An average filter does exactly what you think it might do — takes an area of pixels surrounding a
central pixel, averages all these pixels together, and replaces the central pixel with the average.
 To accomplish our average blur, we‘ll actually be convolving our image with a MxN normalized filter
where both M and N are both odd integers.
This kernel is going to slide from left-to-right and from top-to-bottom for each and every pixel in our
input image.
The pixel at the center of the kernel is then set to be the average of all other pixels surrounding it.
Let‘s go ahead and define a 3x3 average kernel that can be used to blur the central pixel with a 3
pixel radius:
1 1 1
1
K=9 1 1 1
1 1 1
Notice how each entry of the kernel matrix is uniformly weighted — we are giving equal weight to all
pixels in the kernel.
An alternative is to give pixels different weights, where pixels farther from the central pixel contribute
less to the average; this method of smoothing is called the Gaussian blurring.
As the size of the kernel increases, so will the amount in which the image is blurred.
Gaussian:
Gaussian blurring is similar to average blurring, but instead of using a simple mean, we are now using
a weighted mean, where neighborhood pixels that are closer to the central pixel contribute more
―weight‖ to the average.
And as the name suggests, Gaussian smoothing is used to remove noise that approximately follows a
Gaussian distribution.
The end result is that our image is less blurred, but more naturally blurred, than using the average
method.
Furthermore, based on this weighting we‘ll be able to preserve more of the edges in our image as
compared to average smoothing.
Just like an average blurring, Gaussian smoothing also uses a kernel of MxN, where both M and N
are odd integers.
However, since we are weighting pixels based on how far they are from the central pixel, we need an
equation to construct our kernel.
The equation for a Gaussian function in one direction is:
1 𝑥2
−
𝐺 𝑥 = 𝑒 2𝜎 2
2𝜋𝜎 2
And it then becomes trivial to extend this equation to two directions, one for the x-axis and the other
for the y-axis, respectively:
1 𝑥 2 +𝑦 2
−
𝐺 𝑥, 𝑦 = 𝑒 2𝜎 2
2𝜋𝜎 2
where x and y are the respective distances to the horizontal and vertical center of the kernel and is
the standard deviation of the Gaussian kernel.
When the size of our kernel increases so will the amount of blurring that is applied to our output
image.
However, the blurring will appear to be more ―natural‖ and will preserve edges in our image better
than simple average smoothing.
A Gaussian blur tends to give much nicer results, especially when applied to natural images.
Median:
Traditionally, the median blur method has been most effective when removing salt-and-pepper noise.
When applying a median blur, we first define our kernel size .
Then, as in the averaging blurring method, we consider all pixels in the neighborhood of size KxK
where K is an odd integer.
Notice how, unlike average blurring and Gaussian blurring where the kernel size could be
rectangular, the kernel size for the median must be square.
Furthermore (unlike the averaging method), instead of replacing the central pixel with the average of
the neighborhood, we instead replace the central pixel with the median of the neighborhood.
The reason median blurring is more effective at removing salt-and-pepper style noise from an image
is that each central pixel is always replaced with a pixel intensity that exists in the image.
And since the median is robust to outliers, the salt-and-pepper noise will be less influential to the
median than another statistical method, such as the average.
Again, methods such as averaging and Gaussian compute means or weighted means for the
neighborhood — this average pixel intensity may or may not be present in the neighborhood.
But by definition, the median pixel must exist in our neighborhood.
By replacing our central pixel with a median rather than an average, we can substantially reduce
noise.
The median blur is by no means a ―natural blur‖ like Gaussian smoothing.

However, for damaged images or photos captured under highly sub-optimal conditions, a median blur
can really help as a preprocessing step prior to passing the image along to other methods, such as
thresholding and edge detection.
Bilateral:
Thus far, the intention of our blurring methods have been to reduce noise and detail in an image;
however, as a side effect we have tended to lose edges in the image.
In order to reduce noise while still maintaining edges, we can use bilateral blurring.
Bilateral blurring accomplishes this by introducing two Gaussian distributions.
The first Gaussian function only considers spatial neighbors.
That is, pixels that appear close together in the (x, y)-coordinate space of the image.
The second Gaussian then models the pixel intensity of the neighborhood, ensuring that only pixels
with similar intensity are included in the actual computation of the blur.
Intuitively, this makes sense. If pixels in the same (small) neighborhood have a similar pixel value,
then they likely represent the same object.
But if two pixels in the same neighborhood have contrasting values, then we could be examining the
edge or boundary of an object — and we would like to preserve this edge.
Overall, this method is able to preserve edges of an image, while still reducing noise.
The largest downside to this method is that it is considerably slower than its averaging, Gaussian,
and median blurring counterparts.
Experiment 15: To study the effects of different types of blurring
Program 15:

import argparse
import cv2

# load the image, display it, and initialize the list of kernel sizes
kernelSizes = [(3, 3), (9, 9), (15, 15)]
# loop over the kernel sizes and apply an "average" blur to the image
# The larger our kernel becomes, the more blurred our image will appear.
for (kX, kY) in kernelSizes:
blurred = cv2.blur(image, (kX, kY))

cv2.imshow("Average ({}, {})".format(kX, kY), blurred)
cv2.waitKey(0)
# close all windows to cleanup the screen

# loop over the kernel sizes and apply a "Gaussian" blur to the image
# The last parameter in cv2.GaussianBlur function is our σ, the standard deviation of the
# Gaussian distribution. By setting this value to 0, we are instructing OpenCV to automatically
# compute based on our kernel size. In most cases, you‘ll want to let your σ be computed.
for (kX, kY) in kernelSizes:
blurred = cv2.GaussianBlur(image, (kX, kY), 0)
cv2.imshow("Gaussian ({}, {})".format(kX, kY), blurred)
cv2.waitKey(0)
# close all windows to clean-up the screen

# loop over the kernel sizes (square kernels) and apply a "Median" blur to the image
for k in (3, 9, 15):
blurred = cv2.medianBlur(image, k)
cv2.imshow("Median {}".format(k), blurred)
cv2.waitKey(0)
Step 2: Save the code as "blurring.py"
Step 3: Run the python script (blurring.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python blurring.py -i new.jpeg
or
$ python blurring.py --image new.jpeg
Inference:
Experiment 16: To study the effects of bilateral blurring
Program 16:

import argparse
import cv2

# load the image, display it, and construct the list of bilateral filtering parameters that we are
# going to explore. These parameters correspond to the diameter, σcolor and σspace of the bilateral
# filter, respectively.
params = [(11, 21, 7), (11, 41, 21), (11, 61, 39)]
# loop over the diameter, sigma color, and sigma space

# the larger the diameter, the more pixels will be included in the blurring computation
# A larger value for σcolor means that more colors in the neighborhood will be considered when
# computing the blur. If we let σcolor get too large in respect to the diameter, then we essentially
# have broken the assumption of bilateral filtering —that only pixels of similar color should
# contribute significantly to the blur.
# Finally, we need to supply the space standard deviation (σspace). A larger value of σspace means
# that pixels farther out from the central pixel diameter will less influence the blurring calculation.
# apply bilateral filtering and display the image
for (diameter, sigmaColor, sigmaSpace) in params:
blurred = cv2.bilateralFilter(image, diameter, sigmaColor, sigmaSpace)
title = "Blurred d={}, sc={}, ss={}".format(diameter, sigmaColor, sigmaSpace)
cv2.imshow(title, blurred)
cv2.waitKey(0)
Step 2: Save the code as "bilateral.py"
Step 3: Run the python script (bilateral.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python bilateral.py -i new.jpeg

or
$ python bilateral.py --image new.jpeg
Inference:
LESSON 1.8: LIGHTING AND COLOR SPACES
OBJECTIVES
1. Understand the role lighting conditions play in the development of a successful computer vision
system.
2. Discuss the four primary color spaces you‘ll encounter in computer vision: RGB, HSV, L*a*b*, and
grayscale (which isn‘t technically a color space, but is used in many computer vision applications).
LIGHTING CONDITIONS
 Every single computer vision algorithm, application, and system ever developed and that will be
developed, depend on the quality of images input to the system.
 We‘ll certainly be able to make our systems more robust in relation to poor lighting conditions, but
we‘ll never be able to overcome an image that was captured under inferior conditions.
 Lighting can mean the difference between success and failure of your computer vision algorithm.
 Lighting conditions should have three primary goals:
1. High Contrast: Maximize the contrast between the Regions of Interest in your image (i.e. the
―objects‖ you want to detect, extract, classify, manipulate, etc. should have sufficiently high
contrast from the rest of the image so they are easily detectable).
2. Generalizable: Your lighting conditions should be consistent enough that they work well from
one ―object‖ to the next.
3. Stable: Having stable, consistent, and repeatable lighting conditions is the holy grail of computer
vision application development. However, it‘s often hard (if not impossible) to guarantee — this
is especially true if we are developing computer vision algorithms that are intended to work in
outdoor lighting conditions. As the time of day changes, clouds roll in over the sun, and rain
starts to pour, our lighting conditions will obviously change.
COLOR SPACES AND COLOR MODELS

 A color space is just a specific organization of colors that allow us to consistently represent and
reproduce colors.
 A color model, on the other hand, is an abstract method of numerically representing colors in the
color space.
 As we know, RGB pixels are represented as a 3-integer tuple of a Red, Green, and Blue value.
RGB MODEL: Red, Green, and Blue components of an image.

 To define a color in the RGB color model, all we need to do is define the amount of Red, Green, and
Blue contained in a single pixel.
 Each Red, Green, and Blue channel can have values defined in the range [0,255] (for a total of 256
―shades‖), where 0 indicates no representation and 255 demonstrates full representation.
 The RGB color space is an example of an additive color space: the more of each color is added, the
brighter the pixel becomes and the closer it comes to white.
 Adding red and green leads to yellow.
 Adding red and blue yields pink.
 And adding all three red, green, and blue together we create white.
 Since an RGB color is defined as a 3-valued tuple, with each value in the range [0, 255], we can thus
think of the cube containing 256 x 256 x 256 = 16,777,216 possible colors, depending on how much
Red, Green, and Blue we place into each bucket.
 However, this is not exactly the most friendly color space for developing computer vision based
applications.
 In fact, it‘s primary use is to display colors on a monitor.
 But despite how unintuitive the RGB color space may be, nearly all images you‘ll work with will be
represented (at least initially) in the RGB color space.
HSV MODEL: The HSV color space transforms the RGB color space, remodeling it as a cylinder rather
than a cube.
 As we saw in the RGB section, the ―white‖ or ―lightness‖ of a color is an additive combination of each
Red, Green, and Blue component.
 But now in the HSV color space, the lightness is given its own separate dimension.
 Let‘s define what each of the HSV components are:
Hue: Which ―pure‖ color. For example, all shadows and tones of the color ―red‖ will have the same
Hue.
Saturation: How ―white‖ the color is. A fully saturated color would be ―pure,‖ as in ―pure red.‖ And a
color with zero saturation would be pure white.
Value: The Value allows us to control the lightness of our color. A Value of zero would indicate pure
black, whereas increasing the value would produce lighter colors.
 It‘s important to note that different computer vision libraries will use different ranges to represent
each of the Hue, Saturation, and Value components.
 However, in the case of OpenCV, images are represented as 8-bit unsigned integer arrays. Thus,
the Hue value is defined the range [0, 179] (for a total of 180 possible values, since [0, 359] is not
possible for an 8-bit unsigned array) — the Hue is actually a degree (Θ) on the HSV color cylinder. And
both saturation and value are defined on the range [0, 255].
 The value controls the actual lightness of our color, while both Hue and Saturation define the actual
color and shade.
 The HSV color space is used heavily in computer vision applications — especially if we are
interested in tracking the color of some object in an image.
 It‘s far, far easier to define a valid color range using HSV than it is RGB.
L*a*b* MODEL:
 While the RGB color space is easy to understand (especially when you‘re first getting started in
computer vision), it‘s non-intuitive when defining exact shades of a color or specifying a particular range
of colors.
 On the other hand, the HSV color space is more intuitive but does not do the best job in representing
how humans see and interpret colors in images.
 For example, let‘s compute the Euclidean distance between the colors red and green; red and
purple; and red and navy in the RGB color space:
>> import math
>>> red_green = math.sqrt(((255 - 0) ** 2) + ((0 - 255) ** 2) + ((0 - 0) ** 2))
>>> red_purple = math.sqrt(((255 - 128) ** 2) + ((0 - 0) ** 2) + ((0 - 128) ** 2))
>>> red_navy = math.sqrt(((255 - 0) ** 2) + ((0 - 0) ** 2) + ((0 - 128) ** 2))
>>> red_green, red_purple, red_navy
(360.62445840513925, 180.31361568112376, 285.3226244096321)
 What do these distance values actually represent?

 Is the color red somehow more perceptually similar to purple rather than green?
 The answer is a simple no — even though we have defined our color spaces on objects like a cube
and a cylinder, these distances are actually quite arbitrary and there is actual no way to ―measure‖ the
perceptual difference in color between various colors in the RGB and HSV color spaces.
 That is where the L*a*b* color space comes in — its goal is to mimic the methodology in which
humans see and interpret color.
 This means that the Euclidean distance between two arbitrary colors in the L*a*b* color space has
actual perceptual meaning.
 The addition of perceptual meaning makes the L*a*b* color space less intuitive and understanding
as RGB and HSV, but it is heavily used in computer vision.
 Essentially, the L*a*b* color space is a 3-axis system:
 Where we define each channel below:

L-channel: The ―lightness‖ of the pixel. This value goes up and down the vertical axis, white to
black, with neutral grays at the center of the axis.
a-channel: Originates from the center of the L-channel and defines pure green on one end of the
spectrum and pure red on the other.
b-channel: Also originates from the center of the L-channel, but is perpendicular to the a-channel.
The b-channel defines pure blue at one of the spectrum and pure yellow at the other.
 Again, while the L*a*b* color space is less intuitive and easy to understand as the HSV and RGB, it
is heavily used in computer vision. And since the distance between colors between has actual
perceptual meaning, it allows us to overcome various lighting condition problems. It also serves as a
powerful color image descriptor.
 Similar to our HSV example, we have the L*-channel which is dedicated to displaying how light a
given pixel is. The a* and b* then determine the shade and color of the pixel.
GRAYSCALE:
 Simply the grayscale representation of a RGB image.
 The grayscale representation of an image is often referred to as ―black and white,‖ but this is not
technically correct.
 Grayscale images are single channel images with pixel values in the range [0, 255] (i.e. 256 unique
values).
 True black and white images are called binary images and thus only have two possible values: 0 or
255 (i.e. only 2 unique values).
 Be careful when referring to grayscale image as black and white to avoid this ambiguity.
 However, converting an RGB image to grayscale is not as straightforward as you may think.
 Biologically, our eyes are more sensitive and thus perceive more green and red than blue.
 Thus when converting to grayscale, each RGB channel is not weighted uniformly, like this:
Y=0.333xR+0.333xG+0.333xB
 Instead, we weight each channel differently to account for how much color we perceive of each:
Y=0.299xR+0.587xG+0.114xB
 Again, due to the cones and receptors in our eyes, we are able to perceive nearly 2x the amount of
green than red.
 And similarly, we notice over twice the amount of red than blue.
 Thus, we make sure to account for this when converting from RGB to grayscale.
 The grayscale representation of an image is often used when we have no use for color (such in
detecting faces or building object classifiers where the color of the object does not matter).
 Discarding color thus allows us to save memory and be more computationally efficient.
Experiment 17: To study about different color spaces
Program 17:

import argparse
import cv2

# load the original image and display it (RGB)

cv2.imshow("RGB", image)
# loop over each of the individual channels and display them

for (name, chan) in zip(("B", "G", "R"), cv2.split(image)):
cv2.imshow(name, chan)
# wait for a keypress, then close all open windows

cv2.waitKey(0)
# convert the image to the HSV color space and show it

# specify the cv2.COLOR_BGR2HSV flag to indicate that we want to convert from BGR to HSV.
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
cv2.imshow("HSV", hsv)

for (name, chan) in zip(("H", "S", "V"), cv2.split(hsv)):

cv2.waitKey(0)
# convert the image to the L*a*b* color space and show it

lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
cv2.imshow("L*a*b*", lab)

for (name, chan) in zip(("L*", "a*", "b*"), cv2.split(lab)):

cv2.waitKey(0)
# show the original and grayscale versions of the image

cv2.imshow("Grayscale", gray)
cv2.waitKey(0)
Step 2: Save the code as "colorspaces.py"
Step 3: Run the python script (colorspaces.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python colorspaces.py -i new.jpeg
or
$ python colorspaces.py --image new.jpeg
Inference:
LESSON 1.9: THRESHOLDING

Thresholding is one of the most common (and basic) segmentation techniques in computer vision and
it allows us to separate the foreground (i.e. the objects that we are interested in) from the background
of the image.
Thresholding comes in many forms:
1. Simple thresholding: where we manually supply parameters to segment the image — this works
extremely well in controlled lighting conditions where we can ensure high contrast between the
foreground and background of the image.
2. Otsu’s thresholding that attempt to be more dynamic and automatically compute the optimal
threshold value based on the input image.
3. Adaptive thresholding which, instead of trying to threshold an image globally using a single value,
instead breaks the image down into smaller pieces, and thresholds each of these pieces separately
and individually.
OBJECTIVES:
1. Be able to define what thresholding is.
2. Understand simple thresholding and why a thresholding value T must be manually provided.
3. Grasp Otsu‘s thresholding method.
4. Comprehend the importance of adaptive thresholding and why it‘s useful in situations where lighting
conditions cannot be controlled.
WHAT IS THRESHOLDING?
Thresholding is the binarization of an image.
In general, we seek to convert a grayscale image to a binary image, where the pixels are either 0 or
255.
A simple thresholding example would be selecting a threshold value T, and then setting all pixel
intensities less than T to zero, and all pixel values greater than T to 255.
In this way, we are able to create a binary representation of the image.
Normally, we use thresholding to focus on objects or areas of particular interest in an image.
SIMPLE THRESHOLDING:
Applying simple thresholding methods requires human intervention.
We must specify a threshold value T.
All pixel intensities below T are set to 255.
And all pixel intensities greater than T are set to 0.
We could also apply the inverse of this binarization by setting all pixels greater than T to 255 and all
pixel intensities below T to 0.
OTSU's METHOD:
But in real-world conditions where we do not have any a priori knowledge of the lighting conditions,
we actually automatically compute an optimal value of T using Otsu‘s method.
Otsu‘s method assumes that our image contains two classes of pixels: the background and the
foreground.
Furthermore, Otsu‘s method makes the assumption that the grayscale histogram of our pixel
intensities of our image is bi-modal, which simply means that the histogram is two peaks.
Histogram is simply a tabulation or a ―counter‖ on the number of times a pixel value appears in the
image.
Based on the grayscale histogram, Otsu‘s method then computes an optimal threshold value T such
that the variance between the background and foreground peaks is minimal.
However, Otsu‘s method has no a priori knowledge of what pixels belong to the foreground and which
pixels belong to the background — it‘s simply trying to optimally separate the peaks of the histogram.
It‘s also important to note that Otsu‘s method is an example of global thresholding — implying that a
single value of T is computed for the entire image.
In some cases, having a single value of T for an entire image is perfectly acceptable — but in other
cases, this can lead to sub-par results.
The first is that Otsu‘s method assumes a bi-modal distribution of the grayscale pixel intensities of our
input image. If this is not the case, then Otsu‘s method can return sub-par results.
Secondly, Otsu‘s method is a global thresholding method.
In situations where lighting conditions are semi-stable and the objects we want to segment have
sufficient contrast from the background, we might be able to get away with Otsu‘s method.
But when the lighting conditions are non-uniform — such as when different parts of the image are
illuminated more than others, we can run into some serious problem. And when that‘s the case, we‘ll
need to rely on adaptive thresholding.
ADAPTIVE THRESHOLDING:
For simple images with controlled lighting conditions, single value of T is not a problem.
But for situations when the lighting is non-uniform across the image, having only a single value of T
can seriously hurt our thresholding performance.
Simply put, having just one value of T may not suffice.
In order to overcome this problem, we can use adaptive thresholding, which considers small
neighbors of pixels and then finds an optimal threshold value T for each neighbor.
This method allows us to handle cases where there may be dramatic ranges of pixel intensities and
the optimal value of T may change for different parts of the image.
In adaptive thresholding, sometimes called local thresholding, our goal is to statistically examine the
pixel intensity values in the neighborhood of a given pixel p.
The general assumption that underlies all adaptive and local thresholding methods is that smaller
regions of an image are more likely to have approximately uniform illumination. This implies that local
regions of an image will have similar lighting, as opposed to the image as a whole, which may have
dramatically different lighting for each region.
However, choosing the size of the pixel neighborhood for local thresholding is absolutely crucial.
The neighborhood must be large enough to cover sufficient background and foreground pixels,
otherwise the value of T will be more or less irrelevant.
But if we make our neighborhood value too large, then we completely violate the assumption that
local regions of an image will have approximately uniform illumination.
Again, if we supply a very large neighborhood, then our results will look very similar to global
thresholding using the simple thresholding or Otsu‘s methods.
In practice, tuning the neighborhood size is (usually) not that hard of a problem.
You‘ll often find that there is a broad range of neighborhood sizes that provide you with adequate
results — it‘s not like finding an optimal value of T that could make or break your thresholding output.
So as I mentioned above, our goal in adaptive thresholding is to statistically examine local regions of
our image and determine an optimal value of T for each region — which begs the question: Which
statistic do we use to compute the threshold value T for each region?
It is common practice to use either the arithmetic mean or the Gaussian mean of the pixel intensities
in each region (other methods do exist, but the arithmetic mean and the Gaussian mean are by far the
most popular).
In the arithmetic mean, each pixel in the neighborhood contributes equally to computing T.
And in the Gaussian mean, pixel values farther away from the (x, y)-coordinate center of the region
contribute less to the overall calculation of T.
The general formula to compute T is thus:
T=mean(IL)-C
where the mean is either the arithmetic or Gaussian mean, IL is the local sub-region of the image I , and
C is some constant which we can use to fine tune the threshold value T.
Experiment 18: To study about simple thresholding technique
Program 18:

import argparse
import cv2

# load the image, convert it to grayscale, and Gaussian blur with sigma=7 radius.
# Applying Gaussian blurring helps remove some of the high frequency edges in the image that
# we are not concerned with and allow us to obtain a more ―clean‖ segmentation.
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
# apply basic thresholding -- the first parameter is the image we want to threshold, the second
# value is our threshold check
# if a pixel value is greater than our threshold (in this case, T=200), we set it to be BLACK,
# otherwise it is WHITE.
# Our third argument is the output value applied during thresholding. Any pixel intensity p that is
# greater than T is set to zero and any p that is less than T is set to the output value.
# The function then returns a tuple of 2 values: the first, T, is the threshold value. In the case of
# simple thresholding, this value is trivial since we manually supplied the value of T in the first
# place. But in the case of Otsu‘s thresholding where T is dynamically computed for us, it‘s nice
# to have that value. The second returned value is the threshold image itself.
(T, threshInv) = cv2.threshold(blurred, 200, 255, cv2.THRESH_BINARY_INV)
cv2.imshow("Threshold Binary Inverse", threshInv)
# using normal thresholding (rather than inverse thresholding), we can change the last
# argument in the function to make the coins black rather than white.
(T, thresh) = cv2.threshold(blurred, 200, 255, cv2.THRESH_BINARY)
cv2.imshow("Threshold Binary", thresh)
# finally, we can visualize only the masked regions in the image

# we perform masking by using the cv2.bitwise_and function. We supply our original input
# image as the first two arguments, and then our inverted thresholded image as our mask.
# Remember, a mask only considers pixels in the original image where the mask is greater than
# zero.
cv2.imshow("Output", cv2.bitwise_and(image, image, mask=threshInv))
cv2.waitKey(0)
Step 2: Save the code as "simple_thresholding.py"
Step 3: Run the python script (simple_thresholding.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python simple_thresholding.py -i coins01.png
or
$ python simple_thresholding.py --image coins01.png
Inference:
Experiment 19: To study about Otsu's thresholding technique
Program 19:

import argparse
import cv2

# load the image, convert it to grayscale, and Gaussian blur with sigma=7 radius
# apply Otsu's automatic thresholding -- Otsu's method automatically determines the best
# threshold value `T` for us
# T=0, Remember that Otsu‘s method is going to automatically compute the optimal value of T
# for us. We could technically specify any value we wanted for this argument; however, I like to
# supply a value of 0 as a type of ―don‘t care‖ parameter.
# The third argument is the output value of the threshold, provided the given pixel passes the
# threshold test.
# The last argument is one we need to pay extra special attention to. Previously, we had
# supplied values of cv2.THRESH_BINARY or cv2.THRESH_BINARY_INV depending on what
# type of thresholding we wanted to perform. But now we are passing in a second flag that is
# logically OR‘d with the previous method. Notice that this method is cv2.THRESH_OTSU,
# which obviously corresponds to Otsu‘s thresholding method.
(T, threshInv) = cv2.threshold(blurred, 0, 255,
cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
cv2.imshow("Threshold", threshInv)
print "Otsu's thresholding value: {}".format(T)
# finally, we can visualize only the masked regions in the image

cv2.imshow("Output", cv2.bitwise_and(image, image, mask=threshInv))
cv2.waitKey(0)
Step 2: Save the code as "otsu_thresholding.py"
Step 3: Run the python script (otsu_thresholding.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python otsu_thresholding.py -i coins01.png
or
$ python otsu_thresholding.py --image coins01.png
Inference:
Experiment 20: To study about Adaptive thresholding technique
Program 20:

# computer vision + image processing library, scikit-image (http://scikit-image.org/).
# the scikit-image implementation of adaptive thresholding is preferred over the OpenCV one,
# since it is less verbose and more Pythonic than the OpenCV one
from skimage.filters import threshold_adaptive
import argparse
import cv2

# load the image, convert it to grayscale, and blur it slightly

image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(image, (5, 5), 0)
# instead of manually specifying the threshold value, we can use adaptive thresholding to
# examine neighborhoods of pixels and adaptively threshold each neighborhood -- in this
# example, we'll calculate the mean value of the neighborhood area of 25 pixels and threshold
# based on that value; finally, our constant C is subtracted from the mean calculation (in this
# case 15)
# second parameter is the output threshold
# third argument is the adaptive thresholding method. Here we supply a value of
# cv2.ADAPTIVE_THRESH_MEAN_C to indicate that we are using the arithmetic mean of the
# local pixel neighborhood to compute our threshold value of T. We could also supply a value of
# cv2.ADAPTIVE_THRESH_GAUSSIAN_C to indicate we want to use the Gaussian average
# The fourth value to cv2.adaptiveThreshold is the threshold method, again just like in the
# Simple Thresholding and Otsu‘s Method sections. Here we pass in a value of
# cv2.THRESH_BINARY_INV to indicate that any pixel value that passes the threshold test will
# have an output value of 0. Otherwise, it will have a value of 255.
# The fifth parameter is our pixel neighborhood size. Here you can see that we‘ll be computing
# the mean grayscale pixel intensity value of each 25x25 sub-region in the image to compute
# our threshold value T.
# The final argument to cv2.adaptiveThreshold is the constant C which lets us fine tune our
# threshold value.
thresh = cv2.adaptiveThreshold(blurred, 255,
cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 25, 15)
cv2.imshow("OpenCV Mean Thresh", thresh)
# the scikit-image adaptive thresholding, it just feels a lot more "Pythonic"

# supply a value of 29 for our 29x29 pixel neighborhood we are going to inspect
# The offset parameter is equivalent to our C parameter
# The threshold_adaptive function defaults to the Gaussian mean of the local region, but we
# could also use the arithmetic mean, median, or any other custom statistic by adjusting the
# optional method argument
# The threshold_adaptive function actually returns our segmented objects as black appearing
# on a white background, so to fix this, we just take the bitwise NOT.
thresh = threshold_adaptive(blurred, 29, offset=5).astype("uint8") * 255
thresh = cv2.bitwise_not(thresh)
cv2.imshow("scikit-image Mean Thresh", thresh)
cv2.waitKey(0)
Step 2: Save the code as "adaptive_thresholding.py"
Step 3: Run the python script (adaptive_thresholding.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python adaptive_thresholding.py -i license_plate.png
or
$ python adaptive_thresholding.py --image license_plate.png
Inference:
LESSON 1.10:GRADIENTS
 We will be using gradients for detecting edges in images, which allows us to find contours and
outlines of objects in images.
 We use them as inputs for quantifying images through feature extraction — in fact, highly successful
and well-known image descriptors such as Histogram of Oriented Gradients (HoG) and Scale-Invariant
Feature Transform (SIFT) are built upon image gradient representations.
 Gradient images are even used to construct saliency maps, which highlight the subjects of an image.
OBJECTIVES
1. Define what an image gradient is.
2. Compute changes in direction of an input image.
3. Define both gradient magnitude and gradient orientation.
4. Learn how to compute gradient magnitude and gradient orientation.
5. Approximate the image gradient using Sobel and Scharr kernels.
6. Learn how to use the cv2.Sobel function to compute image gradient representations in OpenCV.
IMAGE GRADIENTS
 The main application of image gradients lies within edge detection.
 Edge detection is the process of finding edges in an image, which reveals structural information
regarding the objects in an image.
 Edges could therefore correspond to:
1. Boundaries of an object in an image.
2. Boundaries of shadowing or lighting conditions in an image.
3. Boundaries of ―parts‖ within an object
So how do we go about finding the edges in an image?
 The first step is to compute the gradient of the image. Formally, an image gradient is defined as a
directional change in image intensity. At each pixel of the input (grayscale) image, a gradient measures
the change in pixel intensity in a given direction. By estimating the direction or orientation along with the
magnitude (i.e. how strong the change in direction is), we are able to detect regions of an image that
look like edges.
 In the image above we examine the 3x3 neighborhood surrounding the central pixel.
 Our x values run from left to right, and our y values from top to bottom.
 In order to compute any changes in direction we‘ll need the north, south, east, and west pixels.
 If we denote our input image as I, then we define the north, south, east, and west pixels using the
following notation:
North: I(x,y-1) South: I(x,y+1) East: I(x+1,y) West: I(x-1,y)
 Again, these four values are critical in computing the changes in image intensity in both the x and y
direction.
 To demonstrate this, let‘s compute the vertical change or the y-change by taking the difference
between the north and south pixels:
Gy= I(x,y-1)- I(x,y+1)
 Similarly, we can compute the horizontal change or the x-change by taking the difference between
the east and west pixels:
Gx= I(x+1,y)- I(x-1,y)
 So now we have Gx and Gy, which represent the change in image intensity for the central pixel in
both the x and y direction.
 So now the big question becomes: what do we do with these values?
 To answer that, we‘ll need to define two new terms — the gradient magnitude and the gradient
orientation.
 The gradient magnitude is used to measure how strong the change in image intensity is. The
gradient magnitude is a real-valued number that quantifies the ―strength‖ of the change in intensity.
 While the gradient orientation is used to determine in which direction the change in intensity is
pointing. As the name suggests, the gradient orientation will give us an angle or Θ that we can use to
quantify the direction of the change.
 On the left we have a 3x3 region of an image where the top half of the image is white and the bottom
half of the image is black. The gradient orientation is thus equal to Θ=90°.
 And on the right we have another 3x3 neighborhood of an image, where the upper triangular region
is white and the lower triangular region is black. Here we can see the change in direction is equal to
Θ=45°.
 But how do we actually go about computing the gradient orientation and magnitude?
 3x3 neighborhood of an image:
Here we can see that the central pixel is marked in red. The next step in determining the gradient
orientation and magnitude is actually to compute the changes in gradient in both the x and y direction.
Using both Gx and Gy , we can apply some basic trigonometry to compute the gradient magnitude ,
and orientation Θ:
Inspecting this triangle you can see that the gradient magnitude is the hypotenuse of the triangle.
Therefore, all we need to do is apply the Pythagorean theorem and we‘ll end up with the gradient
magnitude:
G  G2x  G2y
The gradient orientation can then be given as the ratio of Gx to Gy.
180
  arctan 2(Gy ,Gx ) 

The arctan2 function gives us the orientation in radians, which we then convert to degrees by
multiplying by the ratio of 180/π.
Let‘s go ahead and manually compute G and Θ so we can see, how the process is done:
In the above image we have an image where the upper-third is white and the bottom two-thirds is
black.
Using the equations for Gx and Gy, we arrive at:
Gx=0-0=0 and Gy=255-0=255
G  02  2552  255
180
As for our gradient orientation:   arctan 2(255,0)   90

Sure enough, the gradient of the central pixel is pointing up as verified by the Θ=90°.
Another example:
In this particular image we can see that the lower-triangular region of the neighborhood is white while
the upper-triangular neighborhood is black. Computing both Gx and Gy we arrive at:
Gx=0-255=-255 and Gy=0-255=-255
G  2552  2552  360.62

180
As for our gradient orientation:   arctan 2(255, 255)   135

Sure enough, our gradient is pointing down and to the left at an angle of -135°.
Of course, we have only computed our gradient orientation and magnitude for two unique pixel
values: 0 and 255.
Normally you would be computing the orientation and magnitude on a grayscale image where the
valid range of values would be [0, 255].
SOBEL AND SCHARR KERNELS

Now that we have learned how to compute gradients manually, let‘s look at how we can approximate
them using kernels, which will give us a tremendous boost in speed.
Sobel method, uses two kernels: one for detecting horizontal changes in direction and the other for
detecting vertical changes in direction.
 1 0 1
 
G x   2 0 2
 1 0 1 
 1 2 1
 
Gy   0 0 0
 1 2 1 
Given an input image neighborhood below, let‘s compute the Sobel approximation to the gradient,
 93 139 101
 
I i,j   26 252 196 
135 230 18 
Therefore,
 1  93 0  139 1  101   93 0 101 
   
G x    2  26 0  252 2  196     52 0 392  231
 1  135 0  230 1  18   135 0 18 
 1  93 2  139 1  101  93 278 101
   
G y    0  26 0  252 0  196     0 0 0   141
 1  135 2  230 1  18   135 460 18 
Given these values of Gx and Gy, it would then be trivial to compute the gradient magnitude G and
orientation Θ,
G= 2312 + 1412 = 270.63
180
𝜃 = 𝑎𝑟𝑐𝑡𝑎𝑛2(141,231) × 𝜋
=31.4°
We could also use the Scharr kernel instead of the Sobel kernel which may give us better
approximations to the gradient,
 3 0 3 
 
G x  10 0 10 
 3 0 3 
 3 10 3
 
Gy   0 0 0
 3 10 3 
Experiment 21: To study about Sobel kernels
Program 21:

import argparse
import cv2

# load the image, convert it to grayscale, and display the original image
# compute gradients along the X and Y axis, respectively

# The Scharr kernel can be done in the exact same manner, only using the cv2.Scharr function
gX = cv2.Sobel(gray, ddepth=cv2.CV_64F, dx=1, dy=0)
gY = cv2.Sobel(gray, ddepth=cv2.CV_64F, dx=0, dy=1)
# the `gX` and `gY` images are now of the floating point data type, so we need to take care to
# convert them back to an unsigned 8-bit integer representation so other OpenCV functions can
# utilize them
gX = cv2.convertScaleAbs(gX)
gY = cv2.convertScaleAbs(gY)
# combine the sobel X and Y representations into a single image, weighting each gradient
# representation equally.
sobelCombined = cv2.addWeighted(gX, 0.5, gY, 0.5, 0)
# show our output images

cv2.imshow("Sobel X", gX)
cv2.imshow("Sobel Y", gY)
cv2.imshow("Sobel Combined", sobelCombined)
cv2.waitKey(0)
Step 2: Save the code as "sobel.py"
Step 3: Run the python script (sobel.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python sobel.py -i bricks.png
or
$ python sobel.py --image bricks.png
Inference:
Experiment 22: Gradient orientation and magnitude in OpenCV The end goal of this program will be to
(1) compute the gradient orientation and magnitude, and then
(2) only display the pixels in the image that fall within the range minΘ<=Θ<=maxΘ.
Program 22:

import numpy as np
import argparse
import cv2

# Our script will require three command line arguments. The first is the --image, which is the
# path to where our image resides on disk. The second is the --lower-angle, or the smallest
# gradient orientation angle we are interested in detecting. Similarly, we define the final
# argument as --upper-angle, which is the largest gradient orientation angle that we want to
# detect. We default these min and max angles to 175° and 180° respectively, but you can
# change them to whatever you like when executing the script.

ap.add_argument("-l", "--lower-angle", type=float, default=175.0,help="Lower orientation angle")
ap.add_argument("-u", "--upper-angle", type=float,default=180.0,help="Upper orientation angle")
# load the image, convert it to grayscale, and display the original image
# compute gradients along the X and Y axis, respectively.

# However, unlike the previous section, we are not going to display the gradient images to our
# screen, thus we do not have to convert them back into the range [0, 255] or use the
# cv2.addWeighted function to combine them together.
gX = cv2.Sobel(gray, cv2.CV_64F, 1, 0)
gY = cv2.Sobel(gray, cv2.CV_64F, 0, 1)
# compute the gradient magnitude and orientation, respectively

mag = np.sqrt((gX ** 2) + (gY ** 2))
orientation = np.arctan2(gY, gX) * (180 / np.pi) % 180
# find all pixels that are within the upper and low angle boundaries
# following lines handles selecting image coordinates that are greater than the lower angle
# minimum. The first argument to the np.where function is the condition that we want to test
# again, we are looking for indexes that are greater than the minimum supplied angle. The
# second argument is the array that we want to check — this is obviously our orientations array.
# And the final argument that we supply is the value if the check does not pass. In the case that
# the orientation is less than the minimum angle requirement, we‘ll set that particular value to -1.
idxs = np.where(orientation >= args["lower_angle"], orientation, -1)
# The second argument is the idxs list returned by previous line since we are looking for
# orientations that pass both the upper and lower orientation test.
# The idxs now contains the coordinates of all orientations that are greater than the minimum
# angle and less than the maximum angle. Using this list, we construct a mask, all coordinates
# that have a corresponding value of > -1 are set to 255 (i.e. foreground). Otherwise, they are
# left as 0 (i.e. background).
idxs = np.where(orientation <= args["upper_angle"], idxs, -1)
mask = np.zeros(gray.shape, dtype="uint8")
mask[idxs > -1] = 255
# show the images

cv2.waitKey(0)
Step 2: Save the code as "mag_orientation.py"
Step 3: Run the python script (mag_orientation.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python mag_orientation.py -i coins02.png
or
$ python mag_orientation.py --image coins02.png
Inference:
LESSON 1.11: EDGE DETECTION

The Canny edge detector is arguably the most well known and the most used edge detector in all of
computer vision and image processing.
OBJECTIVES:
1. What the Canny edge detector is and how it is used.
2. The basic steps of the Canny edge detector.
3. How to use the cv2.Canny function to detect edges in images.
4. How to extend the Canny edge detector to create the auto_canny, a zero parameter edge detector.
EDGE DETECTION-CANNY EDGE DETECTOR:

As we discovered in the previous lesson, the gradient magnitude and orientation allow us to reveal
the structure of objects in an image.
However, for the process of edge detection, the gradient magnitude is extremely sensitive to noise.
Hence, we‘ll have to use the image gradients as building blocks to create a more robust method to
detect edges — the Canny edge detector.
The Canny edge detector is a multi-step algorithm used to detect a wide range of edges in images.
The algorithm itself was introduced by John F. Canny in 1986.
More formally, an edge is defined as discontinuities in pixel intensity, or more simply, a sharp
difference and change in pixel values.
Types of Edges:
1. Step Edge: A step edge forms when there is an abrupt change in pixel intensity from one side
of the discontinuity to the other. These types of edges tend to be easy to detect.
2. Ramp Edge: A ramp edge is like a step edge, only the change in pixel intensity is not
instantaneous. Instead, the change in pixel value occurs a short, but finite distance.
3. Ridge Edge: A ridge edge is similar to combining two ramp edges, one bumped right against
another. Think of ramp edges as driving up and down a large hill or mountain. In the context of
edge detection, a ridge edge occurs when image intensity abruptly changes, but then returns to
the initial value after a short distance.
4. Roof Edge: Unlike the ridge edge where there is a short, finite plateau at the top of the edge,
the roof edge has no such plateau. Instead, we slowly ramp up on either side of the edge, but
the very top is a pinnacle and we simply fall back down the bottom.
 Steps involved in Canny Edge Detection Algorithm:
1. Applying Gaussian smoothing to the image to help reduce noise.
2. Computing the Gx and Gy image gradients using the Sobel kernel.
3. Applying non-maxima suppression to keep only the local maxima of gradient magnitude pixels that
are pointing in the direction of the gradient.
4. Defining and applying the Tupper and Tlower thresholds for Hysteresis thresholding.
Step 1: Gaussian smoothing

Smoothing an image allows us to ignore much of the detail and instead focus on the actual structure.
This also makes sense in the context of edge detection — we are not interested in the actual detail of
the image.
Instead, we want to apply edge detection to find the structure and outline of the objects in the image
so we can further process them.
Step 2: Gradient orientation and magnitude

We can compute the gradient orientation and magnitude.
However, as we have seen, the gradient magnitude is quite susceptible to noise and does not make
for the best edge detector. We need to add two more steps on to the process to extract better edges.
Step 3: Non-maxima Suppression

It‘s simply an edge thinning process.
After computing our gradient magnitude representation, the edges themselves are still quite noisy and
blurred, but in reality there should only be one edge response for a given region, not a whole clump of
pixels reporting themselves as edges.
To remedy this, we can apply edge thinning using non-maxima suppression.
To apply non-maxima suppression we need to examine the gradient magnitude G and orientation Θ
at each pixel in the image and,
1. Compare the current pixel to the 3x3 neighborhood surrounding it.
2. Determine in which direction the orientation is pointing:
1. If it‘s pointing towards the north or south, then examine the north and south
magnitude.
2. If the orientation is pointing towards the east or west, then examine the east and west
pixels.
3. If the center pixel magnitude is greater than both the pixels it is being compared to, then
preserve the magnitude. Otherwise, discard it.
Some implementations of the Canny edge detector round the value of Θ to either 0°, 45°, 90° or 135°,
and then use the rounded angle to compare not only the north, south, east, and west pixels, but also
the corner top-left, top-right, bottom-right, and bottom-left pixels as well.
Example 1: But, let‘s keep things simple and view an example of applying non-maxima suppression
for an angle of Θ=90°.
 Given that our gradient orientation is pointing north, we need to examine both
the north and south pixels.
 The central pixel value of 93 is greater than the south value of 26, so we‘ll
discard the 26.
 However, examining the north pixel we see that the value is 162 — we‘ll keep this
value of 162 and suppress (i.e. set to 0) the value of 93 since 93 < 162.
Example 2: Applying non-maxima suppression for when Θ=180°.
Notice how the central pixel is less than both the east and west pixels.
 According to our non-maxima suppression rules above (rule #3), we need to
discard the pixel value of 93 and keep the east and west values of 104 and 139,
respectively.
Step 4: Hysteresis thresholding

Even after applying non-maxima suppression, we may need to remove regions of an image that are
not technically edges, but still responded as edges after computing the gradient magnitude and
applying non-maximum suppression.
To ignore these regions of an image, we need to define two thresholds: Tupper and Tlower.
Any gradient value G>Tupper is sure to be an edge.
Any gradient value G<Tlower is definitely not an edge, so immediately discard these regions.
And any gradient value that falls into the range Tlower<G<Tupper needs to undergo additional tests:
1. If the particular gradient value is connected to a strong edge (i.e. G>Tupper), then mark the
pixel as an edge.
2. If the gradient pixel is not connected to a strong edge, then discard it.
Hysteresis thresholding is actually better explained visually:
 At the top of the graph we can see that A is a sure edge, since
A>Tupper.
 B is also an edge, even though B<Tupper since it is connected to a
sure edge, A.
 C is not an edge since C<Tupper and is not connected to a strong
edge.
 Finally, D is not an edge since D<Tlower and is automatically
discarded.
If the threshold range is too wide then we‘ll get many false edges instead of being about to find just
the structure and outline of an object in an image.
Similarly, if the threshold range is too tight, we won‘t find many edges at all and could be at risk of
missing the structure/outline of the object entirely.
Experiment 23: Canny Edge Detection using OpenCV.
Program 23:

import argparse
import cv2

# load the image, convert it to grayscale, and blur it slightly

# While Canny edge detection can be applied to an RGB image by detecting edges in each of
# the separate Red, Green, and Blue channels separately and combining the results back
# together, we almost always want to apply edge detection to a single channel, grayscale image
# this ensures that there will be less noise during the edge detection process.
# show the original and blurred images

cv2.imshow("Blurred", blurred)
# compute a "wide", "mid-range", and "tight" threshold for the edges.

# supply the Tlower and Tupper thresholds, respectively
wide = cv2.Canny(blurred, 10, 200)
mid = cv2.Canny(blurred, 30, 150)
tight = cv2.Canny(blurred, 240, 250)
# show the edge maps

cv2.imshow("Wide Edge Map", wide)
cv2.imshow("Mid Edge Map", mid)
cv2.imshow("Tight Edge Map", tight)
cv2.waitKey(0)
Step 2: Save the code as "canny.py"
Step 3: Run the python script (canny.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python canny.py -i coins01.png
or
$ python canny.py --image coins01.png
Inference:
AUTOMATICALLY TUNING EDGE DETECTION PARAMETERS

As we saw in the section above, the Canny edge detector requires two parameters: an upper and
lower threshold used during the hysteresis step.
The problem becomes determining these lower and upper thresholds.
What are the optimal values for the thresholds?
This question is especially important when you are processing multiple images with different contents
captured under varying lighting conditions.
The actual auto_canny function is already defined for us inside my imutils library.
Library: imutils.py
def auto_canny(image, sigma=0.33):

# compute the median of the single channel pixel intensities
# An optional argument sigma, can be used to vary the percentage thresholds that are
# determined based on simple statistics.
# Unlike the mean, the median is less sensitive to outlier pixel values inside the image,
# thus making it a more stable and reliable statistic for automatically tuning threshold
# values.
v = np.median(image)
# apply automatic Canny edge detection using the computed median

# We then take this median value and construct two thresholds, lower and upper. These
# thresholds are constructed based on the +/- percentages controlled by the sigma.
# A lower value of sigma indicates a tighter threshold, whereas a larger value of sigma
# gives a wider threshold. In general, you will not have to change this sigma value often.
# Simply select a single, default sigma value and apply it to entire dataset of images.
lower = int(max(0, (1.0 - sigma) * v))
upper = int(min(255, (1.0 + sigma) * v))
edged = cv2.Canny(image, lower, upper)
# return the edged image

return edged
Experiment 24: Canny Edge Detection using imutils.py library.
Program 24:

import argparse
import imutils
import cv2
# construct the argument parse and parse the arguments

ap.add_argument("-i", "--image", required=True, help="path to the image")
# load the image, convert it to grayscale, and blur it slightly to remove high frequency noise
# apply Canny edge detection using a wide threshold, tight threshold, and automatically
# determined threshold
wide = cv2.Canny(blurred, 10, 200)
tight = cv2.Canny(blurred, 225, 250)
auto = imutils.auto_canny(blurred)
# show the images

cv2.imshow("Wide", wide)
cv2.imshow("Tight", tight)
cv2.imshow("Auto", auto)
cv2.waitKey(0)
Step 2: Save the code as "auto_canny.py"
Step 3: Run the python script (auto_canny.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python auto_canny.py -i teacup.jpg
or
$ python auto_canny.py --image teacup.jpg
Inference:
LESSON 1.12: CONTOURS

So up until this point, we have been able to apply methods like thresholding and edge detection to
detect the outlines and structures of objects in images.
However, now that we have the outlines and structures of the objects in images, the big question is:
How do we find and access these outlines?
The answer is contours.
OBJECTIVES:
1. Find and detect the contours of objects in images.
2. Extract objects from images using contours, masks and cropping.
FINDING AND DRAWING CONTOURS

 Contours are simply the outlines of an object in an image.
 If the image is simple enough, we might be able to get away with using the grayscale image as an
input.
 But for more complicated images, we must first find the object by using methods such as edge
detection or thresholding — we are simply seeking a binary image where white pixels correspond to
objects in an image and black pixels as the background. There are many ways to obtain a binary image
like this, but the most used methods are edge detection and thresholding.
For better accuracy you‘ll normally want to utilize a binary image rather than a grayscale image.
Once we have this binary or grayscale image, we need to find the outlines of the objects in the image.
This is actually a lot easier than it sounds thanks to the cv2.findContours function.
Experiment 25: Finding and drawing contours in OpenCV.
Program 25:

import numpy as np
import argparse
import cv2


# show the original image
# find all contours in the image and draw ALL contours on the image
#The cv2.findContours function is destructive to the input image (meaning that it manipulates it)
# so if you intend on using your input image again, be sure to clone it using the copy() method
# prior to passing it into cv2.findContours.
# We‘ll instruct cv2.findContours to return a list of all contours in the image by passing in the
# cv2.RETR_LIST flag.
# This flag will ensure that all contours are returned. Other methods exist, such as returning only
# the external most contours, which we‘ll explore later.
# Finally, we pass in the cv2.CHAIN_APPROX_SIMPLE flag. If we did not specify this flag and
# instead used cv2.CHAIN_APPROX_NONE, we would be storing every single (x, y)-coordinate
# along the contour. In general, this not advisable. It‘s substantially slower and takes up
# significantly more memory. By compressing our horizontal, vertical, and diagonal segments
# into only end-points we are able to reduce memory consumption significantly without any
# substantial loss in contour accuracy.
# Finally, the cv2.findContours function returns a tuple of 2 values.
# The first value is the contours themselves. These contours are simply the boundary points of
# the outline along the object.
# The second value is the hierarchy of the contours, which contains information on the topology
# of the contours. Often we are only interested in the contours themselves and not their actual
# hierarchy (i.e. one contour being contained in another) so this second value is usually ignored.
# We then draw our found contours. The first argument we pass in is the image we want to draw
# the contours on. The second parameter is our list of contours we found using the
# cv2.findContours function.
# The third parameter is the index of the contour inside the cnts list that we want to draw.
# If we wanted to draw only the first contour, we could pass in a value of 0. If we wanted to draw
# only the second contour, we would supply a value of 1. Passing in a value of -1 for this
# argument instructs the cv2.drawContours function to draw all contours in the list.
# Finally, the last two arguments to the cv2.drawContours function is the color of the contour
# (green), and the thickness of the contour line (2 pixels).
(cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
clone = image.copy()
cv2.drawContours(clone, cnts, -1, (0, 255, 0), 2)
print "Found {} contours".format(len(cnts))

cv2.imshow("All Contours", clone)
cv2.waitKey(0)
# it‘s important to explore how to access each individual contour

# re-clone the image and close all open windows
# loop over the contours individually and draw each of them

# By using the built-in Python enumerate function we are also able to get the index of each
# contour along with the contour itself.
# Notice that, a value of -1 for contour index value (indicating that I want to draw all contours)
# and then wrapping the contour c as a list.
# In general, if you want to draw only a single contour, I would get in the habit of always
# supplying a value of -1 for contour index and then wrapping your single contour c as a list.
for (i, c) in enumerate(cnts):
print "Drawing contour #{}".format(i + 1)
cv2.drawContours(clone, [c], -1, (0, 255, 0), 2)
cv2.imshow("Single Contour", clone)
cv2.waitKey(0)
# find only external contours and ignore the ovular region inside the orange rectangle.
# find contours in the image, but this time, keep only the EXTERNAL contours in the image.
# Specifying cv2.RETR_EXTERNAL flag instructs OpenCV to return only the external most
# contours of each shape in the image, meaning that if one shape is enclosed in another, then
# the contour is ignored.
(cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(clone, cnts, -1, (0, 255, 0), 2)
print "Found {} EXTERNAL contours".format(len(cnts))

cv2.imshow("All Contours", clone)
cv2.waitKey(0)
# using both contours and masks together.

# what if we wanted to access just the blue rectangle and ignore all other shapes?
# How would we do that?
# The answer is that we loop over the contours individually, draw a mask for the contour, and
# then apply a bitwise AND.
# loop over the contours individually

for c in cnts:
# construct a mask by drawing only the current contour
# create an empty NumPy array with the same dimensions of our original image.
# empty NumPy array will serve as the mask for the current shape that to be examined
# draw the contour on the mask. Notice how I only supplied a value of 255 (white) for the
# color here — but isn‘t this incorrect? Isn‘t white represented as (255, 255, 255)?
# White is represented by (255, 255, 255), but only if we are working with a RGB image.
# In this case we are working with a mask that has only a single (grayscale) channel
# thus only need to supply a value of 255 to get white.
mask = np.zeros(gray.shape, dtype="uint8")
cv2.drawContours(mask, [c], -1, 255, -1)
# show the images

# A bitwise AND is true only if both input pixels are greater than zero.
cv2.imshow("Image + Mask", cv2.bitwise_and(image, image, mask=mask))
cv2.waitKey(0)
Step 2: Save the code as "finding_and_drawing_contours.py"
Step 3: Run the python script (finding_and_drawing_contours.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python finding_and_drawing_contours.py -i images/basic_shapes.png
or
$ python finding_and_drawing_contours.py --image images/basic_shapes.png
Inference:
SIMPLE CONTOUR PROPERTIES
OBJECTIVES:
You should be able to compute various properties of objects using contours, including:
1. Centroid/Center of Mass
2. Area and Perimeter
3. Bounding boxes and Rotated Bounding Boxes
4. Minimum enclosing circles
5. Fitting an ellipse
CONTOUR PROPERTIES:
1. CENTROID/CENTER OF MASS:
The ―centroid‖ or ―center of mass‖ is the center (x, y)-coordinate of an object in an image.
This (x, y) coordinate is actually calculated based on the image moments, which are based on the
weighted average of the (x, y) coordinates/pixel intensity along the contour.
Moments allow us to use basic statistics to represent the structure and shape of an object in an
image.
The centroid calculation itself is actually very straightforward: it‘s simply the mean (i.e. average)
position of all (x, y)-coordinates along the contour of the shape.
2. AREA AND PERIMETER:

The area of the contour is the number of pixels that reside inside the contour outline.
Similarly, the perimeter (sometimes called arc length) is the length of the contour.
3. BOUNDING BOXES AND ROTATED BOUNDING BOXES:

A bounding box is exactly an upright rectangle that ―bounds‖ and ―contains‖ the entire contoured
region of the image.
However, it does not consider the rotation of the shape.
A bounding box consists of four components: the starting x-coordinate of the box, then the starting y-
coordinate of the box, followed by the width and height of the box.
Computing the rotated bounding box requires two OpenCV functions: cv2.minAreaRect and
cv2.cv.BoxPoints.
In general, you‘ll want to use standard bounding boxes when you want to crop a shape from an image
And you‘ll want to use rotated bounding boxes when you are utilizing masks to extract regions from
an image.
4. MINIMUM ENCLOSING CIRCLES:

Just as we can fit a rectangle to a contour, we can also fit a circle
5. FITTING AN ELLIPSE:
Fitting an ellipse to a contour is much like fitting a rotated rectangle to a contour.
Under the hood, OpenCV is computing the rotated rectangle of the contour. And then it‘s taking the
rotated rectangle and computing an ellipse to fit in the rotated region.
Experiment 26: Contour properties in OpenCV.
Program 26:

import numpy as np
import argparse
import cv2


# find external contours in the image

# loop over the contours

for c in cnts:
# compute the moments of the contour which can be used to compute the
# centroid or "center of mass" of the region
# Using the cv2.moments function we are able to compute the center (x, y)-coordinate of
# the shape the contour represents.
# This function returns a dictionary of moments with the keys of the dictionary as the
# moment number and the values as the of the actual moment
M = cv2.moments(c)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
# draw the center of the contour on the image

cv2.circle(clone, (cX, cY), 10, (0, 255, 0), -1)
cv2.imshow("Centroids", clone)
cv2.waitKey(0)
# loop over the contours again

# compute the area and the perimeter of the contour
area = cv2.contourArea(c)
# True flag indicates whether or not the contour is ―closed‖.

# A contour is considered closed if the shape outline is continuous and there are no
# ―holes‖ along the outline. In most cases, you‘ll be setting this flag to True, indicating
# that your contour has no gaps.
perimeter = cv2.arcLength(c, True)
print "Contour #%d -- area: %.2f, perimeter: %.2f" % (i + 1, area, perimeter)
# draw the contour on the image

cv2.drawContours(clone, [c], -1, (0, 255, 0), 2)
# compute the center of the contour and draw the contour number
M = cv2.moments(c)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
cv2.putText(clone, "#%d" % (i + 1), (cX - 20, cY), cv2.FONT_HERSHEY_SIMPLEX,
1.25, (255, 255, 255), 4)

cv2.imshow("Contours", clone)
cv2.waitKey(0)
# clone the original image


for c in cnts:
# fit a bounding box to the contour
(x, y, w, h) = cv2.boundingRect(c)
cv2.rectangle(clone, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow("Bounding Boxes", clone)
cv2.waitKey(0)

for c in cnts:
# fit a rotated bounding box to the contour and draw a rotated bounding box
# The cv2.minAreaRect function takes our contour and returns a tuple with 3 values.
# The first value of the tuple is the starting (x, y)-coordinates of the rotated bounding
# box. The second value is the width and height of the bounding box. And the final value
# is our Θ, or angle of rotation of the shape.
# pass the output of cv2.minAreaRect to the cv2.cv.BoxPoints function which converts
# the (x, y)-coordinates, width and height, and angle of rotation into a set of coordinates
# points.
box = cv2.minAreaRect(c)
box = np.int0(cv2.cv.BoxPoints(box))
cv2.drawContours(clone, [box], -1, (0, 255, 0), 2)

cv2.imshow("Rotated Bounding Boxes", clone)
cv2.waitKey(0)

for c in cnts:
# fit a minimum enclosing circle to the contour
# returns the (x, y)-coordinates of the center of circle along with the radius of the circle.
((x, y), radius) = cv2.minEnclosingCircle(c)
cv2.circle(clone, (int(x), int(y)), int(radius), (0, 255, 0), 2)

cv2.imshow("Min-Enclosing Circles", clone)
cv2.waitKey(0)

for c in cnts:
# to fit an ellipse, our contour must have at least 5 points
# if a contour has less than 5 points, then an ellipse cannot be fit to the rotated rectangle
# region.
if len(c) >= 5:
# fit an ellipse to the contour
ellipse = cv2.fitEllipse(c)
cv2.ellipse(clone, ellipse, (0, 255, 0), 2)
cv2.imshow("Ellipses", clone)
cv2.waitKey(0)
Step 2: Save the code as " contour_properties.py"
Step 3: Run the python script (contour_properties.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python contour_properties.py -i images/more_shapes.png
or
$ python contour_properties.py --image images/more_shapes.png
Inference:
ADVANCED CONTOUR PROPERTIES

Advanced contour properties allow us to discriminate between and recognize various shapes in
images.
Advanced contour properties: aspect ratio, extent, convex hull, and solidity.
OBJECTIVES:
We are going to build on our simple contour properties and expand them to more advanced contour
properties, including:
1. Aspect ratio
2. Extent
3. Convex hull
4. Solidity
1. ASPECT RATIO:
 The aspect ratio is simply the ratio of the image width to the image height.
Aspect ratio = image width / image height
Shapes with an aspect ratio < 1 have a height that is greater than the width — these shapes will
appear to be more ―tall‖ and elongated. For example, most digits and characters on a license plate
have an aspect ratio that is less than 1 (since most characters on a license plate are taller than they are
wide).
And shapes with an aspect ratio > 1 have a width that is greater than the height. The license plate
itself is an example of a object that will have an aspect ratio greater than 1 since the width of a physical
license plate is always greater than the height.
Finally, shapes with an aspect ratio = 1 (plus or minus some ϵ of course), have approximately the
same width and height. Squares and circles are examples of shapes that will have an aspect ratio of
approximately 1.
2. EXTENT:
The extent of a shape or contour is the ratio of the contour area to the bounding box area:
extent = shape area / bounding box area
Recall that the area of an actual shape is simply the number of pixels inside the contoured region.
On the other hand, the rectangular area of the contour is determined by its bounding box, therefore:
bounding box area = bounding box width x bounding box height
In all cases the extent will be less than 1 — this is because the number of pixels inside the contour
cannot possibly be larger the number of pixels in the bounding box of the shape.
3. CONVEX HULL:
A convex hull is almost like a mathematical rubber band.
More formally, given a set of X points in the Euclidean space, the convex hull is the smallest possible
convex set that contains these X points.
In the example image below, we can see the rubber band effect of the convex hull in action:
On the left we have our original shape. And in the center we have the convex hull of original shape.
Notice how the rubber band has been stretched to around all extreme points of the shape, but leaving
no extra space along the contour — thus the convex hull is the minimum enclosing polygon of all points
of the input shape, which can be seen on the right.
Another important aspect of the convex hull that we should discuss is the
convexity.
Convex curves are curves that appear to ―bulged out‖. If a curve is not bulged
out, then we call it a convexity defect.
The gray outline of the hand in the image above is our original shape.
The red line is the convex hull of the hand. And the black arrows, such as in
between the fingers, are where the convex hull is ―bulged in‖ rather than ―bulged
out‖.
Whenever a region is ―bulged in‖, such as in the hand image above, we call them convexity defects.
4. SOLIDITY:
The solidity of a shape is the area of the contour area divided by the area of the convex hull:
solidity = contour area / convex hull area
Again, it‘s not possible to have a solidity value greater than 1.
The number of pixels inside a shape cannot possibly outnumber the number of pixels in the convex
hull, because by definition, the convex hull is the smallest possible set of pixels enclosing the shape.
How do we put these contour properties to work for us?

Case study 1: Distinguishing between X's and O's
Case study 2: Identifying Tetris blocks
Case study 1: Distinguishing between X's and O's

Write a Python script that leverages computer vision and contour properties to recognize the X‘s and
O‘s on the board. Using this script, you could then take the output and feed it into a tic-tac-toe solver to
give you the optimal set of steps to play the game.
Let‘s get started by recognizing the X‘s and O‘s on a tic-tac-toe board.
Tic-tac-toe is a two player game.
One player is the “X‖ and the other player is the ―O‖.
Players alternate turns placing their respective X‘s and O‘s on the board,
with the goal of getting three of their symbols in a row, either horizontally,
vertically, or diagonally.
It‘s very simple game to play, common among young children who are first
learning about competitive games.
Interestingly, tic-tac-toe is a solvable game.
When played optimally, you are guaranteed at best to win, at and at worst to
draw (i.e. tie).
Case study Program 1:

import cv2
# load the tic-tac-toe image and convert it to grayscale

image = cv2.imread("images/tictactoe.png")
# find all contours on the tic-tac-toe board


# compute the area of the contour along with the bounding box to compute the aspect
# ratio
# The cv2.contourArea is not giving us the area=width x height area of the contour.
# Instead, it‘s giving us the number of pixels that reside inside the contour
# compute the convex hull of the contour, then use the area of the original contour and
# the area of the convex hull to compute the solidity
hull = cv2.convexHull(c)
hullArea = cv2.contourArea(hull)
solidity = area / float(hullArea)
# initializing char variable to indicate the character that we are looking at — in this case, we
# initialize it to be a ? indicating that the character is unknown.
char = "?"
# The letter X has four large and obvious convexity defects — one for each of the four V‘s that
# form the X. On the other hand, the O has nearly no convexity defects, and the ones that it has
# are substantially less dramatic than the letter X. Therefore, the letter O is going to have a
# larger solidity than the letter X.
# if the solidity is high, then we are examining an `O`
if solidity > 0.9:
char = "O"
# otherwise, if the solidity it still reasonably high, we are examining an `X`

elif solidity > 0.5:
char = "X"
# if the character is not unknown, draw it

if char != "?":
cv2.drawContours(image, [c], -1, (0, 255, 0), 3)
cv2.putText(image, char, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 1.25,
(0, 255, 0), 4)
# show the contour properties

print "%s (Contour #%d) -- solidity=%.2f" % (char, i + 1, solidity)

cv2.waitKey(0)
Step 2: Save the code as "tictactoe.py"
Step 3: Run the python script (tictactoe.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python tictactoe.py
Inference:
Case study 2: Identifying Tetris Blocks

Using aspect ratio, extent, convex hull, and solidity in conjunction with each other to perform our brick
identification.
The aqua piece is known as a Rectangle.

The blue and orange blocks are called L-pieces.
The yellow shape is obviously a Square.
And the green and red bricks on the bottom are called Z-pieces.
Our goal here is to extract contours from each of these shapes and then identify which shape each of
the blocks are.
Case study Program 2:

import numpy as np
import cv2
# load the Tetris block image, convert it to grayscale, and threshold the image
# to create a binary image, where the background pixels are black and the foreground pixels
# (i.e. the Tetris blocks) are white.
image = cv2.imread("images/tetris_blocks.png")
thresh = cv2.threshold(gray, 225, 255, cv2.THRESH_BINARY_INV)[1]
# show the original and thresholded images

cv2.imshow("Thresh", thresh)
# find external contours in the thresholded image and allocate a NumPy array with the same
# shape as our input image
(cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
hullImage = np.zeros(gray.shape[:2], dtype="uint8")

# compute the area of the contour along with the bounding box to compute the aspect
# ratio
# compute the aspect ratio of the contour, which is simply the width divided by the height
# of the bounding box
# the aspect ratio of a shape will be < 1 if the height is greater than the width. The
# aspect ratio will be > 1 if the width is larger than the height. And the aspect ratio will
# be approximately 1 if the width and height are equal.
# used for discriminating the square and rectangle pieces
aspectRatio = w / float(h)
# use the area of the contour and the bounding box area to compute the extent
extent = area / float(w * h)
# compute the convex hull of the contour, then use the area of the original contour and
# the area of the convex hull to compute the solidity
hull = cv2.convexHull(c)
hullArea = cv2.contourArea(hull)
solidity = area / float(hullArea)
# visualize the original contours and the convex hull and initialize the name of the shape
cv2.drawContours(hullImage, [hull], -1, 255, -1)

shape = ""
# Now that we have computed all of our contour properties, let‘s define the actual rules
# and if statements that will allow us to discriminate between the various if Tetris blocks:
# if the aspect ratio is approximately one, then the shape is a square
if aspectRatio >= 0.98 and aspectRatio <= 1.02:
shape = "SQUARE"
# if the width is 3x longer than the height, then we have a rectangle

elif aspectRatio >= 3.0:
shape = "RECTANGLE"
# if the extent is sufficiently small, then we have a L-piece

elif extent < 0.65:
shape = "L-PIECE"
# if the solidity is sufficiently large enough, then we have a Z-piece

elif solidity > 0.80:
shape = "Z-PIECE"
# draw the shape name on the image

cv2.putText(image, shape, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,(240, 0, 159), 2)
# show the contour properties

print "Contour #%d -- aspect_ratio=%.2f, extent=%.2f, solidity=%.2f" % (
i + 1, aspectRatio, extent, solidity)
# show the output images

cv2.imshow("Convex Hull", hullImage)
cv2.waitKey(0)
Step 2: Save the code as "contour_properties_2.py"
Step 3: Run the python script (contour_properties_2.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python contour_properties_2.py
Inference:
CONTOUR APPROXIMATION
Contour approximation is an algorithm for reducing the number of points in a curve with a reduced set
of points — thus, an approximation. This algorithm is commonly known as the Ramer-Douglas-Peucker
algorithm, or simply: the split-and-merge algorithm.
The general assumption of this algorithm is that a curve can be approximated by a series of short line
segments.
And we can thus approximate a given number of these line segments to reduce the number of points
it takes to construct a curve.
Overall, the resulting approximated curve consists of a subset of points that were defined by the
original curve.
OBJECTIVES:
1. Understand (at a very high level) the process of contour approximation.
2. Apply contour approximation to distinguish between circles and squares.
3. Use contour approximation to find ―documents‖ in images.
Experiment 27: Using contour approximation in OpenCV.
Program 27: From the image given below, to detect only the rectangles, while ignoring the
circles/ellipses.

import cv2
# load the circles and squares image and convert it to grayscale

image = cv2.imread("images/circles_and_squares.png")
# find contours in the image

for c in cnts:
# approximate the contour
# First, we need to compute the actual perimeter of the contoured region. And once we
# have the length of the perimeter, we can use it to approximate it by making a call to
# cv2.approxPolyDP. Here we are telling OpenCV that we want a special ε value to be
# 1% of the original contour perimeter.
# To control the level of tolerance for the approximation, we need to define a ε value. In
# practice, we define this relative to the perimeter of the shape we are examining.
# Commonly, we‘ll define as some percentage (usually between 1-5%) of the original
# contour perimeter. This because the internal contour approximation algorithm is
# looking for points to discard. The larger the ε value is, the more points will be
# discarded. Similarly, the smaller the ε value is, the more points will be kept. It‘s very
# clear that an ε value that will work well for some shapes will not work well for others
# (larger shapes versus smaller shapes, for instance). This means that we can‘t simply
# hardcode an ε value into our code — it must be computed dynamically based on the
# individual contour. Thus, we define ε relative to the perimeter length so we understand
# how large the contour region actually is. Doing this ensures that we achieve a
# consistent approximation for all shapes inside the image.
# And like I mentioned above, it‘s typical to use roughly 1-5% of the original contour
# perimeter length for a value of ε. Anything larger, and you‘ll be over-approximating
# your contour to almost a single straight line. Similarly, anything smaller and you won‘t
# be doing much of an actual approximation.
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.01 * peri, True)
# A rectangle has 4 sides. And a circle has no sides. Or, in this case, since we need to
# represent a circle as a series of points: a circle is composed of many tiny line
# segments — far more than the 4 sides that compose a rectangle. So if we approximate
# the contour and then examine the number of points within the approximated contour,
# we‘ll be able to determine if the contour is a square or not. Once we have the
# approximated contour, we check the len (i.e. the length, or number of entries in the list)
# to see how many vertices (i.e. points) our approximated contour has. If our
# approximated contour has a four vertices, we can thus mark it as a rectangle.
if len(approx) == 4:
# draw the outline of the contour and draw the text on the image
(x, y, w, h) = cv2.boundingRect(approx)
cv2.putText(image, "Rectangle", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX,
0.5, (0, 255, 255), 2)
cv2.waitKey(0)
Step 2: Save the code as "approx_simple.py"
Step 3: Run the python script (approx_simple.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python approx_simple.py
Inference:
Case study 3: Contour approximation to an actual real world problem

Our goal is to utilize contour approximation to find the sales receipt in the following image
As you can see from the image above, our receipt is not exactly laying flat.
It has some folds and wrinkles in it. So it‘s certainly not a perfect rectangle.
Which leads us to the question:
If the receipt is not a perfect rectangle, how are we going to find the actual
receipt in the image?
A receipt looks like a rectangle, after all — even though it‘s not a perfect
rectangle. So if we apply contour approximation and look for rectangle-like
regions,
Case study program 3: Using contour approximation in OpenCV.

# import the necessary images
import cv2
# load the receipt image, convert it to grayscale, and detect edges

image = cv2.imread("images/receipt.png")
edged = cv2.Canny(gray, 75, 200)
# show the original image and edged map

cv2.imshow("Edge Map", edged)
# we need to discard all this noise and find only the receipt outline?
# It is a two-step process. The first step is to sort the contours by their size, keeping only the largest
# ones and the second step is to apply contour approximation.
# find contours in the image and sort them from largest to smallest, keeping only the largest ones
# we have only the 7 largest contours in the image
(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:7]

for c in cnts:
# approximate the contour and initialize the contour color
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.01 * peri, True)
# show the difference in number of vertices between the original and approximated contours
print "original: {}, approx: {}".format(len(c), len(approx))
# if the approximated contour has 4 vertices, then we have found our rectangle
if len(approx) == 4:
# draw the outline on the image
cv2.drawContours(image, [approx], -1, (0, 255, 0), 2)

cv2.waitKey(0)
Step 2: Save the code as "approx_realworld.py"
Step 3: Run the python script (approx_realworld.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python approx_realworld.py
Inference:
The original receipt contour had over 279 points prior to approximation — that original shape was by no
means a rectangle! However, by applying contour approximation we were able to sift through all the
noise and reduce those 279 points down to 4 points. And since our 4 points formed a rectangle, we can
thus label the region as our receipt.
SORTING CONTOURS
OpenCV does not provide a built-in function or method to perform the actual sorting of contours.
OBJECTIVES:
1. Sort contours according to their size/area, along with a template to follow to sort contours by any
other arbitrary criteria.
2. Sort contoured regions from left-to-right, right-to-left, top-to-bottom, and bottom-to-top using only a
single function.
Experiment 28: Sorting contours in OpenCV.
Program 28:

import numpy as np
import argparse
import cv2
# Defining our sort_contours function which will enable us to sort our contours.
# Function takes two arguments. The first is cnts, the list of contours that the we want to sort,
# The second is the sorting method, which indicates the direction in which we are going to sort
# our contours (i.e. left-to-right, top-to-bottom, etc.).
def sort_contours(cnts, method="left-to-right"):
# initialize the reverse flag and sort index
# These variables simply indicate the sorting order (ascending or descending) and the
# index of the bounding box we are going to use to perform the sort
# If we are sorting from right-to-left or bottom-to-top, we‘ll need to sort in descending

# order, according to the location of the contour in the image
reverse = False
i=0
# handle if we need to sort in reverse

if method == "right-to-left" or method == "bottom-to-top":
reverse = True
# handle if we are sorting against the y-coordinate rather than the x-coordinate of the
# bounding box
if method == "top-to-bottom" or method == "bottom-to-top":
i=1
# construct the list of bounding boxes and sort them from top to bottom
# first compute the bounding boxes of each contour, which is simply the starting (x, y)
# coordinates of the bounding box followed by the width and height
# The boundingBoxes enable us to sort the actual contours. Using this code we are able
# to sort both the contours and bounding boxes.
boundingBoxes = [cv2.boundingRect(c) for c in cnts]
(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
key=lambda b:b[1][i], reverse=reverse))
# return the list of sorted contours and bounding boxes

return (cnts, boundingBoxes)
# helper function to draw contour ID numbers on our actual image

def draw_contour(image, c, i):
# compute the center of the contour area and draw a circle representing the center
M = cv2.moments(c)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
# draw the contour number on the image

cv2.putText(image, "#{}".format(i + 1), (cX - 20, cY), cv2.FONT_HERSHEY_SIMPLEX,
1.0, (255, 255, 255), 2)
# return the image with the contour number drawn on it

return image

ap.add_argument("-i", "--image", required=True, help="Path to the input image")
ap.add_argument("-m", "--method", required=True, help="Sorting method")

# load the image and initialize the accumulated edge image

accumEdged = np.zeros(image.shape[:2], dtype="uint8")
# loop over the blue, green, and red channels, respectively

for chan in cv2.split(image):
# blur the channel (to remove high frequency noise), extract edges from it, and
# accumulate the set of edges for the image
chan = cv2.medianBlur(chan, 11)
edged = cv2.Canny(chan, 50, 200)
accumEdged = cv2.bitwise_or(accumEdged, edged)
# show the accumulated edge map

cv2.imshow("Edge Map", accumEdged)
# find contours in the accumulated image, keeping only the largest ones
# to sort them according to their size by using a combination of the Python sorted function and
# the cv2.contourArea method — this allows us to sort our contours according to their area (i.e.
# size) from largest to smallest.
(cnts, _) = cv2.findContours(accumEdged.copy(), cv2.RETR_EXTERNAL,
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:5]
orig = image.copy()
# loop over the (unsorted) contours and draw them

orig = draw_contour(orig, c, i)
# show the original, unsorted contour image

cv2.imshow("Unsorted", orig)
# sort the contours according to the provided method

(cnts, boundingBoxes) = sort_contours(cnts, method=args["method"])
# loop over the (now sorted) contours and draw them

draw_contour(image, c, i)

cv2.imshow("Sorted", image)
cv2.waitKey(0)
Step 2: Save the code as "sort_contours.py"
Step 3: Run the python script (sort_contours.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python sort_contours.py --image images/lego_blocks_1.png --method "top-to-bottom"
Inference:
LESSON 1.13: HISTOGRAMS

Histograms are prevalent in nearly every aspect of computer vision.
We use grayscale histograms for thresholding.
We use histograms for white balancing.
We use color histograms for object tracking in images, such as with the CamShift algorithm.
We use color histograms as features — include color histograms in multiple dimensions.
And in an abstract sense, we use histograms of image gradients to form the HOG and SIFT
descriptors.
Even the extremely popular bag-of-visual-words representation used in image search engines and
machine learning is a histogram as well.
So why are histograms so useful?
Because histograms captures the frequency distribution of a set of data. And it turns out that
examining these frequency distributions is a very nice way to build simple image processing techniques
— along with very powerful machine learning algorithms.
OBJECTIVES:
1. What is a histogram?
2. How to compute a histogram in OpenCV.
3. How to compute a grayscale histogram of an image.
4. Write code to extract a ―flattened‖ RGB histogram from an image.
5. Extract multi-dimensional color histograms from an image.
What is a histogram?
A histogram represents the distribution of pixel intensities (whether color or grayscale) in an image.
It can be visualized as a graph (or plot) that gives a high-level intuition of the intensity (pixel value)
distribution.
We are going to assume a RGB color space in this example, so these pixel values will be in the range
of 0 to 255.
When plotting the histogram, the X-axis serves as our ―bins‖.
If we construct a histogram with 256 bins, then we are effectively counting the number of times each
pixel value occurs.
In contrast, if we use only 2 (equally spaced) bins, then we are counting the number of times a pixel is
in the range [0, 128] or [128,255].
The number of pixels binned to the x-axis value is then plotted on the y-axis.
In the figure given below, we have plotted a histogram with 256-bins along the x-axis and the
percentage of pixels falling into the given bins along the y-axis.
Examining the histogram, note that there are three primary peaks.
The first peak in the histogram is around x=20 where we see a sharp spike in the number of pixels,
clearly there is some sort of object in the image that has a very dark value.
We then see a much slower rising peak in the histogram, where we start to ascend around x=50 and
finally end the descent around x=120. This region probably refers to a background region of the image.
Finally, we see there is a very large number of pixels in the range x=220 to x=245. It‘s hard to say
exactly what this region is, but it must dominate a large portion of the image.
By simply examining the histogram of an image, you get a general understanding regarding the
contrast, brightness, and intensity distribution.
Using OpenCV to compute histograms:

We will be using the cv2.calcHist function to build our histograms.
cv2.calcHist(images, channels, mask, histSize, ranges)
 images: This is the image that we want to compute a histogram for.
 channels: A list of indexes, where we specify the index of the channel we want to compute a
histogram for. To compute a histogram of a grayscale image, the list would be [0]. To compute a
histogram for all three red, green, and blue channels, the channels list would be [0, 1, 2].
 mask: If a mask is provided, a histogram will be computed for masked pixels only. If we do not have a
mask or do not want to apply one, we can just provide a value of None.
 histSize: This is the number of bins we want to use when computing a histogram. Again, this is a list,
one for each channel we are computing a histogram for. The bin sizes do not all have to be the same.
Here is an example of 32 bins for each channel: [32, 32, 32].
 ranges: The range of possible pixel values. Normally, this is [0, 256] (this is not a typo — the ending
range of the cv2.calcHist function is non-inclusive so you‘ll want to provide a value of 256 rather than
255) for each channel, but if you are using a color space other than RGB [such as HSV], the ranges
might be different.)
Experiment 29: Grayscale histogram.
Program 29:

from matplotlib import pyplot as plt
import argparse
import cv2
# Construct the argument parser and parse the arguments

# load the image, convert it to grayscale, and show it

# construct a grayscale histogram

# A grayscale image has only one channel, so we have a value of [0] for channels .
# We don‘t have a mask, so we set the mask value to None.
# We will use 256 bins in our histogram, and the possible values range from 0 to 255.
hist = cv2.calcHist([image], [0], None, [256], [0, 256])
# plot the histogram

plt.figure()
plt.title("Grayscale Histogram")
plt.xlabel("Bins")
plt.ylabel("# of Pixels")
plt.plot(hist)
plt.xlim([0, 256])
# normalize the histogram, simply dividing the raw frequency counts for each bin of the
# histogram by the sum of the counts, this leaves us with the percentage of each bin rather than
# the raw count of each bin.
hist /= hist.sum()
# plot the normalized histogram

plt.figure()
plt.title("Grayscale Histogram (Normalized)")
plt.xlabel("Bins")
plt.ylabel("% of Pixels")
plt.plot(hist)
plt.xlim([0, 256])
plt.show()
Step 2: Save the code as "grayscale_histogram.py"
Step 3: Run the python script (grayscale_histogram.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python grayscale_histogram.py -i grayscale-histogram_total_pixels.jpg
Inference:
Experiment 30: Color histogram.
Program 30:

import argparse
import cv2


# grab the image channels, initialize the tuple of colors and the figure
# OpenCV reverses this order to BGR
# We then initialize a tuple of strings representing the colors.

chans = cv2.split(image)
colors = ("b", "g", "r")
plt.figure()
plt.title("'Flattened' Color Histogram")
plt.xlabel("Bins")
# loop over the image channels

# we start looping over each of the channels in the image.
# Then, for each channel we compute a histogram
for (chan, color) in zip(chans, colors):
# create a histogram for the current channel and plot it
hist = cv2.calcHist([chan], [0], None, [256], [0, 256])
plt.plot(hist, color = color)
plt.xlim([0, 256])
# Now we move on to multi-dimensional histograms and take into consideration two channels at
# a time. For example, ―How many pixels have a Red value of 10 AND a Blue value of 30?‖
# ―How many pixels have a Green value of 200 AND a Red value of 130?‖ By using the
# conjunctive AND, we are able to construct multi-dimensional histograms.
# let's move on to 2D histograms -- reduce the number of bins in the histogram from 256 to 32
fig = plt.figure()
# plot a 2D color histogram for green and blue

# if we used 256 bins for each dimension in a 2D histogram, our resulting histogram would have
# 65,536 separate pixel counts. Not only is this wasteful of resources, it‘s not practical. Most
# applications use somewhere between 8 and 64 bins when computing multi-dimensional
# histograms. We are using 32 bins instead of 256.
# In cv2.calcHist function, we are passing in a list of two channels: the Green and Blue.
ax = fig.add_subplot(131)
hist = cv2.calcHist([chans[1], chans[0]], [0, 1], None, [32, 32], [0, 256, 0, 256])
p = ax.imshow(hist, interpolation="nearest")
ax.set_title("2D Color Histogram for G and B")
plt.colorbar(p)
# plot a 2D color histogram for green and red

ax.set_title("2D Color Histogram for G and R")
plt.colorbar(p)
# plot a 2D color histogram for blue and red

ax.set_title("2D Color Histogram for B and R")
plt.colorbar(p)
# finally, let's examine the dimensionality of one of the 2D histograms

print "2D histogram shape: %s, with %d values" % (hist.shape, hist.flatten().shape[0])
# our 2D histogram could only take into account 2 out of the 3 channels in the image so now let's
# build a 3D color histogram (utilizing all channels) with 8 bins in each direction -- we can't plot
# the 3D histogram, but the theory is exactly like that of a 2D histogram, so we'll just show the
# shape of the histogram
hist = cv2.calcHist([image], [0, 1, 2], None, [8, 8, 8], [0, 256, 0, 256, 0, 256])
print "3D histogram shape: %s, with %d values" % (hist.shape, hist.flatten().shape[0])
# Show our plots

plt.show()
Step 2: Save the code as "color_histograms.py"
Step 3: Run the python script (color_histograms.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python color_histograms.py -i color_histograms_flattened.jpg
Inference:
HISTOGRAM EQUALIZATION
Histogram equalization improves the contrast of an image by ―stretching‖ the distribution of pixels.
Consider a histogram with a large peak at the center of it.
Applying histogram equalization will stretch the peak out towards the corner of the image, thus
improving the global contrast of the image.
Histogram equalization is applied to grayscale images.
This method is useful when an image contains foregrounds and backgrounds that are both dark or
both light.
It tends to produce unrealistic effects in photographs; however, is normally useful when enhancing
the contrast of medical or satellite images.
Experiment 31: Histogram Equalization.
Program 31:

import argparse
import cv2


# apply histogram equalization to stretch the contrast of our image

eq = cv2.equalizeHist(image)
# show our images -- notice how the contrast of the second image has been stretched
cv2.imshow("Histogram Equalization", eq)
cv2.waitKey(0)
Step 2: Save the code as "equalize.py"
Step 3: Run the python script (equalize.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python equalize.py --image histogram_equalization.jpg
Inference:
HISTOGRAMS AND MASKS:

Masks can be used to focus on only regions of an image that interest us.
We are now going to construct a mask and compute color histograms for only the masked region.
First, we need to define a convenience function to plot our histograms and save us from writing
repetitive lines of code.
Experiment 32: Histogram and Masks.
Program 32:

import numpy as np
import cv2
# The mask defaults to None if we do not have a mask for the image.
def plot_histogram(image, title, mask=None):
# grab the image channels, initialize the tuple of colors and the figure
chans = cv2.split(image)
colors = ("b", "g", "r")
plt.figure()
plt.title(title)
plt.xlabel("Bins")
# loop over the image channels

for (chan, color) in zip(chans, colors):
# create a histogram for the current channel and plot it
hist = cv2.calcHist([chan], [0], mask, [256], [0, 256])
plt.plot(hist, color=color)
plt.xlim([0, 256])
# load the beach image and plot a histogram for it

image = cv2.imread("beach.png")
plot_histogram(image, "Histogram for Original Image")
# construct a mask for our image -- our mask will be BLACK for regions to IGNORE and WHITE
# for regions to EXAMINE
# We define our as a NumPy array, with the same width and height as our beach image.
# Then draw a white rectangle starting from point (60, 210) to point (290, 390).
# This rectangle will serve as our mask — only pixels in our original image belonging to the
# masked region will be considered in the histogram computation.
cv2.rectangle(mask, (60, 290), (210, 390), 255, -1)
# what does masking our image look like?

# To visualize our mask, we apply a bitwise AND to the beach image.
cv2.imshow("Applying the Mask", masked)
# compute a histogram for our image, but we'll only include pixels in the masked region
plot_histogram(image, "Histogram for Masked Image", mask=mask)
# show our plots

plt.show()
Step 2: Save the code as " histogram_with_mask.py"
Step 3: Run the python script (histogram_with_mask.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python histogram_with_mask.py
Inference:
For the masked image, most red pixels fall in the range [10, 25], indicating that red pixels contribute
very little to our image. This makes sense, since our ocean and sky are blue. Green pixels are then
present, but these are toward the lighter end of the distribution, which corresponds to the green foliage
and trees. Finally, our blue pixels fall in the brighter range and are obviously our blue ocean and sky.
Most importantly, compare our masked color histograms to the unmasked color histograms. Notice how
dramatically different the color histograms are. By utilizing masks, we are able to apply our computation
only to the specific regions of the image that interest us — in this example, we simply wanted to
examine the distribution of the blue sky and ocean.
LESSON 1.14: CONNECTED-COMPONENT LABELING

Connected-component labeling (also known as connected-component analysis, blob extraction, or
region labeling) is an algorithmic application of graph theory that is used to determine the connectivity
of ―blob‖-like regions in a binary image.
We often use connected-component analysis in the same situations that contours are used; however,
connected-component labeling can often give us a more granular filtering of the blobs in a binary
image.
When using contour analysis, we are often restricted by the hierarchy of the outlines (i.e. one contour
contained within another), but with connected-component analysis we can more easily segment and
analyze these structures.
Once we have extracted the blob using connected-component labeling, we can still apply contour
properties to quantify the region.
A great example usage of connected-component analysis is to compute the connected-components
of a binary (i.e. threshold) license plate image and filter the blobs based on their properties, such as
width, height, area, solidity etc.
OBJECTIVES:
1. Review the classical two-pass algorithm used for connected-component analysis.
2. Apply connected-component analysis to detect characters and blobs in a license plate image.
THE CLASSICAL APPROACH:

The classical connected-component analysis was introduced by Rosenfeld and Pfaltz in their 1966
article.
It‘s important to note that we only apply connected-component analysis to binary or threshold images.
If presented with an RGB or grayscale image, we first need to threshold it based on some criterion in
a manner that can segment the background from the foreground, leaving us with ―blobs‖ in the image
that we can examine.
Once we have obtained the binary version of the image, we can proceed to analyze the components.
The actual algorithm consists of two passes.
In the first pass, the algorithm loops over each individual
pixel. For each center pixel p, the west and north pixels are
checked. This type of check is called 4-connectivity (left).
Based on the west and north pixel labels, a label is
assigned to the current center pixel p.
You might be wondering why only two pixels are being
checked if we want to check the pixels surrounding p for 4-
connectivity.
The reason is because we are looping over each pixel individually and always checking the west and
north pixels.
By repeating this process over the entire image, one row at a time, each pixel will actually be checked
for 4-connectivity.
8-connectivity can also be performed by checking the west, north-west, north, and north-east pixels
(right).
Then, in the second pass, the connected-component analysis algorithm loops over the labels
generated from the first pass and merges any regions together that share connected labels.
THE FIRST PASS:

In the first pass of our connected-component analysis algorithm, every pixel is checked.
For the sake of this example, we‘ll use 4-connectivity (but we could just as easily use 8-connectivity)
and check the west and north pixels of the central pixel p:
Step1:
The first step is to check if we care about the central pixel p or not:
1. If the central pixel is a background pixel (normally a value of 0, indicating black),
we ignore it and move to the next pixel.
2. If it is a foreground pixel, or if we have moved to a pixel that is in the foreground,
we proceed to Steps 2 and 3.
Steps 2 and 3:
If we have reached this step, then we must be examining a foreground pixel, so we
grab the north and west pixels, denoted as N and W, respectively:
Now that we have N and W, there are two possible situations:
1. Both N and W are background pixels, so there are no labels associated with these
pixels. In this case, create a new label (normally by incrementing a unique label counter) and store the
label value in N and W. Then move on to Steps 4 and 5.
2. N and/or W are not background pixels. If this is the case, we can proceed to Steps 4 and 5, since at
least one pixel already has a label associated with it.
Steps 4 and 5:
All we need to do is set the center pixel p by taking the minimum of the label value: p=min(N,W)
Step 6:
Suppose that, in the following figure, the north pixel has label X and the west pixel has label Y:
Even though these pixels have two separate labels, we know
they are actually connected and part of the same blob.
To indicate that the X and Y labels are part of the same
component, we can leverage the union-find data structure to
indicate that X is a child of Y.
We‘ll insert a node in our union-find structure to indicate that X
is a child of Y and that the pixels are actually connected even though they have different label values.
The second pass of our connected-components algorithm will leverage the union-find structure to
connect any blobs that have different labels but are actually part of the same blob.
Step 7:
Continue to the next pixel and go repeat the process beginning with Step 1.
THE SECOND PASS:

The second pass of the connected-components labeling algorithm is much simpler than the first one.
We start off by looping over the image once again, one pixel at a time.
For each pixel, we check if the label of the current pixel is a root (i.e. top of the tree) in the union-find
data
structure.
If so, then we can proceed on to the next step — the label of the current pixel already has the
smallest possible value based on how it is connected to its neighbors.
Otherwise, we follow the tree until we reach a root in the structure. Once we have reached a root, we
assign the value at the root to the current pixel:
By applying this second pass, we can connect blobs with
different label values but that are actually part of the same blob.
The key to efficiency is to use the union-find data structure for
tree-traversal when examining label values.
APPLYING CONNECTED-COMPONENT ANALYSIS TO LICENSE PLATE IMAGES:

Oddly enough, being the de facto computer vision library, you would think that OpenCV has an easy
way to perform connected-component analysis — unfortunately it does not.
Luckily, we have the scikit-image (http://scikit-image.org/) library which comes with a dead-simple
method to perform connected component labeling.
Even if OpenCV had a connected-component analysis function, I don‘t think it would be as
straightforward and easy to use as the one provided by scikit-image.
Let‘s start by taking a look at the problem we are going to be solving using connected-component
labeling:
On the left, you can see an image of a license plate, and on the right, we can see the threshold binary
image of the license plate.
Our goal is to use connected-component analysis to label each of the white ―blobs‖ in the license
plate and then analyze each of these blobs to determine which regions are license plate characters and
which ones can be discarded.
Experiment 33: Connected component labeling in OpenCV.
Program 33:

# the measure module contains our connected-component analysis method
from __future__ import print_function
from skimage.filters import threshold_adaptive
from skimage import measure
import numpy as np
import cv2
# load the license plate image from disk

plate = cv2.imread("license_plate.png")
# extract the Value component from the HSV color space and apply adaptive thresholding to
# reveal the characters on the license plate
V = cv2.split(cv2.cvtColor(plate, cv2.COLOR_BGR2HSV))[2]
thresh = threshold_adaptive(V, 29, offset=15).astype("uint8") * 255
thresh = cv2.bitwise_not(thresh)
# show the images

cv2.imshow("License Plate", plate)
# perform connected component analysis on the thresholded images and initialize the mask to
# hold only the "large" components we are interested in
# we make a call to the label method of measure, which performs our actual connected-
# component labeling. The label method requires a single argument, which is our binary thresh
# image that we want to extract connected-components from. We‘ll also supply neighbors=8 to
# indicate we want to perform connected-component analysis with 8-connectivity. Finally, the
# optional background parameter indicates that all pixels with a value of 0 should be considered
# background and ignored by the label method.
# The label method returns labels, a NumPy array with the same dimension as our thresh
# image. Each (x, y)-coordinate inside labels is either 0 (indicating that the pixel is background
# and can be ignored) or a value > 0, which indicates that it is part of a connected-component.
# Each unique connected-component in the image has a unique label inside .
labels = measure.label(thresh, neighbors=8, background=0)
mask = np.zeros(thresh.shape, dtype="uint8")
print("[INFO] found {} blobs".format(len(np.unique(labels))))
# Now that we have the labels, we can loop over them individually and analyze each one to
# determine if it is a license plate character or not.
# loop over the unique components
for (i, label) in enumerate(np.unique(labels)):
# if this is the background label, ignore it
if label == 0:
print("[INFO] label: 0 (background)")
continue
# otherwise, construct the label mask to display only connected components for the
# current label
# However, in the case we are examining a foreground label, we construct a labelMask
# with the same dimensions as our thresh image. We then set all (x, y)-coordinates in
# labelMask that belong to the current label in labels to white — here, we are simply
# drawing the current blob on the labelMask image.
# At Last, we need to determine if the current blob is a license plate character or not. For
# this particular problem, this filtering is actually quite simple — all we need to do is use
# the cv2.countNonZero to count the number of non-zero pixels in the labelMask and
# then make a check to see if numPixels falls inside an acceptable range to ensure that
# the blob is neither too small nor too big. Provided that numPixels passes this test, we
# accept the blob as being a license plate character.
print("[INFO] label: {} (foreground)".format(i))
labelMask = np.zeros(thresh.shape, dtype="uint8")
labelMask[labels == label] = 255
numPixels = cv2.countNonZero(labelMask)
# if the number of pixels in the component is sufficiently large, add it to our mask of
# "large" blobs
if numPixels > 300 and numPixels < 1500:
mask = cv2.add(mask, labelMask)
# show the label mask

cv2.imshow("Label", labelMask)
cv2.waitKey(0)
# show the large components in the image

cv2.imshow("Large Blobs", mask)
cv2.waitKey(0)
Step 2: Save the code as " connected_components_labeling.py"
Step 3: Run the python script (connected_components_labeling.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python connected_components_labeling.py
Inference:
Note: In versions of scikit-image <= 0.11.X, the background label was originally -1. However, in newer
versions of scikit-image (such as >= 0.12.X), the background label is 0. Make sure you check which
version of scikit-image you are using and update the code to use the correct background label as this
can affect the output of the script.
CHAPTER 2: IMAGE DESCRIPTOR
LESSON 2.0: INTRODUCTION
How to quantify and abstractly represent an image using only a list of numbers?
The process of quantifying an image is called feature extraction.
The process of feature extraction governs the rules, algorithms, and methodologies we use to
abstractly quantify the contents of an image using only a list of numbers, called a feature vector.
Normally real, integer, or binary valued.
Image descriptors and feature descriptors govern how an image is abstracted and quantified, while
feature vectors are the output of descriptors and used to quantify the image. Taken as a whole, this
process is called feature extraction.
Reasons to extract the features from the image are:
1. to compare the images for similarity;
2. to rank images in search results when building an image search engine;
3. to use when training an image classifier to recognize the contents of an image.
OBJECTIVES:
To learn about:
1. Feature vector
2. Image descriptor
3. Feature descriptor
FEATURE VECTOR
Feature vectors are used to represent a variety of properties of an image, such as the shape, color, or
texture of an object in an image. They can also combine various properties.
A feature vector could jointly represent shape and color. Or it could represent texture and shape. Or it
could represent all three!
The general process of extracting a feature vector from an image is shown below:
Both image descriptors and feature descriptors output feature vectors.

Given an N x M pixel image, we input it to our image descriptor, and out pops a d-dimensional feature
out at the end of the image descriptor.
The value of d is the length, or the number of entries inside the list. For example, a feature vector
with 128 entries is called 128-dimensional, or simply 128-d for short.
The algorithms and methodologies used to extract feature vectors are called image descriptors
and feature descriptors
IMAGE DESCRIPTOR:
An image descriptor is an algorithm and methodology that governs how an input image is globally
quantified and returns a feature vector abstractly representing the image contents.
Global — this implies that we will be examining the entire image to compute the feature vector.
Apply Image
[0.51, 0.42, 0.96, ....]
Descriptor
Examples of image descriptors are color channel statistics, color histograms and Local Binary
Patterns, etc.,.
One of the primary benefits of image descriptors is that they tend to be much simpler than feature
descriptors.
The feature vectors derived from image descriptors can be immediately passed down to the classifier
to recognize the contents of an image or building an image search engine.
Image descriptors are not robust to changes in rotation, translation and viewpoints.
FEATURE DESCRIPTORS:
Feature descriptor is an algorithm and methodology that governs how an input region of an image is
locally quantified.
A feature descriptor accepts a single input image and returns multiple feature vectors.
Examples of feature descriptors are SIFT, SURF, ORB, BRISK, BRIEF, and FREAK.
Feature descriptors tend to be much more powerful than our basic image descriptors since they take
into account the locality of regions in an image and describe them in separately.
As you‘ll see later in this section, feature descriptors also tend to be much more robust to changes in
the input image, such as rotation, translation, orientation (i.e. rotation), and changes in viewpoint.
 In most cases the feature vectors extracted using feature descriptors are not directly applicable to
building an image search engine or constructing an image classifier in their current state (the exception
being keypoint matching/spatial verification, which we detail when identifying the covers of books).
This is because each image is now represented by multiple feature vectors rather than just one.
To remedy this problem, we construct a bag-of-visual-words, which takes all the feature vectors of
an image and constructs a histogram, counting the number of times similar feature vectors occur in an
image.
LESSON 2.1: IMAGE DESCRIPTOR-COLOR CHANNEL STATISTICS
OBJECTIVES:
1. Learn how to extract color channel statistic feature vectors from images.
2. Apply color channel statistics and the Euclidean distance to rank images for similarity.
COLOR CHANNEL STATISTICS:

Compute mean and standard deviation for each channel of an image, to quantify and represent the
color distribution of an image.
Therefore, if two images have similar mean and standard deviations, we can assume that these
images have similar color distributions:
The color channel image descriptor can be broken down into three steps:
Step 1: Separate the input image into its respective channels. For an RGB image, we want to examine
each of the Red, Green, and Blue channels, independently.
Step 2: Compute various statistics for each channel, such as mean, standard deviation, skew, and
kurtosis.
Step 3: Concatenate the statistics together to form a ―list‖ of statistics for each color channel — this
becomes our feature vector.
Experiment 34: Color channel statistics
Program 34:

# the distance sub-module of SciPy contains many distance metrics and similarity functions that
# we can use to compute the distance/similarity between two feature vectors
# In this particular example we‘ll be using the Euclidean distance, which is pretty much the de
# facto standard when it comes to computing the distance between two points in a Euclidean
# space. Given two input vectors, p and q:
# the Euclidean distance simply takes the sum of squared difference between each entry in
# the p and q vectors, and finally takes the square-root of this sum.
# A larger Euclidean distance implies that the two points are farther away from each other in a
# Euclidean space. A smaller Euclidean distance implies that the two points are closer
# together in a Euclidean space, with a distance of 0 implying that the points are identical.
from scipy.spatial import distance as dist
from imutils import paths
import numpy as np
import cv2
# grab the list of image paths from our "dinos" directory.

# The "dinos" directory contains the four images of the T-Rex
# initialize the index to store the image filename and feature vector
# Python dictionary (basically a Hash Table) called index .
# It‘s very common to use dictionaries/Hash Tables when extracting features from images.
# This is because each input image is unique; therefore, we can use a unique key (such as the
# filename or UUID) as the key to our dictionary.
# As for the value of the dictionary, that will simply be our feature vector.
# Again, by using a dictionary data structure we are able to use the (unique) image filename as
# the key and the feature vector extracted from the image as the value.
imagePaths = sorted(list(paths.list_images("dinos")))
index = {}
# loop over the image paths

for imagePath in imagePaths:
# load the image and extract the filename
image = cv2.imread(imagePath)
filename = imagePath[imagePath.rfind("/") + 1:]
# extract the mean and standard deviation from each channel of the BGR image, then
# update the index with the feature vector
# In this case, our feature vector consists of the means and standard deviations,
# allowing us to characterize the color distribution of our images.
(means, stds) = cv2.meanStdDev(image)

features = np.concatenate([means, stds]).flatten()
index[filename] = features
# display the query image and grab the sorted keys of the index dictionary
# we‘ll be using the trex_01.png image as our query image — all other images in our dataset,
# (i.e. trex_02.png , trex_03.png , and trex_04.png ) will be compared to trex_01.png .
query = cv2.imread(imagePaths[0])
cv2.imshow("Query (trex_01.png)", query)
keys = sorted(index.keys())
# loop over the filenames in the dictionary

for (i, k) in enumerate(keys):
# if this is the query image, ignore it
# If the current image in the loop is our query image, we simply ignore it and continue looping.
if k == "trex_01.png":
continue
# load the current image and compute the Euclidean distance between the query image (i.e. the
# 1st image) and the current image
# the dist.euclidean function to compute the Euclidean distance between the query image
# feature vector and the feature vectors in our dataset. As I mentioned above, similar images
# will have a smaller Euclidean distance, whereas less similar images will have
#a larger Euclidean distance.
image = cv2.imread(imagePaths[i])
d = dist.euclidean(index["trex_01.png"], index[k])
# display the distance between the query image and the current image
cv2.putText(image, "%.2f" % (d), (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
0.75, (0, 255, 0), 2)
cv2.imshow(k, image)
# wait for a keypress

cv2.waitKey(0)
Step 2: Save the code as " color_channel_stats.py"
Step 3: Run the python script (color_channel_stats.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python color_channel_stats.py
Inference:
LESSON 2.2: IMAGE DESCRIPTOR-COLOR HISTOGRAMS

Unlike the mean and standard deviation which attempt to summarize the pixel intensity distribution, a
color histogram explicitly represents it.
 In fact, a color histogram is the color distribution.
Assumption that images with similar color distributions contain equally similar visual contents.
In this example, we‘re going to take small dataset of images — but instead of ranking, we are going
to cluster and group them into two distinct classes using color histograms.
OBJECTIVES:
1. Learn how histograms can be used as image descriptors.
2. Apply k-means clustering to cluster color histogram features.
COLOR HISTOGRAMS:
Color histogram counts the number of times a given pixel intensity occurs in an image.
Using a color histogram we can express the actual distribution or ―amount‖ of each color in an image.
The counts for each color/color range are used as feature vectors.
 If we decided to utilize a 3D color histogram with 8 bins per channel, we could represent any
image of any size using only 8 x 8 x 8 = 512 bins, or a feature vector of 512-d.
The size of an image has no effect on our output color histogram — although it‘s wise to resize large
images to more manageable dimension to increase the speed of the histogram computation.
k-means is a clustering algorithm.
k-means is to partition n data points into k clusters.
Each of the n data points will be assigned to a cluster with the nearest mean.
The mean of each cluster is called its ―centroid‖ or ―center‖.
Applying k-means yields k separate clusters of the original n data
points.
Data points inside a particular cluster are considered to be ―more
similar‖ to each other than data points that belong to other clusters.
In this particular program, we will be clustering the color histograms
extracted from the images in our dataset — but in reality, you could be
clustering any type of feature vector.
Histograms that belong to a given cluster will be more similar in color distribution than
histograms belonging to a separate cluster.
One caveat of k-means is that we need to specify the number of clusters we want to generate ahead
of time.
There are algorithms that automatically select the optimal value of k.
For the time being, we‘ll be manually supplying a value of k=2 to separate the two classes of images.
Experiment 35: Color Histogram
Program 35: Before we can cluster vacation photo dataset into two distinct groups, we first need to
extract color histograms from each of the 10 images in the dataset. With that in mind, let‘s go ahead
and define the directory structure of this project:
|--- example
| |--- __init__.py
| |--- descriptors
| | |---- __init__.py
| | |--- labhistogram.py
|--- cluster_histograms.py
First we‘ll be defining our image descriptor inside the descriptors sub-module of the example
package. And inside the descriptors sub-module, we‘ll create a LabHistogram class to extract color
histograms from images in the L*a*b* color space:
# Save it as labhistogram.py
# Define image descriptors as classes rather than functions
import cv2
class LabHistogram:
def __init__(self, bins):
# store the number of bins for the histogram
self.bins = bins
def describe(self, image, mask=None):

# convert the image to the L*a*b* color space, compute a 3D histogram,
# and normalize it
# the Euclidean distance between two colors in the L*a*b*
# has perceptual and noticeable meaning. And since the k-means clustering
# algorithm assumes a Euclidean space, we will get better clusters by using the
# L*a*b* color space than RGB or HSV.
# If we did not normalize, then images with the exact same contents but different
# sizes would have dramatically different histograms.
# Instead, by normalizing our histogram we ensure that the width and height of
# our input image has no effect on the output histogram.
hist = cv2.calcHist([lab], [0, 1, 2], mask, self.bins,
[0, 256, 0, 256, 0, 256])
hist = cv2.normalize(hist).flatten()
# return the histogram

return hist

from example.descriptors.labhistogram import LabHistogram
from sklearn.cluster import KMeans
import numpy as np
import argparse
import cv2
def describe(image, mask=None):

# convert the image to the L*a*b* color space, compute a histogram, and normalize it
hist = cv2.calcHist([lab], [0, 1, 2], mask, [8,8,8],[0, 256, 0, 256, 0, 256])
hist = cv2.normalize(hist).flatten()
# return the histogram
return hist

# --dataset : This is the path to the directory containing photos that we are going to cluster.
# --clusters : As I mentioned above, we need to supply the value of k — the number of clusters
# to generate — to the k-means algorithm before we can actually cluster our images.
# In this case, we‘ll default k=2 since we are only trying to separate images into two separate
# groups.
ap.add_argument("-d", "--dataset", required=True, help="path to the input dataset directory")
ap.add_argument("-k", "--clusters", type=int, default=2,help="# of clusters to generate")
# initialize the image descriptor along with the image matrix

# instantiate our LabHistogram image descriptor, indicating that we are utilizing 8 bins per L*,
# a*, and b* channels respectively in our 3D histogram. Using 8 bins per channel will yield us a
# feature vector of 8 x 8 x 8 = 512-d.
# initialize a list, data, to store the color histograms extracted from our image. Unlike the
# previous lesson on color channel statistics, we do not need a dictionary datatype since we are
# not comparing and ranking images — just clustering and grouping them together.
desc = LabHistogram([8, 8, 8])
data = []
# grab the image paths from the dataset directory

imagePaths = list(paths.list_images(args["dataset"]))
imagePaths = np.array(sorted(imagePaths))
# loop over the input dataset of images

for imagePath in imagePaths:
# load the image, describe the image, then update the list of data
hist = describe(image)
data.append(hist)
# Now that we have all of our color features extracted, we can cluster the feature vector using
# the k-means algorithm. We initialize k-means using the supplied number of clusters via
# command line argument. And a call to clt.fit_predict not only performs the actual clustering,
# but performs the prediction as to which histogram (and thus which associated image) belongs
# to which of the 2 clusters.
clt = KMeans(n_clusters=args["clusters"])
labels = clt.fit_predict(data)
#print labels
# Now that we have our color histograms clustered, we need to grab the unique IDs for each
# cluster. This is handled by making a call to np.unique, which returns the unique values inside
# a list. For each unique label , we need to grab the image paths that belong to the cluster). And
# for each of the images that belong to the current cluster, we load and display the image to our
# screen.
# loop over the unique labels
for label in np.unique(labels):
# grab all image paths that are assigned to the current label
labelPaths = imagePaths[np.where(labels == label)]
# loop over the image paths that belong to the current label
for (i, path) in enumerate(labelPaths):
# load the image and display it
image = cv2.imread(path)
cv2.imshow("Cluster {}, Image #{}".format(label + 1, i + 1), image)
# wait for a keypress and then close all open windows

cv2.waitKey(0)
Step 2: Save the code as " cluster_histograms.py"
Step 3: Run the python script (cluster_histograms.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python cluster_histograms.py --dataset dataset
Inference:
LESSON 2.3: LOCAL BINARY PATTERNS (LBP)

Local Binary Patterns are used to characterize the texture and pattern of an image/object in an
image.
LBPs compute a local representation of texture.
This local representation is performed by comparing each pixel with its surrounding neighborhood of
pixel values.
LBPs are implemented in both mahotas and scikit-image.
Both these implementations work well; however, I prefer the scikit-image implementation which is (1)
easier to use and (2) implements recent extensions to LBPs which further improves rotation invariance,
leading to higher accuracy and smaller feature vector sizes.
The first step in constructing a LBP texture descriptor is to
convert the image to grayscale.
For each pixel in the grayscale image, we select a
neighborhood of size r surrounding the center pixel.
A LBP value is then calculated for this center pixel and stored
in an output 2D array with the same width and height as our
input image.
For example, consider a 8 pixel neighborhood surrounding a pixel and threshold it against its
neighborhood of 8 pixels.
If the intensity of the center pixel is greater-than-or-equal to its neighbor, then we set the value to 1;
otherwise, we set it to 0.
With 8 surrounding pixels, we have a total
of 28=256 possible combinations of LBP
codes.
8-bit binary neighborhood of the central
pixel and converting it into a decimal
representation.
Calculated value is stored in an output array
with the same width and height as the original
image.
A LBP is considered to be uniform if it has at
most two 0-1 or 1-0 transitions.
 For example, the pattern 00001000 (2
transitions) and 10000000 (1 transition) are
both considered uniform patterns since they
contain at most two 0-1 to 1-0 transitions.
01010010 (6 transitions) on the other hand
is not considered a uniform pattern since it has six 0-1 or 1-0 transitions.
There are two primary benefits of this original LBP algorithm proposed by Ojala et al.
The first benefit is that examining the simple 3 x 3 neighborhood is extremely fast and efficient — it
only requires a simple thresholding test and very quick bit operations.
The second benefit is working at such a small scale we are able to capture extremely fine grained
details in the image.
However, being able to capture details at a small scale also is the biggest drawback of the
algorithm — we cannot capture details at varying scales, only the fixed 3 x 3 scale.
To handle this, an extension to the original LBP implementation was proposed to handle variable
neighborhood sizes.
To account for variable neighborhood sizes, two parameters were introduced:
1. The number of points p in a circularly symmetric neighborhood to consider (thus removing
relying on a square neighborhood).
2. The radius of the circle r, which allows us to account for different scales.
It‘s also important to keep in mind the effect of both the radius r and the number of points p.
The more points p you sample, the more patterns you can encode, but at the same time you increase
your computational cost.
Experiment 36: Local Binary Pattern (LBP)
Program 36: Mini fashion search engine

import numpy as np
import argparse
import cv2
from skimage import feature

ap.add_argument("-d", "--dataset", required=True, help="path to the dataset of shirt images")
ap.add_argument("-q", "--query", required=True, help="path to the query image")
# initialize the local binary patterns descriptor and initialize the index dictionary where the image
# filename is the key and the features are the value
# define a dictionary called index , where the key to the dictionary is the unique shirt image
# filename and the value is the extracted LBPs. We‘ll be using this dictionary to store our
# extracted feature and aid us in comparing the query image to our dataset.
index = {}
radius=8
numPoints=24
def describe(image, eps=1e-7):
# compute the Local Binary Pattern representation of the image, and then use the LBP
# representation to build the histogram of patterns
lbp = feature.local_binary_pattern(image,numPoints, radius, method="uniform")
(hist, _) = np.histogram(lbp.ravel(), bins=range(0, numPoints + 3),
range=(0, numPoints + 2))
# normalize the histogram
hist = hist.astype("float")
hist /= (hist.sum() + eps)
# return the histogram of Local Binary Patterns

return hist
# loop over the shirt images

# We simply loop over the images, extract the LBPs, and update the index dictionary.
for imagePath in paths.list_images(args["dataset"]):
# load the image, convert it to grayscale, and describe it
hist = describe(gray)
# update the index dictionary

filename = imagePath[imagePath.rfind("/") + 1:]
index[filename] = hist
# load the query image and extract Local Binary Patterns from it
query = cv2.imread(args["query"])
queryFeatures = describe(cv2.cvtColor(query, cv2.COLOR_BGR2GRAY))
# show the query image and initialize the results dictionary

cv2.imshow("Query", query)
results = {}
# loop over the index

for (k, features) in index.items():
# compute the chi-squared distance between the current features and the query
# features, then update the dictionary of results
# The chi-squared distance is an excellent choice for this problem as it‘s well suited for
# comparing histograms. Smaller distance indicates higher similarity.
d = 0.5 * np.sum(((features - queryFeatures) ** 2) / (features + queryFeatures + 1e-10))
results[k] = d
# sort the results
# keeping the 3 most similar results
results = sorted([(v, k) for (k, v) in results.items()])[:3]
# loop over the results

for (i, (score, filename)) in enumerate(results):
# show the result image
print("#%d. %s: %.4f" % (i + 1, filename, score))
image = cv2.imread(args["dataset"] + "/" + filename)
cv2.imshow("Result #{}".format(i + 1), image)
cv2.waitKey(0)
Step 2: Save the code as " search_shirts.py "
Step 3: Run the python script (search_shirts.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ # python search_shirts.py --dataset shirts --query queries/query_01.jpg
Inference:
LESSON 2.4: HISTOGRAM OF ORIENTED GRADIENTS (HOG)

HOG descriptors are mainly used in computer vision and machine learning for object detection.
However, we can also use HOG descriptors for quantifying and representing both shape and texture.
HOG has five stages namely,
1. Normalizing the image prior to description.
2. Computing gradients in both the x and y directions.
3. Obtaining weighted votes in spatial and orientation cells.
4. Contrast normalizing overlapping spatial cells.
5. Collecting all Histograms of Oriented gradients to form the final feature vector.
The most important parameters for the HOG descriptor are the orientations, pixels_per_cell and
the cells_per_block .
These three parameters (along with the size of the input image) effectively control the dimensionality
of the resulting feature vector.
In most real-world applications, HOG is used in conjunction with a Linear SVM to perform object
detection.
The reason HOG is utilized so heavily is because local object appearance and shape can be
characterized using the distribution of local intensity gradients.
However, since HOG captures local intensity gradients and edge directions, it also makes for a good
texture descriptor.
The HOG descriptor returns a real-valued feature vector.
HOG is implemented in both OpenCV and scikit-image.
The OpenCV implementation is less flexible than the scikit-image implementation, and thus we will
primarily used the scikit-image implementation.
How do HOG descriptors work?

The cornerstone of the HOG descriptor algorithm is that appearance of an object can be modeled by
the distribution of intensity gradients inside rectangular regions of an image:
Implementing this descriptor requires dividing the image into small connected regions called cells,
and then for each cell, computing a histogram of oriented gradients for the pixels within each cell.
We can then accumulate these histograms across multiple cells to form our feature vector.
Step 1: NORMALIZING THE IMAGE PRIOR TO DESCRIPTION

This normalization step is entirely optional, but in some cases this step can improve performance of
the HOG descriptor. There are three main normalization methods that we can consider:
1. Gamma/power law normalization: In this case, we take the log(p) of each pixel p in the input image.
2. Square-root normalization: Here, we take the (𝑝) of each pixel p in the input image. Square-root
normalization compresses the input pixel intensities far less than gamma normalization.
3. Variance normalization: A slightly less used form of normalization is variance normalization. Here,
we compute both the mean µ and standard deviation 𝜎 of the input image. All pixels are mean centered
by subtracting the mean from the pixel intensity, and then normalized through dividing by the standard
deviation: . 𝑝′ = (𝑝 − 𝜇)/𝜎.
Step 2: GRADIENT COMPUTATION

The first actual step in the HOG descriptor is to compute the image gradient in both
the x and y direction.
We will apply a convolution operation to obtain the gradient images:
and
where is the input image, is our filter in the x-direction, and is our filter in the y-direction.
gX gY
Now that we have our gradient images, we can compute the final gradient magnitude representation
of the image:
Combined
Finally, the orientation of the gradient for each pixel in the input image can then be computed by:
𝐺𝑥
Ɵ = tan−1 ( )
𝐺𝑦
Given both |G| and , we can now compute a histogram of oriented gradients, where the bin of the
histogram is based on and the contribution or weight added to a given bin of the histogram is based
on |G| .
Step 3: WEIGHTED VOTES IN EACH CELL

Now that we have our gradient magnitude and orientation representations, we need to divide our
image into cells and block.
A ―cell‖ is a rectangular region defined by the number of pixels that belong in each cell.
For example, if we had a 140 x 140 image and defined our pixels_per_cell as 4 x 4, we would thus
have 35 x 35 = 1225 cells:
If we defined our pixels_per_cell as 28 x 28, we would have 5 x 5 = 25 total cells:
Now, for each of the cells in the image, we need to construct a histogram of oriented gradients using
our gradient magnitude |G| and orientation mentioned above.
But before we construct this histogram, we need to define our number of orientations.
The number of orientations control the number of bins in
the resulting histogram.
The gradient angle is either within the
range [0,180] (unsigned) or [0,360] (signed).
Finally, each pixel contributes a weighted vote to the
histogram — the weight of the vote is simply the gradient
magnitude |G| at the given pixel.
Step 4: CONTRAST NORMALIZATION OVER BLOCKS

To account for changes in illumination and contrast, we can normalize the gradient values locally.
This requires grouping the ―cells‖ together into larger, connecting ―blocks‖.
It is common for these blocks to overlap, meaning that each cell contributes to the final feature vector
more than once.
Here is an example where we have taken an input region of an image, computed a gradient
histogram for each cell, and then locally grouped the cells into overlapping blocks.
For each of the cells in the current block we concatenate their
corresponding gradient histograms, followed by either L1 or L2 normalizing
the entire concatenated feature vector.
Finally, after all blocks are normalized, we take the resulting histograms,
concatenate them, and treat them as our final feature vector.
Step 5: COLLECTING ALL HISTOGRAMS OF ORIENTED GRADIENTS TO FORM THE FINAL

FEATURE VECTOR
Experiment 37: Histogram Of Oriented Gradients (HOG)
Program 37: Identifying car logos using HOG descriptors

Dataset: Our car logo dataset consists of five brands of vehicles: Audi, Ford, Honda, Subaru,
and Volkswagen.
The goal of this project is to:
1. Extract HOG features from our training set to characterize and quantify each car logo.
2. Train a machine learning classifier to distinguish between each car logo.
3. Apply a classifier to recognize new, unseen car logos.

from sklearn.neighbors import KNeighborsClassifier
from skimage import exposure
from skimage import feature
import argparse
import imutils
import cv2
# construct the argument parse and parse command line arguments

ap.add_argument("-d", "--training", required=True, help="Path to the logos training dataset")
ap.add_argument("-t", "--test", required=True, help="Path to the test dataset")
# initialize the data matrix and labels

# initialize data and labels , two lists that will hold the HOG features and car brand name for
# each image in our training set, respectively.
print "[INFO] extracting features..."
data = []
labels = []
# loop over the image paths in the training set

# image path looks like this: car_logos/audi/audi_01.png
# we are able to extract the make of the car by splitting the path and extracting the second sub-
# directory name, or in this case audi
for imagePath in paths.list_images(args["training"]):
# extract the make of the car
make = imagePath.split("/")[-2]
# load the image, convert it to grayscale, and detect edges

edged = imutils.auto_canny(gray)
# find contours in the edge map, keeping only the largest one, presumed to be the car logo
(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
c = max(cnts, key=cv2.contourArea)
# take the largest contour region, compute the bounding box, and extract the ROI.
# extract the logo of the car and resize it to a canonical width and height
# having various widths and heights for your image can lead to HOG feature vectors of different
# sizes — in nearly all situations this is not the intended behavior that you want.
# Remember, our extracted feature vectors are supposed to characterize and represent the
# visual contents of an image. And if our feature vectors are not the same dimensionality, then
# they cannot be compared for similarity. And if we cannot compare our feature vectors for
# similarity, we are unable to compare our two images at all.
# Because of this, when extracting HOG features from a dataset of images, you‘ll want to define
# a canonical, known size that each image will be resized to. In many cases, this means that
# you‘ll be throwing away the aspect ratio of the image. Normally, destroying the aspect ratio of
# an image should be avoided — but in this case we are happy to do it, because it ensures (1)
# that each image in our dataset is described in a consistent manner, and (2) each feature
# vector is of the same dimensionality.
# our logo is resized to a known, predefined 200 x 100 pixels
logo = gray[y:y + h, x:x + w]
logo = cv2.resize(logo, (200, 100))
# extract Histogram of Oriented Gradients from the logo

H = feature.hog(logo, orientations=9, pixels_per_cell=(10, 10), cells_per_block=(2, 2),
transform_sqrt=True)
# Finally, given the HOG feature vector, we then update our data matrix and labels list with the
# feature vector and car make, respectively.
data.append(H)
labels.append(make)
# Given our data and labels we can now train our classifier
# To recognize and distinguish the difference between our five car brands, we are going to use
# scikit-learns KNeighborsClassifier.
# The k-nearest neighbor classifier is a type of ―lazy learning‖ algorithm where nothing is
# actually ―learned‖. Instead, the k-Nearest Neighbor (k-NN) training phase simply accepts a set
# of feature vectors and labels and stores them — that‘s it! Then, when it is time to classify a
# new feature vector, it accepts the feature vector, computes the distance to all stored feature
# vectors (normally using the Euclidean distance, but any distance metric or similarity metric can
# be used), sorts them by distance, and returns the top k ―neighbors‖ to the input feature vector.
# From there, each of the k neighbors vote as to what they think the label of the classification is.
# In our case, we are simply passing the HOG feature vectors and labels to our k-NN algorithm
# and ask it to report back what is the closest logo to our query features using k=1 neighbors.
print "[INFO] training classifier..."
model = KNeighborsClassifier(n_neighbors=1)
model.fit(data, labels)
print "[INFO] evaluating..."
# loop over the test dataset

for (i, imagePath) in enumerate(paths.list_images(args["test"])):
# load the test image, convert it to grayscale, and resize it to the canonical size
logo = cv2.resize(gray, (200, 100))
# extract Histogram of Oriented Gradients from the test image and predict the make of
# the car
# call to our k-NN classifier, passing in our HOG feature vector for the current testing
# image and asking the classifier what it thinks the logo is.
(H, hogImage) = feature.hog(logo, orientations=9, pixels_per_cell=(10, 10),
cells_per_block=(2, 2), transform_sqrt=True, visualise=True)
pred = model.predict(H.reshape(1, -1))[0]
# visualize the HOG image

hogImage = exposure.rescale_intensity(hogImage, out_range=(0, 255))
hogImage = hogImage.astype("uint8")
cv2.imshow("HOG Image #{}".format(i + 1), hogImage)
# draw the prediction on the test image and display it

cv2.putText(image, pred.title(), (10, 35), cv2.FONT_HERSHEY_SIMPLEX, 1.0,
(0, 255, 0), 3)
cv2.imshow("Test Image #{}".format(i + 1), image)
cv2.waitKey(0)
Step 2: Save the code as " recognize_car_logos.py "
Step 3: Run the python script (recognize_car_logos.py ) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python recognize_car_logos.py --training car_logos --test test_images
Inference: Of course, this approach only worked, because we had a tight cropping of the car logo. If we
had described the entire image of a car, it is very unlikely that we would have been able to correctly
classify the brand. But again, that‘s something we can resolve when we get to the Custom Object
Detector, specifically sliding windows and image pyramids.
LESSON 2.5: KEYPOINT DETECTORS

FAST:
FAST is used to detect corners in images and it is most applicable to real-time applications or
resource constrained devices.
FAST keypoint detector is that, for a pixel to be considered a ―corner‖, there must be
atleast n contiguous pixels along a circular perimeter with radius r that
are all either brighter or darker than the center pixel by a threshold t.
Here, we are considering a
circle with 16 pixels (which
corresponds to a radius of r=3)
surrounding the center pixel.
For this center pixel p to be
considered a keypoint, there
must be n contiguous pixels
that are brighter or darker than
the central pixel by some
threshold t.
In practice, it is common to
select a radius of r=3 pixels,
which corresponds to a circle of 16 pixels. It‘s also common to choose n, the number of contiguous
pixels, to be either n=9 or n=12.
Here, we are considering if the center pixel p should be considered a
keypoint or not.
The center pixel p has a grayscale intensity value of p=32.
For this pixel to be considered a keypoint, it must
have n=12 contiguous pixels along the boundary of the circle that
are all either brighter than p + t or darker than p – t.
Let‘s assume that t=16 for this example.
As we can see, there are only 8 contiguous pixels that
are darker (marked with green rectangles, all others as red
rectangles) than the center pixel — thus, the center pixel is not a
keypoint.
let‘s take a look at another example:
Here, we can see there are n=14 contiguous pixels that are lighter than
the center pixel. Thus, this pixel p is indeed a keypoint.
Even though the FAST keypoint detector is very simple, it is still heavily
used in the computer vision world, especially for real-time applications.
Experiment 38: FAST Keypoint Detection
Program 38:

import numpy as np
import cv2

image = cv2.imread("next.png")
orig = image.copy()
# detect FAST keypoints in the image

detector = cv2.FeatureDetector_create("FAST")
kps = detector.detect(gray)
print("# of keypoints: {}".format(len(kps)))
# loop over the keypoints and draw them

for kp in kps:
r = int(0.5 * kp.size)
(x, y) = np.int0(kp.pt)
cv2.circle(image, (x, y), r, (0, 255, 255), 2)
# show the image

cv2.imshow("Images", np.hstack([orig, image]))
cv2.waitKey(0)
Step 2: Save the code as " detect_fast.py"
Step 3: Run the python script (detect_fast.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_fast.py
Inference:
HARRIS:
Harris detector is one of the most common corner detectors that you‘ll encounter in the computer
vision world.
It is quite fast (not as fast as the FAST keypoint detector), but more accurately marks regions as
corners.
The Harris keypoint detector is heavily rooted in linear algebra; however, the most intuitive way to
understand the detector is to take a look at the following figure:
On the left, we have the original image that we want to detect keypoints on.
The middle image represents the gradient magnitude in the x direction. Finally, the right image
represents the gradient magnitude in the y direction.
Here, we have a simple 2x2 pixel region.
The top-left and bottom-right pixels are black, and the top-right and bottom-left
pixels are white.
At the center of these pixels, we thus have a corner (denoted as the red
circle).
So how can we algorithmically define this region as a corner?
Simple! We‘ll just take the summation of the gradient values in the region in
both the x and y direction, respectively: (𝐺𝑥 )2 and (𝐺𝑦 )2 .
If both these values are sufficiently ―large‖, then we can define the region as a
corner.
This process is done for every pixel in the input image.
This method works because the region enclosed inside the red circle will have
a high number of both horizontal and vertical gradients.
To extend this method to arbitrary corners, we first need to (1) compute the
gradient magnitude representation of an image, and (2) then use these gradient
magnitude representations to construct a matrix M:
Now that M is defined, we can take the eigenvalue decomposition of the matrix, leaving us a ―score‖
indicating the ―cornerness‖ (i.e. a value to quantify and score how much of a ―corner‖ the region is):
where , , and both and are the eigenvalues of the

matrix M.
Again, this process is done for each and every pixel in the input image.
So now that we have these eigenvalues, how do we ―know‖ if a region is actually a corner or not?
We can use the following list of possible values to help us determine if a region is a keypoint or not:
1. If is small, then we are examining a ―flat‖ region of the image. Thus, the region is not a
keypoint.
2. If , which happens when or , then the region is an ―edge‖. Again,
the region is not a keypoint.
3. The only time the region can be considered a keypoint is when both is large, which
corresponds to and when being approximately equal. If this holds, then the region is indeed a
keypoint.
The following graphic will help depict this idea:
Experiment 39: Harris Keypoint Detection
Program 39:

import numpy as np
import cv2

orig = image.copy()
# detect Harris keypoints in the image

detector = cv2.FeatureDetector_create("HARRIS")

for kp in kps:
# show the image

cv2.waitKey(0)
Step 2: Save the code as " detect_harris.py"
Step 3: Run the python script (detect_harris.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_harris.py
Inference:
The Harris detector found 453 corners in our image, most of which correspond to the corners of the
keyboard, the corners on the author text, and the corners on the book logo.
DoG (Difference of Gaussian):

The DoG detector is used to detect ―blob‖-like regions in an image.
These blobs could be corners, edges, or combinations of the two.
This DoG detector is commonly called the SIFT keypoint detector; however, this is technically
incorrect.
The keypoint detector itself is called the Difference of Gaussian, or DoG.
The actual image descriptor takes the DoG keypoints and generates a feature vector for each one of
them. This image descriptor is called SIFT.
However, what really sets SIFT apart from other keypoint detectors (at the time, at least) was the
notion of scale space, where we wish to recognize an object (in this case, a book) no matter how close
or far away it appears:
Notice as we get farther and farther away
from the book, the object appears to be smaller.
Conversely, the closer we get to the book, the
larger the object appears to be.
The question becomes, how do we
detect repeatable keypoints on these images,
even as the viewpoint scale and angle
changes?
By utilizing scale spaces inside the DoG
keypoint detector, which allows us to find ―interesting‖ and repeatable regions of an image, even as the
scale changes.
Step 1: Scale space images

The first step of the DoG keypoint
detector is to generate the scale space
images.
Here, we take the original image and
create progressively blurred versions of
it.
We then halve the size of the image
and repeat.
Here‘s an example of a set of scale
space images:
Images that are of the same size
(columns) are called octaves.
Here, we detail five octaves of the
image.
Each octave has four images, with
each image in the octave becoming
progressively more blurred using a
Gaussian kernel.
Step 2: Difference of Gaussians

The second step, and how the DoG
keypoint detector got its name, is to take
the Difference of Gaussians.
Here, we take two consecutive images in
the octave and subtract them from each
other.
We then move to the next two consecutive
images in the octave and repeat the process.
Here is an example of constructing the
Difference of Gaussians representation:
Step 3: Finding local maxima and minima

Now that we have constructed the difference of Gaussians, we can move on to the third step, which is
finding local maxima and minima in the DoG images.
So now for each pair of Difference of Gaussian images, we are going to detect local minima and
maxima.
Consider the pixel marked X in the following figure, along with its 8 surrounding
neighbors.
This pixel X can be considered a ―keypoint‖ if the pixel intensity value
is larger or smaller than all of its 8 surrounding neighbors.
Furthermore, we‘ll apply this check to the layer above and below.
So now a total of 26 checks are made.
 Again, if the pixel X is greater than or less than all 26 of its neighbors,
then it can be considered a keypoint.
Finally, we collect all pixels located as maxima and minima across all
octaves and mark these as keypoints.
The DoG detector is very good at detecting repeatable keypoints
across images, even with substantial changes to viewpoint angle.
However, the biggest drawback of DoG is that it‘s not very fast and
not suitable for real-time applications.
Remember, OpenCV calls the DoG keypoint detector SIFT , but
realize that it‘s actually DoG under the hood.
Experiment 40: SIFT Keypoint Detection
Program 40:

import numpy as np
import cv2

orig = image.copy()
# detect Difference of Gaussian keypoints in the image

detector = cv2.FeatureDetector_create("SIFT")

for kp in kps:
# show the image

cv2.waitKey(0)
Step 2: Save the code as " detect_dog.py"
Step 3: Run the python script (detect_dog.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_dog.py
Inference:
Using the DoG detector, we have found 660 keypoints on the book. Notice how the keypoints have
varying size — this is due to the octaves that we have formed to detect local minima and maxima.
LOCAL INVARIANT DESCRIPTORS: SIFT

Local feature descriptors are broken down into two phases.
The first phase identifies interesting, salient regions of an image that should be described and
quantified. These regions are called keypoints and may correspond to edges, corners, or ―blob‖-like
structures of an image.
After identifying the set of keypoints in an image, we then need to extract and quantify the
region surrounding each keypoint. The feature vector associated with a keypoint is called
a feature or local feature since only the local neighborhood surrounding the keypoint is included in the
computation of the descriptor.
Now that we have keypoints detected using DoG, we can move on to the stage where we
actually describe and quantify the region of the image surrounding the keypoint.
The SIFT feature description algorithm requires a set of input keypoints.
Then, for each of the input keypoints, SIFT takes the 16 x 16 pixel region surrounding the center pixel
of the keypoint region.
From there, we divide the 16 x 16 pixel region into sixteen 4 x
4 pixel windows.
For each of the 16 windows, we compute the gradient magnitude and orientation, just like we did for
the HOG descriptor.
Given the gradient magnitude and orientation, we next construct an 8-bin histogram for each of the 4
x 4 pixel windows:
The amount added to each bin is dependent on the magnitude of the gradient.
However, we are not going to use the raw magnitude of the gradient.
Instead, we are going to utilize Gaussian weighting.
The farther the pixel is from the keypoint center, the less it contributes to the overall histogram.
Finally, the third step of SIFT is to collect all 16 of these 8-bin orientation histograms and concatenate
them together:
Given that we have 16 of these histograms, our feature vector is of thus: 16 x 8 = 128-d.
Experiment 41: SIFT Feature Descriptor
Program 41:

import argparse
import cv2

# initialize the keypoint detector and local invariant descriptor

# Just pass in the name SIFT to the cv2.DescriptorExtractor_create function, and OpenCV will
# instantiate the object for us.
detector = cv2.FeatureDetector_create("SIFT")
extractor = cv2.DescriptorExtractor_create("SIFT")
# load the input image, convert it to grayscale, detect keypoints, and then
# extract local invariant descriptors
(kps, descs) = extractor.compute(gray, kps)
# show the shape of the keypoints and local invariant descriptors array
print("[INFO] # of keypoints detected: {}".format(len(kps)))
print("[INFO] feature vector shape: {}".format(descs.shape))
Step 2: Save the code as " extract_sift.py"
Step 3: Run the python script (extract_sift.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python extract_sift.py
Inference:
Again, it‘s important to note that unlike global image descriptors such as Local Binary
Patterns, Histogram of Oriented Gradients, or Haralick texture (where we have only one feature vector
extracted per image), local descriptors return N feature vectors per image, where N is the number of
detected keypoints. This implies that given N detected keypoints in our input image, we‘ll obtain N x
128-d feature vectors after applying SIFT.
CHAPTER 3: BUILDING YOUR OWN CUSTOM OBJECT DETECTOR

LESSON 3.0: OBJECT DETECTION
Image classification algorithms can only give us a global labeling and categorization of an image.
They cannot provide local labeling of the image and tell us where the stop sign is, where the railroad
is, etc.
For a more granular classification of an image, such as identifying each of the ―objects‖ in an image,
we need to perform object detection.
An object can be a chair, a person, or even a glass of water.
In general, any physical entity with a semi-rigid structure (meaning
the object is not overly deformable and can dramatically alter its
shape) can be considered an ―object‖.
Here are a few examples of objects from CALTECH-101, a
popular 101-category object detection benchmark dataset.
Object detection is used in your

everyday life, whether you realize
it or not. For example, detecting
the presence of faces in images is
a form of object detection.
A face represents a rigid,

predicable object structure and pattern: two eyes, two ears on either side, a nose below the eyes, lips
below the nose, and a chin below the lips.
Since nearly all faces share these traits, we thus have a common structure and pattern that we can
detect in images.
Face detection is used all the time, but you‘re probably most familiar with the implementation in your
digital camera or Smartphone — face detection can be used to perform auto-focusing to ensure the
face(s) are clear in the shot.
We can also use object detection in security systems where we track
people in video feeds and monitor their activity:
Another great real-world implementation of object detection
is automated vehicle parking garages where computer vision techniques
can be used to detect if a parking spot is open or not:
A good object detector should be robust to changes in these properties
(viewpoint, scale, deformation, occlusion, illumination, background clutter, and intra-class variation) and
still be able to detect the presence of the object, even under less-than-ideal circumstances.
Experiment 42: Training your own object detector
Program 42: Write a code to perform actual object recognition, specifically recognizing stop signs in
images.
The CALTECH-101 dataset is a very popular benchmark dataset for object detection and has been
used by many researchers, academics, and computer vision developers to evaluate their object
detection algorithms.
The dataset includes 101 categories, spanning a diverse range of objects including elephants,
bicycles, soccer balls, and even human brains, just to name a few.
When you download the CALTECH-101 dataset, you‘ll notice that it includes both an images and
annotations directory.
For each image in the
dataset, an associated
bounding box (i.e. (x, y)-
coordinates of the object)
is provided.
Our goal is to take both
the images and the bounding boxes (i.e. the annotations) and train a classifier to detect the presence of
a given image in an image.
We are presented with not only the labels of the images, but also annotations corresponding to
the bounding box surrounding each object.
We‘ll take these bounding boxes, extract features from them, and then use these features to build our
object detector.

# The loadmat function from scipy . The annotations/bounding boxes for the CALTECH-101
# dataset are actually .mat files which are Matlab files, similar to .cpickle files for Python and
# NumPy — they are simply the serialized bounding boxes for each image.
# We‘ll also be using scikit-image instead of OpenCV for our train_detector.py script, mainly
# because we‘ll also be using dlib which is built to play nice with scikit-image.
from scipy.io import loadmat
from skimage import io
import argparse
import dlib

# Looking at our argument parsing section, you‘ll notice that we need three switches:
# --class : This is the path to our specific CALTECH-101 class that we want to train an object
# detector for. For this example, we‘ll be using the stop sign class.
#--annotations : For each image in the dataset, we also have the corresponding bounding boxes
# for each object in the image — the --annotations switch specifies the path to our bounding
# boxes directly for the specific class we are training on.
#--output : After our model has been trained, we would like to dump it to file — this is the path to
# our output classifier.
ap.add_argument("-c", "--class", required=True,help="Path to the CALTECH-101 class images")
ap.add_argument("-a", "--annotations", required=True, help="Path to the CALTECH-101 class
annotations")
ap.add_argument("-o", "--output", required=True, help="Path to the output detector")
# grab the default training options for our HOG + Linear SVM detector,
# initialize the images list to store the images we are using to train our classifier as well as
# initialize the boxes list to store the bounding boxes for each of the images.
print("[INFO] gathering images and bounding boxes...")
options = dlib.simple_object_detector_training_options()
images = []
boxes = []
# loop over the image paths

for imagePath in paths.list_images(args["class"]):
# extract the image ID from the image path and load the annotations file
imageID = imagePath[imagePath.rfind("/") + 1:].split("_")[1]
imageID = imageID.replace(".jpg", "")
p = "{}/annotation_{}.mat".format(args["annotations"], imageID)
annotations = loadmat(p)["box_coord"]
# loop over the annotations and add each annotation to the list of bounding boxes
bb = [dlib.rectangle(left=long(x), top=long(y), right=long(w), bottom=long(h))
for (y, h, x, w) in annotations]
boxes.append(bb)
# add the image to the list of images

images.append(io.imread(imagePath))
# train the object detector

print("[INFO] training detector...")
detector = dlib.train_simple_object_detector(images, boxes, options)
# dump the classifier to file
print("[INFO] dumping classifier to file...")

detector.save(args["output"])
# visualize the results of the detector

win = dlib.image_window()
win.set_image(detector)
dlib.hit_enter_to_continue()
Step 2: Save the code as " train_detector.py"
Step 3: Run the python script (train_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python train_detector.py --class stop_sign_images --annotations stop_sign_annotations
--output output/stop_sign_detector.svm
Experiment 43: Testing your object detector
Program 43: Gathered 11 images of stop signs from Google that our classifier has not been trained on.

import argparse
import dlib
import cv2

# In order to run our test_detector.py script, we‘ll need two switches.
# The first is the --detector , which is the path to our trained stop sign detector
# The second switch, --testing , is the path to the directory containing our stop sign images for
# testing.
ap.add_argument("-d", "--detector", required=True, help="Path to trained object detector")
ap.add_argument("-t", "--testing", required=True, help="Path to directory of testing images")
# load the object detector from disk

detector = dlib.simple_object_detector(args["detector"])
# loop over the testing images

# For each of these testing images, we load it from disk and then use our classifier to detect the
# presence of stop signs. Our detector returns us a list of boxes, corresponding to the (x, y)-
# coordinates of the detected stop signs.
for testingPath in paths.list_images(args["testing"]):

# load the image and make predictions
# It‘s important to note that we take special care to convert our image from the BGR
# color space to the RGB color space before calling our detector . Remember, in
# our train_detector.py script, we used scikit-image which represents images in
# RGB order. But now that we are using OpenCV (which represents images in BGR
# order), we need to convert from BGR to RGB before calling the dlib object detector.
image = cv2.imread(testingPath)
boxes = detector(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# loop over the bounding boxes and draw them

for b in boxes:
(x, y, w, h) = (b.left(), b.top(), b.right(), b.bottom())
cv2.rectangle(image, (x, y), (w, h), (0, 255, 0), 2)
# show the image

cv2.waitKey(0)
Step 2: Save the code as " test_detector.py"
Step 3: Run the python script (test_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python test_detector.py --detector output/stop_sign_detector.svm --testing stop_sign_testing
Inference:
LESSON 3.1: IMAGE PYRAMIDS

Object detection systems are not easy to build — there are many components and moving parts that
we need to put into place.
To create our own custom object detection framework, we‘ll need to implement (at a bare minimum)
the following pipeline.
As we can see from the flowchart above, the first step
when creating our custom object detection framework is to
implement the ―scanning‖ functionality, which will enable us
to find objects in images at various sizes and locations.
This ―scanner‖ can be broken into two components:
 Component #1: An image pyramid.
 Component #2: A sliding window.
An image pyramid is simply a multi-scale representation of
an image.
Utilizing an image pyramid allows us to find objects in images at different scales of an image.
At the bottom of the pyramid, we have the original image at its original size (in terms of width and
height).
At each subsequent layer, the image is resized (sub-sampled) and optionally smoothed via Gaussian
blurring.
The image is progressively sub-sampled until some stopping criterion is met, which is normally a
minimum size being reached, indicating that no further sub-sampling needs to take place.
When combined with the second component of our ―scanner‖ the sliding window, we can find objects
in images at various locations.
As the name suggests, a sliding window ―slides‖ from left to right and top to bottom of each scale in
the image pyramid.
Again, by leveraging both image pyramids and sliding windows together, we are able to detect
objects at various locations and scales in an image.
Experiment 44: Image Pyramids
Program 44:

import argparse
import cv2

ap.add_argument("-i", "--image", required=True, help="path to the input image")
ap.add_argument("-s", "--scale", type=float, default=1.5, help="scale factor size")
# load the input image

# The second argument is the scale , which controls how much the image is resized at each
# layer. A small scale yields more layers in the pyramid, while a larger scale yields less layers.
# Next, we define the minSiz , which is the minimum required width and height of the layer.
def pyramid(image, scale=1.5, minSize=(30, 30)):
# yield the original image
yield image
# keep looping over the pyramid

while True:
# compute the new dimensions of the image and resize it
w = int(image.shape[1] / scale)
image = imutils.resize(image, width=w)
# if the resized image does not meet the supplied minimum size, then stop constructing the
# pyramid
if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]:
break
# yield the next image in the pyramid

yield image
# loop over the layers of the image pyramid and display them
for (i, layer) in enumerate(pyramid(image, scale=args["scale"])):
cv2.imshow("Layer {}".format(i + 1), layer)

cv2.waitKey(0)
Step 2: Save the code as " test_pyramid.py"
Step 3: Run the python script (test_pyramid.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python test_pyramid.py --image florida_trip.png --scale 1.5
Inference:
LESSON 3.2:SLIDING WINDOWS

Sliding windows play an integral role in object classification, as they allow us to localize
exactly where in an image an object resides.
Utilizing both a sliding window and an image pyramid, we are able to detect objects in images at
various scales and locations.
In the context of computer vision and machine learning, a sliding window is a
rectangular region of fixed width and height that ―slides‖ across an image, such
as in the following figure.
For each of these windows, we would take the window region, extract features
from it, and apply an image classifier to determine if the window contains an
object that interests us — in this case, a face.
Combined with image pyramids, we can create image classifiers that
can recognize objects at varying scales and locations in an image.
These techniques, while simple, play an absolutely critical role in object detection and image
classification.
Experiment 45: Sliding Windows
Program 45:

import argparse
import time
import cv2

ap.add_argument("-i", "--image", required=True, help="path to the input image")
ap.add_argument("-w", "--width", type=int, help="width of sliding window")
ap.add_argument("-t", "--height", type=int, help="height of sliding window")
ap.add_argument("-s", "--scale", type=float, default=1.5, help="scale factor size")
# The first is the image that we are going to loop over. The second argument is the stepSize .
# The stepSize indicates how many pixels we are going to ―skip‖ in both the (x, y) direction.
# Normally, we would not want to loop over each and every pixel of the image (i.e. stepSize=1 )
# as this would be computationally prohibitive if we were applying an image classifier at each
# window. In practice, it‘s common to use a stepSize of 4 to 8 pixels. Remember, the smaller
# your step size is, the more windows you‘ll need to examine. The larger your step size is, the
# less windows you‘ll need to examine — note, however, that while this will be computationally
# more efficient, you may miss detecting objects in images if your step size becomes too large.
# The last argument, windowSize , defines the width and height (in terms of pixels) of the
# window we are going to extract from our image .
def sliding_window(image, stepSize, windowSize):
# slide a window across the image
for y in xrange(0, image.shape[0], stepSize):
for x in xrange(0, image.shape[1], stepSize):
# yield the current window
# returns a tuple containing the x and y coordinates of the sliding
# window, along with the window itself.
yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])
# load the input image and unpack the command line arguments
(winW, winH) = (args["width"], args["height"])
# loop over the image pyramid

for layer in pyramid(image, scale=args["scale"]):
# loop over the sliding window for each layer of the pyramid
for (x, y, window) in sliding_window(layer, stepSize=32, windowSize=(winW, winH)):
# if the current window does not meet our desired window size, ignore it
if window.shape[0] != winH or window.shape[1] != winW:
continue
# This is where we would process the window, extract hog features, and
# apply a machine learning classifier to perform object detection
# since we do not have a classifier yet, let's just draw the window
clone = layer.copy()
cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
cv2.imshow("Window", clone)
# normally we would leave out this line, but let's pause execution
# of our script so we can visualize the window
cv2.waitKey(1)
time.sleep(0.025)
Step 2: Save the code as " test_sliding_window.py"
Step 3: Run the python script (test_sliding_window.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python test_sliding_window.py --image car.jpg --width 96 --height 36
Inference:
LESSON 3.3:TRAINING YOUR CUSTOM OBJECT DETECTOR

So far in this module, we have learned how to train a HOG + Linear SVM object detector on datasets
that already have labeled bounding boxes.
But what if we wanted to train an object detector on our own datasets that do not provide bounding
boxes?
How do we go about labeling our images and obtaining these bounding boxes?
And once we have these image annotations, how do we train our object detector?
In the remainder of this lesson, we‘ll be addressing each of these questions, starting by examining
dlib‘s imglab tool, which we can use to annotate our images by drawing bounding boxes surrounding
objects in our dataset.
Compiling and using the imglab tool

$ cd dlib-18.18/tools/imglab
$ mkdir build
$ cd build
$ cmake ..
$ cmake --build . --config Release
To run imglab , you need to supply two command line arguments over two separate commands:
 The first is your output annotations file which will contain the bounding boxes you will manually draw
on each of the images in your dataset.
 The second argument is the dataset path which contains the list of images in your dataset.
For this lesson, we‘ll be using a subset of the MIT + CMU Frontal Images dataset as our training data,
followed by a subset of the CALTECH Web Faces dataset for testing.
First, let‘s initialize our annotations file with a list of images in the dataset path:
$ ./imglab -c ~/Desktop/faces_annotations.xml ~/Desktop/faces
From there, we can start the annotation process by using the following command:
$ ./imglab ~/Desktop/faces_annotations.xml
As you can see, the imglab GUI is displayed to my
screen, along with the images in my dataset of faces.
To draw a bounding box surrounding each object in my
dataset, I simply select an image, hold the shift key on my
keyboard, and drag-and-draw the bounding rectangle,
then release my mouse.
Note: It‘s important to label all examples of objects in an
image; otherwise, dlib will implicitly assume that regions
not labeled are regions that should not be detected (i.e.,
hard-negative mining applied during extraction time).
Finally, if there is an ROI that you are unsure about and
want to be ignored entirely during the training process,
simply double click the bounding box and press the i key. This will cross out the bounding box and
mark it as ―ignored‖.
While annotating a dataset of images is a time consuming and tedious task, you should nonetheless
take your time and take special care to ensure the images are properly labeled with their
respective bounding boxes.
Remember, machine learning algorithms are only as good as their input data — if you put garbage in,
you‘ll only get garbage out. But if you take the time to properly label your images, you‘ll get much better
results.
Experiment 46: Training your own custom object detector
Program 46:

import argparse
import dlib

# Our train_detector.py script requires two command line arguments: the --xml path to where
# our face annotations live, followed by the --detector , the path to where we will store our
# trained classifier.
ap.add_argument("-x", "--xml", required=True, help="path to input XML file")
ap.add_argument("-d", "--detector", required=True, help="path to output director")
# grab the default training options for the HOG + Linear SVM detector, then
# train the detector -- in practice, the `C` parameter should be cross-validated
# we define the options to our dlib detector. The most important argument to set here is C , the
#―strictness‖ of our SVM. In practice, this value needs to be cross-validated and grid-searched
# to obtain optimal accuracy.
print("[INFO] training detector...")
options = dlib.simple_object_detector_training_options()
options.C = 1.0
options.num_threads = 4
options.be_verbose = True
dlib.train_simple_object_detector(args["xml"], args["detector"], options)
# show the training accuracy

print("[INFO] training accuracy: {}".format(
dlib.test_simple_object_detector(args["xml"], args["detector"])))
# load the detector and visualize the HOG filter

win = dlib.image_window()
win.set_image(detector)
dlib.hit_enter_to_continue()
Step 2: Save the code as " train_detector.py "
Step 3: Run the python script (train_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python train_detector.py --xml face_detector/faces_annotations.xml --detector
face_detector/detector.svm
Experiment 47: Testing our custom object detector
Program 47:

import argparse
import dlib
import cv2

# two required command line arguments here: the path to our custom object --detector ,
# followed by the path to our --testing directory.
ap.add_argument("-d", "--detector", required=True, help="Path to trained object detector")
ap.add_argument("-t", "--testing", required=True, help="Path to directory of testing images")
# load the detector from disk

# loop over the testing images

for testingPath in paths.list_images(args["testing"]):
# load the image and make predictions
image = cv2.imread(testingPath)
boxes = detector(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# loop over the bounding boxes and draw them
for b in boxes:
(x, y, w, h) = (b.left(), b.top(), b.right(), b.bottom())
cv2.rectangle(image, (x, y), (w, h), (0, 255, 0), 2)
# show the image

cv2.waitKey(0)
Step 2: Save the code as " test_detector.py "
Step 3: Run the python script (test_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python test_detector.py --detector f
Inference:
CHAPTER 4: WORKING WITH RASPBERRY Pi
LESSON 4.1: HOME SURVEILLANCE AND MOTION DETECTION
To demonstrate how to build a home surveillance and motion detection system capable of running
in real-time on your Raspberry Pi.
This motion detection system will monitor a particular area of your house (such as the front door for
motion).
When activity occurs, the frame that best captures and characterizes the motion (according to a
criteria we‘ll define later) will be written to disk.
Once the frame has been written to disk, it becomes easy to apply any other type of API integration,
such as uploading the image to an online server, texting ourselves a picture of the intruder, or
uploading the image to Dropbox.
Background subtraction is critical in many computer vision applications.
The applications of background subtraction are to count the number of cars passing through a toll
booth, count the number of people walking in and out of a store.
The background of our video stream is largely static and unchanging over consecutive frames of a
video.
Therefore, if we can model the background, we monitor it for substantial changes.
If there is a substantial change, we can detect it — this change normally corresponds to motion on
our video.
Now obviously in the real-world this assumption can easily fail.
Due to shadowing, reflections, lighting conditions, and any other possible change in the environment,
our background can look quite different in various frames of a video.
And if the background appears to be different, it can throw our algorithms off.
That‘s why the most successful background subtraction/foreground detection systems utilize fixed
mounted cameras and in controlled lighting conditions.
Experiment 48: Basic motion detection and tracking
Program 48:

import argparse
import datetime
import imutils
import time
import cv2

ap.add_argument("-v", "--video", help="path to the video file")
ap.add_argument("-a", "--min-area", type=int, default=500, help="minimum area size")

# if the video argument is None, then we are reading from webcam

if args.get("video", None) is None:
camera = cv2.VideoCapture(0)
time.sleep(0.25)
# otherwise, we are reading from a video file

else:
camera = cv2.VideoCapture(args["video"])
# initialize the first frame in the video stream

firstFrame = None
# loop over the frames of the video

while True:
# grab the current frame and initialize the occupied/unoccupied text
(grabbed, frame) = camera.read()
text = "Unoccupied"
# if the frame could not be grabbed, then we have reached the end of the video
if not grabbed:
break
# resize the frame, convert it to grayscale, and blur it

frame = imutils.resize(frame, width=500)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
# if the first frame is None, initialize it

if firstFrame is None:
firstFrame = gray
continue
# compute the absolute difference between the current frame and first frame
frameDelta = cv2.absdiff(firstFrame, gray)
thresh = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)[1]
# dilate the thresholded image to fill in holes, then find contours on thresholded image
thresh = cv2.dilate(thresh, None, iterations=2)
(cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,

for c in cnts:
# if the contour is too small, ignore it
if cv2.contourArea(c) < args["min_area"]:
continue
# compute the bounding box for the contour, draw it on the frame and update the text
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
text = "Occupied"
# draw the text and timestamp on the frame

cv2.putText(frame, "Room Status: {}".format(text), (10, 20),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
# show the frame and record if the user presses a key

cv2.imshow("Security Feed", frame)
cv2.imshow("Frame Delta", frameDelta)
key = cv2.waitKey(1) & 0xFF
# if the `q` key is pressed, break from the loop

if key == ord("q"):
break
# cleanup the camera and close any open windows

camera.release()
Step 2: Save the code as " motion_detector.py"
Step 3: Run the python script (motion_detector.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python motion_detector.py --video videos/example_01.mp4
Inference:
LESSON 4.2: FACE DETECTION IN IMAGES
VIOLA-JONES ALGORITHM
Viola and Jones focus on detecting faces in images, but the framework can be used to train detectors
for arbitrary ―objects,‖ such as cars, buildings, kitchen utensils, and even bananas.
Recall when we discussed image kernels and how we slid a small matrix across our image from left-
to-right and top-to-bottom, computing an output value for each center pixel of the kernel?
Well, it turns out that this sliding window approach is also extremely useful in the context of detecting
objects in an image
In the figure above, we can see that we are sliding a fixed size window across our
image at multiple scales.
At each of these phases, our window stops, computes some features, and then
classifies the region as Yes, this region does contain a face, or No, this region does
not contain a face.
For each of the stops along the sliding window path, five rectangular features are
computed.
To obtain features for each of these five rectangular areas, we
simply subtract the sum of pixels under the white region from the
sum of pixel from the black region.
Interestingly enough, these features have actual real
importance in the context of face detection:
1. Eye regions tend to be darker than cheek regions.
2. The nose region is brighter than the eye region.
Therefore, given these five rectangular regions and their
corresponding difference of sums, we are able to form features
that can classify parts of a face.
Then, for an entire dataset of features, we use
the AdaBoost algorithm to select which ones correspond to facial regions of an image.
However, as you can imagine, using a fixed sliding window and sliding it across every (x, y)-
coordinate of an image, followed by computing these Haar-like features, and finally performing the
actual classification can be computationally expensive.
To combat this, Viola and Jones introduced the concept of cascades or stages.
At each stop along the sliding window path, the window must pass a series of tests where each
subsequent test is more computationally expensive than the previous one.
If any one test fails, the window is automatically discarded.
Experiment 49: Face Detection in Images
Program 49:

import argparse
import cv2

ap.add_argument("-f", "--face", required=True, help="Path to where the face cascade resides")
ap.add_argument("-i", "--image", required=True, help="Path to where the image file resides")

# load the face detector and detect faces in the image

detector = cv2.CascadeClassifier(args["face"])
faceRects = detector.detectMultiScale(gray,scaleFactor=1.05, minNeighbors=5,
minSize=(30, 30), flags = cv2.cv.CV_HAAR_SCALE_IMAGE)
print "I found %d face(s)" % (len(faceRects))
# loop over the faces and draw a rectangle around each

for (x, y, w, h) in faceRects:
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
# show the detected faces

cv2.imshow("Faces", image)
cv2.waitKey(0)
Step 2: Save the code as "detect_faces.py"
Step 3: Run the python script (detect_faces.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_faces.py --face cascades/haarcascade_frontalface_default.xml --image
images/messi.png
Inference:
Experiment 50: Face Detection in Video
Program 50:

import argparse
import imutils
import cv2

ap.add_argument("-v", "--video", help="path to the (optional) video file")
# if a video path was not supplied, grab the reference to the webcam
if not args.get("video", False):
# otherwise, grab a reference to the video file

else:
# keep looping
while True:
# grab the current frame
# if we are viewing a video and we did not grab a frame, then we have
# reached the end of the video
if args.get("video") and not grabbed:
break
# resize the frame, convert it to grayscale, and detect faces in the frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faceRects = detector.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=5,
minSize=(30, 30), flags=cv2.cv.CV_HAAR_SCALE_IMAGE)
# loop over the faces and draw a rectangle around each

for (x, y, w, h) in faceRects:
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
# show the frame to our screen

cv2.imshow("Frame", frame)
# if the 'q' key is pressed, stop the loop

if key == ord("q"):
break
# clean up the camera and close any open windows

camera.release()
Step 2: Save the code as "detect_faces_video.py"
Step 3: Run the python script (detect_faces_video.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python detect_faces_video.py --face cascades/haarcascade_frontalface_default.xml
Inference:
CHAPTER 5: IMAGE CLASSIFICATION AND MACHINE LEARNING

LESSON 5.1: IMAGE CLASSIFICATION
In order to understand the contents of an image, we must apply image classification, which is the task
of applying computer vision and machine learning algorithms to extract meaning from an image.
This could be as simple as assigning a label to what the image contains, or it could be as advanced
as interpreting the contents of an image and returning a human-readable sentence.
WHAT IS IMAGE CLASSIFICATION?

Image classification, at the very core, is the task of assigning a label to an image from a pre-
defined set of categories.
Practically, this means that given an input image, our task is to analyze the image and return a label
that categorizes the image.
This label is (almost always) from a pre-defined set.
It is very rare that we see ―open-ended‖ classification problems where the list of labels is infinite.
For example, let‘s assume that our set of possible categories includes: categories = {cat, cow, dog,
horse, wolf}
Then we present the following image to our classification system:
Our goal here is to take this input image and assign a label to it from
our categories set — in this case, dog.
Our classification system could also assign multiple labels to the image via
probabilities, such as such as dog: 95%, wolf: 55%, cat: 3%, horse:
0%, cow: 0%.
More formally, given our input image of W x H pixels, with 3 channels, Red,
Green, and Blue, respectively, our goal is to take the W x H x 3 = N pixels and figure out how to
accurately classify the contents of the image.
THE SEMANTIC GAP

Take a look at the two photos below:
But all a computer sees is two big matrices of
pixels:
The semantic gap is the difference between how a human perceives the contents of an image versus
how an image can be represented in a way a computer can understand and process.
Again, a quick visual examination of the two photos above can reveal the difference between the two
species of animals.
But in reality, the computer has no idea that there are animals in the image to begin with.
To make this point more clear, take a look this photo of a tranquil beach
below:
We might go about describing the image as follows:
 Spatial: The sky is at the top of the image and the sand/ocean are at
the bottom.
 Color: The sky is dark blue, the ocean water is lighter than the sky, while the sand is tan.
 Texture: The sky has a relatively uniform pattern, while the sand is very coarse.
So how do we encode all this information in a way that a computer can understand?
The answer is to use various forms of image descriptors and deep learning methods.
By using image descriptors and deep learning we can actually extract and quantify regions of an
image.
Some descriptors are used to encode spatial information.
Others quantify the color of an image.
And other features are used to characterize texture.
Finally, based on these characterizations of the image, we can apply machine learning to ―teach‖ our
computers what each type of image ―looks like.‖
CHALLENGES
If the semantic gap was not enough of a problem, we also have to handle variations in how an image
or an object in an image appears. For example, we
have viewpoint variation:
Where the object can be oriented/rotated in multiple

dimensions with respect to how the object is photographed
and captured.
No matter the angle in which we capture our Raspberry Pi,
it‘s still a Raspberry Pi.
We also have to account for scale variation: Ever order a
tall, grande, or venti cup of coffee from Starbucks?
Technically, they are all the same thing — a cup of coffee.
But they are all different sizes of a cup of coffee.
Furthermore, that same venti coffee will look dramatically
different when it is photographed up close and when it is
captured from farther way.
Our image classification methods must be able to tolerate
these types of scale variations.
Our image classification system should also be able to
handle occlusions, where large parts of the object we want to
classify are hidden from view in the image:
On the left we have a picture of a dog. And

on the right we have a picture of the same
dog, but notice how the dog is resting
underneath the covers, occluded from our
view.
The dog is still clearly in both images —
she‘s just more visible in one image than the
other.
Our image classification algorithms should
still be able to detect and label the presence of the dog in both images.
Just as challenging as the deformations and occlusions mentioned above, we also need to handle
changes in illumination. Take a look at the following image of a coffee cup captured in standard lighting
and low lighting:
The image on the left was photographed with standard overhead
lighting.
And the image on the right was captured with very little lighting.
We are still examining the same coffee cup — but based on the
lighting conditions the cup looks dramatically different.
LESSON 5.2: MACHINE LEARNING-SUPERVISED LEARNING

You need a training set consisting of the emails themselves along with their labels, in this
case: spam or not-spam.
Given this data, you can analyze the text (i.e. the distribution of words) of the email and utilize the
spam/not-spam labels to teach a machine learning classifier what words occur in a spam email or not.
This example of creating a spam filter system is an example of supervised learning.
Supervised learning is arguably the most well known and studied type of machine learning.
Given our training data, a model (or ―classifier‖) is created through a training process where
predictions are made on the input data and then corrected when the predictions are wrong.
This training process continues until the model achieves some desired stopping criterion, such as a
low error rate or a maximum number of training iterations.
Common supervised learning algorithms include Logistic Regression, Support Vector Machines, and
Random Forests.
The first column of our
spreadsheet is the label
associated with a particular
image.
The remaining six columns
correspond to our feature
vector — in this case the mean
and standard deviation of each
RGB color channel.
UNSUPERVISED LEARNING
 In contrast to supervised
learning, unsupervised learning has no labels
associated with the input data, and thus we cannot
correct our model if it makes an incorrect
prediction.
Thus, most unsupervised learning methods are
focused on deducing structure present in the input
data.
SEMI-SUPERVISED LEARNING
So what happens if we only have some of the labels associated with our data and no labels for the
other?
Is there a way that we can apply some hybrid
of supervised and unsupervised learning and
still be able to classify each of our data points?
It turns out the answer is yes — we just need
to apply semi-supervised learning.
Our semi-supervised learning algorithm
would take the known pieces of data, analyze
them, and then try to label each of the
unlabeled data points for use as extra training data.
This process can then repeat for many iterations as the semi-supervised algorithm learns the
―structure‖ of the data to make more accurate predictions and generate more reliable training data.
The overall goal here is to generate more training data, which the algorithm can use to make itself
―smarter‖.
THE IMAGE CLASSIFICATION PIPELINE
Step 1: Gathering your dataset

The first thing we need is our initial dataset.
We need the images themselves as well as the labels associated with each image.
These labels could come from a finite set of categories, such as: categories = {cat, cow, dog, horse,
wolf}.
Furthermore, the number of images for each category should be fairly uniform (i.e. the same).
If we have twice the number of cat images than dog images, and five times the number of horse
images than cat images, then our classifier will become naturally biased to ―overfitting‖ into these
heavily-represented categories.
In order to fix this problem, we normally sample our dataset so that each category is represented
equally.
Step 2: Splitting our dataset

Now that we have our initial
dataset, we need to split it into
two parts: a training set and
a testing set.
A training set is used by our
classifier to ―learn‖ what each
category looks like by making
predictions on the input data
and then corrected when the
predictions are wrong.
After the classifier has been trained, we can then evaluate its performance on a testing set.
It’s extremely important that the training set and testing set are independent of each other and
do not overlap!
If you use your testing set as part of your training data then your classifier has an unfair advantage,
since it has already seen the testing examples before and ―learned‖ from them.
Instead, you must keep this testing set entirely separate from your training process and use it only to
evaluate your classifier.
Step 3: Feature extraction

Now that we have our data splits defined, we need to extract features to abstractly quantify and
represent each image.
Common choices of features include:
 Color Histograms
 Histogram of Oriented Gradients
 Local Binary Patterns
Step 4: Training your classifier

Given the feature vectors associated for the training data, we can now train our classifier.
The goal here is for our classifier to ―learn‖ how to recognize each of the categories in our label data.
When the classifier makes a mistake, it learns from this mistake and improves itself.
So how does the actual ―learning‖ work?
Well, that depends on each individual algorithm.
Support Vector Machines work in high-dimensional spaces seeking an optimal hyper-plane to
separate the categories.
Decision Trees and Random Forest Classifiers look for optimal splits in the data based on entropy.
Meanwhile, algorithms such as k-Nearest Neighbor perform no actual ―learning‖ because they simply
rely on the distance between feature vectors in an n-dimensional space to make predictions.
Step 5: Evaluation
Last, we need to evaluate our trained classifier.
For each of the feature vectors in our testing set, we present them to our classifier and ask it
to predict what it thinks the label of image is.
We then tabulate the predictions of the classifier for each point in the testing set.
Finally, these classifier predictions are compared to the ground-truth label from our testing set.
The ground-truth labels represent what the category actually is.
From there, we can compute the number of predictions our classifier got right and compute aggregate
reports such as precision, recall, and f-measure, which are used to quantify the performance of our
classifier as a whole.
K-NEAREST NEIGHBOR CLASSIFICATION

The k-Nearest Neighbor classifier is by far the most simple
image classification algorithm.
In fact, it‘s so simple that it doesn‘t actually ―learn‖ anything!
Instead, this algorithm simply relies on the distance between
feature vectors.
Simply put, the k-NN algorithm classifies unknown data points
by finding the most common class among the k closest examples.
Here, we can see three categories of images, denoted as red,
blue, and green dots, respectively.
We can see that each of these sets of data points are grouped relatively close together in our n-
dimensional space.
This implies that the distance between two red dots is much smaller than the distance between a red
dot and a blue dot.
However, in order to apply the k-Nearest Neighbor classifier, we first need to select a distance metric
or a similarity function.
Here, we have a dataset of three types of

flowers — sunflowers, daises, and pansies —
and we have plotted them according to
the size and lightness of their petals.
Now, let‘s insert a new, unknown flower and try
to classify it using only a single
neighbor (i.e. k=1):
Here, we have found the ―nearest neighbor‖ to our

test flower, indicated by k=1.
And according to the label of the nearest flower, it‘s
a daisy.
Let‘s try another ―unknown flower‖, this time
using k=3:
This time, we have found two sunflowers and
one daisy in the top three results.
Since the sunflower category has the largest
number of votes, we‘ll classify this unknown flower
as a sunflower.
Experiment 51: Recognizing handwritten digits using MNIST
Program 51:

from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report
from sklearn import datasets
from skimage import exposure
import numpy as np
import imutils
import cv2
# load the MNIST digits dataset

mnist = datasets.load_digits()
# take the MNIST data and construct the training and testing split, using 75% of the
# data for training and 25% for testing
(trainData, testData, trainLabels, testLabels) = train_test_split(np.array(mnist.data),
mnist.target, test_size=0.25, random_state=42)
# now, let's take 10% of the training data and use that for validation
(trainData, valData, trainLabels, valLabels) = train_test_split(trainData, trainLabels,
test_size=0.1, random_state=84)
# show the sizes of each data split

print("training data points: {}".format(len(trainLabels)))
print("validation data points: {}".format(len(valLabels)))
print("testing data points: {}".format(len(testLabels)))
# initialize the values of k for our k-Nearest Neighbor classifier along with the
# list of accuracies for each value of k
kVals = range(1, 30, 2)
accuracies = []
# loop over various values of `k` for the k-Nearest Neighbor classifier
for k in xrange(1, 30, 2):
# train the k-Nearest Neighbor classifier with the current value of `k`
model = KNeighborsClassifier(n_neighbors=k)
model.fit(trainData, trainLabels)
# evaluate the model and update the accuracies list

score = model.score(valData, valLabels)
print("k=%d, accuracy=%.2f%%" % (k, score * 100))
accuracies.append(score)
# find the value of k that has the largest accuracy
i = np.argmax(accuracies)
print("k=%d achieved highest accuracy of %.2f%% on validation data" % (kVals[i],
accuracies[i] * 100))
# re-train our classifier using the best k value and predict the labels of the test data
model = KNeighborsClassifier(n_neighbors=kVals[i])
predictions = model.predict(testData)
# show a final classification report demonstrating the accuracy of the classifier for each of the
# digits
print("EVALUATION ON TESTING DATA")
print(classification_report(testLabels, predictions))
# loop over a few random digits

for i in np.random.randint(0, high=len(testLabels), size=(5,)):
# grab the image and classify it
image = testData[i]
prediction = model.predict(image.reshape(1, -1))[0]
# convert the image for a 64-dim array to an 8 x 8 image compatible with OpenCV,
# then resize it to 32 x 32 pixels so we can see it better
image = image.reshape((8, 8)).astype("uint8")
image = exposure.rescale_intensity(image, out_range=(0, 255))
image = imutils.resize(image, width=32, inter=cv2.INTER_CUBIC)
# show the prediction

print("I think that digit is: {}".format(prediction))
cv2.waitKey(0)
Step 2: Save the code as " mnist_demo.py"
Step 3: Run the python script (mnist_demo.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python mnist_demo.py
Inference:
LOGISTIC REGRESSION
Let‘s consider a simple two-class classification problem, where we
want to predict if a given image contains a cat or a dog.
We‘ll assign cats to have a label of 0 and dogs to have a label of 1.
Let‘s denote this set of labels as L = {0, 1}.
We‘ll also assume that we have extracted a set of (arbitrary) feature
vectors from our dataset of images to characterize the contents of
each images.
We‘ll call this set of feature vectors F.
Given our set of labels L and feature vectors F, we would like to
create a mathematical function that takes a feature vector as an input
and then returns a value of 0 or 1 (corresponding to the cat or dog
prediction).
If we were to plot this function, it would look something like this:
Where we extract the following feature vector:
To perform classification using Logistic

Regression, we‘ll multiply each of our feature
vector values by a weight and take the sum:
This x value is then passed through our sigmoid function where the output is
constrained such that:
Any output from s(x) that is >=0.5 will be classified as 1 (cat) and anything <0.5 will be classified as 0
(dog).
This seems simple enough.
But the big questions lies in defining this weight vector w.
What are the best weight values for w?
And how do we go about finding them?
To answer that, let‘s go back to the input to the sigmoid function x:
We can represent this more compactly in matrix notation:
Again, v is our input feature vector, and w are the weights associated with each entry in the feature
vector.
Our goal is to find the values of w that make our classifier as accurate as possible; and in order to find
appropriate values of w, we‘ll need to apply gradient ascent/descent.
GRADIENT ASCENT AND DESCENT

To perform gradient ascent for Logistic Regression, we:
1. Extract feature vectors from all images in our dataset.
2. Initialize all weight entries w to 1.
3. Loop N times (or until convergence)
1. Calculate the gradient of the entire dataset.
2. Update the weight entries based on the current values of w, gradient, and learning
rate .
4. Return weights w.
The gradient is our error E on the training data:
where are the feature vectors associated with our training data.
Based on this error E, we can then update our weight vector w via:
Experiment 52: Applying Logistic Regression for image classification
Program 52:

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
import numpy as np
import imutils
import cv2
# grab a small subset of the Labeled Faces in the Wild dataset, then construct
# the training and testing splits (note: if this is your first time running this
# script it may take awhile for the dataset to download -- but once it has downloaded
# the data will be cached locally and subsequent runs will be substantially faster)
print("[INFO] fetching data...")
dataset = datasets.fetch_lfw_people(min_faces_per_person=70, funneled=True, resize=0.5)
(trainData, testData, trainLabels, testLabels) = train_test_split(dataset.data, dataset.target,
# train the model and show the classification report

print("[INFO] training model...")
model = LogisticRegression()
print(classification_report(testLabels, model.predict(testData),
target_names=dataset.target_names))
# loop over a few random images

for i in np.random.randint(0, high=testLabels.shape[0], size=(10,)):
# grab the image and the name, then resize the image so we can better see it
image = testData[i].reshape((62, 47))
name = dataset.target_names[testLabels[i]]
image = imutils.resize(image.astype("uint8"), width=image.shape[1] * 3,
inter=cv2.INTER_CUBIC)
# classify the face

prediction = model.predict(testData[i].reshape(1, -1))[0]
prediction = dataset.target_names[prediction]
print("[PREDICTION] predicted: {}, actual: {}".format(prediction, name))

cv2.imshow("Face", image)
cv2.waitKey(0)
Step 2: Save the code as " train_and_test.py"
Step 3: Run the python script (train_and_test.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python train_and_test.py
Inference:
SUPPORT VECTOR MACHINES
LINEAR SEPARABILITY
In order to explain SVMs, we should first start with the concept of linear separability.
A set of data is linearly separable if we can draw a straight line that clearly separates all data points
in class #1 from all data points belonging to class #2:
In the figures above, we have two classes of
data represented by blue squares and red
circles, respectively.
In Plot A(left) and Plot B (center), we can
clearly draw a (straight) line through the space
that cleanly places all blue squares on one
side of the line and all red circles on the other.
These plots are examples of data points that
are linear separable.
However, in Figure C (right), this is not the
case.
Here, we see four groupings of data points.
The blue squares are present at the top-left
and bottom-right of the plot, whereas the red circles are at the top-right and bottom-left region (this is
known as the XOR [exclusive OR] problem).
Regardless of whether we have a line, plane, or a hyperplane, this separation is our decision
boundary — or the boundary we use to make a decision if a data point is a blue rectangle or a red
circle. All data points for a given class will lay on one side of the decision boundary, and all data points
for the second class on the other.
Given our decision boundary, I am more confident that the
highlighted square is indeed a square, because it is farther away from the
decision boundary than the circle is.
This all makes sense, but how do we come up with this decision boundary?
For example, all 3 plots below can separate the two classes of data — is one of
these separations better than the other?
The actual reason

why Plot C is the best
separation is because
the margin between
the circles and squares is
the largest.
In order to find
this maximum-margin
separating hyperplane, we
can frame the problem as
an optimization problem using support vectors, or data points that lie closest to the decision boundary:
Here, we have highlighted the data points that are our support vectors.
Using these support vectors, we can maximize the margin of the hyperplane, thus separating the two
classes of data in an optimal way:
Experiment 53: Support vector machine for image classification
Program 53:

from sklearn.svm import SVC

import numpy as np
# generate the XOR data

tl = np.random.uniform(size=(100, 2)) + np.array([-2.0, 2.0])
tr = np.random.uniform(size=(100, 2)) + np.array([2.0, 2.0])
br = np.random.uniform(size=(100, 2)) + np.array([2.0, -2.0])
bl = np.random.uniform(size=(100, 2)) + np.array([-2.0, -2.0])
X = np.vstack([tl, tr, br, bl])
y = np.hstack([[1] * len(tl), [-1] * len(tr), [1] * len(br), [-1] * len(bl)])
# construct the training and testing split by taking 75% of data for training and 25% for testing
(trainData, testData, trainLabels, testLabels) = train_test_split(X, y, test_size=0.25,
random_state=42)
# train the linear SVM model, evaluate it, and show the results
print("[RESULTS] SVM w/ Linear Kernel")
model = SVC(kernel="linear")
print(classification_report(testLabels, model.predict(testData)))
print("")
# train the SVM + poly. kernel model, evaluate it, and show the results
print("[RESULTS] SVM w/ Polynomial Kernel")
model = SVC(kernel="poly", degree=2, coef0=1)
print(classification_report(testLabels, model.predict(testData)))
Step 2: Save the code as " classify.py"
Step 3: Run the python script (classify.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python classify.py
Inference:
K-MEANS ALGORITHM
The k-means algorithm is used to find k clusters in a dataset, where the number of clusters k is a user
supplied value.
Each cluster is represented by a single data point called the centroid.
The centroid is defined as the mean (average) of all data points belonging to the cluster and is thus
simply the center of the cluster:
Here, we can see three clusters of data with the
centroids highlighted as white X‗s.
A visual inspection of this figure reveals that
the X mark for each cluster is the average of all data
points belonging to the cluster.
The pseudo-code for k-means is quite simple:
 Step 1: Start off by selecting k random data
points from your input dataset —
these k random data points are your initial
centroids.
 Step 2: Assign each data point in the dataset
to the nearest centroid. This requires
computing the distance from each data point to
each centroid (using a distance metric such as
the Euclidean distance) and assigning the data
point to the cluster with the smallest distance.
 Step 3: Recalculate the position of all centroids by computing the average of all data points in
the cluster.
 Step 4: Repeat Steps 2 and 3 until all cluster assignments are stable (i.e. not flipping back and
forth) or some stopping criterion has been met (such as a maximum number of iterations).
Experiment 54: Support vector machine for image classification
Program 54:

from sklearn.cluster import KMeans
import numpy as np
import random
import cv2
# initialize the list of color choices

colors = [
# shades of red, green, and blue
(138, 8, 8), (180, 4, 4), (223, 1, 1), (255, 0, 0), (250, 88, 88),
(8, 138, 8), (4, 180, 4), (1, 223, 1), (0, 255, 0), (46, 254, 46),
(11, 11, 97), (8, 8, 138), (4, 4, 180), (0, 0, 255), (46, 46, 254)]
# initialize the canvas

canvas = np.ones((400, 600, 3), dtype="uint8") * 255
# loop over the canvas
for y in xrange(0, 400, 20):

for x in xrange(0, 600, 20):
# generate a random (x, y) coordinate, radius, and color for the circle
(dX, dY) = np.random.randint(5, 10, size=(2,))
r = np.random.randint(5, 8)
color = random.choice(colors)[::-1]
# draw the circle on the canvas

cv2.circle(canvas, (x + dX, y + dY), r, color, -1)
# pad the border of the image

canvas = cv2.copyMakeBorder(canvas, 5, 5, 5, 5, cv2.BORDER_CONSTANT,
value=(255, 255, 255))
# convert the canvas to grayscale, threshold it, and detect contours in the image
gray = cv2.cvtColor(canvas, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
thresh = cv2.threshold(gray, 10, 255, cv2.THRESH_BINARY)[1]
(cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# initialize the data matrix

data = []

for c in cnts:
# construct a mask from the contour
mask = np.zeros(canvas.shape[:2], dtype="uint8")
cv2.drawContours(mask, [c], -1, 255, -1)
features = cv2.mean(canvas, mask=mask)[:3]
data.append(features)
# cluster the color features

clt = KMeans(n_clusters=3)
clt.fit(data)
# loop over the unique cluster identifiers

for i in np.unique(clt.labels_):
# construct a mask for the current cluster
mask = np.zeros(canvas.shape[:2], dtype="uint8")
# loop over the indexes of the current cluster and draw them
for j in np.where(clt.labels_ == i)[0]:
cv2.drawContours(mask, [cnts[j]], -1, 255, -1)
# show the output image for the cluster

cv2.imshow("Cluster", cv2.bitwise_and(canvas, canvas, mask=mask))
cv2.waitKey(0)
Step 2: Save the code as " cluster_colors.py"
Step 3: Run the python script (cluster_colors.py) from terminal window (Ctrl+Alt+T)
Go to root folder
$ workon gurus
$ python cluster_colors.py
Inference:
CHAPTER 6: CASE STUDIES

LESSON 6.1: OBJECT TRACKING IN VIDEO
The primary goal is to learn how to detect and track objects in video streams based primarily on their
color.
While defining an object in terms of color boundaries is not always possible, whether due to lighting
conditions or variability of the object(s), being able to use simple color thresholding methods allows us
to easily and quickly perform object tracking.
Let‘s take a look at a single frame of the video file we will be processing:
As you can see, we have two balls in this image:
a blue one and a green one.
We‘ll be writing code that can track each of these balls
separately as they move around the video stream.
A color will be considered green if the following three tests
pass;
 The Hue value H is:
 The Saturation value S is:
 The Value V is:
Similarly, a color will be considered blue if:
 The Hue value H is:
 The Saturation value S is:
 The Value V is:
Experiment 55: Object Tracking in Video
Program 55:

import argparse
import imutils
import cv2

ap.add_argument("-v", "--video", help="path to the (optional) video file")
# define the color ranges

colorRanges = [
((29, 86, 6), (64, 255, 255), "green"),
((57, 68, 0), (151, 255, 255), "blue")]
# if a video path was not supplied, grab the reference to the webcam
if not args.get("video", False):
# otherwise, grab a reference to the video file

else:
# keep looping
while True:
# grab the current frame
# if we are viewing a video and we did not grab a frame, then we have
# reached the end of the video
if args.get("video") and not grabbed:
break
# resize the frame, blur it, and convert it to the HSV color space
blurred = cv2.GaussianBlur(frame, (11, 11), 0)
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# loop over the color ranges

for (lower, upper, colorName) in colorRanges:
# construct a mask for all colors in the current HSV range, then
# perform a series of dilations and erosions to remove any small
# blobs left in the mask
mask = cv2.inRange(hsv, lower, upper)
mask = cv2.erode(mask, None, iterations=2)
mask = cv2.dilate(mask, None, iterations=2)
# find contours in the mask

(cnts, _) = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
# only proceed if at least one contour was found

if len(cnts) > 0:
# find the largest contour in the mask, then use it to compute
# the minimum enclosing circle and centroid
c = max(cnts, key=cv2.contourArea)
((x, y), radius) = cv2.minEnclosingCircle(c)
M = cv2.moments(c)
(cX, cY) = (int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"]))
# only draw the enclosing circle and text if the radius meets a minimum size
if radius > 10:
cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 255), 2)
cv2.putText(frame, colorName, (cX, cY),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 255), 2)
# show the frame to our screen
cv2.imshow("Frame", frame)
# if the 'q' key is pressed, stop the loop

if key == ord("q"):
break
# clean up the camera and close any open windows

camera.release()
Step 2: Save the code as "track.py"
Step 3: Run the python script (track.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python track.py --video BallTracking_01.mp4
Inference:
LESSON 6.2: IDENTIFYING THE COVERS OF BOOKS

Before we can identify the cover of a book in an image, we first need to create our dataset.
I have manually constructed a dataset of 50 book cover images pulled from various sources such as
eBay, Google, and Amazon.com. A sample of these images can be seen below:
I have also created a
corresponding books.csv file, a
database containing meta-information
on each book including the unique
image filename, author, and book title:
Experiment 56: IDENTIFYING THE COVERS OF BOOKS
Program 56:

import argparse
import glob
import csv
import cv2
import numpy as np

ap.add_argument("-d", "--db", required=True, help = "path to the book database")
ap.add_argument("-c", "--covers", required=True, help = "path to the directory that contains our
book covers")
ap.add_argument("-q", "--query", required=True, help = "path to the query book cover")
# initialize the cover descriptor and cover matcher

dad = cv2.FeatureDetector_create("SIFT")
des = cv2.DescriptorExtractor_create("SIFT")
coverPaths = glob.glob(args["covers"] + "/*.png")
def search(queryKps, queryDescs):

# initialize the dictionary of results
results = {}
# loop over the book cover images
for coverPath in coverPaths:
# load the query image, convert it to grayscale, and extract
# keypoints and descriptors
cover = cv2.imread(coverPath)
gray = cv2.cvtColor(cover, cv2.COLOR_BGR2GRAY)
(kps, descs) = describe(gray)
# determine the number of matched, inlier keypoints,

# then update the results
score = match(queryKps, queryDescs, kps, descs)
results[coverPath] = score
# if matches were found, sort them

if len(results) > 0:
results = sorted([(v, k) for (k, v) in results.items() if v > 0], reverse = True)
# return the results

return results
def match(kpsA, featuresA, kpsB, featuresB, ratio=0.7, minMatches=50):

# compute the raw matches and initialize the list of actual matches
matcher = cv2.DescriptorMatcher_create("BruteForce")
rawMatches = matcher.knnMatch(featuresB, featuresA, 2)
matches = []
# loop over the raw matches

for m in rawMatches:
# ensure the distance is within a certain ratio of each other
if len(m) == 2 and m[0].distance < m[1].distance * ratio:
matches.append((m[0].trainIdx, m[0].queryIdx))
# check to see if there are enough matches to process

if len(matches) > minMatches:
# construct the two sets of points
ptsA = np.float32([kpsA[i] for (i, _) in matches])
ptsB = np.float32([kpsB[j] for (_, j) in matches])
# compute the homography between the two sets of points

# and compute the ratio of matched points
(_, status) = cv2.findHomography(ptsA, ptsB, cv2.RANSAC, 4.0)
# return the ratio of the number of matched keypoints

# to the total number of keypoints
return float(status.sum()) / status.size
# no matches were found

return -1.0
def describe(image, useKpList=True):

# detect keypoints in the image and extract local invariant descriptors
kps = dad.detect(image)
(kps, descs) = des.compute(image, kps)
# if there are no keypoints or descriptors, return None

if len(kps) == 0:
return (None, None)
# check to see if the keypoints should be converted to a NumPy array

if useKpList:
kps = np.int0([kp.pt for kp in kps])
# return a tuple of the keypoints and descriptors

return (kps, descs)
# initialize the database dictionary of covers

db = {}
# loop over the database

for l in csv.reader(open(args["db"])):
# update the database using the image ID as the key
db[l[0]] = l[1:]
# load the query image, convert it to grayscale, and extract keypoints and descriptors
queryImage = cv2.imread(args["query"])
gray = cv2.cvtColor(queryImage, cv2.COLOR_BGR2GRAY)
(queryKps, queryDescs) = describe(gray)
# try to match the book cover to a known database of images
results = search(queryKps, queryDescs)
# show the query cover

cv2.imshow("Query", queryImage)
# check to see if no results were found

if len(results) == 0:
print("I could not find a match for that cover!")
cv2.waitKey(0)
# otherwise, matches were found

else:
# loop over the results
for (i, (score, coverPath)) in enumerate(results):
# grab the book information
(author, title) = db[coverPath[coverPath.rfind("/") + 1:]]
print("{}. {:.2f}%% : {} - {}".format(i + 1, score * 100, author, title))
# load the result image and show it

result = cv2.imread(coverPath)
cv2.imshow("Result", result)
cv2.waitKey(0)
Step 2: Save the code as "search.py"
Step 3: Run the python script (search.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ python search.py --db books.csv --covers covers --query queries/query01.png
Inference:
LESSON 6.3: PLANT CLASSIFICATION

COMBINING GLOBAL FEATURE DESCRIPTORS
Plants species are found in millions of variations in nature.
There are many types of plants around the world with common image characteristics such as color,
texture and shape.
These three features are the most important to consider when it comes to plant classification.
As we know that there are two types of feature descriptors namely global descriptor and local
descriptor, in this module we will discuss about applying global feature descriptor to classify plant
species.
We will use the FLOWER17 benchmark dataset provided by the University of Oxford.
In this dataset, there are 17 flower species with 80 images per class.
The objective of this module is to combine Global Feature descriptors namely Haralick Texture
descriptor and Color Channel Statistics descriptor which describes the overall image in terms of texture
and color.
This module combines the two feature vectors to a single global feature vector that describes the
entire image.
Experiment 57: IDENTIFYING THE COVERS OF BOOKS
Program 57:

Download the dataset from this website
(https://drive.google.com/open?id=0B5vzV2BJz_52bXhnWVhMb0RwN1k) and put it inside a folder
named ―dataset‖. The entire project folder setup should be as follows otherwise the program will not
work.
- Plant-Classification (main folder)

o dataset (folder)
 train (folder)
 all the flower species folders + images
o output (folder)
 data.h5
 labels.h5
o global.py
o train_test.py

from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import numpy as np
import mahotas
import cv2
import os
import h5py
print"[STATUS] Loaded imports.."
# Load configuration parameters

fixed_size=tuple((500,500))
training_path="dataset/train"
num_trees=150
test_size=0.10
seed=9
print "[STATUS] Loaded config.."
# Feature Descriptor - {Mean,Standard Deviation} {13}

def fd_meanstddev(image, mask=None):
(mean, stds) =cv2.meanStdDev(cv2.cvtColor(image, cv2.COLOR_BGR2HSV))
colorStats=np.concatenate([mean, stds]).flatten()
return colorStats
# Feature Desctiptor - {Haralick Texture} {13}

def fd_haralick(image):
gray= cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
haralick=mahotas.features.haralick(gray).mean(axis=0)
return haralick
# GLOBAL FEATURE EXTRACTION

print "[STATUS] Training started.."
# training images path

train_path=training_path
# get the training labels

train_labels=os.listdir(train_path)
train_labels.sort()
print(train_labels)
# empty lists to hold data and labels

labels= []
global_features= []
i, j =0, 0
k =0
# loop over the images
for training_name in train_labels:
dir=os.path.join(train_path, training_name)
current_label=training_name
k =1
for x in range(1,81):
file=dir+"/"+str(x) +".jpg"
# read the image and resize it
image= cv2.imread(file)
image= cv2.resize(image, fixed_size)
# Global Features
fv_meanstddev=fd_meanstddev(image)
fv_haralick=fd_haralick(image)
# Feature vector concatenation

global_feature=np.hstack([fv_meanstddev, fv_haralick])
# update the list of labels and features

labels.append(current_label)
global_features.append(global_feature)
print "Feature size: {}".format(global_feature.shape)
# show status
print "Processed Image: {} in {}".format(k, training_name)
i+=1
k +=1
j +=1
print "Feature vector size {}".format(np.array(global_features).shape)

print "Training Labels {}".format(np.array(labels).shape)
# encode the target labels

targetNames=np.unique(labels)
le=LabelEncoder()
target=le.fit_transform(labels)
scaler=MinMaxScaler(feature_range=(0, 1))
rescaled_features=scaler.fit_transform(global_features)
print "Target Labels: {}".format(target)

print "Target Labels shape: {}".format(target.shape)
# save the feature vector using HDF5
h5f_data =h5py.File('output/data.h5', 'w')

h5f_data.create_dataset('dataset_1', data=np.array(rescaled_features))
h5f_label =h5py.File('output/labels.h5', 'w')

h5f_label.create_dataset('dataset_1', data=np.array(target))
h5f_data.close()
h5f_label.close()
print "[STATUS] Training Features and Labels saved.."
Step 3: Save the code as "global.py"
Step 4: Run the python script (global.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ global.py
Step 5:Write the code in Text Editor

# Organize imports
import h5py
import numpy as np
import os
import glob
import cv2
from matplotlib import pyplot
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.model_selection import KFold, StratifiedKFold
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
fromsklearn.discriminant_analysisimportLinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.externals import joblib
# Load configuration params

fixed_size=tuple((500,500))
training_path="dataset/train"
num_trees=150
test_size=0.10
seed=9
print"[STATUS] Loaded config.."
# Prepare MODELS
models= []
models.append(('LR', LogisticRegression(random_state=9)))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier(random_state=9)))
models.append(('RF', RandomForestClassifier(n_estimators=num_trees, random_state=9)))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(random_state=9)))
results= []
names= []
scoring="accuracy"
# Feature Desctiptor - {Haralick Texture} {13}

def fd_haralick(image):
gray= cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
haralick=mahotas.features.haralick(gray).mean(axis=0)
return haralick
# Feature Descriptor - {Mean,Standard Deviation} {13}

def fd_meanstddev(image, mask=None):
(mean, stds) =cv2.meanStdDev(cv2.cvtColor(image, cv2.COLOR_BGR2HSV))
colorStats=np.concatenate([mean, stds]).flatten()
return colorStats
# Load Training features and Labels

h5f_data =h5py.File('output/data.h5', 'r')
h5f_label =h5py.File('output/labels.h5', 'r')
global_features_string= h5f_data['dataset_1']
global_labels_string= h5f_label['dataset_1']
global_features=np.array(global_features_string)
global_labels=np.array(global_labels_string)
h5f_data.close()
h5f_label.close()
print"[STATUS] Features shape: {}".format(global_features.shape)

print"[STATUS] Labels shape: {}".format(global_labels.shape)
print"[STATUS] Training started.."
# TRAINING THE CLASSIFIER

# construct the training and testing split
# training = 75%
# testing = 25%
train_labels= ["bluebell", "buttercup", "coltsfoot", "cowslip", "crocus", "daffodil", "daisy", "dandelion",

"fritillary", "iris", "lilyvalley", "pansy","snowdrop", "sunflower", "tigerlily","tulip", "windflower"]
(trainDataGlobal, testDataGlobal, trainLabelsGlobal, testLabelsGlobal)
=train_test_split(np.array(global_features), np.array(global_labels),
print"[STATUS] Splitted train and test data - Global"

print"Train data : {}".format(trainDataGlobal.shape)
print"Test data : {}".format(testDataGlobal.shape)
print"Train labels: {}".format(trainLabelsGlobal.shape)
print"Test labels : {}".format(testLabelsGlobal.shape)
# 10-fold cross validation

for name, model in models:
kfold=KFold(n_splits=10, random_state=7)
cv_results=cross_val_score(model, trainDataGlobal, trainLabelsGlobal, cv=kfold,
scoring=scoring)
results.append(cv_results)
names.append(name)
msg="%s: %f (%f)"% (name, cv_results.mean(), cv_results.std())
print(msg)
# boxplot algorithm comparison

fig=pyplot.figure()
fig.suptitle('Algorithm Comparison')
ax=fig.add_subplot(111)
pyplot.boxplot(results)
ax.set_xticklabels(names)
pyplot.show()
Step 6: Save the code as "train_test.py"
Step 7: Run the python script (train_test.py) from terminal window (Ctrl+Alt+T)
Go to root folder:
$ pythontrain_test.py
Inference:

IPCVML-2017 Workshop Tutorial PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

IPCVML-2017 Workshop Tutorial PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Two Days Hands-on Training in "Image Processing, Computer Vision and

Machine Learning (IPCVML-2017)"

IDENTIFYING THE COVER OF THE

CHAPTER I: BASICS OF COMPUTER VISION

LESSON 1.1: INTRODUCTION TO OPENCV

1. Load an image off disk using the cv2.imread function.

Experiment 1: Loading, displaying and writing an image in hard disk.

Program 1.1: Using the file location address in python script

# load the image and show some basic information on it

# show the image and wait for a keypress

# save the image -- OpenCV handles converting file types automatically

Step 2: Save the code as "load_display_save1.py"

Program 1.2: Using argument parsing from terminal window

# construct the argument parser and parse the arguments

# load the image and show some basic information on it

# show the image and wait for a keypress

# save the image -- OpenCV handles converting file types automatically

Step 2: Save the code as "load_display_save.py"

LESSON 1.2: IMAGE BASICS

1. Have a full understanding of what a ―pixel‖ is.

Overview of the Coordinate System

Experiment 2: Accessing and Manipulating Pixels

Program 2: Getting and setting the pixel values

# construct the argument parser and parse the arguments

# load the image, grab its dimensions, and show it

cv2.imshow("Top-Right Corner", tr)

# Show our updated image

Step 2: Save the code as "getting_and_setting.py"

LESSON 1.3: DRAWING

 What if we wanted to draw a single line? Or a circle?

Experiment 3: Drawing Shapes-Define images manually using NumPy arrays.

# show our work of art

# draw 25 random circles

# draw our random circle

# Show our masterpiece

# load the image

# draw a circle (two filled in circles) and a rectangle

cv2.circle(image, (168, 188), 90, (0, 0, 255), 2)

# show the output image

Step 2: Save the code as "drawing.py"

LESSON 1.4: BASIC OPERATIONS ON IMAGE

Experiment 4: Translation Operation

# construct the argument parser and parse the arguments

# load the image and show it

# NOTE: Translating (shifting) an image is given by a NumPy matrix in the form:

shifted = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))

def translate(image, x, y):

Step 2: Save the code as "translation.py"

Experiment 5: Rotation Operation

# construct the argument parser and parse the arguments

# load the image and show it

# rotate our image by -90 degrees (clock wise rotation by 90 degree)

def rotate(image, angle, center=None, scale=1.0):

# if the center is None, initialize it as the center of the image

# perform the rotation

# return the rotated image

Step 2: Save the code as "rotation.py"

So which interpolation method should you be using?

Experiment 6: Image resizing

# construct the argument parser and parse the arguments

# load the image and show it

# perform the actual resizing of the image

# perform the resizing