Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Learning OpenCV 3 Application Development
Learning OpenCV 3 Application Development
Learning OpenCV 3 Application Development
Ebook563 pages5 hours

Learning OpenCV 3 Application Development

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This is the perfect book for anyone who wants to dive into the exciting world of image processing and computer vision. This book is aimed at programmers with a working knowledge of C++. Prior knowledge of OpenCV or Computer Vision/Machine Learning is not required.
LanguageEnglish
Release dateDec 19, 2016
ISBN9781784399139
Learning OpenCV 3 Application Development

Related to Learning OpenCV 3 Application Development

Related ebooks

Programming For You

View More

Related articles

Reviews for Learning OpenCV 3 Application Development

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Learning OpenCV 3 Application Development - Samyak Datta

    Table of Contents

    Learning OpenCV 3 Application Development

    Credits

    About the Author

    About the Reviewer

    www.PacktPub.com

    Why subscribe?

    Preface

    What this book covers 

    Who this book is for 

    What you need for this book 

    Conventions 

    Reader feedback

    Customer support

    Downloading the example code

    Downloading the color images of this book

    Errata

    Piracy

    Questions

    1. Laying the Foundation

    Digital image basics

    Pixel intensities

    Color depth and color spaces

    Color channels

    Introduction to the Mat class

    Exploring the Mat class: loading images

    Exploring the Mat class - declaring Mat objects

    Spatial dimensions of an image

    Color space or color depth

    Color channels

    Image size

    Default initialization value

    Digging inside Mat objects

    Traversing Mat objects

    Continuity of the Mat data matrix

    Image traversals

    Image enhancement

    Lookup tables

    Linear transformations

    Identity transformation

    Negative transformation

    Logarithmic transformations

    Log transformation

    Exponential or inverse-log transformation

    Summary

    2. Image Filtering

    Neighborhood of a pixel

    Image averaging

    Image filters

    Image averaging in OpenCV

    Blurring an image in OpenCV

    Gaussian smoothing

    Gaussian function and Gaussian filtering

    Gaussian filtering in OpenCV

    Using your own filters in OpenCV

    Image noise

    Vignetting

    Implementing Vignetting in OpenCV

    Summary

    3. Image Thresholding

    Binary images

    Image thresholding basics

    Image thresholding in OpenCV

    Types of simple image thresholding

    Binary threshold

    Inverted binary threshold

    Truncate

    Threshold-to-zero

    Inverted threshold-to-zero

    Adaptive thresholding

    Morphological operations

    Erosion and dilation

    Erosion and dilation in OpenCV

    Summary

    4. Image Histograms

    The basics of histograms

    Histograms in OpenCV

    Plotting histograms in OpenCV

    Color histograms in OpenCV

    Multidimensional histograms in OpenCV

    Summary

    5. Image Derivatives and Edge Detection

    Image derivatives

    Image derivatives in two dimensions

    Visualizing image derivatives with OpenCV

    The Sobel derivative filter

    From derivatives to edges

    The Sobel detector - a basic framework for edge detection

    The Canny edge detector

    Image noise and edge detection

    Laplacian - yet another edge detection technique

    Blur detection using OpenCV

    Summary

    6. Face Detection Using OpenCV

    Image classification systems

    Face detection

    Haar features

    Integral image

    Integral image in OpenCV

    AdaBoost learning

    Cascaded classifiers

    Face detection in OpenCV

    Controlling the quality of detected faces

    Gender classification

    Working with real datasets

    Summary

    7. Affine Transformations and Face Alignment

    Exploring the dataset

    Running face detection on the dataset

    Face alignment - the first step in facial analysis

    Rotating faces

    Image cropping -- basics

    Image cropping for face alignment

    Face alignment - the complete pipeline

    Summary

    8. Feature Descriptors in OpenCV

    Introduction to the local binary pattern

    A basic implementation of LBP

    Variants of LBP

    What does LBP capture?

    Applying LBP to aligned facial images

    A complete implementation of LBP

    Putting it all together - the main() function

    Summary

    9. Machine Learning with OpenCV

    What is machine learning

    Supervised and unsupervised learning

    Revisiting the image classification framework

    k-means clustering - the basics

    k-means clustering - the algorithm

    k-means clustering in OpenCV

    k-nearest neighbors classifier - introduction

    k-nearest neighbors classifier - algorithm

    What k to use

    k-nearest neighbors classifier in OpenCV

    Some problems with kNN

    Some enhancements to kNN

    Support vector machines (SVMs) - introduction

    Intuition into the workings of SVMs

    Non-linear SVMs

    SVM in OpenCV

    Using an SVM as a gender classifier

    Overfitting

    Cross-validation

    Common evaluation metrics

    The P-R curve

    Some qualitative results

    Summary

    A. Command-line Arguments in C++

    Introduction to command-line arguments

    Parsing command-line arguments

    Summary

    Learning OpenCV 3 Application Development


    Learning OpenCV 3 Application Development

    Copyright © 2016 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: December 2016

    Production reference: 1131216

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78439-145-4

    www.packtpub.com

    Credits

    About the Author

    Samyak Datta has a bachelor's and a master's degree in Computer Science from the Indian Institute of Technology, Roorkee. He is a computer vision and machine learning enthusiast. His first contact with OpenCV was in 2013 when he was working on his master's thesis, and since then, there has been no looking back. He has contributed to OpenCV's GitHub repository. Over the course of his undergraduate and master's degrees, Samyak has had the opportunity to engage with both the industry and research. He worked with Google India and Media.net (Directi) as a software engineering intern, where he was involved with projects ranging from machine learning and natural language processing to computer vision. As of 2016, he is working at the Center for Visual Information Technology (CVIT) at the Indian Institute of Information Technology, Hyderabad.

    About the Reviewer

    Nikolaus Gradwohl was born 1976 in Vienna, Austria and always wanted to become an inventor like Gyro Gearloose. When he got his first Atari, he figured out that being a computer programmer is the closest he could get to that dream. He has written programs for nearly anything that can be programmed, ranging from 8-bit microcontrollers to mainframes, for a living. In his free time, he collects programming languages and operating systems.

    He is the author of the book Processing 2: Creative Coding Hotshot. You can see some of his work on his blog at http://www.local-guru.net/.

    www.PacktPub.com

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www.packtpub.com/mapt

    Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Preface

    The mission of this book is to explain to a novice the steps involved in building and deploying an end-to-end application in the domain of computer vision using OpenCV/C++. The book will start with instructions about installing the library and end with the reader having developed an application that does something tangible and useful in computer vision/machine learning. All concepts included in the text have been selected because of their frequent use and relevance in practical computer vision-based projects. To avoid being too theoretical, the description of concepts is accompanied simultaneously by the development of some end-to-end applications. The projects will be explained and developed step-by-step during the entire course of the text, as and when relevant theoretical concepts will be introduced. This will help the readers grasp the practical applications of the concept under study while not losing sight of the big-picture.

    What this book covers 

    Chapter 1, Laying the Foundation, will lay the foundation for the basic data structures and containers in OpenCV for example, the Mat, Rect, Point and Scalar objects . It also explains the need for each of them as a separate data types by offering an insight into the possible use cases. A major portion of the chapter focuses on how OpenCV uses the Mat object to store images so that your code can access them, basics of scanning images using the Mat object and the concept of R, G and B color channels. After the readers are comfortable with working with images, we introduce the concept of pixel-based image transformations and show examples of simple code that can achieve contrast/brightness modification using simple Mat object traversals.

    Chapter 2, Image Filtering, progresses to slightly advanced pixel traversal algorithms. We introduce the concepts of masks and image filtering and talk about the standard filters such as box filter, median filter, and Gaussian blur in OpenCV. Also, we will mathematically explain the concepts of convolution and correlation and show how a generic filter can be written using OpenCV’s filter2D method. As a practical use case, we implement a popular and interesting image manipulation technique called the Vignette filter.

    Chapter 3, Image Thresholding, talks about image thresholding, which is yet another process that frequently comes up in the solution to most computer vision problems. In this chapter, the readers are made to understand that the algorithms that fall into this domain basically involve operations that produce a binary image from a grayscale one. The different variants such as binary and inverted binary, are available as part of the OpenCV imgproc module are explained in detail. The chapter will also briefly touch upon the concepts of erosion and dilation (morphological transformations) as steps that takes place after thresholding.

    Chapter 4, Image Histograms, talks about aggregating pixel values by discussing image histograms and histogram-related operations on images such as calculating and displaying histograms, the concept of color and multi-dimensional histograms. 

    Chapter 5, Image Derivatives and Edge Detection, focuses on other types of information that can be extracted from the pixels in our image. The readers are introduced to the concept of image derivatives and the discussion then moves on to the application of image derivatives in edge detection algorithms. A demonstration of the edge detection methods in OpenCV for example Sobel, Canny and Laplacian is presented. As a small practical use-case of the Laplacian, we demonstrate how it can be used to detect the blurriness of images.

    Chapter 6, Face Detection Using OpenCV, talks about one of the most popular and ubiquitous problems in the computer vision community--detecting objects, specifically faces in images. The main motivation of this chapter is to take a look at the Haar-cascade classifier algorithm, which is used to detect faces and then go on to show how the complex algorithm can be run using a single line of OpenCV code. At this point, we introduce our programming project of predicting gender of a person from a facial image. The reader is made to realize that any system involving analysis of facial images (be it face recognition or gender, age or ethnicity prediction) requires an accurate and robust face detection as its first step.

    Chapter 7, Affine Transformations and Face Alignment, serves as a natural successor to the preceding chapter. After detecting faces from images, the readers will be taught about the post-processing steps that are undertaken--geometric (Affine) transformations such resizing, cropping and rotating images. The need for such transformations is explained. All images in the gender classification project have to go through this step before feature extraction.

    Chapter 8, Feature Descriptors in OpenCV, introduces the notion of feature descriptors. The readers realize that in order to infer meaningful information from images, one must construct appropriate features from the pixel values. In other words, images have to be converted into feature vectors before feeding them into a machine learning algorithm. To that end, the chapter also goes on to introduce the concept of local binary pattern as a feature and also talks about the details of implementation. We demonstrate the process of calculating the LBP features from the cropped and aligned facial images that we obtained in the previous chapters.

    Chapter 9, Machine Learning with OpenCV, teaches the readers different classifiers and machine learning algorithms available as part of the OpenCV library and how they can be used to make meaningful predictions from data. The readers will witness the learning algorithms accept the feature vectors that they had computed in the previous chapters as input and make intelligent predictions as outputs. All the steps involved with using a learning algorithm--training, testing, cross validation, selection of hyper-parameters and evaluation of results--will be covered in detail.

    Appendix, Command-line Arguments in C++, talks about command line arguments in C++ and how to extract the best possible use of them while writing OpenCV programs.

    Who this book is for 

    Learning OpenCV 3.0 Application Development is the perfect book for anyone who wants to dive into the exciting world of image processing and computer vision. This book is aimed at programmers having a working knowledge of C++. A prior knowledge of OpenCV is not required. The book also doesn't assume any prior understanding of computer vision/machine learning algorithms as all concepts are explained from basic principles. Although, familiarity with these concepts would help. By the end of this book, readers will get a first-hand experience of building and deploying computer vision applications using OpenCV and C++ by following the detailed, step-by-step tutorials. They will begin to appreciate the power of OpenCV in simplifying seemingly challenging and intensive tasks such as contrast enhancement in images, edge detection, face detection and classification.

    What you need for this book 

    The book assumes a basic, working knowledge of C++. However, prior knowledge of computer vision, image processing or machine learning is not assumed. You will need an OpenCV 3.1 installation in your systems to run the sample programs spread across the various chapters in this book. The setup and installation details have already been shared.

    Conventions 

    In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can include other contexts through the use of the include directive.

    A block of code is set as follows:

    typedef Size_ Size2i;

    typedef Size2i Size;

    New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Clicking the Next button moves you to the next screen.

    Note

    Warnings or important notes appear in a box like this.

    Tip

    Tips and tricks appear like this.

    Reader feedback

    Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

    To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message.

    If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

    Customer support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Downloading the example code

    You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

    You can download the code files by following these steps:

    Log in or register to our website using your e-mail address and password.

    Hover the mouse pointer on the SUPPORT tab at the top.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box.

    Select the book for which you're looking to download the code files.

    Choose from the drop-down menu where you purchased this book from.

    Click on Code Download.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR / 7-Zip for Windows

    Zipeg / iZip / UnRarX for Mac

    7-Zip / PeaZip for Linux

    The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-OpenCV-3-Application-Development We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Downloading the color images of this book

    We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/LearningOpenCV3ApplicationDevelopment_ColorImages.pdf.

    Errata

    Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

    To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

    Piracy

    Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

    Please contact us at copyright@packtpub.com with a link to the suspected pirated material.

    We appreciate your help in protecting our authors and our ability to bring you valuable content.

    Questions

    If you have a problem with any aspect of this book, you can contact us at questions@packtpub.com, and we will do our best to address the problem.

    Chapter 1. Laying the Foundation

    Computer vision is a field of study that associates itself with processing, analyzing, and understanding images. It essentially tries to mimic what the human brain does with images captured by our retina. These tasks are easy for human beings but are not so trivial for a computer. In fact, some of them are computationally so challenging that they are open research problems in the computer vision community. This book is designed to help you learn how to develop applications in OpenCV/C++. So, what is an ideal place to start learning about computer vision? Well, images form an integral component in any vision-based application. Images are everywhere! Everything that we do in the realm of computer vision boils down to performing operations on images.

    This chapter will teach you the basics of images. We will discuss some common jargons that come up frequently while we talk about images in the context of computer vision. You will learn about pixels that make up images, the difference between color spaces and color channels and between grayscale and color images. Knowing about images is fine, but how do we transfer our knowledge about images to the domain of a programming language? That's when we introduce OpenCV.

    OpenCV is an open source computer vision and machine learning library. It provides software programmers an infrastructure to develop computer vision-based applications. The library has its own mechanisms for efficient storage, processing, and retrieval of these images. This will be a major topic of discussion in this chapter. We will learn about the different data structures that the OpenCV developers have made available to the users. We will try to get a glimpse of the possible use cases for each of the data structures that we discuss. Although we try to cover as many data structures as possible, for starters, this chapter will focus on one particular data structure of paramount importance, which lies at the core of everything that we can possibly do with the library: the Mat object.

    At the most basic level, the Mat object is what actually stores the two-dimensional grid of pixel intensity values that represent an image in the digital realm. The aim of this chapter is to equip the readers with sufficient knowledge regarding the inner workings of the Mat object so that they are able to write their first OpenCV program that does some processing on images by manipulating the underlying Mat objects in the code. This chapter will teach you different ways to traverse your images by going over the pixels one by one and applying some basic modifications to the pixel values. The end result will be some cool and fascinating effects on your images that will remind you of filters in Instagram or any popular image manipulation app

    We will be embarking on our journey to learn something new. The first steps are always exciting! Most of this chapter will be devoted to developing and strengthening your basics-the very same basics that will, in the near future, allow us to take up challenging tasks in computer vision, such as face detection, facial analysis, and facial gender recognition.

    In this chapter, we will cover the following topics:

    Some basic concepts from the realm of digital images: pixels, pixel intensities, color depth, color spaces, and channels

    A basic introduction to the Mat class in OpenCV and how the preceding concepts are implemented in code using Mat objects

    A simple traversal of the Mat class objects, which will allow us to access as well as process the pixel intensity values of the image, one by one

    Finally, as a practical application of what we will learn during the chapter, we also present the implementations of some image enhancement techniques that use the pixel traversal concepts

    Digital image basics

    Digital images are composed of a two-dimensional grid of pixels. These pixels can be thought of as the most fundamental and basic building blocks of images. When you view an image, either in its printed form on paper or in its digital format on computer screens, televisions, and mobile phones, what you see is a dense cluster of pixels arranged in a two-dimensional grid of rows and columns. Our eyes are of course not able to differentiate one individual pixel from its neighbor, and hence, images appear continuous to us. But, in reality, every image is composed of thousands and sometimes millions of discrete pixels.

    Every single one of these pixels carries some information, and the sum total of all this information makes up the entire image and helps us see the bigger picture. Some of the pixels are light, some are dark. Each of them is colored with a different hue. There are grayscale images, which are commonly known as black and white images. We will avoid the use of the latter phrase because in image processing jargon, black and white refers to something else all together. It does not take an expert to deduce that colored images hold a lot more visual detail than their grayscale counterparts.

    So, what pieces of information do these individual, tiny pixels store that enable them to create the images that they are a part of? How does a grayscale image differ from a colored one? Where do the colors come from? How many of them are there? Let's answer all these questions one by one.

    Pixel intensities

    There are countless sophisticated instruments that aid us in the process of acquiring images from nature. At the most basic level, they work by capturing light rays as they enter through the aperture of the instrument's lens and fall on a photographic plate. Depending on the orientation, illumination, and other parameters of the photo-capturing device, the amount of light that falls on each spatial coordinate of the film differs. This variation in the intensity of light falling on the film is encoded as pixel values when the image is stored in a digital format. Therefore, the information stored by a pixel is nothing more than a quantitative measure of the intensity of light that illuminated that particular spatial coordinate while the image was being captured. What this essentially means is that any image that you see, when represented digitally, is reduced to a two-dimensional grid of values where each pixel in the image is assigned a numerical value that is directly proportional to the intensity of light falling on that pixel in the natural image.

    Color depth and color spaces

    Now we come to the issue of encoding light intensity in pixel values. If you have studied a programming language before, you might be aware that the range and the type of values that you can store in any data structure are closely linked to the data type. A single bit can represent two values: 0 and 1. Eight bits (also known as a byte) can accommodate different values. Going further along, an int (represented using 32 bits in most architectures) data type has the capacity to represent roughly 4.29 billion different entries. Extending the same logic to digital images, the range of values that can be used to represent the pixel intensities depends on the data type we select for storing the image. In the world of image processing, the term color space or color depth is used in place of data type.

    The most common and simplest color space for representing images is using 8 bits to represent the value of each pixel. This means that each pixel can have any value between 0 and 255 (inclusive). Images made up of such color spaces are called grayscale images. By convention, 0 represents black, 255 represents white, and each of the other values between 0 and 255 stand for a different shade of gray. The following figure demonstrates such an 8-bit color space. As we move from left to right in the following figure, the grayscale values in the image gradually change from 0 to 255:

    So, if we have a grayscale image, such as the following one, then to a digital medium, it is merely a matrix of values-where each element of the matrix is a grayscale value between 0 (black) to 255 (white). This grid of pixel intensity values is represented for a tiny sub-section of the image (a portion of one of the wing mirrors of the car).

    Color channels

    We have seen that using 8 bits is sufficient to represent grayscale images in digital media. But how do we represent colors? This brings us to the concept of color channels. A majority of the images that you come across are colored as opposed to grayscale. In the case of the image we just saw, each pixel is associated with a single intensity value (between 0 and 255). For color images, each pixel has three values or components: the red (R), green (G), and blue (B) components. It is a well-known fact that all possible colors can be represented as a combination of the R, G, and B components, and hence, the triplet of intensity values at each pixel are sufficient to represent

    Enjoying the preview?
    Page 1 of 1