Documente Academic
Documente Profesional
Documente Cultură
Jaiveer
Microsof
[Pick the date]
INDUSTRIAL TRAINING REPORT
on
Face And Eye Detection Using Machine Learning
Submitted as a part of course curriculum for
Bachelor of Technology
in
Submitted by
D R . A P J A B D U L K A L A M T E C H N I C A L U N I V E R S I T Y,
LUCKNOW
2018-19
S. No. Contents Page No.
1. Introduction ( Organization And Project) 7
2. Synopsis 8
3. Overview Of E-Shopping 9-11
4. Number of Modules 12
5. Hardware Requirements 12
6. System Analysis 13-18
Introduction
System Study
Needs of the System
System Planning
Preliminary investigation
Information Gathering
Structured Analysis
Feasibility Study
Implementation
The implementation process can be split into two main stages: 1. The classifier training stage
2. The application development stage During the first stage, the classifier was trained on the
preprocessed training data. This was done in a Jupyter notebook (titled “Create and freeze
graph”), and can be further divided into the following steps: 1. Load both the training and
validation images into memory, preprocessing them as described in the previous section 2.
Implement helper functions: a. get_batch(...).
Machine learning is a type of technology that aims to learn from experience. For example, as a
human, you can learn how to play chess simply by observing other people playing chess.
In the same way, computers are programmed by providing them with data from which
they learn and are then able to predict future elements or conditions.
Let's say, for instance, that you want to write a program that can tell whether a certain type of
fruit is an orange or a lemon. You might find it easy to write such a program and it will
give the required results, but you might also find that the program doesn't work effectively
for large datasets. This is where machine learning comes into play.
There are various steps involved in machine learning:
1.collection of data
2.filtering of data
3.analysis of data
4.algorithm training
5.testing of the algorithm
6.using the algorithm for future predictions
ORGANIZATION
Machine learning uses different kinds of algorithms to find patterns, and these
algorithms are classified into two groups:
supervised learning
unsupervised learning
Supervised Learning
Supervised learning is the science of training a computer to recognize elements
by giving it sample data. The computer then learns from it and is able to predict
future datasets based on the learned data.
For example, you can train a computer to filter out spam messages based on
past information.
training
testing
decision trees
support vector machines
naive Bayes
k-nearest neighbor
linear regression
Unsupervised Learning
Unsupervised learning is when you train your machine with only a set of
inputs. The machine will then be able to find a relationship between the
input data and any other you might want to predict. Unlike in supervised
learning, where you present a machine with some data to train on,
unsupervised learning is meant to make the computer find patterns or
relationships between different datasets.
clustering
association
k-means clustering
hierarchical clustering
FACE DETECTION
The problem of face recognition is all about face detection. This is a fact that seems quite
bizarre to new researchers in this area. However, before face recognition is possible, one
must be able to reliably find a face and its landmarks. This is essentially a segmentation
problem and in practical systems, most of the effort goes into solving this task. In fact the
actual recognition based on features extracted from these facial landmarks is only a minor
last step.
1.1 A successful face detection in an image with a frontal view of a human face.
Most face detection systems attempt to extract a fraction of the whole face, thereby
eliminating most of the background and other areas of an individual's head such as hair
that are not necessary for the face recognition task. With static images, this is often done
by running a across the image. The face detection system then judges if a face is present
inside the window (Brunelli and Poggio, 1993). Unfortunately, with static images there is
a very large search space of possible locations of a face in an image.
Most face detection systems use an example based learning approach to decide whether or not
a face is present in the window at that given instant (Sung and Poggio,1994 and
Sung,1995). A neural network or some other classifier is trained using supervised learning
with 'face' and 'nonface' examples, thereby enabling it to classify an image (window in
face detection system) as a 'face' or 'non-face'.. Unfortunately, while it is relatively easy to
find face examples, how would one find a representative sample of images which
represent non-faces (Rowley et al., 1996)? Therefore, face detection systems using
example based learning need thousands of 'face' and 'nonface' images for effective
training. Rowley, Baluja, and Kanade (Rowley et al.,1996) used 1025 face images and
8000 non-face images (generated from 146,212,178 sub-images) for their training set.
Face detection
All faces in the face database are transformed into face space. Then face recognition is
achieved by transforming any given test image into face space and comparing it with the
training set vectors. The closest matching training set vector should belong to the same
individual as the test image.
Face recognition and detection system is a pattern recognition approach for personal
identification purposes in addition to other biometric approaches such as fingerprint
recognition, signature, retina and so forth. Face is the most common biometric used by
humans applications ranges from static, mug-shot verification in a cluttered background.
Analysis
We propose an Open Soruce Software to efficiently detect and extract faces from an
image,giving using OPENCV,the most popular library for computer vision.Originally
written in C,C++,it now provides bindings for python.
For this, we apply each and every feature on all the training images. For
each feature, it finds the best threshold which will classify the faces to
positive and negative. Obviously, there will be errors or misclassifications.
We select the features with minimum error rate, which means they are the
features that most accurately classify the face and non-face images. (The
process is not as simple as this. Each image is given an equal weight in
the beginning. After each classification, weights of misclassified images
are increased. Then the same process is done. New error rates are
calculated. Also new weights. The process is continued until the required
accuracy or error rate is achieved or the required number of features are
found).
So now you take an image. Take each 24x24 window. Apply 6000 features
to it. Check if it is face or not. Wow.. Isn't it a little inefficient and time
consuming? Yes, it is. The authors have a good solution for that.
The authors' detector had 6000+ features with 38 stages with 1, 10, 25, 25
and 50 features in the first five stages. (The two features in the above
image are actually obtained as the best two features from Adaboost).
According to the authors, on average 10 features out of 6000+ are
evaluated per sub-window.
DESCRIPTION OF TOOLS USED
PLATFORM
A machine learning platform provides capabilities to complete a machine
learning project from beginning to end. Namely, some data analysis, data
preparation, modeling and algorithm evaluation and selection.
Platform used in this project is:
Anaconda is a free and open source[4] distribution of the Python and R programming languages
for data science and machine learningrelated applications (large-scale data
processing, predictive analytics, scientific computing), that aims to simplify package
managementand deployment. Package versions are managed by the package management
system conda.[5] The Anaconda distribution is used by over 6 million users, and it includes more
than 250 popular data science packages suitable for Windows, Linux, and MacOS
LIBRARY
A machine learning library provides capabilities for completing part of a machine learning
project. For example a library may provide a collection of modeling algorithms.
They provide a specific capability for one or more steps in a machine learning project.
The interface is typically an application programming interface requiring
programming.
They are tailored for a specific use case, problem type or environment.
Methodology
Here we will work with face detection. Initially, the algorithm needs a lot
of positive images (images of faces) and negative images (images without
faces) to train the classifier. Then we need to extract features from it. For
this, Haar features shown in the below image are used. They are just like
our convolutional kernel. Each feature is a single value obtained by
subtracting sum of pixels under the white rectangle from sum of pixels
under the black rectangle.
Now, all possible sizes and locations of each kernel are used to calculate
lots of features. (Just imagine how much computation it needs? Even a
24x24 window results over 160000 features). For each feature calculation,
we need to find the sum of the pixels under white and black rectangles. To
solve this, they introduced the integral image. However large your image, it
reduces the calculations for a given pixel to an operation involving just
four pixels. Nice, isn't it? It makes things super-fast.
But among all these features we calculated, most of them are irrelevant.
For example, consider the image below. The top row shows two good
features. The first feature selected seems to focus on the property that
the region of the eyes is often darker than the region of the nose and
cheeks. The second feature selected relies on the property that the eyes
are darker than the bridge of the nose. But the same windows applied to
cheeks or any other place is irrelevant. So how do we select the best
features out of 160000+ features? It is achieved by Adaboost.
For this, we apply each and every feature on all the training images. For
each feature, it finds the best threshold which will classify the faces to
positive and negative. Obviously, there will be errors or misclassifications.
We select the features with minimum error rate, which means they are the
features that most accurately classify the face and non-face images. (The
process is not as simple as this. Each image is given an equal weight in
the beginning. After each classification, weights of misclassified images
are increased. Then the same process is done. New error rates are
calculated. Also new weights. The process is continued until the required
accuracy or error rate is achieved or the required number of features are
found).
So now you take an image. Take each 24x24 window. Apply 6000 features
to it. Check if it is face or not. Wow.. Isn't it a little inefficient and time
consuming? Yes, it is. The authors have a good solution for that.
The authors' detector had 6000+ features with 38 stages with 1, 10, 25, 25
and 50 features in the first five stages. (The two features in the above
image are actually obtained as the best two features from Adaboost).
According to the authors, on average 10 features out of 6000+ are
evaluated per sub-window.
OpenCV comes with a trainer as well as detector. If you want to train your
own classifier for any object like car, planes etc. you can use OpenCV to
create one. Its full details are given here: Cascade Classifier Training.
Here we will deal with detection. OpenCV already contains many pre-
trained classifiers for face, eyes, smiles, etc. Those XML files are stored in
the opencv/data/haarcascades/ folder. Let's create a face and eye detector
with OpenCV.
Negative Samples
Negative samples are taken from arbitrary images, not containing objects
you want to detect. These negative images, from which the samples are
generated, should be listed in a special negative image file containing one
image path per line (can be absolute or relative). Note that negative
samples and sample images are also called background samples or
background images, and are used interchangeably in this document.
Directory structure:
/img
img1.jpg
img2.jpg
bg.txt
File bg.txt:
img/img1.jpg
img/img2.jpg
Your set of negative window samples will be used to tell the machine
learning step, boosting in this case, what not to look for, when trying to
find your objects of interest.
Positive Samples
While the first approach works decently for fixed objects, like very rigid
logo's, it tends to fail rather soon for less rigid objects. In that case we do
suggest to use the second approach. Many tutorials on the web even state
that 100 real object images, can lead to a better model than 1000
artificially generated positives, by using the opencv_createsamples
application.
Since OpenCV 3.x the community has been supplying and maintaining a
open source annotation tool, used for generating the -info file. The tool can
be accessed by the command opencv_annotation if the OpenCV
applications where build.
Using the tool is quite straightforward. The tool accepts several required
and some optional parameters:
Note that the optional parameters can only be used together. An example
of a command that could be used can be seen below
opencv_annotation --annotations=/path/to/annotations/file.txt
--images=/path/to/image/folder/
This command will fire up a window containing the first image and your
mouse cursor which will be used for annotation. A video on how to use the
annotation tool can be found here. Basically there are several keystrokes
that trigger an action. The left mouse button is used to select the first
corner of your object, then keeps drawing until you are fine, and stops
when a second left mouse button click is registered. After each selection
you have the following choices:
Finally you will end up with a usable annotation file that can be passed to
the -info argument of opencv_createsamples.
Cascade Training
The next step is the actual training of the boosted cascade of weak
classifiers, based on the positive and negative dataset that was prepared
beforehand.
Common arguments:
o -data <cascade_dir_name> : Where the trained classifier should be
stored. This folder should be created manually beforehand.
Cascade parameters:
From time to time it can be useful to visualise the trained cascade, to see
which features it selected and how complex its stages are. For this
OpenCV supplies a opencv_visualisation application. This application has
the following commands:
OpenCV already contains many pre-trained classifiers for face, eyes, smiles,
etc. Those XML files are stored in the opencv/data/haarcascades/ folder. Let's
create a face and eye detector with OpenCV.
First we need to load the required XML classifiers. Then load our input image
(or video) in grayscale mode.
Then we find the faces in the image. If faces are found, it returns the positions
of detected faces as Rect(x,y,w,h). Once we get these locations, we can
create a ROI for the face and apply eye detection on this ROI (since eyes are
always on the face !!! ).
Result will look like below:
The computational models, which were implemented in this project, were chosen after
extensive research, and the successful testing results confirm that the choices made by the
researcher were reliable.The system with manual face detection and automatic face
recognition did not have a recognition accuracy over 90%, due to the limited number of
eigenfaces that were used for the PCA transform. This system was tested under very
robust conditions in this experimental study and it is envisaged that real-world
performance will be far more accurate.The fully automated frontal view face detection
system displayed virtually perfect accuracy and in the researcher's opinion further work
need not be conducted in this area. The fully automated face detection and recognition
system was not robust enough to achieve a high recognition accuracy. The only reason for
this was the face recognition subsystem did not display even a slight degree of invariance
to scale, rotation or shift errors of the segmented face image. This was one of the system
requirements identified in section 2.3. However, if some sort of further processing, such
as an eye detection technique, was implemented to further normalise the segmented face
image, performance will increase to levels comparable to the manual face detection and
recognition system. Implementing an eye detection technique would be a minor extension
to the implemented system and would not require a great deal of additional research.All
other implemented systems displayed commendable results and reflect well on the
deformable template and Principal Component Analysis strategies.The most suitable real-
world applications for face detection and recognition systems are for mugshot matching
and surveillance. There are better techniques such as iris or retina recognition and face
recognition using the thermal spectrum for user access and user verification applications
since these need a very high degree of accuracy.The real-time automated pose invariant
face detection and recognition system proposed in chapter seven would be ideal for crowd
surveillance applications. If such a system were widely implemented its potential for
locating and tracking suspects for law enforcement agencies is immense. The
implemented fully automated face detection and recognition system (with an eye
detection system) could be used for simple surveillance applications such as ATM user
security, while the implemented manual face detection and automated recognition system
is ideal of mugshot matching. Since controlled conditions are present when mugshots are
gathered, the frontal view face recognition scheme should display a recognition accuracy
far better than the results, which were obtained in this study, which was conducted under
adverse conditions. Furthermore, many of the test subjects did not present an
expressionless, frontal view to the system. They would probably be more compliant when
a 6'5'' policeman is taking their mugshot! In mugshot matching applications, perfect
recognition accuracy or an exact match is not a requirement. If a face recognition system
can reduce the number of images that a human operator has to search through for a match
from 10000 to even a 100, it would be of incredible practical use in law enforcement. The
automated vision systems implemented in this thesis did not even approach the
performance, nor were they as robust as a human's innate face recognition system.
However, they give an insight into what the future may hold in computer vision
Analysis
We propose an Open Soruce Software to efficiently detect and extract faces from an
image,giving using OPENCV,the most popular library for computer vision.Originally
written in C,C++,it now provides bindings for python.