Sunteți pe pagina 1din 115

Machine

Learning
using
R and Python
The Road Map…
We are living in the primitive age of machines, while the

future of machine is enormous and is beyond our scope of

imagination.
Instead of you writing the code, what we do is that, we feed data to the generic
algorithm, and the algorithm / machine builds the logic based on the given data.

Machine Learning is a subset of artificial intelligence which focuses mainly on


machine learning from their experience and making predictions based on its
experience.
1. It enables the computers or the machines to make data-driven decisions rather than

being explicitly programmed for carrying out a certain task.

2. These programs or algorithms are designed in a way that they learn and improve over

time when are exposed to new data.

3. Machine Learning is a subset of Artificial Intelligence which provide computers with

the ability to learn without being explicitly programmed.

4. In machine learning, we do not have to define explicitly all the steps or conditions like

any other programming application.

5. On the contrary, the machine gets trained on a training dataset, large enough to create a

model, which helps machine to take decisions based on its learning.


A computer program is said to ‘learn‘ from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.

Learning is any process by which a system improves performance from


experience.
Machine Learning algorithm is trained using a training data set to create a
model. When new input data is introduced to the ML algorithm, it makes a
prediction on the basis of the model.
Types of Machine Learning
1. Supervised Learning – Train Me!
2. Unsupervised Learning – I am self sufficient in learning
3. Reinforcement Learning – My life My rules! (Hit & Trial)
We use the training dataset to get better boundary conditions which could be used to
determine each target class. Once the boundary conditions are determined, the next
task is to predict the target class. The whole process is known as classification.
Reinforcement Learning is a type of machine learning algorithm
where the machine/agent in an environment learns ideal behavior in order to
maximize its performance. Simple reward feedback is required for the agent
to learn its behavior, this is known as the reinforcement signal.
A linear regression is one of the easiest statistical models in machine
learning. It is used to show the linear relationship between a dependent
variable and one or more independent variables.
Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent and independent
variable.

•Linear Regression
•Logistic Regression
Least Square Method – Finding the best fit line
Least squares is a statistical method used to determine the best fit line or
the regression line by minimizing the sum of squares created by a
mathematical function. The “square” here refers to squaring the distance
between a data point and the regression line. The line with the minimum
value of the sum of square is the best-fit regression line.

Regression Line, y = mx+c where,


y = Dependent Variable
x= Independent Variable ; c = y-Intercept
R Square Method – Goodness of Fit
R–squared value is the statistical measure to show how close the data are to the
fitted regression line

y = actual value
y ̅ = mean value of y
yp = predicted
value of y
“A computer program is said to learn from experience E with respect to some

task T and some performance measure P, if its performance on T, as measured by

P, improves with experience E.”


Classification
Algorithms
1. Naive Bayes Classifier

2. Support Vector Machines

3. Decision Trees

4. Random Forest

5. Nearest Neighbor
Classifier: An algorithm that maps the input data to a specific category.
Classification model: A classification model tries to draw some conclusion from the
input values given for training. It will predict the class labels/categories for the new data.
Feature: A feature is an individual measurable property of a phenomenon being observed.
Binary Classification: Classification task with two possible outcomes. Eg: Gender
classification (Male / Female)
Multi class classification: Classification with more than two classes. In multi class
classification each sample is assigned to one and only one target label. Eg: An animal can
be cat or dog but not both at the same time
Multi label classification: Classification task where each sample is mapped to a set of
target labels (more than one class). Eg: A news article can be about sports, a person, and
location at the same time.

The following are the steps involved in building a classification model:


Initialize the classifier to be used.
Train the classifier: All classifiers in scikit-learn uses a fit(X, y) method to fit the
model(training) for the given train data X and train label y.
Predict the target: Given an unlabeled observation X, the predict(X) returns the
predicted label y.
Evaluate the classifier model
Naive Bayes Classifier (Generative Learning Model) :

1. It is a classification technique based on Bayes’ Theorem with an assumption of

independence among predictors.

2. In simple terms, a Naive Bayes classifier assumes that the presence of a particular

feature in a class is unrelated to the presence of any other feature.

3. Even if these features depend on each other or upon the existence of the other

features, all of these properties independently contribute to the probability.

4. Naive Bayes model is easy to build and particularly useful for very large data sets.

Along with simplicity, Naive Bayes is known to outperform even highly

sophisticated classification methods.


Suppose we have a Day with the following values :
Outlook = Rain
Humidity = High
Wind = Weak
Play =?

So, with the data, we have to predict whether “we can play on that day or not”.

Likelihood of ‘Yes’ on that Day


= P(Outlook = Rain|Yes)*P(Humidity= High|Yes)* P(Wind= Weak|Yes)*P(Yes)
= 2/9 * 3/9 * 6/9 * 9/14 = 0.0199

Likelihood of ‘No’ on that Day


= P(Outlook = Rain|No)*P(Humidity= High|No)* P(Wind= Weak|No)*P(No)
= 2/5 * 4/5 * 2/5 * 5/14 = 0.0166
Now we normalize the values, then

P(Yes) = 0.0199 / (0.0199+ 0.0166) = 0.55

P(No) = 0.0166 / (0.0199+ 0.0166) = 0.45

Our model predicts that there is a 55% chance there will be a Game

tomorrow.
Decision Trees:

1. Decision tree builds classification or regression models in the form of a tree structure.

2. It breaks down a data set into smaller and smaller subsets while at the same time an

associated decision tree is incrementally developed.

3. The final result is a tree with decision nodes and leaf nodes.

4. A decision node has two or more branches and a leaf node represents a classification

or decision.

5. The topmost decision node in a tree which corresponds to the best predictor called

root node.

6. Decision trees can handle both categorical and numerical data.


Random Forest:

1. Random forests or random decision forests are an ensemble learning method for

classification, regression and other tasks, that operate by constructing a multitude of

decision trees at training time and outputting the class that is the mode of the classes

(classification) or mean prediction (regression) of the individual trees.

2. Random decision forests correct for decision trees’ habit of over fitting to their

training set.
Nearest Neighbor:

1. The k-nearest-neighbors algorithm is a classification algorithm, and it is supervised:

it takes a bunch of labelled points and uses them to learn how to label other points.

2. To label a new point, it looks at the labelled points closest to that new point (those are

its nearest neighbors), and has those neighbors vote, so whichever label the most of

the neighbors have is the label for the new point (the “k” is the number of neighbors

it checks).
Support Vector Machine

Support vector machine is a representation of the training data as points in space

separated into categories by a clear gap that is as wide as possible.

New examples are then mapped into that same space and predicted to belong to a

category based on which side of the gap they fall.


Clustering
Algorithms
Types of clustering

Hard Clustering: In hard clustering, each data point either belongs to a cluster

completely or not.

Soft Clustering: In soft clustering, instead of putting each data point into a separate

cluster, a probability or likelihood of that data point to be in those clusters is

assigned.
Types of clustering algorithms

1. Connectivity models: As the name suggests, these models are based


on the notion that the data points closer in data space exhibit more
similarity to each other than the data points lying farther away.

2. Centroid models: These are iterative clustering algorithms in which


the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.

3. Distribution models: These clustering models are based on the notion


of how probable is it that all data points in the cluster belong to the
same distribution

4. Density Models: These models search the data space for areas of
varied density of data points in the data space.
K Means Clustering
K means is an iterative clustering algorithm that aims to find local maxima in each
iteration. This algorithm works in these 5 steps :
1. Specify the desired number of clusters K : Let us choose k=2 for these 5 data
points in 2-D space.
2. Randomly assign each data point to a cluster : Let’s assign three points in cluster
1 shown using red color and two points in cluster 2 shown using grey color.
3. Compute cluster centroids : The centroid of data points in the red cluster
is shown using red cross and those in grey cluster using grey cross.
4. Re-assign each point to the closest cluster centroid : Note that only the
data point at the bottom is assigned to the red cluster even though its closer
to the centroid of grey cluster. Thus, we assign that data point into grey
cluster
5. Re-compute cluster centroids : Now, re-computing the centroids for
both the clusters.

Repeat steps 4 and 5 until no improvements are possible : Similarly, we’ll repeat
the 4th and 5th steps until we’ll reach global optima. When there will be no further
switching of data points between two clusters for two successive repeats. It will
mark the termination of the algorithm if not explicitly mentioned.
Hierarchical Clustering

This algorithm starts with all the data points assigned to a cluster of their
own. Then two nearest clusters are merged into the same cluster. In the end,
this algorithm terminates when there is only a single cluster left.
At the bottom, we start with 25 data points, each assigned to separate clusters.
Two closest clusters are then merged till we have just one cluster at the top.
The height in the dendrogram at which two clusters are merged represents the
distance between two clusters in the data space.

the best choice of no. of clusters will be 4 as the red horizontal line in the
dendrogram below covers maximum vertical distance AB.
Two important things that you should know about hierarchical clustering are:
This algorithm has been implemented above using bottom up approach. It is also
possible to follow top-down approach starting with all data points assigned in the
same cluster and recursively performing splits till each data point is assigned a
separate cluster.

The decision of merging two clusters is taken on the basis of closeness of these
clusters. There are multiple metrics for deciding the closeness of two clusters :
Euclidean distance: ||a-b||2 = √(Σ(ai-bi))
Squared Euclidean distance: ||a-b||22 = Σ((ai-bi)2)
Manhattan distance: ||a-b||1 = Σ|ai-bi|
Maximum distance:||a-b||INFINITY = maxi|ai-bi|
Mahalanobis distance: √((a-b)T S-1 (-b)) {where, s : covariance matrix}
Applications of Clustering
1. Recommendation engines
2. Market segmentation
3. Social network analysis
4. Search result grouping
5. Medical imaging
6. Image segmentation
7. Anomaly detection

A pizza chain wants to open its delivery centres across a city. What do you
think would be the possible challenges?
They need to analyse the areas from where the pizza is being ordered
frequently.
They need to understand as to how many pizza stores has to be opened to
cover delivery in the area.
They need to figure out the locations for the pizza stores within all these areas
in order to keep the distance between the store and delivery points minimum.
Resolving these challenges includes a lot of analysis and mathematics.
Association
ASSOCIATION RULES

The Problem

When we go grocery shopping, we often have a standard list of things to buy. Each

shopper has a distinctive list, depending on one’s needs and preferences. A housewife

might buy healthy ingredients for a family dinner, while a bachelor might buy beer

and chips. Understanding these buying patterns can help to increase sales in several

ways. If there is a pair of items, X and Y, that are frequently bought together
1. Both X and Y can be placed on the same shelf, so that buyers of one item would

be prompted to buy the other.

2. Promotional discounts could be applied to just one out of the two items.

3. Advertisements on X could be targeted at buyers who purchase Y.

4. X and Y could be combined into a new product, such as having Y in flavors of X.

5. While we may know that certain items are frequently bought together, the question

is, how do we uncover these associations?

6. Besides increasing sales profits, association rules can also be used in other fields.

7. In medical diagnosis for instance, understanding which symptoms tend to co-

morbid can help to improve patient care and medicine prescription.


Definition
Association rules analysis is a technique to uncover how items are associated to each
other. There are three common ways to measure association.

Measure 1: Support. This says how popular an itemset is, as measured by the
proportion of transactions in which an itemset appears. In Table 1 below, the support
of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For
instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.

If you discover that sales of items


beyond a certain proportion tend to
have a significant impact on your
profits, you might consider using that
proportion as your support threshold.
You may then identify itemsets with
support values above this threshold as
significant itemsets.
Measure 2: Confidence. This says how likely item Y is purchased when
item X is purchased, expressed as {X -> Y}. This is measured by the
proportion of transactions with item X, in which item Y also appears. In
Table 1, the confidence of {apple -> beer} is 3 out of 4, or 75%.

One drawback of the confidence measure is that it might misrepresent the


importance of an association. This is because it only accounts for how
popular apples are, but not beers. If beers are also very popular in general,
there will be a higher chance that a transaction containing apples will also
contain beers, thus inflating the confidence measure. To account for the base
popularity of both constituent items, we use a third measure called lift.
Measure 3: Lift. This says how likely item Y is purchased when item X is
purchased, while controlling for how popular item Y is. In Table 1, the lift of
{apple -> beer} is 1,which implies no association between items. A lift value
greater than 1 means that item Y is likely to be bought if item X is bought,
while a value less than 1 means that item Y is unlikely to be bought if item X
is bought.
An Illustration
We use a dataset on grocery transactions from the arules R library. It contains
actual transactions at a grocery outlet over 30 days. The network graph below
shows associations between selected items. Larger circles imply higher support,
while red circles imply higher lift:
Associations between selected items. Visualized using the arulesViz R library.

Several purchase patterns can be observed.

For example:

The most popular transaction was of pip and tropical fruits

Another popular transaction was of onions and other vegetables

If someone buys meat spreads, he is likely to have bought yogurt as well

Relatively many people buy sausage along with sliced cheese

If someone buys tea, he is likely to have bought fruit as well, possibly inspiring the

production of fruit-flavored tea


Examples of time series

the daily closing value of the Dow Jones index,

the annual flow volume of the River Nile at Aswan,

daily air temperature or monthly precipitation in a specific location,

the annual yield of corn in Iowa,

the size of an organism, measured daily,

annual U.S. population data,

daily closing stock prices,

weekly interest rates,

national income,

sales figures,
NRR for a team in cricket is calculated by the following formula
NRR = (Average runs scored per over by the team throughout the tournament) -
(Average runs scored per over by the opposing teams against it).
Also Cricinfo , says the following rules are applied in case a match is abandoned/
concluded a duckworth lewis method.
Where a match is abandoned, but a result is achieved under Duckworth/Lewis, for
net run rate purposes Team 1 will be accredited with Team 2's Par Score on
abandonment off the same number of overs faced by Team 2. Where a match is
concluded but with Duckworth/Lewis having been applied at an earlier point in the
match, Team 1 will be accredited with 1 run less than the final Target Score for Team
2 off the total number of overs allocated to Team 2 to reach the target.
Also only matches that never take place(abandoned without a ball being bowled) are
not considered for the same i believe.
For example in current IPL , RCB has for and against values as (1046/133.0) and
(1034/139.5) . So it would be (1046/133.0) - (1034/139.5) = 7.8644 - 7.4 =
(approx) +0.470

Another example:
Across the three games, TEAM1 scored 678 runs in a total of 147 overs and 2
balls (actually 147.333 overs), a rate of 678/147.333 or 4.602 rpo.
The run-rate scored against TEAM1 across the three games is calculated on the
basis of 466 runs in a total of 50 + 50 + 50 = 150 overs, a rate of 466/150 or
3.107 rpo.
The net run-rate is, therefore, 4.602 - 3.107 = + 1.495
NET RUN RATE OF TEAM1 is + 1.495
Deep
Learning
Introduction
Artificial Intelligence is nothing but the capability of a machine to imitate
intelligent human behavior. AI is achieved by mimicking a human brain, by
understanding how it thinks, how it learns, decides, and work while trying to
solve a problem.
Limitations

Machine Learning is not capable of handling high dimensional data that


is where input & output is quite large. Handling and processing such
type of data becomes very complex and resource exhaustive. This is
termed as Curse of Dimensionality.

Machine learning was not capable of solving these use-cases and hence,
Deep learning came to the rescue. Deep learning is capable of handling
the high dimensional data and is also efficient in focusing on the right
features on its own. This process is called feature extraction.
How Deep Learning Works?
In an attempt to re-engineer a human brain, Deep Learning studies the basic unit
of a brain called a brain cell or a neuron. Inspired from a neuron an artificial
neuron or a perceptron was developed.
Any Deep neural network will consist of three types of layers:
The Input Layer
The Hidden Layer
The Output Layer
Thank You

S-ar putea să vă placă și