Logistic Regression

Business Analytics
Chapter - 1
Business Analytics
Logistic Regression
Linear Regression
Recall:
Linear regression is the process of
finding a best fitting straight line
passing through points with the
objective of being able to use the
equation of the line as a model for
prediction.
The key assumption is that both
the predictor and target variables
are continuous as seen in this chart
below. Intuitively, one can state that
when X increases, Y increases along
the slope of the line.
Logistic Regression - Introduction
Logistic regression is a form of regression where the dependent

variable (outcome variable) is dichotomous (binary) and the
independent variables (influencing factors) can be either continuous or
categorical. It is a prediction done with categorical variable.
Logistic regression can be binomial (binary) or multinomial.
In the binary logistic regression, the outcome can have only two
possi le types of alues e.g. Yes or No , u ess or Failure .
Out o e is oded as a d i i ary logisti regressio .

Logistic Regression - Introduction
These models are extensively used in banking, finance and

telecommunication sectors to estimate the risks involved.
Multinomial logistic refers to cases where the outcome can have
three or ore possi le types of alues e.g., good s. ery good
s. est .
Examples
If you would like to predict who will win the next T20 world cup,
ased o player s stre gth a d other details.
Imagine you want to predict the outcome of an election. Here
your outcome variable is binary i.e. Win/Lose. Factors influencing
this outcome could be the amount of money spent on the
campaign, amount of time spent campaigning, previous
election history, etc.
You might be interested in factors that influence (or explain)
whether or not a person in the U.S. owns a U.S.-made or foreign-
made (non-U.S.) car. It would be natural to code owning a U.S.
car as a 1 and owning a foreign car as a 0. So the Y-variable in the
logistic regression is whether or not a person owns a U.S. car
coded as a 1 if he or she does and a 0 otherwise.
Logistic Curve
The logistic model: The logistic curve, illustrated below, is better
for modeling binary dependent variables coded 0 or 1 because it
comes closer hugging y=0 and y=1 points on the y axis.
Even more, the logistic function is bounded by 0 and 1.
Come back to Sales vs Ad spent example
What happens if the target variable is not continuous?
When the target variable (Y) is discrete, the straight line is no longer a
fit as seen in this chart.
Although intuitively we can still state that when X (say advertising
spend) increases, Y (say response or no response to a mailing campaign)
also increases, but there is no gradual transition, the Y value abruptly
jumps from one binary outcome to the other.
Thus the straight line is a poor fit for this data.
The Straight Line is a Poor Fit for Binary Outcome
S- Shaped Curve is a Better Fit
On the other hand, take a

look at the S-shaped curve
below.
This is certainly a better fit
for the data shown.
If we then know the
equation to this "sigmoid"
curve, we can use it as
effectively as we used the
straight line in the case of
linear regression.
Logistic Regression
Logistic regression is thus the process of obtaining an appropriate
sigmoid curve to fit the data when the target variable is discrete.
In statistics, logistic regression or logit regression is a type of regression
analysis used for predicting the outcome of a categorical dependent
variable.
Key facts to keep in mind
Logistic Regression is the equivalent of linear regression to use when
the target (or dependent) variable is discrete i.e. not continuous.
The predictors variables can be either continuous or categorical.
R Data Analysis Examples
Logistic regression, also called a logit model, is used to model

dichotomous outcome variables.
In the logit model the log odds of the outcome is modeled as a linear
combination of the predictor variables.
We will require following packages:
- caret
- aod
- ggplot2
install.packages("packagename")
Example : German Credit data set
The data set o sists of usto ers i for atio a out a k
account, age ,sex, credit history and present credit situation, credit
purpose, property and installment, employment, residence as its
20 independent variables.
The target variable is Class which binary with levels- Bad, Good
Bad: Customer with high credit risk
Good: Customer with low credit risk
The purpose of a alyzi g this data set is to predi t a usto er s

credit risk based on the other information of the customer.
German Credit dataset is an inbuilt R data set and can be loaded by

loading caret library as follows.
We can get the descriptive statistics for all the variables by giving
following command:
Output
We need to convert the target variable Class from Bad/Good to 0/1 to
apply logistic regression. This can be done by giving the following
command.
Partitioning the Data Set
One issue that arises when fitting the model is to check how
well the newly created model behaves when applied to new
data.
To address this issue, the data set can be divided into two
partitions a training partition used to build the model and a
test partition used to validate how well our model is performing.
Now we will partition our German Credit data set into training and
testing sets using the createDataPartition function.
Partitioning the Data Set
The training data set contains 700 observation and the testing data
set contains 300 observations.
We will now consider training data set and use logistic regression to
model Class as a function of 5 predictor variables Age, Amount,
ForeignWorker, Property.RealEstate, Housing.Own ,
CreditHistory.Critical and Purpose.NewCar using glm function.
Model Building
Model Building
Interpreting Output
All the variables in our model are significant.
The null deviance indicates the deviance for a model without
variables whereas the residual deviance is for our model.
The null deviance and residual deviance need to be as far as possible
from each other.
We can see that the null deviance is 839.40 and residual deviance is
769.93 which indicates that there is not much considerable difference
between them.
Estimates from logistic regression characterize the relationship
between the predictor and response variable on a log-odds scale.
The estimate for variable Age is 0.01839. This implies that for every 1
unit increase in Age, the log odds of the consumer having good credit
increases by exp(0.01839) = 1.01856
Similarly, for other variables.
Prediction
There is 28.43% misclassification in our model.

Goodness of Fit Hosmer Lemeshow Test
Hosmer Lemeshow test is used for testing overall goodness of fit.
The statistic is computed on the data after the observations have
been grouped by having similar predictor probabilities.
It examines whether the observed proportion of events are similar
to predicted probabilities of occurrence in the subgroups of the
data set using a Pearson Chi-Square test.
The hypothesis is,
H0: the current model fits well

v/s
H1: the current model does not fit well.
Goodness of Fit Hosmer Lemeshow Test
Since the p-value is greater than 0.05, we do not reject H0. i.e. the
current model is a good fit.
Wald Test
A Wald test is used to evaluate the statistical significance of each
coefficient in the model and is calculated by taking the ratio of the
square of the regression coefficient to the square of the standard
error of the coefficient.
We test whether the coefficient of the independent variable in the
model is significantly different from zero.
If the test fails to reject the null hypothesis, this suggests that
removing the variable from the model will not substantially harm the
fit of that model.
To apply Wald test, regTermTest function from survey library will be
used.
Wald Test
Since the p-value is less than 0.05, we reject H0 i.e. removing the
variable ForeignWorker from the model will harm the fit of the model.
Mc Fadden R2
Unlike linear regression, there is no R2 statistic which explains the
proportion of variation in the dependent variable that is explained by
the predictors.
For this we use Mc Faddens R2.
The predictors in our model explain just the 8.27% variation in the
data.
This suggests that one or more variables are missing in our model.
We can try a model by considering all the variables.
ROC Curve
Receiver Operating Characteristic (ROC) curve
is a plot of the true positive rate against the
false positive rate.
It shows a tradeoff between sensitivity and

specificity (any increase in sensitivity will be
accompanied by a decrease in specificity).
The closer the curve follows the left hand

border and then the top border of the ROC
space, the more accurate the test.
ROC Curve
The closer the curve comes to the 45-degree diagonal of the ROC
space, the less accurate the test.
Accuracy is measured by the area under the ROC curve. An area of 1

represents a perfect test; an area of 0.5 represents a worthless test.
Area Under ROC Curve
Since the area under the curve is 0.6944, we can say the discrimination
ability of our model is fair.
Kolmogorov Smirnov Chart
Kolmogorov Smirnov chart measures the performance of classification
models.
It is a measure of the degree of separation between Goods (Event) and

Bads (Non Event).
Distance between the goods and bads should be as large as possible.
The ra do li e i the hart orrespo ds to the ase of apturi g the

respo ders O es y ra do sele tio , i.e., he you do t ha e a y
model at disposal.
The odel li e represe ts the ase of apturi g the respo ders if you
go by the model generated probability scores where you begin by
targeting datapoint with highest probability scores.
Kolmogorov Smirnov Chart
As the measure
between the goods
and bads is very
small, we can say
that the
performance of the
model is not good.
Kolmogorov Smirnov Statistic
Kolmogorov Smirnov Statistic is the maximum difference between

the cumulative true positive and cumulative false positive rate.
It is the maximum difference between the goods and the bads.
It is often used as the deciding metric to judge the efficacy of the

models in credit scoring.
The higher the value of Kolmogorov Smirnov statistic, the more

efficient is the model at capturing the responders (Ones or Goods)
Kolmogorov Smirnov Statistic
As the value of Kolmogorov Smirnov Statistic is small we can say that

the model is not that efficient in capturing the responders.
Tree Diagrams
A useful way of investigating probability problems is to use what are
known as tree diagrams.
Tree diagrams are a useful way of mapping out all possible outcomes
for a given scenario.
They are widely used in probability and are often referred to as
probability trees.
They are also used in decision analysis where they are referred to as
decision trees.
In the context of decision theory a complex series of choices are
available with various different outcomes and we are looking for the
bets of these under a given performance criterion such as maximizing
profit or minimizing cost referred to as probability trees.
Tree Diagram Example 1
Suppose we are given three boxes, Box A contains 10 light bulbs, of
which 4 are defective, Box B contains 6 light bulbs, of which 1 is
defective and Box C contains 8 light bulbs, of which 3 are defective.
We select a box at random and then draw a light bulb from that box at
random. What is the probability that the bulb is defective?
Here we are performing two experiments:

Selecting a box at random
Selecting a bulb at random from the chosen box
If A, B and C denote the events choosing box A, B, or C respectively

and D and N denote the events defective/non-defective bulb chosen,
the two experiments can be represented on the diagram below.
We can compute the following probabilities and insert them onto

the branches of the tree:
To get the probability for a particular path of the tree (left to right) we
multiply the corresponding probabilities on the branches of the path.
For example, the probability of selecting box A and then getting a

defective bulb is:
Since all the paths are mutually exclusive and there are three paths
which lead to a defective bulb, to answer the original question we
must add the probabilities for the three paths,
i.e. 4/30 + 1/3*1/6 + 1/3*3/8 = 2/15 + 1/18 + 1/8 = 0.314.
Machines A and B turn out respectively 10% and 90% of

the total production of a certain type of article.
The probability that machine A turns out a defective
item is 0.01 and the probability that machine B turns out
a defective item is 0.05.
(i) What is the probability that an article taken at random from

the production line is defective?
(ii) What is the probability that an article taken at random from

the production line was made by machine A, given that it is
defective?
Thank You

Logistic Regression

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Logistic Regression

Încărcat de

Drepturi de autor:

Formate disponibile

Business Analytics

Logistic regression is a form of regression where the dependent

Logistic regression can be binomial (binary) or multinomial.

Out o e is oded as a d i i ary logisti regressio .

These models are extensively used in banking, finance and

On the other hand, take a

Logistic regression, also called a logit model, is used to model

The purpose of a alyzi g this data set is to predi t a usto er s

German Credit dataset is an inbuilt R data set and can be loaded by

There is 28.43% misclassification in our model.

H0: the current model fits well

It shows a tradeoff between sensitivity and

The closer the curve follows the left hand

Accuracy is measured by the area under the ROC curve. An area of 1

It is a measure of the degree of separation between Goods (Event) and

Distance between the goods and bads should be as large as possible.

The ra do li e i the hart orrespo ds to the ase of apturi g the

Kolmogorov Smirnov Statistic is the maximum difference between

It is the maximum difference between the goods and the bads.

It is often used as the deciding metric to judge the efficacy of the

The higher the value of Kolmogorov Smirnov statistic, the more

As the value of Kolmogorov Smirnov Statistic is small we can say that

Here we are performing two experiments:

If A, B and C denote the events choosing box A, B, or C respectively

We can compute the following probabilities and insert them onto

For example, the probability of selecting box A and then getting a

Machines A and B turn out respectively 10% and 90% of

(i) What is the probability that an article taken at random from

(ii) What is the probability that an article taken at random from

S-ar putea să vă placă și