Sunteți pe pagina 1din 10

How to build an attrition analysis model

Building an attrition analysis (also known as churn analysis) is about finding the
relations between customers' attrition and the variables that affect it. The goal of
attrition analysis is to provide the manger or researcher the ability to understand what
the most important variables that cause attrition are and what the likelihood of a
customer to churn is.

It may looks easy to draw the main reasons that affect attrition: customer satisfaction,
length of service etc. Using those rules-of-thumbs the user can predict 15% of all
churners but using a mathematical procedure as in Analysis 6 can yield more then
60% precision.

Analysis 6 makes use of four logistic regression methods to find the best model that
can explain the main reasons for attrition. In this "how-to" paper, we will discuss a
simple yet powerful method for obtaining a good attrition model. We will also discuss
the model interpretation in order to deliver the manager / researcher tools to conduct
his own model.

A model that is based on logistic regression is a model that analyzes each variable
weight and contribution to the model goals. The variable contribution is measured in
percents and the manager or researcher can understand the weight of each variable on
the model target variable (In this example: attrition).

You can also use the Analysis 6 Logistic regression procedure for a wide variety of
fields like Employees attrition in HR field, Projects failure analysis, Engineering,
Social research, finance and other research aiming to find an explanation to a binary
(like "0" / "1" , "Churned"/"Not churned" etc.) event occurrence and prediction.
Preparing the data set:
Logistic regression can produce a model using a data set where the target variable has
two values: "1" means that the event has accrued and "0" that means that the event has
not accrued.

Step No .1: Preparing the data set

Customer_ID Children Age Education Calls Visits Attrition


102654 2 61 12 12 0 1
103540 1 32 20 18 2 0
104426 1 35 20 14 2 0
105312 0 26 20 20 2 0
The Attrition
106198 0 25 12 90 0 1 variable
107084 5 59 10 6 2 1 (Our target)
107970 3 46 10 70 2 1 has two
108856 4 65 16 6 2 0 values.
109742 3 57 10 5 2 0
110628 2 64 14 12 0 1
111514 0 72 9 40 0 0
112400 5 67 12 8 2 1
113286 0 33 15 12 2 1
114172 1 23 14 12 0 0
115058 1 33 12 12 2 1
115944 2 59 12 6 0 0
116830 1 60 14 6 2 0
117716 2 77 9 86 0 1
118602 2 52 14 12 2 0
119488 1 55 7 98 2 1

Let us look at the example:

The data set contains 20 customers that have churned last year.10 of those customers
have churned and 10 are still with the company. The main goal is to score each
customer with a personal risk of churning (e.g. Joe john has 95% risk of churning).
An important outcome is the ability to understand what the reasons that cause the
attrition are and what influence do each variable has on customer's decision to leave
the company.

Variables description:
"Children": Number of children that a customer has.
"Age": Age of customer.
"Education": Number of education years.
"Calls": No of calls that the customer has done to the service center.
"Visits": No of visits that the customer has done to the local service center.
"Attrition": If the customer has churned ("1") or not ("0") – this is the model target
variable, so it has to have only two values: "1" or "0".
Step No .2: Selecting attrition model variables

The Explained variable is the target variable (In this case: Attrition). It is the variable
that we would like to know how the changes of explanatory variable values (In this
case: Age, Calls, Children, Education, Visits) affect it.
To define our model we will move the desired variables from the Explanatory
Variables frame on the left side of the wizard window to the Selected Columns frame
on the right side of the wizard window.
Step No .3: Defining attrition model

Here are the model components:


We have the explained variable – that is the target variable "Attrition".
We have the explanatory variables – the variables that we think have an influence on
the target variable outcome. The selected columns are In this case: Age, Calls,
Children, Education, visits.
We have the modeling method: Enter All, which is the simplest modeling technique.
Now, all we have to do is to click the Next button and let the software do the
mathematics.

Step No .3: Viewing results (part 1)


The above screen shot is a part of the model's results.

The software displays the ROC curve and the Area Under Curve (AUC) value.
We will not discuss the ROC or AUC methods here but, generally speaking, those
procedures measure the model's success to distinguish between "1" or "0" events of
the target variable.
As the AUC figure is close to "1" this means that the model has a very high success
distinguishing between the binary events ("0" or "1") of the targeted variable.

AUC value
0.5 No distinguish
ability
0.5-0.7 Not a very good
model
0.7-0.9 Very good
model
0.9-1.0 Excellent model

Most of the good business models have AUC of 0.7-0.8.

In our example, we have an AUC of 0.83 so we may proceed to view the rest of the
results.
Interpreting the statistical parameters can a complicated task that is not for our How-
to-paper so we will view other results that will help us to understand the attrition
phenomena.

Step No .4: Working with model results

Here is the main attrition model window; each variable has its own value regarding its
contribution to the attrition phenomena.

For example:
Age has the value of 0.9512 which means that for each additional year the churn risk
is decrease in 4.88%. (1-0.9512) = 4.88
Calls has the value of 1.0458 which means that for each additional call to the call
center the risk of churning increase by 4.58%. (1.0458-1) =4.58
In the same window there is another important test: Classification.

Figure no.1 measures the model performance identifying the non-churners ( 80%
success). Figure no.2 measures the model performance identifying the churners ( 70%
success). Figure no.3 measures the overall model performance ( 75% success).
After computing the logistic regression procedure, we can finally answer the question:
what affects the attrition phenomena and what is the weight of each variable on
"Attrition"?

Analysis 6 armed you with four powerful analytics tools:

1. "What-If" scenario where you can analyze specific case to learn from it on
your customer's attrition, or you can analyze specific customer.
Let us have a look at the following screen shot:

Frame no.1 is the variables calculator that calculate the risk of Attrition based
on the given variables : Age, Calls, Children, Education, Visits.
Frame no.2 shows the result of the calculation : The Probability of Attrition is
40% for a customer that is 57 years old, had called the call center 12 times, has
2 children, has 12 years education and has visited the customer service centers
two times.

1
2
2. "Sensitivity Table" where you can analyzed your customer's attrition
sensitivity having values changes of one of the explained variables that are
part of the attrition model.
Let us have a look at the following screen shot:

For a customer that have constant variables values as described in figure no.1
And the only change is the number of children from no children at all to six
children (Figure no.2) the Probability of Attrition is increasing from 14.9%
with no children to 90.78% risk at 6 children value (figure no.3).

1
3
3. Deploy Model : Current Results
A user that wishes to test the attrition model on current data may use the
Deploy Model: Current Results option that computes the probability of
attrition per customer. By doing so the user can find the cases where there is a
difference between the computed probability and actual value.

Testing the model success by computing DID HIT


Deploying the results on the current data set:

As you can see from the picture above the model has computed a churn risk of
97% for the first customer ( PROBABILITY = 0.97) and the actual Attrition was
"1" so in this case the DID_HIT value is "1" that means success of the model.

4. Deploy Model : Future Results


A user that wishes to deploy the attrition model on a new data set can use the
Deploy Model: Future Results option that computes the probability of
attrition per customer with no prior risk evaluation. When clicking this option
a new window opens that allow the user to select a new data set that contains
new customers with the same variable structure and compute the their
likelihood to churn based on the formula generated by the Logistic regression
procedure.
Another way to deploy the model is simply copying the formula to a new field
in any SQL engine and getting future results from the SQL tool that will
compute the formula for given variables values.

Now that you have the answer, you can decide where to put your organizational
efforts to fight attrition.

S-ar putea să vă placă și