Sunteți pe pagina 1din 41

An ensemble approach for efficient

Churn prediction
SupervisorDr. Sonali Agarwal

Pretam Jayaswal
ISE2013025

Overview

Basic Terms
Motivation
Importance of churn prediction
Selection of domain
Objective
Workflow
My approach

What is Churn
Churn is a word derived from Change and
Turn.
In context of Customer Relationship It refers
to the discontinuation of a contract by the
customer.

Churn Prediction
Churn prediction is the term used to
determine the churning customers from a
given service provider.
Nowadays, more and more companies start to
focus on CRM to prevent churn.

Types of Churn
There are 3 types of churning customers1. Active/Deliberate- The customer decides to
quit his contract and switch to another
service provider.
2. Rational/Incidental- The customer quits
contract without the aim of switching to a
competitor.
3. Passive/ Non-voluntary The service
provider discontinues the contract itself.

Types of Churn
There are 3 types of churning customers1. Active/Deliberate- The customer decides to
quit his contract and switch to another
service provider.
2. Rational/Incidental- The customer quits
contract without the aim of switching to a
competitor.
3. Passive/ Non-voluntary The service
provider discontinues the contract itself.

Motivation
KDD Cup 2009 : Customer Relationship prediction
(15th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining) [2]

- The challenge was to beat the in-house


Customer Churn prediction system developed by
Orange Labs.
- The task was to estimate the churn, appetency
and up-selling probability of customers.

Why Churn Prediction is important ?


The cost of acquiring a new customer is more
expensive than retaining an existing customer.
It is too laborious a task to gain a new customer.
even if a new customer were absorbed, they even
would not be as loyal as the old customers.
losing a customer would mean passing a
valuable asset intentionally to the competitor.

Why Churn Prediction is important ?


In fact it costs anywhere between 6 to 10 times
more for acquiring a new customer
Loss of customer= Loss of future Revenue
+
Loss of investment to acquire them.

Why Churn Prediction is important ?


If a company can predict whether a given
customer is going to churn or not in future,
that company has the opportunity to take
action to try and provide the customer with a
better service or address any unsatisfactory
situations.

Why Customer retention is important ?


long-term customers tend to buy more.
positive word of mouth from satisfied
customers is a good way for new customers
acquisition.
long-term customers are less sensitive to
competitors marketing activities.

Churn Analysis Domains

Banking Sector
Insurance Companies
E-commerce Industry
Telecom Industry etc.

Churn Analysis Domain

Banking Sector
Insurance Companies
E-commerce Industry
Telecom Industry etc.

Churn in Telecom Sector


Churn rate is highest in Telecom Industry.
The average global churn rate in mobile
operators is about 2% per month and in India
it is about 4-5 % per month.[5]

Yearly churn rate in Europe is 25%, in US 37%


and Asia 48% that is quite high.

Churn Prediction for Telecom industry


Initially when the industry growth was high,
churn may not be a problem. The focus at that
stage was on customer acquisition.
As the industry matures, the churn rate rises
because of the fierce competition and
saturation, so the customer retention comes
in focus.

Churn in Telecom Sector


Customers have enough alternative.
Smooth migration process to switch service
provider.
Compared to some other industries there is
tremendous amount of data available as each
transaction of the customer is recorded and data
is in digital Format.
CHURN Prediction makes sense for Subscription
based business sectors like Telecom etc.

Objective
The objective of this thesis is to predict the churning
customer with confidence i.e. with higher accuracy,
rate each customer with churn likelihood and
assign them relative score to identify churn
potential of customers in Telecom Industry.

Dataset
The dataset that will be used is provided by
Orange Telecom for KDD cup 2009 problem.[2]
Both training and test sets contains
approx. 50,000 examples.

Workflow

Ensemble approach
Ensembles are a divide-and-conquer approach used
to improve performance.
The main principle behind ensemble methods is that
a group of weak learners can come together to
form a strong learner.

Approach
Our major concern is better prediction so I will
use ensemble approach with voting schem
that uses multiple learning algorithms and try
to obtain better predictive performance than
could be obtained from any of the generic
learning algorithms

My Approach : Step 1
Pre-Processing over Dataset
Traditional data pre-processing approaches are
used to handle data issues like data cleaning and
reduction etc.
.

My Approach : Step 2
Partition the data into subsets.
we divide an entire dataset into M equally sized,
non-overlapping subsamples using a SRSWOR
(Simple random sample without replacement)
scheme.

My Approach : Step 3
For each Partitioned Sample, Built a
corresponding Classifier (Training).
Will use 3 supervised classification models.
1) Decision Tree
2) Random Forest
3) Support Vector Machines

Decision Trees
A decision tree (DT) is a flowchart-like tree
structure, where each internal node denotes a test
on an attribute and each branch represents an
outcome of the test
Decision trees are "white boxes" in the sense that
the acquired knowledge can be expressed in a
readable form.
decision trees are quit robust to the presence of
noise in data.

Decision Trees
Decision trees are highly interpretable and
simple to grow.

Random Forest
Random Forest uses Bagging approach for
the classification.
random forests, bagging is used in tandem
with random feature selection. Each new
training set is drawn from the original training
set.

Random Forest
The working of random forest algorithm is as follows.
1. A random seed is chosen which pulls out at random a
collection of samples from the training dataset while
maintaining the class distribution.
2. With this selected data set, a random set of attributes
from the original data set is chosen based on user
defined values. All the input variables are not
considered because of enormous computation and
high chances of over fitting.

Random Forest
3. In a dataset where M is the total number of input
attributes in the dataset, only R attributes are chosen
at random for each tree where R< M.

4. The attributes from this set creates the best possible


split using the suitable index to develop a decision
tree model. The process repeats for each of the
branches until the termination condition stating that
leaves are the nodes that are too small to split.

RF advantages
Compared with Adaboost, the forests discussed
here have following desirable characteristics[6]: its accuracy is as good as Adaboost and
sometimes better;
its relatively robust to outliers and noise;
its faster than bagging or boosting;
it gives useful internal estimates of error,
strength, correlation and variable importance;
its simple and easily parallelized.

Support Vector Machines


SVM is a binary classifier that involves finding the
hyperplane (line in 2D, plane in 3D) that
separates two classes of points with the
maximum margin.
High accuracy, nice theoretical guarantees
regarding overfitting.
SVMs work very well in many circumstances and
perform very good with large amounts of data.
takes more time for learning.

Support Vector Machines


The samples dimension cant affect the
algorithm complexity.
SVM has many types and is not easy to choose
a fitting one.

My Approach : Step 3
For each Partitioned Sample, Built a
corresponding Classifier (Training Phase).

My Approach : Step 4
Evaluation of Classifier performance
(Validation).
For validation cross-validation is used to ensure that
every example from the original dataset has the
same chance of appearing in the training and testing
set.
k-fold cross-validation is used for training and
validation.

My Approach : Step 5
Apply the weighting scheme to the classifiers
based on their performance.
The classifiers with higher prediction accuracy
will have higher weight or more dominant.
Poor classifiers will have lower weight.

My Approach : Step 6
Generating the Collective decision
The final prediction on the test data set is
weighted according to this normalized weight
as follows:

wm is weight of mth classifier.


fm (x) is the performance of classifier mth classifier
while predicting label x.

My Approach : Step 6
Generating the Collective decision

Analysis
The outcomes of the proposed methodology
will be compared with the existing state of the
art research work to check the advantages/
disadvantages of the thesis work.

Parameters

Customer demography
Bill and payment analysis
Call detail records analysis
Customer care/service analysis

References
1.
2.
3.

4.
5.

6.

Lu, Ning, et al. "A Customer Churn Prediction Model in Telecom Industry Using
Boosting." (2011): 1-1.
Dror, Gideon, et al. "The 2009 Knowledge Discovery and Data Mining
Competition (KDD Cup 2009)." (2011).
Bandara, W. M. C., A. S. Perera, and D. Alahakoon. "Churn prediction
methodologies in the telecommunications sector: A survey." Advances in ICT for
Emerging Regions (ICTer), 2013 International Conference on. IEEE, 2013
Lazarov, Vladislav, and Marius Capota. "Churn prediction." Business Analytics
Course (2007).
Hung, Shin-Yuan, David C. Yen, and Hsiu-Yu Wang. "Applying data mining to
telecom churn management." Expert Systems with Applications 31.3 (2006): 515524
Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.

Thank you

S-ar putea să vă placă și