Sunteți pe pagina 1din 9

Expert Systems with Applications 38 (2011) 18141822

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

GSM churn management by using fuzzy c-means clustering and adaptive


neuro fuzzy inference system
Adem Karahoca , Dilek Karahoca
Department of Software Engineering, Bahcesehir University, Istanbul, Besiktas, Turkey

a r t i c l e

i n f o

Keywords:
ANFIS
Data mining
Churn management
Telecom churn prediction
Soft computing

a b s t r a c t
Churn management is important and critical issue for Global Services of Mobile Communications (GSM)
operators to develop strategies and tactics to prevent its subscribers to pass other GSM operators. First
phase of churn management starts with prole creation for the subscribers. Proling process evaluates
call detail data, nancial information, calls to customer service, contract details, market details and geographic and population data of a given state. In this study, input features are clustered by x-means and
fuzzy c-means clustering algorithms to put the subscribers into different discrete classes. Adaptive Neuro
Fuzzy Inference System (ANFIS) is executed to develop a sensitive prediction model for churn management by using these classes. First prediction step starts with parallel Neuro fuzzy classiers. After then,
FIS takes Neuro fuzzy classiers outputs as input to make a decision about churners activities.
2010 Elsevier Ltd. All rights reserved.

1. Introduction
Turkeys Global Services of Mobile (GSM) 1800 licenses were
distributed to ARIA and AYCELL respectively in 2000. Thus, GSM
market has been enforced to enhance the quality of services
(QoS) and supports to customers. One of the major problems of
GSM operators has been churning customers. Churning means that
subscribers may move from one operator to another operator
wherefore the dissatisfaction of services. For instance, cost of services, corporate capability, credibility, customer communication,
customer services, roaming and coverage, call quality, billing and
cost of roaming may be reason to churn (Mozer et al., 2000). Hence
churn management becomes an important issue for the GSM operators to struggling with. Churn management includes monitoring
the aim of the subscribers, and offering new alternative campaigns
to improve expectations and satisfactions of subscribers.
Quality metrics can be used to determine indicators to identify
inefciency problems. Metrics of churn management are related
with quality of network services, operations, and customer services. Mobility of GSM numbers is critical metric for determining
the churners. In Turkey, end of the 2008, Telecommunication Regulation Committee decided that GSM subscribers can move other
operators with their original GSM numbers. Thus, these possible
churner activities should have been predicted before to prevent
the lost of the subscribers to the other GSM carriers.
When subscribers are clustered or predicted for the arrangement of the campaigns, telecom operators should have focused
on demographic data, billing data, contract situations, and number
Corresponding author.
E-mail address: akarahoca@bahcesehir.edu.tr (A. Karahoca).
0957-4174/$ - see front matter 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2010.07.110

of calls, locations, tariffs, and credit ratings of the subscribers (Yu


et al., 2005).
Predictions of customers behavior, value, satisfaction, and loyalty are examples of some of the information that can be extracted
from the data stored in a companys data warehouses (Hadden
et al., 2005).
It is well known that the cost of retaining a subscriber is much
cheaper than gaining a new one from another GSM operator
(Mozer et al., 2000). When the unhappy subscribers are predicted
before the churn, operators may retain subscribers by new offerings. In this situation in order to implement efcient campaigns,
subscribers have to be segmented into classes such as loyal, hopeless, and lost. This segmentation has advantages to dene the customer intentions. Many segmentation methods have been applied
in the literature. Thus, churn management can be supported with
data mining modeling tools to predict hopeless subscribers before
the churn. We can follow the hopeless groups by clustering the
proles of the customer behaviors. Also, we can benet from the
prediction advantages of the data mining algorithms.
Data mining tools have been used to analyze many prole-related variables, including those related to demographics, seasonality, service periods, competing offers, and usage patterns. Leading
indicators of churn potentially include late payments, numerous
customer service calls, and declining use of services. In data
mining, one can choose a high or low degree of granularity in
dening variables. By grouping variables to characterize different
types of customers, the analyst can dene a customer segment.
A particular variable may show up in more than one segment. It
is essential that data mining results extend beyond obvious
information. Off-the-shelf data mining solutions may provide little
new information and thus serve merely to predict the obvious.

1815

A. Karahoca, D. Karahoca / Expert Systems with Applications 38 (2011) 18141822

Tailored data mining solutions can provide far more useful information to the carrier (Gerpott et al., 2001).
Hung et al., 2006 proposed a data mining solution with decision
trees and neural networks to predict churners to assess the model
performances by LIFT (a measure of the performance of a model at
segmenting the population) and hit ratio (true positives). Data
mining approaches are considered to predict customer behaviors
by call detail records (CDRs) and demographics data in Wei and
Chiu (2002), Yan et al. (2005), Karahoca et al. (2007).
In this study, main motivation is investigating the best data
mining model for churn management, according to the measure
of the performances of the model. We have utilized data sets which
are obtained from a Turkish mobile telecom operators data warehouse system to analyze the churner activities to develop a data
mining model.
The remainder of the paper is organized as follows. Firstly data
preparation procedure is summarized in Section 2. Executed methods, x-means, fuzzy c-means and ANFIS are explained respectively
in Section 3 with comparison procedure of data mining methods.
Findings of the study are given in Section 4 by using benchmarking
methodologies to compare different prediction techniques. Conclusions are considered in Section 5.

2. GSM subscribers dataset


When a classication model constructed, sort of the inputs which
related with the problem directly or indirectly should have to be considered. For this reason, input selection procedure has to be applied
to minimize complexity of computation of the new model. As Jang
(1996) mentioned about the purposes of input selection include:





Remove noise/irrelevant inputs.


Remove inputs that depends on other inputs.
Make the underlying model more concise and transparent.
Reduce the time for model construction.

In the churner detection process, loyal 24,900 GSM subscribers


were randomly selected from data warehouses of GSM operator
which operates in Turkey. Hopeless 7600 candidate churners were
ltered from databases during a period of a year. In real life, usually
36% annual churn rates can be observed in GSM sector for GSM
operators. Because of computational complexity, when all parameters and subscriber records were used in predicting churners, data
mining methods could not estimate the churners. Generally correctness of the classication methods is between 90% and 93%. It
means that churners cannot be predicted if we take care of all
the churners and loyal subscribers into account. Therefore, it was
selected 31% hopeless churners from dataset and discarded the
most of the loyal subscribers from the dataset to eliminate the
computational complexity problem.
In pattern recognition applications, the usual way to create input data for the model is through feature extraction. In feature
extraction, descriptors or statistics of the domain are calculated
from raw data. Usually, this process involves in some form of data
aggregation.
The unit of aggregation in time is one day for this study. The
feature mapping transforms the transaction data sorted in time to
static variables residing in feature space. The features reect the
daily usage of a GSM subscribers account. Number of calls and
summed length of calls to describe the daily usage of a mobile
phone were used. National and international calls were regarded
as different categories. Calls made during business hours, evening
hours and night hours were also aggregated to create different features. The parameters were taken into account to detect the churn
intentions as listed in Table 1.

Summarized attributes that are augmented by different data


sources are given in Table 1. These attributes are used as input
parameters for churn detection process. These attributes are expected to have higher impact on the outcome (whether churning
or not). In order to reduce computational complexity of the analysis, some of the elds are ignored. The attributes with highest correlation coefcients (Spearmans Rho values) are listed in Table 2.
The factors are assumed to have the highest contribution to the
ultimate decision about the subscriber. When the factor analysis
implemented for these input features, same order of the list is observed as give in Table 2.
According to the ranked attributes, seven input parameters are
instanced as listed in Table 3. These parameters have signicant
correlation with output. Monthly expense represents the all outgoing calls from subscribers. Bigger monthly expenses indicated with
the big numbers. Age attribute divided into ve different intervals.
Marriage column represents, single (1), widow (2), married (3), and
married but living alone (4). Total spent shows subscribers total
calling time since he was registered. Monthly income shows salary
of the subscribers. Customer segment column represents the tariff
of the subscribers. Length of service duration is named as sub
length and shows the registered duration of the subscriber. Output
column is obtained by using fuzzy c-means algorithm after the preprocessing of the x-mean. It shows a rank value between 0 and 1 to
put the record into the proper cluster.
In this study, sort of different data mining methods were considered to predict churners. Ridor (Cohen, 1995), Decision Trees
(Freund & Mason, 1999), ANFIS (Jang 1992, 1993, 1996) and Fuzzy
c-means (Dunn, Bezdek) supported ANFIS is implemented. Except
ANFIS, all the methods were executed in WEKA (Waikato Environment for Knowledge Analysis) data mining software (Frank & Witten, 2005).
Table 1
Extracted features.
Abbreviation

Meaning

City
Age
Occupation
Home
Month-income
Credit-limit
Avglen-call-month
Avg-len-call46month
Avg-len-sms-month
Avglensms46month
Tariff
Marriage
Child
Gender
MonthExpense
Sublen
CustSegment
AvgLenCall3month
TotalSpent
AvgLenSms3month
GsmLineStatus
Output

Subscribers city
Subscribers age
Occupation code of subscriber
Home city of subscriber
Monthly income of subscriber
Credit limit of subscriber
Average length of calls in last month
Average length of calls in last 46 months
Number of SMS in last month
Number of SMS in last 46 months
Subscriber tariff
Marital Status of the subscriber
Number of child(ren) of the subscriber
Gender of the subscriber
Monthly Expense of subscriber
Length of service duration
Customer segment for subscriber
Average length of calls in last 3 months
Total expenditure of the subscriber
Number of sent SMS in last 3 months
Status of subscriber line
Churn status of subscriber

Table 2
Ranked attributes.
Attribute name

Spearmans Rho

Monthexpense
Age
Marriage
Totalspent
Month-income
Custsegment
Sublen

0.4804
0.4154
0.3533
0.2847
0.2732
0.2477
0.2304

1816

A. Karahoca, D. Karahoca / Expert Systems with Applications 38 (2011) 18141822

Table 3
Example data set.
Monthly
Expense

Age

Marriage

Total
spent

Monthly
Income

Customer
Segment

Sub
Length

Output

2
2
1
3
3

5
2
4
4
1

1
3
1
4
3

1
3
1
3
4

1
1
1
1
2

1
5
7
2
2

3
1
4
2
4

0.984
0.280
0.808
0.064
0.712

3. Methods
According to the previous studies of (Karahoca et al., 2007),
churner prediction process can supported with ANFIS. It seems to
be better predicted subscribers than other prediction methods.
But prediction sensitivity and correctness of the ANFIS has only
85% and three clusters (loyal, hopeless, lost) was considered constantly. For exceeding this handicap, x-means and fuzzy c-means
algorithms are used respectively to determine clusters effectively
for providing better clustered inputs to prediction model. In this
section, x-means algorithm and fuzzy c-means algorithm are introduced. ANFIS architecture is explained in deep. Also, benchmarking
methodology of the data mining techniques is given.
3.1. x-Means clustering

n vectors in p-space as data input, and uses them, in conjunction


with rst-order necessary conditions for minimizing the FCM
objective functional, to obtain estimates for two sets of unknowns.
The unknowns in FCM clustering are:
1. A fuzzy c-partition of the data, which is a cxn membership
matrix U = [u (ik)] with c rows and n columns. The values in
row i give the membership of all n input data in cluster i for
k = 1 to n; the kth column of U gives the membership of vector
k (which represents some object k) in all c clusters for i = 1 to c.
Each of the entries in U lies in [0, 1]; each row sum is greater
than zero; and each column sum equals 1.
2. The other set of unknowns in the original FCM model is a set of c
cluster centers or prototypes, arrayed as the c columns of a p x c
matrix V. These prototypes are vectors (points) in the input
space of p-tuples. Pairs (U, V) of coupled estimates are found
by alternating optimization through the rst-order necessary
conditions for U and V. The objective function minimized in
the original version measured distances between data points
and prototypes in any inner product norm, and memberships
were weighted with an exponent m > 1. It is based on minimization of the following objective function:

Jm

N X
C
X
i1

Although K-means is mostly used clustering method, it has


some problems inside; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is
prone to local minima. Pelleg and Moore (2000) proposed solutions
for the rst two problems, and a partial remedy for the third. Building on prior work for algorithmic acceleration that is not based on
approximation, they introduce a new algorithm that efciently,
searches the space of cluster locations and number of clusters to
optimize the Bayesian Information Criterion (BIC) or the Akaike
Information Criterion (AIC) measure. The innovations include two
new ways of exploiting cached sufcient statistics and a new very
efcient test that in one K-means sweep selects the most promising subset of classes for renement. This gives rise to a fast, statistically founded algorithm that outputs both the number of classes
and their parameters. Experiments show this technique reveals the
true number of classes in the underlying distribution, and that it is
much faster than repeatedly using accelerated K-means for different values of K (Pelleg and Moore, 2000). This proposed method
called as x-means. It works as extending K-means with efcient
estimation of the number of clusters. The algorithm consists of
the following two operations repeated until completion.
Algorithm 1. The x-means algorithm
1: Improve-Params
2: Improve-Structure
3: If K > Kmax stop and report the best scoring model found
during the search. Else, goto 1.

2
um
ij kxi  c j k ;

16m61

j1

where m is any real number greater than 1, uij is the degree of


membership of xi in the cluster j, xi is the ith of d-dimensional
measured data, cj is the d-dimension center of the cluster, and
k*k is any norm expressing the similarity between any measured
data and the center. Fuzzy partitioning is carried out through an
iterative optimization of the objective function shown above,
with the update of membership uij and the cluster centers cj by:

uij

PC

PN

kxi cj k
k1 kxi ck k

2
m1

m
i1 uij xi

cj PN

m
i1 uij

o
n
 k1
k 
This iteration will stop when maxij uij
 uij  < e, where e
is a termination criterion between 0 and 1, whereas k are the iteration steps. This procedure converges to a local minimum or a saddle point of Jm. The algorithm is composed of the following steps as
listed in Algorithm 2 (Dunn, 1973; Bezdek, 1981).

Algorithm 2. The fuzzy c-means algorithm

Input: Data set D fxj gN


j1
Initialization: Number of clusters nc(1 6 nc6 N), weighting
exponent m, termination tolerance e, fuzzy partition matrix
U = (li,j)ncN(0 6 li,j 6 1)
repeat
for t = 1, 2, . . .do
PN  t1 m
j1

The Improve-Params operation consists of running conventional K-means to convergence. The Improve-Structure operation
nds out if and where new centroids should appear.

j1

Step 2: Calculate the distances

D2ijA

as:

D2ijA xj  ci T Axj  ci ; 1 6 i 6 nc ; 1 6 j 6 N.
Step 3: Update the fuzzy partition matrix:

3.2. Fuzzy c-means (FCM) clustering algorithm


Fuzzy c-means (FCM) is a method of clustering which allows
one piece of data to belong to two or more clusters. This method
(developed by Dunn in 1973 and improved by Bezdek in 1981) is
frequently used in pattern recognition. FCM clustering processes

li;j

t
Step 1: Compute the cluster centers ci P 
N

li;jt Pnc

2=m1
k1 DijA =DkjA

end for
until kU(t)  U(t1)k < e

t1
i;j

Xj

m

A. Karahoca, D. Karahoca / Expert Systems with Applications 38 (2011) 18141822

3.3. Adaptive Neuro Fuzzy Inference System (ANFIS)


A Fuzzy Logic System (FLS) can be seen as a non-linear mapping
from the input space to the output space. The mapping mechanism
is based on the conversion of inputs from numerical domain to fuzzy domain with the use of fuzzy sets and fuzziers, and then applying fuzzy rules and fuzzy inference engine to perform the
necessary operations in the fuzzy domain (Jang, 1992, 1993). The
result is transformed back to the arithmetical domain using
defuzziers. The ANFIS approach uses Gaussian functions for fuzzy
sets and linear functions for the rule outputs. The parameters of
the network are the mean and standard deviation of the membership functions (antecedent parameters) and the coefcients of the
output linear functions (consequent parameters).
The last node (rightmost one) calculates the summation of all
outputs. Sugeno fuzzy model was proposed (Sugeno and Kang,
1988; Takagi and Sugeno, 1985) proposed fuzzy if-then rules are
used in the model.
A typical fuzzy rule in a Sugeno fuzzy model has the format

If x is A and y is B then x f x; y;
where A and B are fuzzy sets in the antecedent; z = f (x, y) is a crisp
function in the consequent. Usually f(x, y) is a polynomial in the input variables x and y, but it can be any other functions that can
appropriately describe the output of the system within the fuzzy region specied by the antecedent of the rule. When f(x, y) is a rstorder polynomial, we have the rst-order Sugeno fuzzy model,
which was originally proposed in (Sugeno & Kang, 1988; Takagi &
Sugeno, 1985). When f is a constant, we then have the zero-order
Sugeno fuzzy model, which can be viewed either as a special case
of the Mamdani fuzzy inference system (Mamdani & Assilian,
1975) where each rules consequent is specied by a fuzzy singleton, or a special case of Tsukamotos fuzzy model (Tsukamato,
1979) where each rules consequent is specied by a membership
function of a step function centered at the constant. Moreover, a
zero-order Sugeno fuzzy model is functionally equivalent to a radial
basis function network under certain minor constraints (Jang, 1993,
1996). Consider a rst-order Sugeno fuzzy inference system which
contains two rules:

1817

membership grades in the premise part, and the output f is the


weighted average of each rules output. To facilitate the learning
(or adaptation) of the Sugeno fuzzy model, it is convenient to put
the fuzzy model into the framework of adaptive networks that
can compute gradient vectors systematically. The resultant network architecture, called Adaptive Neuro Fuzzy Inference System
(ANFIS), is shown in Fig. 3, where node within the same layer performs functions of the same type, as detailed below.
The output of each rule is a linear combination of input variables and a constant term. The nal output is the weighted average
of each rules output. The basic learning rule of the proposed network is based on the gradient descent and the chain rule (Werbos,
1974).
The ANFIS learning algorithm is used to obtain these parameters. This learning algorithm is a hybrid algorithm consisting of
the gradient descent and the least-squares estimate. Using this hybrid algorithm, the rule parameters are recursively updated until
an acceptable error is reached. Iterations have two steps, one forward and one backward. In the forward pass, the antecedent
parameters are xed, and the consequent parameters are obtained
using the linear least-squares estimate. In the backward pass, the
consequent parameters are xed, and the output error is backpropagated through this network, and the antecedent parameters
are accordingly updated using the gradient descent method.
In the designing of ANFIS model, the number of membership
functions, the number of fuzzy rules, and the number of training
epochs are important factors to be considered. If they were not selected appropriately, the system will over-t the data or will not be
able to t the data. Adjusting mechanism works using a hybrid
algorithm combining the least squares method and the gradient
descent method with a mean square error method. The aim of
the training process is to minimize the training error between
the ANFIS output and the actual objective. This allows a fuzzy system to train its features from the data it observes, and implements
these features in the system rules. ANFIS has the following layers
as represented in Fig. 3.

Rule 1: If X is A1 and Y is B1, then f1 = p1x + q1y + r1.


Rule 2: If X is A2 and Y is B2, then f2 = p2x + q2y + r1.
The fuzzy reasoning mechanism is summarized in Fig. 1 (Jang,
1996). Weighted averages are used in order to avoid extreme computational complexity in defuzzication processes.
Fig. 2 illustrates graphically the fuzzy reasoning mechanism to
derive an output f from a given input vector [x, y]. The ring
strengths w1 and w2 are usually obtained as the product of the

Fig. 1. First-order Sugeno fuzzy model.

Fig. 2. ANFIS architecture.

1818

A. Karahoca, D. Karahoca / Expert Systems with Applications 38 (2011) 18141822

to each nodes output) recursively from the output layer backward to


the input nodes. This learning rule is exactly the same as the backpropagation learning rule used in the common feed-forward neural
networks (Jang, 1992, 1993, 1996; Chiu, 1997, chapter 9).
3.4. Comparing data mining techniques
Benchmarking the performances of the data mining methods
efciency can be calculated by confusion matrix. Calculation of a
confusion matrix can be done to asses the accuracy of a classication model. The simplest example of a confusion matrix is one for a
binary classication problem. For a binary problem, the confusion
matrix is a two-dimensional square matrix. In general, the confusion matrix is an n-dimensional square matrix, where n is the
number of distinct target values. The row indexes of a confusion
matrix correspond to actual values observed and used for model
testing; the column indexes correspond to predicted values produced by applying the model to the test data. For any pair of actual/predicted indexes, the value indicates the number of records
classied in that pairing. Confusion matrix can be calculated by
using following terms:

Fig. 3. ANFIS model of fuzzy interference.

Algorithm 3. ANFIS algorithm


Layer 0: It consists of plain input variable set.
Layer 1: Each node in this layer generates a membership grade
of a linguistic label. For instance, the node function of the ith
liode may be a generalized bell membership function:

lAi x
1

1
  bi
2

xci
ai

where x is the input to node i; Ai is the linguistic label (small,


large, etc.) associated with this node; and {ai, bi, ci} is the parameter set that changes the shapes of the membership function.
Parameters in this layer are referred to as the premise
parameters.
Layer 2: The function is a T-norm operator that performs the
ring strength of the rule, e.g., fuzzy conjunctives AND and
OR. The simplest implementation just calculates the product
of all incoming signals.

wi lAi xlBi y;

i 1; 2:

Layer 3: Every node in this layer is xed and determines a normalized ring strength. It calculates the ratio of the jth rules
ring strength to the sum of all rules ring strength.

wi
i
;
w
w1 w2

i 1; 2:

Layer 4: The nodes in this layer are adaptive and are connected
with the input nodes (of layer 0) and the preceding node of
layer 3. The result is the weighted output of the rule j.

 i pi x qi y r i
 i fi w
w

 i is the output of layer 3, and {pi, qi, ri} is the parameter set.
where w
Parameters in this layer are referred to as the consequent
parameters.
Layer 5: This layer consists of one single node which computes
the overall output as the summation of all incoming signals.

Overall Output

X
i

P
wf
 i fi Pi i i
w
i wi

1. True positive (TP) corresponds the number of positive examples


correctly predicted by the classication model.
2. False negative (FN) corresponds the number of positive examples wrongly predicted as negative by the classication model.
3. False positive (FP) corresponds the number of negative examples wrongly predicted as positive by the classication model.
4. True negative (TN) corresponds the number of negative examples correctly predicted by the classication model.
For binary classication, rare classes are often denoted as
positive class, majority classes are denoted as the negative class.
A confusion matrix that summarizes the number of instances predicted correctly or incorrectly by a classication model as shown in
Table 4.
The true positive rate (TPR) or sensitivity is dened as the fraction of positive examples predicted correctly by the model as seen
in Eq. (8).

TPR TP=TP FN

Similarly, the true negative rate (TNR) or specicity is dened as


the fraction of negative examples predicted correctly by the model
as seen in Eq. (9).

TNR TN=TN FP

Sensitivity is the probability that the test results indicate churn


behavior given that no churn behavior is present. This is also
known as the true positive rate. Specicity is the probability that
the test results do not indicate churn behavior even though churn
behavior is present. This is also known as the true negative rate.
Correctness is the percentage of correctly classied instances.
RMS denotes the root mean square error for the given dataset
and method of classication. Precision is the reliability of the test
(F-score). RMS, prediction and correctness values indicates important variations.
Another useful method for evaluating classication models is
Receiver Operating Characteristics (ROC) analysis. ROC curves are
similar to lift charts in that they provide a means of comparison beTable 4
A confusion matrix for a binary classication problem.
Predicted class

The constructed adaptive network in Fig. 2 is functionally equivalent


to a fuzzy inference system in Fig. 1. The basic learning rule of ANFIS
is the back-propagation gradient descent (Werbos, 1974), which calculates error signals (the derivative of the squared error with respect

Actual Class

+


TP
FP

FN
TN

1819

A. Karahoca, D. Karahoca / Expert Systems with Applications 38 (2011) 18141822

tween individual models and determine thresholds which yield a


high proportion of positive hits. ROC was originally used in signal
detection theory to gauge the true hit versus false alarm ratio when
sending signals over a noisy channel.
The horizontal axis of an ROC graph measures the false positive
rate as a percentage. The vertical axis shows the true positive rate.
The top left hand corner is the optimal location in an ROC curve,
indicating high TP (true positive) rate versus low FP (false positive)
rate. The area under the ROC curve (AUC) measures the discriminating ability of a binary classication model. The larger the AUC
the higher the likelihood that an actual positive case will be
assigned a higher probability of being positive than an actual
negative case. The AUC measure is especially useful for data sets
with unbalanced target distribution (Han & Kamber, 2006).
4. Findings
In this study, classication model development process can be
divided into three activities. First activity was the data set preparation which covers ltering data, eliminating useless and null data.
Also, attribute importance determination stage was vital for minimizing data set collection.
Second activity was clustering the data into the proper groups.
As studied in Karahoca et al. (2007), static clustering operations
included static number of clusters. But this static constant may
create some handicaps for the prediction process. For this reason,
x-means and fuzzy c-means methods were used to exceed this
shortcoming. As mentioned above, x-means algorithm tries to
solve these problems. Outputs of the x-means passed to the fuzzy
c-means for identifying clusters fuzziness.
ANFIS method was used to generate fuzzy rules and beneted
from the neural network structure of this fuzzy inference system.
In this section, clustering and prediction phases are given
respectively.
4.1. Clustering phase
Clustering phase starts with implementation of the x-means
and fuzzy c-means algorithms respectively. x-Means algorithm applied for nding optimum number of clusters, average of distance
for each cluster point and standard deviation for each cluster. For
this purpose, Dunns Index (DI) was being used. This index is originally proposed to use at the identication of compact and well
separated clusters. So the result of the clustering has to be recalculated as it was a hard partition algorithm. Alternative Dunn Index (ADI) has aim of modifying the original Dunns index. Its
calculation becomes simpler when there is dissimilarity. Fig. 4 represents the optimum number of clusters for x-means algorithm.
Dunn Index and Alternative Dunn Index show ideal cluster number, in this case, algorithm selects minimum count for clusters.
After nding this information about clusters, this data is passed
to the fuzzy c-means algorithm. As mentioned above x-means is
the K-means extended version. In this part of the algorithm, the
centers are attempted to be split in its region by an Improve-Structure part. The decision between the children of each center and itself is done by comparing the BIC-values of the two structures
(Pelleg and Moore, 2000). x-means algorithms conguration
parameters are listed in Table 5.
Parameter cutoff factor means that takes the given percentage
of the splitted centroids if none of the children win. Mean and standard deviation values for all clusters are as summarized in Table 6.
This information is using in fuzzy c-means algorithm.
The fuzzy c-means clustering algorithm uses the minimization
of the fuzzy c-means functions. There are three input parameter
needed to run this function: as the number of clusters or initializ-

Fig. 4. Finding optimum clusters by using x-means algorithm.

Table 5
x-Means algorithm: Conguration parameters.
Parameter

Value

Requested iterations
Iterations performed
Splits prepared
Splits performed
Cutoff factor
Percentage of splits accepted by cutoff factor
Cutoff factor
Cluster centers

1
1
2
2
0.5
0%
0.5
4 centers

Table 6
Clusters.
Cluster

Mean

Standard deviation

0
1
2
3

0.55
0.29
1.0
0.60

0.31
0.12
0.71
0.11

ing partition matrix. The one latter parameter has their default value 5, if they are not given by x-means algorithm. The function
calculates with the standard Euclidean distance norm, the norm
inducing matrix is an N  N identity matrix. The result of the
partition is collected in structure arrays. One can get the partition
matrix cluster centers, the square distances, the number of iteration and the values of the c-means functional at each iteration step.
Table 7 displays number of data for each cluster and their percentages. We can see distribution of data in Fig. 5.
In Fig. 5, the dots remark the data points, the o the cluster centers, which are the weighted, mean of the data. The algorithm can
only detect clusters with circle shape, that is why it cannot really

Table 7
Clustered instances.
Cluster

Percent of row (%)

0
1
2
3

45
12
15
28

1820

A. Karahoca, D. Karahoca / Expert Systems with Applications 38 (2011) 18141822

Fig. 5. Results of fuzzy c-means algorithm with data normalization.

discover the orientation and shape of the cluster right below the
circles in the contour-map are a little elongated, since the clusters
have correct on each other. However, the fuzzy c-means algorithm
is a very good initialization tool for more sensitive methods.
Fig. 6 displays the distributions of churners clusters based on
fuzzy c-means algorithm. Finally software draws ellipses using
regularity vector variables, results shown in Fig. 7 notes that desired criteria may be implemented on choice.

4.2. Prediction phase


ANFIS architecture is built as mentioned in Section 3.3 and gured in Fig. 1. Seven input parameters are used to create fuzzy
rules to determine subscribers clusters.
There are three different Fuzzy models. These are the Mamdani
fuzzy model, the Sugeno fuzzy model, and the Tsukamoto fuzzy

Fig. 7. The display of churners cluster data with self regulating ellipses.

model. Sugeno-type fuzzy model was being used in this study to


gain efcient results effectively. Sub-clustering method was used
for gaining benets on real time processing with high performance
computation. The range of inuence is 0.5, squash factor is 1.25, accept ratio is 0.5; rejection ratio is 0.15 for this training model.
Within these conguration parameters, the ANFIS generated good
performances as listed in Tables 8 and 9.
Roughly fuzzy c-means + ANFIS and ANFIS methods have the
minimal errors and high precisions as shown in Table 8 for training. Also in testing phase fuzzy c-means supported ANFIS has highest values as listed in Table 9.
RMSE values of the methods vary between 0.14 and 0.72, where
precision is between 0.72 and 0.92. RMS of errors is often a good
indicator of reliability of methods. ANFIS and c-means + Ans
methods tend to have higher sensitivity and specicity. While a
number of methods show perfect specicity, the fuzzy c-means + ANFIS has the highest sensitivity.
Test results indicate that ANFIS is a pretty good means to determine churning users in a GSM network. Vertical axis denotes the
test output, whereas horizontal axis shows the index of the testing
data instances as can be seen in Fig. 8.
Table 8
Training results for the methods used.
Method

Ridor
Decision Tree
ANFIS
Fuzzy c-means + ANFIS

Training data
Sensitivity

Specicity

Precision

Correctness

0.90
0.85
0.86
0.91

0.91
0.84
0.85
0.93

0.72
0.73
0.82
0.91

0.67
0.72
0.81
0.93

Sensitivity

Specicity

Precision

Correctness

0.78
0.75
0.85
0.91

0.78
0.73
0.88
0.93

0.72
0.72
0.81
0.92

0.66
0.71
0.80
0.93

Table 9
Testing results for the methods used.
Method

Fig. 6. The display of data without regulating vector ellipse.

Ridor
DT
ANFIS
Fuzzy c-means + ANFIS

Testing data

A. Karahoca, D. Karahoca / Expert Systems with Applications 38 (2011) 18141822

1821

Fig. 10. Membership function for Marital Status.


Fig. 8. ANFIS classication of testing data.

Fig. 9 displays plot of input factors for fuzzy inference and the
output results in the conditions. The horizontal axis has extracted
attributes from Table 2. The fuzzy inference diagram is the composite of all the factor diagrams. It simultaneously displays all
parts of the fuzzy inference process. Information ows through
the fuzzy inference diagram that is sequential.
ANFIS creates membership functions for each input variables.
The graphs show Marital Status, Age, Monthly Expense and Customer Segment variables membership functions. In these properties, changes of the ultimate (after training) generalized
membership functions with respect to the initial (before training)
generalized membership functions of the input parameters were
examined.
Whenever an input factor has an effect over average, it shows
considerable deviation from the original curve. We can infer from
the membership functions that, these properties has considerable
effect on the nal decision of churn analysis since they have significant change in their shapes.
In Figs. 1013, vertical axis is the value of the membership function; horizontal axis denotes the value of input factor.
Marital Status is an important indicator for churn management;
it shows considerable deviation from the original Gaussian curve
as seen in Fig. 10, during the iterative process.
Fig. 11 displays the initial and nal membership functions. As
expected, Age Group found to be an important indicator to identify
churn. In network, monthly expense is another factor affecting the
nal model most. Resultant membership function is displayed in
Fig. 12.
Subscribers customer segment also critically affects the model.
As seen in Fig. 13, deviation from original curve is signicant. These
attributes represented in Figs. 1013 has the highest effect on nal
classication, the process has changed the membership functions
signicantly giving the values more emphasis for the nal decision.
By using this ANFIS structure, following results obtained when
analysing Receiver Operating Characteristics (ROC).
Receiver Operating Characteristics (ROC) analysis, well-established technique in diagnostics, was used for model assessment.
Fig. 14 illustrates the ROC curve for the best four methods, namely

Fig. 11. Membership function for Age Group.

Fig. 12. Membership function for Monthly Expense.

Fig. 13. Membership function for Customer Segment.

fuzzy c-means + ANFI, ANFIS, RIDOR and Decision Trees. The fuzzy
c-means + ANFIS method is far more accurate where the smaller
false positive rate is critical. In this situation where preventing

Fig. 9. Fuzzy inference diagram.

1822

A. Karahoca, D. Karahoca / Expert Systems with Applications 38 (2011) 18141822

Fig. 14. ROC curve for c-means_ANFIS, ANFIS, RIDOR and Decision Trees.

churn is costly, we would like to have a low false positive ratio to


avoid unnecessary customer relationship management (CRM)
costs.
5. Conclusions
The proposed integrated diagnostic system for the churn management application presented is based on a multiple Adaptive
Neuro Fuzzy Inference System with fuzzy c-means. Use of a series
of ANFIS units greatly reduces the scale and complexity of the
system and speeds up the training of the network. The system is
applicable to a range of telecom applications where continuous
monitoring and management is required. Unlike other techniques
discussed in this study, the addition of extra units (or rules) will
neither affect the rest of the network nor increase the complexity
of the network.
As mentioned in (Karahoca et al., 2007), rule based models and
decision tree derivatives have high level of precision, however,
they demonstrate poor robustness when the dataset is changed.
In order to provide adaptability of the classication technique,
neural network based alteration of fuzzy inference system parameters is necessary. The results prove that, ANFIS method combines
both precision of fuzzy based classication system and adaptability
(back-propagation) feature of neural networks in classication of
data.
One disadvantage of the ANFIS method is that the complexity of
the algorithm is high when there are more than a number of inputs
fed into the system. However, when the system reaches an optimal
conguration of membership functions, it can be used efciently
against large datasets.
Based on the accuracy of the results of the study, it can be stated
that the ANFIS models can be used as an alternative to current CRM
churn management mechanism (detection techniques currently in
use). This approach can be applied to many telecom networks or
other industries, since it is once trained, it can then be used during
operation to provide instant detection results to the task.
References
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algoritms. New
York: Plenum Press.
Chiu, S. L. (1997). Extracting fuzzy rules from data for function approximation and
pattern classication. In D. Dubois, H. Prade, & R. Yager (Eds.), Fuzzy information
engineering: A guided tour of applications. John Wiley & Sons. Chapter 9.

Cohen, W. (1995). Fast effective rule induction. In Proceedings of the 12th


international conference on machine learning, Lake Tahoe, CA (pp. 115123).
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting
compact well-separated clusters. Journal of Cybernetics, 3, 3257.
Frank, E., & Witten, I. H. (2005). Data mining: Practical machine learning tools and
techniques. San Francisco: Morgan Kaufmann. 0-12-088407-0.
Freund, Y., & Mason, L. (1999). The alternating decision tree learning algorithm. In
Proceedings of the sixteenth international conference on machine learning, Bled,
Slovenia (pp. 124133).
Gerpott, T. J., Rams, W., & Schindler, A. (2001). Customer retention loyalty and
satisfaction in the German mobile cellular telecommunications market.
Telecommunications Policy, 25(10-11), 885906.
Hadden, J., Tiwari, A., Roy, R., & Ruta, D. (2005). Computer assisted customer churn
management: State of the art and future trends. Journal of Computers and
Operations Research, 34(10), 29022917.
Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques. San Francisco:
Morgan Kaufmann. 978-1-55860-901-3.
Hung, S-Y., Yen, D. C., & Wang, H. Y. (2006). Applying data mining to telecom churn
management. Journal of Expert Systems with Applications, 31(3), 515524.
Jang, J.-S. R. (1992). Self-learning fuzzy controllers based on temporal back
propagation. IEEE Transactions on Neural Networks, 3(5), 714723.
Jang, J.-S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference system. IEEE
Transactions on Systems Man and Cybernetics, 23(3), 665685.
Jang, J.-S. R. (1996). Input selection for ANFIS learning. Proceedings of the IEEE
International Conference on Fuzzy Systems, 14931499.
Karahoca, A., Karahoca, D., & Aydin, N. (2007). GSM churn management using an
adaptive neuro-fuzzy inference system. In International conference on intelligent
pervasive computing (pp. 323326). Korea: JEJU.
Mamdani, E. H., & Assilian, S. (1975). An experiment in linguistic synthesis with a
fuzzy logic controller. International Journal of ManMachine Studies, 7(1), 113.
Mozer, M. C., Wolniewicz, R., Grimes, D. B., Johnson, E., & Kaushanksky, H. (2000).
Predicting subscriber dissatisfaction and improving retention in the wireless
telecommunications industry. IEEE Transactions on Neural Networks, 11(3),
690696.
Pelleg, D., & Moore, A. (2000). X-means: Extending K-means with efcient
estimation of the number of clusters. In Proceedings of the 17th international
conference on machine learning (pp. 727734). Morgan Kaufmann.
Sugeno, M., & Kang, G. T. (1988). Sturcture identication of fuzzy model. Fuzzy Sets
and Systems, 28, 1533.
Takagi, T., & Sugeno, M. (1985). Fuzzy identication of systems and its application
to modeling and control. IEEE Transactions on Systems Man and Cybernetics, 15,
116132.
Tsukamato, Y. (1979). An approach to fuzzy reasoning method. In M. M. Gupta, R. K.
Ragade, & R. R. Yager (Eds.), Advances in fuzzy set theory and applications
(pp. 137149). Amsterdam: North-Holland.
Wei, C. P., & Chiu, I-T. (2002). Turning telecommunications call details to churn
prediction: A data mining approach. Journal of Expert Systems with Applications,
23, 103112.
Werbos, P. (1974). Beyond regression, new tools for prediction and analysis in the
behavioural sciences. Unpublished PhD Thesis. Harvard University.
Yan, L., Fassion, M., & Baldasare, P. (2005). Predicting customer behavior via calling
links. In Proceedings of international joint conference on neural networks (pp.
25552560), Montreal, Canda.
Yu, W., Jutla, D. N., & Sivakumar, S. C. (2005). A churn management alignment
model for managers in mobile telecom. In Proceedings of the 3rd annual
communication networks and services research conferences (pp. 4853).

S-ar putea să vă placă și