Sunteți pe pagina 1din 10

International Journal of Computer Science Engineering and

Information Technology Research (IJCSEITR)


ISSN(P): 2249-6831; ISSN(E): 2249-7943
Vol. 6, Issue 5, Oct 2016, 75-84
TJPRC Pvt. Ltd.

ANALYSIS AND ENHANCEMENT OF PROCESS MODEL USING


SCORING FOR CUSTOMER RELATIONSHIP MANAGEMENT
REKHA ARUN1 & JEBAMALAR TAMILSELVI2
1
2

Research Scholar, Sathyabama University, Tamil Nadu, India

Assistant Professor, Jaya Engineering College, Tamil Nadu, India

ABSTRACT
A potential and valuable customer is identified only through the 360 degree complete analysis. The identification
process uses various business models in CRM. A number of researchers had made efforts to use such process models to
direct them to implement in mining large amount of data. This paper mainly focuses on the comparative analysis of most
popular data mining process models viz., Knowledge Discovery Databases (KDD) process model, CRISP-DM and
SEMMA as well as enhancement of CRISP-DM in its modeling technique. This comparative study shows that the KDD
and SEMMA are almost similar and CRISP-DM is best suited to the business analysis which is related to the
identification of potential customer in CRM, the major objective of this paper. Also the investigation revealed that the
inclusion of scoring model in the modeling phase of CRISP-DM provides optimum result in identifying the potential
customer through the process models.

Received: Sep 13, 2016; Accepted: Oct 07, 2016; Published: Oct 13, 2016; Paper Id.: IJCSEITROCT20169

1. INTRODUCTION
Data mining is an innovative process that needs various skill and knowledge. With available standard

Original Article

KEYWORDS: Process Model, KDD, SEMMA, CRISP-DM, Scoring Model, CRM

models, different data projects are carried out. It is interpreted that the success of the project depends on the process
model used. These models are used to translate the business challenges into various data mining tasks, recommend
appropriate data transformation and data mining technique, and give method for assessing the efficiency of the
result and prepare the document of the learning. Acceptance of common process model in the market provides more
benefits where the model serves as a general reference point for discussing and thus increases the understanding of
vital data mining challenges for pointing out the potential customer. The familiar models are KDD, SEMMA,
CRISP-DM.
Data mining is one of the phase of KDD process (Fayyad et al., 1996) and in (Brachman & Anand, 1996).
The Phrase knowledge discovery in database or KDD was termed in 1989 which refers to the extended process of
identifying information from data, and to highlight the high end application of specific datamining method
(Fayyad et al, 1996). SEMMA was developed by the SAS Institute. The acronym SEMMA stands for Sample,
Explore, Modify, Model, Assess, and refers to the process of conducting a data mining project. SEMMA is
simple to understand, allows a structured and sufficient development and maintaining of data mining project. Thus
it conferred an organization for conception, creation and evolution, and helps to present solution to business
problem as well as to identify the CRM goals. (Santos & Azevedo, 2005).

www.tjprc.org

editor@tjprc.org

76

Rekha Arun & Jebamalar Tamilselvi

The process of CRISP-DM was generated by the effort of an association composed of Daimler Chryrler, SPSS
and NCR. CRISP-DM stands for CRoss-Industry Standard Process for DataMining (Chapmen et al, 2000). At the time of
analysing the documentation in these process models, the similarities and dissimilarities of them are understood.
This paper also deals with the enhancement of CRISP-DM with including scoring model in the modelling phase.
Scoring model is a predictive system that is used for assessing the credit worthiness, optimization of direct marketing and
models used in CRM that allows the predicting of future behaviour of customers. The scoring model is delivered as score
table containing the scores of customers with respect to various parameters.
The remaining part of this paper is organized as section 2- comparative study of existing process models, section
3- Enhancement of CRISP-DM with scoring model, section 4- Results and discussion section, 5- Conclusion and Future
work.

2. COMPARATIVE STUDY OF EXISTING MODELS


2.1 The KDD Process
Fayyad et al (1996) presents that KDD is a process that uses data mining methods for extracting the knowledge
based on the specific measure and threshold with a database with required stages of pre-process, sub-sample and database
transformation. It has five stages:
1. Selection: This stage performs the creation of target data set, or focus on a variable subset or data sample on
which the knowledge discovery is to be done.
2. Pre Processing: This stage is responsible for the cleaning of target data and pre processing to get the consistent
data.
3. Transformation: This stage transforms data using dimensionality reduction or transformation methods.
4. Data Mining: This stage searches the pattern of interest in a specific denoted format based on the objective.
5. Interpretation/Evaluation: This stage interprets and evaluates the mined pattern.
2.2 The SEMMA Process
The SEMMA process was developed by the SAS Institute. The acronym SEMMA stands for Sample, Explore,
Modify, Model, Assess, and refers to the process of conducting a data mining project. The SAS Institute considers a cycle
with 5 stages for the process:
1. Sample: This stage deals with sampling the data by the extraction of a part of large data set that holds
important information but could be manipulated quickly and this stage is considered optional.
2. Explore: This stage explores the data by searching for unanticipated trend and anomaly to understand and gain
ideas.
3. Modify: This stage modifies the data by creating, selecting, and transforming the variable to focus the model
selection process.
4. Model: This stage models the data by facilitating the software to search a combination of data that predicts a
desired result optimally.
Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Analysis and Enhancement of Process Model Using


Scoring for Customer Relationship Management

77

5. Assess: This stage evaluates the data by assessing the worth and consistency of the finding from the process of
data mining process and its performance.
Even though SEMMA process is not dependent on the selected tool, it is associated with the SAS Enterprise
Miner software and acts as if guides the users on the implementation of DM application. SEMMA offer a simple to
understand process that allows unstructured and sufficient development and maintaining of data mining project.
2.3 The CRISP-DM Process
The CRISP-DM process was designed by the group that included DaimlerChryrler, SPSS and NCR. CRISP-DM
stands for CRoss-Industry Standard Process for DataMining. It consists on a cycle that comprises six stages:
1. Business Understanding: This first stage focusing in the understanding of objectives of the project and needs
from the business view. Later converting the knowledge in to data mining problems and initial plan developed for the
achievement of the objective.
2. Data Understanding: This phase commences with the initial data set and further proceeds with the actions that
make the data familiar and identifies the data quality issues, discovers the first view of data or find required subset to form
hypothesis on hidden information.
3. Data Preparation: This includes entire actions to build the final data set from the initial rough set.
4. Modeling: In this stage, different modelling technique is chosen and implemented with their calibrated
parameters to best values.
5. Evaluation: This stage evaluates the model as well as the steps included in constructing the model. It achieves
the exact objective of business.
6. Deployment: Creating a model is not the end. Its purpose is to increase the knowledge gain, and to present it in
the user friendly manner. (Chapman et al, 2000)
2.4 Comparison
With the comparison of KDD and SEMMA stagesit confirms the equivalency between them: Sample is similar to
Selection;
Explore is similar to Preprocessing;
Modify is similar to Transformation;
Model is similar to DM;
Assess is similar to Interpretation/Evaluation.
By thorough investigation, it is observed that the entire five stages of SEMMA process are similar to the practical
implementation of all the five phases of KDD process. At the same time, when compared to KDD stages the CRISP-DM
stages are not as straightforward as in the SEMMA environment. But it is observed that the CRISP-DM methodology
includes the steps given above; either precedes or succeeds the KDD process. The Business Understanding phase is deals
with the development of an understanding of the application domain related to the previous knowledge and goal of the final
user. The Deployment phase incorporates this knowledge to the working system. While considering the other stages, it is
www.tjprc.org

editor@tjprc.org

78

Rekha Arun & Jebamalar Tamilselvi

said that: The Data Understanding phase is the blend of Selection and Pre processing; The Data Preparation phase is related
to Transformation; The Modeling phase is compared with DM and finally the Evaluation phase with
Interpretation/Evaluation.
Table 1, presents a summary of the correspondence:
Table 1: Summary of the Correspondences between KDD, SEMMA and CRISP-DM

With previous researches it is observed that the data mining experts follow the KDD process model due to its
completeness and accurateness. In contra, CRISP-DM and SEMMA are highly company oriented. In specific, SEMMA is
used by SAS enterprise miner and integrate with their software. However, studies prove that CRISP-DM is more complete
when compared to SEMMA. These process models help the users and experts to understand the application of data mining
in the practical environment. The CRISP-DM process was developed as a process which is industry oriented and
tool-neutral. From the embryonic knowledge discovery process implemented in the early data mining projects which
responded directly to user requirement, this model can be applied to various industry sector. This model works on larger
data, with fastness, cheaper, consistent and more manageable. Not only larger data, even the small level data mining
exploration benefits of using CRISP-DM.

3. ENHANCEMENT OF CRISP-DM WITH SCORING MODEL


The Steps in CRISP-DM are
The CRISP-DM model is given described in terms of a hierarchical process model that contains a collection of
tasks explained at several levels of abstraction.

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Analysis and Enhancement of Process Model Using


Scoring for Customer Relationship Management

79

Figure 1: Shows the Phase of the CRISP-DM


Among the steps of CRISP-DM, most of the companies apply various statistical models in the modeling phase
for optimizing their activities.
With the various modeling technique existing, scoring model is used for assessing credit worthiness, bad debt
collection activities and optimizing direct marketing in CRM. This model is a special kind of predictive model because it
allows predicting the future behavior of clients and in turn the potential customer. Predictive model predicts chance of
occurring of an arbitrary event or fact of its occurrence, for example: default on loan payment, an accident, client agitation
or attrition, or being a good. Decisions supported with the help of scoring model when compared to the general rules shows
the increase in profit by 10-30%.
The examples of scoring models are as follows:
Credit Risk
Forecasting of credit risk of a customer before granting a loan (application scoring)
Forecasting of risk for a loan already granted to a customer (behavioral scoring)
Detecting fraud / unusual transactions (fraud detection)
Forecasting of mailing campaign answering (response scoring)
Selecting optimal bad debt collection actions
Whether a client is using all the products bought? (activation scoring)
Extension of usage of product bought? (usage scoring)
Whether the customer buys a product along with some other product? (cross-selling)
www.tjprc.org

editor@tjprc.org

80

Rekha Arun & Jebamalar Tamilselvi

Whether a customer buys a product requested earlier (e.g. will decide to have higher credit limit)? (up-selling)
Using a product less (attrition scoring)
Stopping using a product jointly with starting using another product it is a problem often occuring in telecoms
(churn)
It is vital in situations where only small data set is available. For instance, this happens while constructing a model
to assess the credit worthiness to verify the customers who apply of the mortgage loan where the sample is smaller when
compared to cash or retail loan. With the less data more significant methods are to be selected for building the model. In
case, where data are extensively gig, then optimal choice of method and knowledge in analyzing the data plays a key role
and is a major factor to success. Best suited method allows evaluating the uncertainty that causes reduction in risk.
Implementation of best model directly increases the profit and competitiveness. This is especially important during
economical recession.
Pseudo code for redit scoring the calculation done to find the potential customers

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Analysis and Enhancement of Process Model Using


Scoring for Customer Relationship Management

81

Data Set Used: The German Credit data set contains observations on 30 variables for 1000 past applicants for
credit. Each applicant was rated as good credit (700 cases) or bad credit (300 cases). New applicants for credit can also
be evaluated on these 30 "predictor" variables.

4. RESULTS AND DISCUSSIONS


With this data set the credit scoring rule is generated for determining whether a new applicant is a good credit risk
or a bad credit risk, depending on the values for one or more of the predictor variables.
All the variables are explained in Table 1.1

www.tjprc.org

editor@tjprc.org

82

Rekha Arun & Jebamalar Tamilselvi

Table 1.2, below, shows the values of these variables for the first several records in the case.
Table 1.2: The Data (First Several Rows)

The consequences of misclassification have been assessed as follows: the costs of a false positive
(incorrectly saying an applicant is a good credit risk) outweigh the cost of a false negative (incorrectly saying an applicant
is a bad credit risk) by a factor of five. This can be summarized in the following table.
Table 1.3: Opportunity Cost Table (In deutch Marks)

The Opportunity Cost table was derived from the average net profit per loan as shown below:
Table 1.4: Average Net Profit

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Analysis and Enhancement of Process Model Using


Scoring for Customer Relationship Management

83

Useful graphs include the lift chart, Kolmogorov Smirnov chart, and other ways to assess the performance of the
scoring model. For example, the following graph shows the Kolmogorov Smirnov (KS) graph for a credit scoring model.

Figure 2
In this graph, the X axis shows the credit score values (sums), and the Y axis denotes the cumulative proportions
of observations in each outcome class (Good Credit vs. Bad Credit) in the hold-out sample. The further apart are the two
lines, the greater is the degree of differentiation between the Good Credit and Bad Credit cases in the hold-out sample, and
thus, the better (more accurate) is the model.

5. CONCLUSIONS AND FUTURE WORK


With the objective to find the potential customer, this paper focused on the process models to enhance the data
mining. Three different process models viz., KDD, SEMMA and CRISP-DM are compared with their performance. It is
concluded that the CRISP-DM is best suited for business analysis. It is determined to enhance the CRISP-DM in its
modeling phase with the predictive model called scoring model. The scoring model enables the CRM to identify the
potential customers through the credit risk by distinguishing good credit and bad credit. Objective of scoring model is not
only to determine the credit worthiness of the customer but also to maintain customer relationship management (CRM) to
retain the customer and maintain the overall profit portfolio. CRISP-DM with scoring model shows the optimization of the
result obtained. Further the evaluated can be classified with an enhanced meta-heuristic algorithm to help CRM for
deciding the potential customer.

www.tjprc.org

editor@tjprc.org

84

Rekha Arun & Jebamalar Tamilselvi

REFERENCES
1.

Fayyad, U. M. et al. 1996. From data mining to knowledge discovery: an overview. In Fayyad, U. M.et al (Eds.),Advances in
knowledge discovery and data mining. AAAI Press / The MIT Press.

2.

Benot, G., 2002. Data Mining. Annual Review of Information Science and Technology, Vol. 36, No. 1, pp 265-310.

3.

Brachman, R. J. & Anand, T., 1996. The process of knowledge discovery in databases. In Fayyad, U. M. et al. (Eds.),
Advances in knowledge discovery and data mining. AAAI Press / The MIT Press.

4.

Chen, M. et al, 1996. Data Mining: An Overview from a Database Perspective. IEEE Transactions on Knowledge andData
Engineering, Vol. 8, No. 6, pp 866-883.

5.

Simoudis, E., 1996. Reality check for data mining. IEEE Expert, Vol. 11, No. 5, pp 26-33.

6.

Fayyad, U. M., 1996. Data mining and knowledge discovery: making sense out of data. IEEE Expert, Vol. 11 No. 5, pp20-25.

7.

Dzeroski, S., 2006. Towards a General Framework for Data Mining.. In Dzeroski, S and Struyf, J (Eds.), Knowledge
Discovery in Inductive Databases. LNCS 47474. Springer-Verlag.

8.

Meo, R. e tal, 1998. An Extension to SQL for Mining Association Rules. Data Mining and Knowledge Discovery Vol. 2,pp
195-224. Kluwer Academic Publishers.

9.

Imielinski, T.; Virmani, A., 1999. MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery
Vol. 3, pp 373-408. Kluwer Academic Publishers.

10. Sarawagi, S. et al, 2000. Integrating Association Rule Mining with Relational Database Systems: Alternatives
andImplications. Data Mining and Knowledge Discovery, Vol. 4, pp 89125.
11. Botta, Marco, et al, 2004. Query Languages Supporting Descriptive Rule Mining: A Comparative Study. Database Support for
Data Mining Applications. LNAI 2682, pp 24-51.
12. SAS Enterprise Miner SEMMA. SAS Institute.
13. Accessed from http://www.sas.com/technologies/analytics/datamining/miner/semma.html, on May 2008
14. Santos, M &Azevedo, C (2005). Data Mining Descoberta de Conhecimentoem Bases de Dados. FCA Publisher.
15. Chapman, P. et al, 2000. CRISP-DM 1.0 - Step-by-step data mining guide.
16. Accessed from http://www.crisp-dm.org/CRISPWP-0800.pdf on May 2008

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

S-ar putea să vă placă și