Sunteți pe pagina 1din 7

Analysis of Case study

Unsupervised learning method is used because


there is no target field in this case.
Unsupervised learning method is used to uncover
meaningful patterns in the data.
Clustering is often called an unsupervised learning
task as no class values denoting an a priori grouping of
the data instances are given.
Analysis of Case study(Contd..)
Using the SAS Enterprise Miner, a sample of
approximately 13,000 accounts is created.
To generate the clusters for analysis ,the value k and
cluster variable names are given in the cluster model
parameters option.
The algorithm will try to arrange the data around the
four clusters in such a way as to minimize differences
within clusters at the same time that it maximizes
differences between clusters.
Analysis of Case study(Contd..)
In this case, The parameters for the cluster analysis
were set to 40 clusters
In one of the clusters ,they found that that account
holders in one of those cluster made a high amount of
weekend and holiday purchases, restaurant purchases
and hotel purchases
These accounts are problematic as the patterns
exhibited by them clearly indicate improper use of
purchase cards for personal and unwarranted expenses.
Conclusion to the case
Cluster analysis yields substantive results in the
absence of a target field.
Used wisely, cluster analysis can help an organization
interested in fraud detection build a knowledge base of
fraud.
The ultimate objective would be the creation of
supervised learning model such as a neural network
that is focused on uncovering fraudulent transactions.
Recommendation
Data clustering should used for pattern detecting rather than
mere exploratory research
Cluster models should be tested before applying to new data,
producing cases for investigation
Investigation to cluster models must be carried out to validate
the conclusion derived from such patterns
Cluster models needs to be revisited if investigation show the
models judgment to be erroneous
Findings from cluster model investigation must be stored in a
knowledge base
Strength of the Case Study
Use of Cluster Analysis for pattern detection on fraudulent
behaviour
Exploratory analysis may be satisfied to discover some interesting cases
in the data
Pattern discovery will leverage the existing clusters and the general
patterns associated with those clusters to assign new cases to clusters
Use of SAS for data processing
Sample of approximately 13000 accounts created. Data processing
needed a tool which can handle such a large dataset
Creation of knowledge base on the basis of analysed account data
This will help in detection of future incidents without having to perform
the whole test again
Limitation of the Case Study
Possibility of wrong sample being chosen for performing analysis
Samples need to be representative of the total population so that models have
a chance to see possible combinations of fields
Since there is no target field, there is a possibility of wrongful
classification of raw data available as fraudulent
Cluster analysis is used in the case as a pattern detection technique; therefore,
the resulting cluster model would need to be tested were it is to be applied
The clusters created from the sample data are on the basis of hypothetical
assumptions
Possibility of error in the proposed model causing the requirement for a
revisit of the employed cluster analysis
The model would still need to be tested by using new data to ensure that the
clusters developed are consistent with the current model

S-ar putea să vă placă și