Sunteți pe pagina 1din 6

Credit Card Fraud Detection System

V. Filippov
Institute of Control Sciences Moscow, Russia Email: lippovs@umail.ru

L. Mukhanov
Insitute of Electronic Controlling Machines Moscow, Russia Email: lmukhanov@fraudprevention.ru

B. Shchukin
Institute of Control Sciences Moscow, Russia Email: tsh@cyber.mephi.ru

AbstractThe use of credit cards is prevalent in modern day society. But it is obvious that the number of credit card fraud cases is constantly increasing in spite of the chip cards worldwide integration and existing protection systems. This is why the problem of fraud detection is very important now. In this paper the general description of the developed fraud detection system and comparisons between models based on using of articial intelligence are given. In the last section of this paper the results of evaluative testing and corresponding conclusions are considered.

I. I NTRODUCTION The use of credit cards is prevalent in modern day society. But as in other related elds, nancial fraud is also occurring in spite of the chip cards worldwide integration and existing protection systems. This is why most software developers are trying to improve existing methods of fraud detection in processing systems. The majority of such methods are rules based models. Such models allow bank employees to create the rules describing transactions that are suspicious. But the number of transactions per day is large and new types of the fraud appear quickly. Therefore, it is very difcult to track new types of fraud and to create corresponding rules in time. It would require a signicant increase in the number of employees. Such problems can be avoided using of articial intelligence. But this task is very special and complex models are not acceptable because of authorization time limits [1],[2]. The use of Bayesian Networks is suitable for this type of detection, but results from previous research showed that some input data ( attributes of transaction) representation method should be used for effective classication [3]. For transaction monitoring by bank employees the clustering model was developed. This model allows provision of fast analysis of transactions by attributes. In this paper a general description of the developed credit card fraud detection system, the clustering model, the Naive Bayesian Classier and the model based on Bayesian Networks with the data representation method are considered. Finally, conclusions about results of models evaluative testing are made. II. F RAUD D ETECTION S YSTEM In this system two modules (FDS ONLINEP and FDS OFFLINEP) for fraud detection ( transaction classication) are used. The FDS ONLINEP module is used for on-line fraud detection, i.e. fraud detection process during authorization of transactions in a bank processing system.

Fig. 1.

Structure of the Fraud Detection System

In this module different models for the fraud detection can be used. If a transaction is recognized as fraudulent, in this module, then a corresponding message will be sent to the processing system and this transaction can be declined. Classication process takes some time and the time of transaction authorization is limited. This is why some models can not be applied in the FDS ONLINEP module because of exceeding time limits. These models are used in the FDS OFFLINE module.FDS OFFLINE module allows the system to detect fraud among transactions that have been already authorized and were classied in the FDS ONLINEP module. For the storage of incoming transactions, statistical data for corresponding models, results of classication and generic parameters a FDS Data Warehouse is used. Module FDS ALERT is used for alerting credit card holders in case of fraud recognition by the FDS ONLINEP module using SMS or email messages. For building statistical data the FDS BUILDSTATP module is used. In some models ( for example, the model based on Bayesian Networks ) it is required to build statistical data with a dened period, and this module allows classication of transactions during the building of statistical data( Fig. 1). Following models are used for the fraud detection: 1) A model is based on using data clusterization regions of parameters values ( the clustering model). In this model parameter analysis of legal and fraudulent transactions is conducted in order to nd regions of data clusterization for each parameter. During the process of classication, parameters of an incoming transaction

Authorized licensed use limited to: Pune Institute of Computer Technology. Downloaded on January 6, 2010 at 06:14 from IEEE Xplore. Restrictions apply.

TABLE I I NPUT ATTRIBUTES Input attribute Message type Type of transaction Network identication Day of registration in system Time of registration in system(in minutes) Day of registration in device Time of registration in device(in minutes) Amount of transaction Currency of transaction Terminal type Language code( for ATM only) Acquirer institution identication Acquirer institution country code Merchant identication Card data input method Card present ag Cardholder present ag Cardholder authentication method Example of attribute value 1031 700 25 2 720 1 730 1000 840 1 7 109428 643 456783 2 1 1 1

TABLE II VALUES OF ONE ATTRIBUTE Number of value 1 2 3 4 5 6 7 8 Value 1200 1220 1260 1270 720 730 780 800

TABLE III F OUND REGIONS OF DATA CLUSTERIZATION Number of region 1 2 Center of region 757.5 1237.5 Maximum deviation 42.5 37.5

2)

3)

4)

5)

are compared with these regions of data clusterization. If parameters of the transaction are typical for legal then the transaction is recognized as legal. Alternatively if the parameters of the transaction are typical for fraud then the transaction is recognized as fraudulent; A model is based on Bayesian Networks. This model allows us to get a probabilistic estimation of classied transaction conformity with legal and fraudulent transactions; A model is based on minimal differences in times between geographic places where transactions have taken place. A transaction will be recognized as fraudulent if the difference in time between the place of the last transaction and the place where previous transaction has taken place is less than possible difference; A model is based on heuristic rules. This model allows the user to assign suspicious parameters of a transaction that are typical for fraud. It is also possible to assign a suspicious sequence of transactions; A model is based on limits. For example, these limits allow the user to restrict the amount of transactions per month for one card or the amount of transactions from one nancial institution per month ( per day). III. T HE C LUSTERING M ODEL

Fig. 2.

Regions of data clusterization

This model is based on the use of the parameters data clusterization regions. In this system 24 real parameters of transactions are used for classication.All of them are discrete ( 18 input attributes are described in Table I). To nd regions of data clusterization a corresponding analysis should be provided at rst. For example, if in the training data we observed values from Table II, then the regions of clusterization from Table III will be found.

In order to determine these regions of clusterization rst we need to nd the maximum difference (DIF Fmax ) between values of an attribute in the training data. This difference (DIF Fmax ) is split into Ninterval segments. Ninterval is the binary logarithm of the attribute values account Npoints . In general, Ninterval can be found using another way of looking. Such calculation of Ninterval is based on the assumption that a twofold increase of Npoints will be equal to Ninterval plus one. For each found segment the calculation of the average value and the corresponding deviation for hit attribute values is made. Thus Ninterval centers and corresponding deviations that describe all values of the certain attribute from the training data appears( Fig. 2). This information is collected for each attribute of a transaction during the learning process ( the building of statistical data), separately for legal and fraudulent transactions. During classication, we must compare parameters of a transaction with regions of data clusterization for legal transactions. If the transaction is not typical for legal transactions then it is compared with regions of data clusterization for fraudulent transactions. The transaction is recognized as fraudulent if it was found as a typical fraudulent transaction. Comparisons are made for each parameter of a transaction with found regions of data clusterization for this parameter.

Authorized licensed use limited to: Pune Institute of Computer Technology. Downloaded on January 6, 2010 at 06:14 from IEEE Xplore. Restrictions apply.

If a value of a transaction parameter hits into any region ( deection of transaction parameter value from the center of a region is less than the corresponding deviation) then this parameter is recognized as typical and as the result of classication for this parameter ( Classi ) value 1 is returned. If the value of the parameter does not hit in any of corresponding regions then it is recognized as not typical and as the result of classication for this parameter ( Classi ) value -1 is returned. The nal result of classication of the whole transaction is the linear combination of classication results for each parameter: Result = 1 Class1 + 2 Class2 + ... + n Classn Here Classi is the result of comparing parameter i with corresponding regions of data clusterization, n is an account of parameters, wi is a factor ( weight factor ) of the importance of classication result of parameter i for the whole transaction classication process [4]. If Result is greater than 0 then the transaction is recognized as typical. If Result is less than 0 then the transaction is recognized as not typical. The absolute value of Result is the accuracy of the transaction classication. Weight factors can be changed by bank employees according to importance of corresponding parameter for the whole transaction classication. If it is demanded to avoid an inuence of a parameter to the transaction classication process then a corresponding weight factor should be decreased. At rst it is supposed that wi = 1/n. If any weight factors were changed then all weight factors should be normalized:
n

discrete this probability can be calculated as ( the discrete distribution): Kc + 1 N +1 Here Kc is the account of the value x in the learning data set for an attribute i and a transaction class c. N is the account of instances in the learning data set. In this approach numeric attributes should be transformed into discrete [6]. The disadvantage of this approach is that it is required to store all observed values in the training data which is difcult for a considerable amount of transactions. It is possible to use the normal distribution [5],[7], where P (Xi = x|C = c) = i c =
i Dc =

1 xi n j =1 cj
n

1 2 (xi i c) n j =1 cj

i P (Xi = x|C = c) = g (x, i c , Dc ) i g (x, i c , Dc ) = 2 (x i 1 c) exp i 2 i 2 (Dc ) 1 Dc

Here i c is the mean value of an attribute i for a class c and is the standard deviation of attribute i values for a class c. The disadvantage of this approach is concerned with the assumption about corresponding attribute values to the normal distribution. It is better to calculate the class probability using the kernel density estimation [5],[8]:
i Dc

i = i /(
j =1

j )

P (C = c|X = x) =

1 g (x, xi , D) n i=1

IV. T HE NAIVE BAYESIAN C LASSIFIER It is better to start the research of Bayesian Networks for the credit card transaction classication process from the Naive Bayesian Classier because of its simplicity. The Naive Bayesian Classier is one of the forms of Bayesian Networks, where the conditional independence of attributes ( except of class attribute) is supposed. The nal decision about a transaction class is made after the class probability estimation [5]: P (C = c|X = x) = P (C = c) P (X = x|C = c) P (X = x)

1 D= n In this case the Gaussian is averaged over a set of all attribute values that has been observed in the training data. Such an approach allows us to get a closer estimation of a real attribute distribution, however the storing of all values is needed. V. BAYESIAN B ELIEF N ETWORKS AND T HE MDL P RINCIPLE The assumption about the conditional independence of attributes has a great inuence to the efcacy of Bayesian Network classication [9],[10]. This is connected with a strong impact of different attributes to each other. For example, the currency of transaction has an impact on the amount of transactions. Therefore, a structure of dependence network between used attributes of transaction should be found at rst [11]. To get this information the Rissanens Minimal Description Length principle is used [12],[13]. This principle is based on the quantitative characteristic M DL: M DL = log2 N |B | LL 2

Here C is the random variable denoting the class of an instance ( transaction) and X is a vector of random variables denoting the observed attribute value vector. In this calculation X = x represents the event that X1 = x1 X2 = x2 ... Xn = xn . Because of the assumption about the attributes conditional independence: P (X = x|C = c) =
i

P (Xi = xi |C = c)

To compute P (Xi = xi |C = c) different approaches can be used. At rst, under the assumption that all attributes are

Authorized licensed use limited to: Pune Institute of Computer Technology. Downloaded on January 6, 2010 at 06:14 from IEEE Xplore. Restrictions apply.

Fig. 3. Results of the Naive Bayes method classication using different approaches for P legal calculation

Fig. 4. Results of the Naive Bayes method classication using different approaches for P f raud calculation

LL = N
i=1 Xi

I (Xi , Xi ) P (x, y ) P ( x) P ( y )

I (X, Y ) =
X,Y

P (x, y ) log2

In this expression N is the account of attributes, |B | is the account of dependence edges in the Bayesian Network, Xi is a set of attributes that depends on the attribute Xi . For example, if X1 depends on X2 and X3 then X 1 = {X2 , X3 }, log2 (N |B |)/2 - this characteristic describes the amount of bits required to keep corresponding network in memory. I (X, Y ) is the characteristic of the dependence between X and Y . LL is the nal characteristic of the dependence between all used attributes in the formed network. For all possible combinations of attributes a calculation of LL is made. The searching of the optimal conditional dependence network between attributes is the searching of the minimum M DL. A detailed algorithm of the network searching for Bayesian Networks using the Rissanens Minimal Description Length was described in reference [10]. But results of the conducted evaluative testing showed that the Bayesian Networks method is not effective enough. The main problem is that parameters of a transaction would not have any common distributions. For example, the type of transaction can have values that group around number 800 or number 1000 only. Therefore it is very difcult to nd the suitable distribution. This is why the following original method for input data representation was developed. The key idea of the new method is the use of attributes that correspond to real attributes of a transaction, but for these attributes only two values are possible ( 0 or 1). In this case an input attribute has value 1 when a real value of this parameter has been observed in the training data ( else 0). Thus to get the input value of an attribute, it is required to know all observed real values of this attribute in the training data. But it requires to store all observed values and it is not desirable. To solve this problem the developed clusterization algorithm can be used.

In this case an input attribute has a value 1 when a real value of this parameter hits into some region of data clusterization, or else it has value 0. Such a method is suitable for the Naive Bayesian Classier only. For Bayesian Networks it is better to use a number of clusters where a value of an attribute hits during classication. This allows us to consider the dependence between clusters of values for different attributes. To calculate the class probability it is required to store all observed combinations for each region ( for example, the dependence between attributes X1 , X2 , X3 ). A more robust approach can be used. It is possible to transform dependent attributes in one independent attribute. Such approach has been chosen for the evaluative testing. The process of the training for the Naive Bayesian Classier and Bayesian Networks using this representation data method is split into two tasks. The rst task is to nd regions of data clusterization for each of the attributes. The second task is to build the statistical data for the Naive Bayesian Classier or Bayesian Networks. Thus all training data consists of two parts, one is intended for the searching of data clusterization regions and the other is for building of statistical data. The approach based on the discrete distribution is used for the class probability calculation. It should be mentioned that the searching of the dependence network structure is an independent task. This task is performed before the training processes by the system using representative data for all bank cards. The class probability is calculated as ( the discrete distribution): Kc + 1 N +1 Here Kc is account of cases when a value of transaction parameter hits into corresponding regions of data clusterization if the value x is 1( 0 if did not hit) for an attribute i and a transaction class c.N is the account of instances in the learning data set. P (Xi = x|C = c) = VI. R ESULTS The evaluative testing of two sets of transactions have been generated. The rst set of transactions was generated for the

Authorized licensed use limited to: Pune Institute of Computer Technology. Downloaded on January 6, 2010 at 06:14 from IEEE Xplore. Restrictions apply.

Fig. 5. Results of P legal calculation for Bayesian networks and the Naive Bayesian Classier

Fig. 6. Results of P f raud calculation for Bayesian networks and the Naive Bayesian Classier

training process and the second set is for the testing process. The training data consists of 83 transactions, where 52 of them correspond to legal transactions while 31 correspond to fraudulent transactions. All transactions have been generated for one card and these transactions correspond to real using of credit card in one of the banks during three months. This set contains 21 legal purchases that have been generated using 8 different merchants through 8 different nancial organizations. From these transactions 16 have been done with the local country parameter using local currency and other have been done with foreign country parameter using foreign currency. Other part of legal transactions consists of cash withdrawals. In this part 24 transactions have been generated using 5 different cash point machines of issuer bank and 7 cash withdrawals were done through cash point machines of foreign banks. The fraudulent transactions have been generated with a specic nancial institution, country code and time. These transactions were generated on the base of real fraudulent transaction. The testing data consists of 9 transactions with attributes that have been observed in the legal transactions from the training data and 2 transactions that correspond to fraud. Three different tests were conducted using these sets. The rst test was intended to receive a comparison between the Naive Bayesian Classier based on the normal distribution, the discrete distribution, the kernel density estimation and the developed input data representation method( Fig. 3 and Fig. 4). In these pictures P legal - the probability that transaction is legal, P f raud - the probability that transaction is fraudulent. Legal transactions have numbers from 1 to 9 and fraudulent transactions correspond to 10, 11. It should be noticed that transactions 7 and 8 have one attribute( value of this attribute) that is not observed in the training set. The results of the Naive Bayes method classication using the normal distribution, the kernel density estimation and the discrete distribution are not acceptable for this type of fraud detection. The most legal probabilities for the Naive Bayesian Classier using these probability estimation methods are too low and are therefore incorrect. The reasons for this for each method are

different. For the normal distribution, these probabilities are low because the real distribution of values from the training set does not correspond to the normal distribution for each attribute. The reason for this for the discrete distribution and the kernel density estimation is that a value of probability for each attribute depends on a number of different values for this attribute which have been observed in the training set. The problem is that if a number of different values for some attribute in the training set is increased then the dispersion of the attribute values will grow upwards. Therefore the probabilities for values that hit into a center of the real distribution for corresponding attribute will be high only. But the values that have been observed in the training data may do not hit into a center of the real distribution. Consequently the probabilities for these values will be low which is not acceptable for this type of detection. Also the assumption about conditional independence of different attributes has a bad effect on the nal probability calculation, because of according this assumption, probabilities for different attributes are multiplied. Thus the Naive Bayesian Classier using the normal distribution, the discrete distribution and the kernel density estimation method is not suitable for this type of detection. But results of the classication using the Naive Bayesian Classier based on the developed input representation method and the discrete distribution for probability estimation are acceptable. In this case only two values for input attributes are possible (0 and 1) and this is why probabilities calculated using the discrete distribution are correct for this testing. This is the main reason that legal transactions have been classied as legal and fraudulent transactions have been classied as fraudulent with high probabilities. The second test was intended to receive comparison between the Naive Bayesian Classier and Bayesian Networks. For both models the input representation method was used. Before the training a network of attribute dependence was found using the Rissanens Minimal Description Length principle for the model based on Bayesian Networks. This search was based on real transactions from one of the banks. Comparative testing of

Authorized licensed use limited to: Pune Institute of Computer Technology. Downloaded on January 6, 2010 at 06:14 from IEEE Xplore. Restrictions apply.

Classier and Bayesian Networks method for this type of fraud detection. The description of the developed clustering model was considered in this paper also. This model allows banking employees to provide fast monitoring of incoming transactions. But the accuracy of classication for this model is not enough, because of the fact that the correlation between attributes is not taken into account in this model. The same problem is observed with the Naive Bayesian Classier and because of this the model is less accurate than the model based on Bayesian Networks. Results of the conducted evaluative testing prove that it is possible to use Bayesian Networks based on the input representation method and the developed clustering model in the real fraud detection system.
Fig. 7. Results of testing for the clustering model

R EFERENCES
[1] R. Brause, T. Langsdorf, and M. Hepp, Neural data mining for credit card fraud detection, in Proc. of the 11th IEEE International Conference on Tools with Articial Intelligence, Evanston, 1999, pp. 103106. [2] J. Abello, P. Pardalos, and M. Resende, Eds., Handbook of massive data sets. Norwell, MA, USA: Kluwer Academic Publishers, 2002. [3] L. Mukhanov, Using bayesian belief networks for credit card fraud detection, in Proc. of the IASTED International Conference on Articial Intelligence and Applications, Insbruck, Austria, Feb. 2008, pp. 221 225. [4] J. Friedman, T. Hastie, and R. Tibshirani, Additive logistic regression: a statistical view of boosting, The Annals of Statistics, vol. 28, no. 2, pp. 337407, 2000. [5] G. John and P. Langley, Estimating continuous distributions in bayesian classiers, in Proc. of the Eleventh Conference on Uncertainty in Articial Intelligence, San Mateo, 1995, pp. 338345. [6] J. Dougherty, R. Kohavi, and M. Sahami, Supervised and unsupervised discretization of continuous features, in International Conference on Machine Learning, San Francisco, 1995, pp. 194202. [7] D. Geiger and D. Heckerman, Learning gaussian networks, in Proc. 10th Conference on Uncertainty in Articial Intelligence, San Francisco, 1994, pp. 235243. [8] P. K. Chan and S. J. Stolfo, Toward scalable learning with non-uniform class and cost distribution: A case study in credit card fraud detection, in Proc. of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, 1998, pp. 164168. [9] W. Lam and F. Bacchus, Learning bayesian belief networks: An approach based on the mdl principle, Computational Intelligence, vol. 10, no. 4, pp. 269293, 1994. [10] J. Suzuki, Learning bayesian belief networks based on the mdl principle: An efcient algorithm using the branch and bound technique, in Proc. of the International Conference on Machine Learning, Bally, Italy, 1996, pp. 463470. [11] D. Heckerman, D. Geiger, and D. Chickering, Learning bayesian networks: The combination of knowledge and statistical data, Machine Learning, vol. 20, no. 3, pp. 197243, 1995. [12] J. Suzuki, A construction of bayesian networks from databases on an mdl principle, in Proc. of the Ninth Conrefence on Uncertainty on Articial Intelligence, Washington D.C., 1993, pp. 266273. [13] R. Bouckaert, Probabilistic network construction using the minimum description length principle, Lecture Notes in Computer Science, vol. 747, pp. 4148, 1993.

the Naive Bayesian Classier and Bayesian Networks showed that it is better to use Bayesian Networks for fraud detection ( Fig. 5 and Fig. 6). For all legal transactions ( numbers of legal transactions: 1-9) P legal was determined greater or equal to 0.65 using Bayesian Networks. The probability P legal calculated for transactions 7 and 8 using the Naive Bayesian Classier are less than 0.3.This probability is rather low for transactions that have one attribute for which a value is not obtained in the training data. The third test was intended to estimate result of classication for the clustering model. These results can be acknowledged as acceptable ( Fig. 7). It should be noticed that results for this model are comparable with results of Bayesian Networks testing because the special factor weights were selected for this testing. However it does not mean that the clustering model is as effective as the model based on Bayesian Networks for fraud detection in general. Finally, it should be mentioned that the evaluative testing was conducted to nd out if the Naive Bayes, Bayesian Networks and the clustering model are acceptable for this type of detection. But this testing was not intended to estimate classication accuracy of these methods in general. This estimation can be found using considerable amount of real transaction sets only. VII. C ONCLUSION A general description of the developed fraud detection system and comparison of base models have been presented. When comparing these models, the special evaluative testing was conducted. This evaluative testing was intended to simulate a typical use of credit cards. Obtained results show that it is impossible to use the Naive Bayesian Classier based on the discrete distribution, the normal distribution and the kernel density estimation for this type of fraud detection. The main problem for this is that the real distribution of values for each attribute does not correspond to any common distribution. This is why the input representation method was developed that allows to increase effectiveness of the Naive Bayesian

Authorized licensed use limited to: Pune Institute of Computer Technology. Downloaded on January 6, 2010 at 06:14 from IEEE Xplore. Restrictions apply.

S-ar putea să vă placă și