Sunteți pe pagina 1din 5

The 6th International Conference on Computer Science & Education (ICCSE 2011) August 3-5, 2011.

SuperStar Virgo, Singapore

ThC 3.38

Health Care Fraud Detection Using Nonnegative Matrix Factorization


Shunzhi Zhu Department of Computer Science & Technology Xiamen University of Technology Xiamen, China szzhu@xmut. edu. en Abstract In a practical health care dataset, there are many patients with different prescriptions. A methodology for automatically identifying and clustering patients with similar symptoms is needed for health care management department to judge whether there are frauds in a large-scale clinic dataset. In this paper, we encode the clinic data with a low rank nonnegative matrix factorization algorithm to retain natural data nonnegativity, thereby eliminating the need to use subtractive basis vector and encoding calculations presented in other techniques such as principal component analysis for similar feature abstraction. Result evaluations of the proposed method are conducted on a practical dataset supplied by Health Insurance Management Center of Xiamen. In our experiments, we have shown that this method is useful for health care fraud detection. Index Terms nonnegative matrix factorization, fraud detection, patient mining
I. INTRODUCTION

Y a n Wang, Y u n W u Department of Computer Science & Technology Xiamen University of Technology Xiamen, China ywu@xmut.edu.cn medical treatment item shifts from one cluster in this month to another cluster in next month, then this algorithm could classify the patient using this medical treatment item as a fraudsuspicious patient. In the end, all these fraud-suspicious patients are submitted to medical experts for detailed careful detection. To cluster medical treatment items, a patient is represented as an M-dimension vector, where M is the number of medical treatment items listed in all prescriptions in one month, which is described in lemma 1. Lemma 1: p = (di, d2, ..., dM) In lemma 1, p means a patient, dt means money paid for the i-th medical treatment item listed in all prescriptions of all patients. The minimum value of dt is 0. Here only the prescriptions in May 2010 are considered. Secondly, all patients received medical therapies in one month are collected and all the values are structured into one matrix, called Mat. In Mat, each column means one patient and each row means one medical treatment item. The definition of Mat is described as follows: Lemma 2: Mat = (pi', p 2 \ p 3 \ ..., p N ') In Lemma2, Mat means a matrix of N*M dimensions, where N means the number of patients received medical therapies in one month, and the meaning of M is similar to M in Lemma 1. Then a traditional clustering algorithm could be used in Mat to cluster medical treatment items into several groups. In the end, the shift track of each patient could be detected according to the clustering results in different months. The patient, who induces the medical treatment items shift frequently from one cluster to another cluster, could be considered as a fraudsuspicious patient. In this paper, we propose a Non-negative Matrix Factorization (NMF) method for fraud detection, which introduces a technique for clustering semantic features in a prescription collection and groups medical treatment items into clusters on the basis of shared semantic features. The factorization can be used to compute a low rank approximation of a large sparse matrix along with preservation of natural data non-negativity. As we have seen that this matrix, called Mat, is structured in traditional vector space model. A collection of patients can, thus, be represented as a "medical treatment items"-by-patient

In the past several years, frauds in health care have increased rapidly, not only in developed countries, such as USA, but in developing countries, such as China. According to a report from IBM, in USA, more than 10 percent of money in health insurance corporations is swindled or abused. In China, situations are the same. In December of 2010, a punishment measure is designed by Beijing city government to alleviate frauds in health insurance. Though situations are severe, in China, there are few cities using information management systems for detecting frauds automatically in mass data sets. This problem partly arises from the complexity of fraud detection, while only medical experts could judge a prescription is a fraud or not. And these frauds are deeply hidden in the clinic data ocean, and it is not economical to hire experts to verify each prescription manually. Then the help of computers is introduced into fraud detection in health care fields, so this task is considered as half-automatic, the medical experts are only required to check the prescriptions considered as fraud-suspicions with automatic algorithms. Here we introduce a data mining algorithm for clustering medical treatment items, such as medicines or medical measurements, into several groups according to usage of different patients. Then each group is considered as a kind of medical treatment items for curing similar symptoms. If a

* This work is partially supported by National Natural Science Foundation under Grant No. 61070151.

978-1-4244-9718-8/11/$26.00 2011 IEEE

499

ThC 3.38

matrix. Since each vector component is given a positive value (or weight) if the corresponding medical treatment item is used by the patient and a zero value otherwise, the resulting matrix is always non-negative. This inherent data non-negativity is preserved by the NMF method as a result of constraints (placed on the factorization) that produce non-negative lower rank factors that can be interpreted as semantic features or patterns in the patient collection. The vectors or patients in the original matrix can be reconstructed by combining these semantic features, and medical treatment items that have common features can be viewed as a cluster. As shown in [1], NMF outperforms traditional vector space approaches, such as latent semantic indexing, for document clustering on a few topic detection benchmark collections. In summary, this paper makes the following contributions To the best of our knowledge, it is the first attempt to use NMF method in health care data sets. A bigger contribution, as we deem, lies in that a set of systematic experiments on real-world data sets are performed to demonstrate the desired usefulness of NMF. And in our experiments, we find actually the clustering results could help experts detect frauds efficiently and correctly in health care data sets.

Here, W is an N xM matrix which contains basis vectors wx as its columns. Note that each measurement vector is written in terms of the same basis vectors. The basis vectors wx can be thought of as the 'building blocks' of the data, and the Mdimensional coefficient vector ht describes the correlations between building blocks and measurement vectors. Arranging the measurement vectors v t into the columns of an NxT matrix V, the relationship between V, W and H could be written as V ^WH. (2)

Here each column of H contains the coefficient vector ht, which corresponds to the measurement vector vt. It is apparent that a linear data representation is simply a factorization of a matrix. In contrast with PCA and ICA, which do not restrict the signs of the entries of W and H, NMF requires all entries of both matrices to be non-negative. This means that the data should only be described using additive components. This constraint has been motivated according to the following reasons. First, in many applications where the quantities involved cannot be negative, it is difficult to interpret the results of PCA and ICA, which methods will bring negative values in the result matrix. Second, non-negativity bases on the intuition that parts are generally combined additively (and not subtracted) to form a object in whole; hence, these constraints might be useful for detecting parts-based representations. Given a data matrix V, the optimal choice of matrices W and H are defined to be those non-negative matrices that minimize the reconstruction error between V and WH. Various error functions have been proposed, perhaps the most widely used is the euclidean distance function

The rest of this paper is organized as follows. Section 2 lays out the NMF algorithms with details. Section 3 is devoted to describing the details of our practical data set. We report the experimental evaluation in Section 4, and discuss related work in Section 5. Finally, Section 6 concludes this study with brief discussion on future work.
II. NMF ALGORITHM

Non-negative matrix factorization (NMF) is a commonly used approach to understand the latent structure of the observed matrix for various applications, such as text mining, digital watermarking, face detection, image restoration. There are many forms of matrix factorization, including Singular Value Decomposition (SVD) and NMF. In this paper, we choose NMF for clustering medical treatment items because it respects the non-negativity that is inherent in all the health care data sets. Previous works have shown that by respecting the nonnegativity, the factorization results will be easier to interpret while being comparable to, or better than, other techniques like PC A on effectiveness. In essence, most data analysis methods, such as principal component analysis, independent component analysis, vector quantization, and non-negative matrix factorization, could be seen as different kinds of matrix factorization methods. Their differences lie in the choices of different objective function and/or constraints. Assuming that a data set consists of T measurements of N non-negative scalar variables, denoting the N-dimensional measurement vectors as v t (t = 1, . . . ,T), a linear approximation of the data is given by

E(W,H)=\\V-WH\\2=YJ(V,J-(WH)lJf.

(3)

Although the minimization problem is complicated in NMF, much of the appeal of NMF comes from its empirical success in learning meaningful features from a diverse collection of real-life data sets. In [2], an experiment is executed in a data set consisted of a collection of face images, the results showed that the representation consisted of basis vectors could correspond to the intuitive features of face images, such as mouth, nose, eyes, etc. In the same way, meaningful topics can be learned through text mining, where documents are represented as a set of terms. Subsequently, NMF has been successfully applied to a variety of data sets. However, by using NMF, there also exist data sets could not be decomposed into parts, which correspond to 'building blocks' of the data. All of these data sets have a same feature that the sparseness of the matrix, to be divided, is too high. So, explicitly controlling the sparseness of the representation leads to dividing the original matrix into parts, which match the intuitive features of the data. Then, our method is to use NMF in "medical treatment item" clustering. With the standard vector space model, a set of patients P can be expressed as an N x M matrix called Mat, where N is the number of patients in P and M is the number of

V ~YaWtt =Wht

C1)

500

ThC 3.38

medical treatment items in health care dictionary. Each row Matj of Mat is an encoding of a patient in P and each entry maty of vector Matj is the significance of medical treatment item i in Mat,, where i ranges across the medical treatment items in the dictionary. According to the traditional NMF algorithm, our problem is to find a low rank approximation of Mat in terms of some metrics by factoring Mat into the product (WH) of two reduced-dimensional matrices W and H. Each column of W is a basis vector, it contains an encoding of a type of medical treatment items from Mat and each column of H contains an encoding of the linear combination of the basis vectors that approximates the corresponding column of Mat. Dimensions of W and H are mxk and k xn respectively, where k is the reduced rank or selected number of types. Usually k is chosen to be much smaller than n, but more accurately, k min(m, n). Finding the appropriate value of k depends on the application and is also influenced by the nature of the collection itself. From another point of view [3], NMF is equivalent to relaxed K-means clustering. So, k means the number of final clusters appointed by users in advance. Original NMF approaches obtain an approximation of V by computing a (W, H) pair to minimize the Frobenius norm of the difference Mat-WH. Our solution also uses the same method. Then, the minimization problem can be stated as minW3||Mat-WH||2F, with Wy > 0 and Hy > 0 for each i and j . The matrices W and H are not unique. Usually H is initialized to zero and W to a randomly generated matrix where each Wy > 0 and these initial estimates are improved or updated with iterations of the NMF algorithm.
III. EXPERIMENT SETUP

(4)

Our experiments are implemented with MATLAB code package used in [4]. For the sake that the whole data set supplied by Health Insurance Management Center of Xiamen is too large, which corresponds to a matrix with millions of rows and millions of columns. This matrix is a sparse matrix. To improve the accuracy and efficiency of standard NMF algorithm, only the most characteristic rows and columns are chosen. In the following experiments, only the top 256 patients who spend most money in the health care data set and top 256 medicines which costs most money are chosen to form a 256*256 matrix. In the conventional methods, all the values of the original matrix, called Mat, should be converted to make their values between 0 and 1. However, according to our experiments, converting or not will not influence the accuracy of final results. So, this process is omitted in our experiments.

Figure 1. Features learned from the health care data set using NMF.

In Figure 1, each picture represents features learned from the given health care data set, which is composed of cells with 16*16 points. These results are gained with 1000 iterations of NMF algorithm. In our experiments, for the reason that the extracted dataset is very small, 1000 iterations are enough for objective function to converge, which could be observed from Figure 2. From Figure 1, we could find that the features learned from the data set are obvious than others, i.e. each cell has one point with deepest color, when choosing k=30. Because each cell corresponds to a cluster and each point corresponds to a medical treatment item, so the medical treatment item corresponds to the point with deepest color could be considered as the representative medical treatment item of that cluster. Also, another metric, called the ratio of average distance among a cluster to average distance between clusters, is used to

501

ThC 3.38

compare the effect of different K. And we also find that when choosing K=30, the best clustering effect is gained.

phenomenon. Unfortunately, these two medicines always appear in one cluster, when K=9, 12, 16, 20, 30, 40. So, we consider that the patients who buy these medicines in one month are fraud-suspicious. The result is represented in Figure 3. From this figure, we could observe that only 21 percent of patients buy both medicines in one month, and 34 percent of patients buy only one medicine, 45 percent of patients buy none of them. This observation enhances our judgements. So we report this observation to the medical experts for detailed detection. In the end, according to the feedback of medical experts, there indeed exist some fraud patients among these fraud-suspicious patients.
V. RELATED WORK

A.

Development of NMF algorithm

Figure 2. Converging tendence of iteration times, when K=30. In this figure, X-ary axis means iteration times and Y-ary axis means the value of objective function, i.e. ||Mat-WH||2F.

Figure 2 represents the converging tendency of iteration times when K=30. From this figure, a phenomenon is observed that when the iteration times is bigger than 200, the value of objective function changes little. We also do many converging tendency experiments for different K, and the tendency line is similar. This experiment shows that the value of objective function declines sharply when the iteration times is small. That is to say, the converging effect is very evident. Then to save time, 200 iterations are enough for future experiments.

Using of constraints to only produce non-negative basis vectors, NMF differs from other rank reduction methods for vector space models, such as principal component analysis (PCA) or vector quantization (VQ), which make possible the concept of a parts-based representation [5]. In [2], the notion of parts-based representations was firstly introduced for problems in image analysis that occupy non-negative subspaces in a vector-space model. PCA and VQ also generate basis vectors, while using various additive and subtractive combinations to reconstruct the original space. But the basis vectors for PCA and VQ contain negative entries, which cannot be directly related to the original vector space to derive meaningful interpretations. In using NMF, the entries in the basis vectors are no negative, which means only additive combinations of the vectors are used to reproduce the original matrix. So an image, a document or a set of patients could be perceived in whole, according to a combination of its parts represented by these basis vectors. In the field of health care fraud detection, the vectors represent medical features, i.e., a set of medical treatment items denoting a particular illness. Thus, NMF could be used to organize patient collections into several structures or clusters derived from the nonnegative factors. In [1], it has been verified that NMF is better than traditional methods, such as singular value decomposition which are widely used in clustering text documents. Two data sets are used for experiments: the Reuters data corpus 1 and TDT2 corpus 2, both are considered benchmark collections for topic detection. Also, in actual applications, there are many kinds of data distribution, which requires adapting NMF for these situations. The drawbacks of standard NMF are noted, and some researchers propose extensions and modifications of the original model. In [6], it is noted that only global features are found with NMF, so an extension named Local Nonnegative Matrix Factorization (LNMF) is proposed to capture local features. However, it does not solve the problem what kind of local features should be filtered. Moreover, how to explicitly control the sparseness of the representation is still a problem. In [7], an adjustable sparseness parameter is included in NMF algorithm. In [4], this idea is extended to allow explicit control of the statistical properties of the representation, i.e., to produce the desired level of sparseness, trial-and-error is not needed for finding the parameter setting. In [8], an extension, named Sparse Non-negative Matrix Factorization (SNMF), is also suggested to deal with sparseness of representations,

Figure 3. An example of "Yunnan baiyao powder" and "Xinhuang pian'"s usage for different patients, and these medicines are always appeared in one cluster.

While determining K, all clusters are fixed. According to the knowledge of medical experts, in one month, a patient will not use two medicines with same effect. So the medicines appeared in one cluster are complementary in medical effectiveness. We check all the medicines in one cluster, and find that two medicines with similar medical effectiveness, named "Yunnan baiyao powder" and "Xinhuang pian", are appeared in one cluster. To avoid error, we change the number of K to do a great deal of experiments for checking this

502

ThC 3.38

which controlled sparseness implicitly. The drawback of SNMF lies in that oriented features from natural image data are not yielded. As we know, just like the similar matrix factorization methods, the efficiency of NMF algorithm is not good. To transplant the NMF algorithm to large-scale data sets, the improvement of NMF algorithm's efficiency is needed, distributed NMF is by no means the only data analysis algorithm for huge amount data in Web era. In [9], an attempt is presented to implement NMF on MapReduce framework. The generality of MapReduce computation model leads itself easily to implement and effectively to produce the results. Also there are many researchers working for scaling up factorization from algorithmic aspects (e.g., [10, 11]). These algorithms can effectively factorize tens of thousands by tens of matrices with millions of nonzero values. While these algorithms are not comparable to NMF on MapReduce Framework in terms of the data scales, these algorithmic design could be exploited to further boost the scalability on distributed clusters. B. Researchs on Health Care Fraud Detection NMF algorithm is widely used in some fields, such as document clustering, image restoration or face detection. However, to the best of our knowledge, there is no attempt to transplant NMF to health care fraud detection. In foregone years, systems for processing electronic claims have been increasingly implemented to automatically perform audits and reviews of claims data. The major advantages of these systems include: adaptively learning of fraud patterns from data; specification of "fraud likelihood" for each case; and identification of unknown types of fraud. Many statistical methods have been applied to health care fraud detection, including neural networks, decision trees, association rules, Bayesian networks. Unsupervised methods are also presented for health care fraud detection. In [12], an expert system is designed for detecting service providers' fraud, which computes the information gain for a provider according to probability density functions of the features for the provider and the aggregate peer group. By using the computed information gain, system could measure how different the distribution of the provider is from that of all the peers. In [13], an unsupervised method, called SmartSifter, is proposed, which uses a probabilistic model to represent the underlying datagenerating mechanism. In the probabilistic model, a histogram is used to represent the probability distribution of categorical variables to learn the probability distributions of the categorical and continuous variables, respectively. While the existing research has provided various effective tools, there still remain many research challenges worthy of further study for developing scalable, accurate and fast fraud detection methods. Our attempt is just one of them. VI. CONCLUSION AND FUTURE WORK

patients' monthly cost on them. Thirdly, an experiment is conducted to show that NMF algorithm is useful for mining some fraud-suspicious patients. For lacking of detailed experiments, our discoveries are still very superficial. And in future, there are many works need to be done. Firstly, a comparative experiment is needed for showing that effectiveness of NMF is better than other traditional cluster methods, such as SVD. Then, to use NMF in the whole data set, a matrix with millions rows and columns, a distributed NMF method is needed. In the end, we also need to design an evolutionary NMF algorithm for better detecting patient's fraud in accordance with his/her evolutionary history medication information.
ACKNOWLEDGMENT

We are grateful to the Health Insurance Management Center of Xiamen for supplying the historic medication information.
REFERENCES

[I]

[2] [3]

[4] [5]

[6]

[7]

[8]

[9]

[10]

[II]

[12]

[13]

In this paper, firstly, we propose a method to represent a patient as a set of medical treatment items, and all the patients are represented as a matrix. Secondly, NMF algorithm is utilized in clustering medical treatment items according to

[14]

W. Xu, X. Liu, and Y. Gong. Document-Clustering based on NonNegative Matrix Factorization. In Proc. of SIGIR'03, July 28-August 1, Toronto, CA, 2003, pp. 267-273. D. Lee, H. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, vol 401, pp. 788-791, October 1999 T. Li, C. Ding. The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering. In Proc. of the ICDM'06, Leipzig, Germany, 2006, pp. 362-371. P. Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research. Vol 5, pp. 14571469,2004. D. Lee and H. Seung. Algorithms for Non-Negative Matrix Factorization. Advances in Neural Information Processing Systems, vol 13, pp. 556562,2001 S. Li, X. Hou, H. Zhang, Q. Cheng. Learning spatially localized partsbased representations. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. I, Hawaii, USA, 2001, pp. 207-212. P. Hoyer. Non-negative sparse coding. In Neural Networks for Signal Processing XII (Proc. IEEE Workshop on Neural Networks for Signal Processing), Martigny, Switzerland, 2002, pp. 557-565. W. Liu, N. Zheng, X. Lu. Non-negative matrix factorization for visual coding. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP'2003), 2003, pp.293~296. C. Liu, H. Yang, J. Fan, L. He, Y. Wang. Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce. In proc. of the WWW 2010, Raleigh, North Carolina, USA, 2010, pp. 681-690. D. Kim, S. Sra, and I. S. Dhillon. Fast newton-type methods for the least squares nonnegative matrix approximation problem. In SDM, 2007, pp. 343-354. K. Yu, S. Zhu, J. Lafferty, and Y. Gong. Fast nonparametric matrix factorization for large-scale collaborative filtering. In SIGIR'09, 2009, pp. 211-218. J. Major, D. Riedinger. EFD: A hybrid knowledge/statistical-based system for the detection of fraud. The Journal of Risk and Insurance, vol 69, No 3, pp. 309-324, 2002. Yamanishi K, Takeuchi J, Williams G, Milne P. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, vol 8, pp.275-300, 2004. J. Li, K. Huang, J. Jin, J. Shi. A survey on statistical methods for health care fraud detection. Health Care Manag Sci. vol 11, No3, pp.275-287, Sep 2008.

503

S-ar putea să vă placă și