Modifed Bit-Apriori Algorithm For Frequent Item - Sets in Data Mining

Poster Paper Proc. of Int. Conf.
on Advances in Information Technology and Mobile Communication 2013
Modifed Bit-Apriori Algorithm for Frequent ItemSets in Data Mining

J Karthikeyan1 and Dr. Udaykumar2
1
Research Scholar, Hindustan University, Chennai, India Email: karthikeyan_world@hotmail.com 2 ACOE, Hindustan University, Chennai, India Email: aukumar71@gmail.com unimportant patterns in the item-sets mining. II. RELATED WORK A. Apriori algorithm In computer science and data mining, Apriori is a classic algorithm for learning association rules[8]. Apriori is designed to operate on databases containing transactions. Apriori is commonly used in association rule mining [3]. Apriori uses a bottom up approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data[9][10]. The algorithm terminates when no further successful extensions are found. Apriori [2] uses breadth-first [3] search and a tree structure to count[6][12[13] candidate item sets efficiently. It generates candidate item sets of length K from item sets of length k-1. Then it prunes the candidates which have an infrequent sub pattern[11]. According to the downward closure lemma, the candidate set contains all frequent k- length item sets. After that, it scans the transaction database to determine frequent item-sets among the candidates. Apriori [2], though historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all -1of its proper subsets. The pseudo code for Apriori is shown in Table I. B. Bit-Apriori Algorithm Bit-Apriori used the datastructure and techniques of Apriori [1] algorithm. The main difference between Apriori and Bit-Apriori lies in candidate item-sets generation and support count approach. These two steps consume more time and memory in the Apriori [2] algorithm. Given a set of item-sets, the algorithm attempts to find subsets which are common to at least a minimum number C of the item-sets. The time required for mining [14][15]frequent k-item-sets grows significantly when k increases in Apriori. But Bit-Apriori [1] performs much better because it has no candidate generation and needs to traverse the trie only once. The pseudocode for Bit-Apriori is shown in Table II. 54
Abstract - Mining frequent item-sets is one of the most important concepts in data mining. It is a fundamental and initial task of data mining. Apriori[3] is the most popular and frequently used algorithm for finding frequent item-sets. There are other algorithms viz, Eclat[4], FP-growth[5] which are used to find out frequent item-sets. In order to improve the time efficiency of Apriori algorithms, Jiemin Zheng introduced Bit-Apriori[1] algorithm with the following corrections with respect to Apriori[3] algorithm. 1) Support count is implemented by performing bitwise And operation on binary strings 2) Special equal-support pruning In this paper, to improve the time efficiency of Bit-Apriori[1] algorithm, a novel algorithm that deletes infrequent items during trie2 and subsequent tires are proposed and demonstrated with an example. Index Terms - Data mining; frequent item-sets; Apriori; BitApriori, trie2.
I. INTRODUCTION In recent years the size of database has increased rapidly. This has led to a growing interest in the development of tools capable of automatic extraction of knowledge from data. The term data mining or knowledge discovery in database has been adopted for a field of research dealing with the automatic discovery of implicit information or knowledge within the databases. The implicit information within databases, mainly the interesting association relationships[5] among sets of objects that lead to association rules may disclose useful patterns for decision support, financial forecast, marketing policies, even medical diagnosis and many other applications[7]. In frequent patterns, the challenge is large number of result patterns. As the minimum threshold becomes lower, an exponentially large number of item-sets are generated. Therefore, pruning[1] unimportant patterns can be done effectively in mining process and that becomes one of the main topics in frequent pattern mining. Hence, the main aim is to optimize the process of finding frequent patterns which should be efficient, scalable and can detect the important patterns that can be used in various ways of extraction of knowledge from data. Therefore, the study of frequent item-sets mining is well acknowledged in frequent pattern mining because of its broad applications on association rules and for other data mining tasks. An attempt is made in the present work to prune 2013 ACEEE DOI: 03.LSCS.2013.2.66
Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013
TABLE I. THE PSUEDOCODE FOR FINDING FREQUENT ITEM-SETS USING APRIORI ALGORITHM
there exist a node with child then we go for traversal else ignore the node by considering as infrequent. Such nodes will not be considered for the further iterations in the proposed algorithm. This will reduce the time complexity when the occurance of the infrequent items are increased in the given dataset. The pseudo code for the proposed algorithm is shown in Table III.
TABLE III. THE PSUEDOCODE FOR THE PROPOSED ALGORITHM
TABLE II. T HE PSUEDOCODE FOR BIT-APRIORI
To demonstrate the process of proposed algorithm, an example is given below. As shown in table , the example database is in the second column. In the database, there are ten transactions.
TABLE IV. T HE EXAMPLE DATABASE
TID 1 2 3 4 5 6 7 8 9 10 Items ABDEFL AGO CEI ACDEG ABCEGK EH ABCEFJ ACD ACEGM ACEGN Ordered frequent items AE GA CE GCAE GCAE E CAE CA GCAE GCAE
III. PROBLEM STATEMENT To find out frequent item-sets, both Apriori[3] and BitApriori[1] algorithms are used to search elements in the entire item-sets starting from 1 to N. When the total support count for an item is zero or lesser than the support count, then the elements are not required for the consecutive iterations. While forming tires Apriori and Bit-Apriori algorithms are considering these elements. Hence there is a scope for improvement by eliminating such items during tires formation. A new algorithm is proposed to improve the performance, resource utilization, time and efficiency. IV. PROPOSED ALGORITHM A new algorithm has been developed which deletes the infrequent items during the trie2 and subsequent iterations. The removal of infrequent items results with improvement in computation time. Apriori and Bit-Apriori algorithms do not removes the infrequent items during the tire2 and subsequent iterations. In the graph, the proposed algorithm checks if 2013 ACEEE DOI: 03.LSCS.2013.2. 66 55
Suppose the support threshold min_sup is 40%. The support of each item is counted, and infrequent items are deleted, during the first scan of the database. The support of each item is given as follows. A:8, B:3, C:7, D:3, E:8, F:2, G:5, H:1, I:1, J:1, K:1, L:1, M:1, N:1, O:1 Since the minimum support is 4, frequent items are sorted into a non-decreasing list, according to their respective supports. And if two items have the same support, they will be sorted according to their lexicographic order. In Step 2 of Bit-Apriori, all frequent 2-item-sets are found as shown in Table V. The trie with the binary string shown in each leaf is established, which is shown in Fig. 1.
TABLE V. FREQUENT 2-I TEM-SETS
TID 1 2 3 4 5 6 7 8 9 10 Ordered Items AE GA CE GCAE GCAE E CAE CA GCAE GCAE {G, C} 0 0 0 1 1 0 0 0 1 1 {G, A} 0 1 0 1 1 0 0 0 1 1 {G, E} 0 0 0 1 1 0 0 0 1 1 {C, A} 0 0 0 1 1 0 1 1 1 1 {C, E} 0 0 1 1 1 0 1 0 1 1 {A, E} 1 0 0 1 1 0 1 0 1 1
algorithms. Interesting finding is that, when the occurrence of the non-frequent item-sets are higher then the execution time gets reduced drastically. The experimental result shows that the proposed algorithm not only decreases the computation time but also decreases the resources used and the execution time is represented in Table VI.
During the consequent iterations, element E can be ignored by considering it as non-frequent item set. The computation time can be considerably reduced when the occurrence of element like E are more in the frequent items. By completing all iterations the final output of the binary string is shown in Fig. 2.
Fig. 3. Execution Time Of Algorithms TABLE VI. C OMPARISON OF EXECUTION T IME BETWEEN APRIORI/B IT-APRIORI/ MODIFIED BIT-APRIORI
(Execution Time in Seconds) Dataset Apriori pusmsb 4.5 Bit-Apriori 1.32 Modified Bit-Apriori 0.98
VII. CONCLUSIONS
Fig. 1. Trie After Generation(2)
In this paper, the modified Bit-Apriori technique improves the performance of Bit-Apriori, by eliminating the search of infrequent item-sets. It also improves the computational efficiency significantly. Experimental results have shown that modified Bit-Apriori algorithm out performs the fast BitApriori, especially when the occurrence of the non-frequent item-sets are more. When the database is large, the Bit-Apriori may suffer from the problem of memory scarcity due to large number of bitwise operations. Future work can be done in the direction of replacing bitwise operations. REFERENCES
[1] Jiemin Zheng., 1, Defu Zhang 1, Stephen C.H.Leung 2,Xiyue Zhou, An efficient algorithm for frequent itemsets in data mining Service Systems and Service Management(ICSSSM), 2010 7th International Conference on: 28-30 June 2010. [2] Agrwal R., R.Srikant, Fast algorithms for mining association rules, The International Conference on Very Large Dabases, pp. 487-499, 1994. [3] Zaki M.J., S. Parthasarathy, M.Ogihara, W.Li, New algorithms for fast discovery of association rules, in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283-296,1997. [4] Han J., J. Pei, Y. Yin, Mining frequent patterns without candidate generation in Proceedings of the 2000 ACM SIGMOD international conference on Management of data,
Fig. 2. Trie After Completion
V. EXPERIMENTAL RESULTS The proposed algorithm is tested on different data sets and the experimental results are shown in Fig. 3. The proposed algorithm consumes considerably a lesser amount of time compared to Bit-Apriori and Apriori 2013 ACEEE DOI: 03.LSCS.2013.2.66 56
ACM Press, pp. 1-12,2000. [5] Pork J.S., M.S. Chen, P.S. Yu, An effective hash based algorithm for mining association rules ACM SIGMOD, pp. 175-186, 1995. [6] Brin S., R. Motwani, J.D. Ullman, S. Tsur,Dynamic itemset counting and implicationrulesformarket basket data,in Proceedings of the ACMSIGMOD International Conference on Management of Data, pp. 255264, 1997. [7] Brin S., R. Motwani, C. Silverstein, Beyond market baskets: generalizing association rules to correlations, in Proceedings of the ACM SIGMOD International Conference on Management of Data, Tuscon, Arizona, pp. 265-276, 1997. [8] Toivonen H., Sampling large databases for association rules, in Proceedings of 22nd VLDB Conference, Mumbai, India, pp. 134-145, 1996. [9] Savasere A., E. Omiecinski, S.B. Navathe, An efficient algorithm for mining association rules in large databases, in Proceedings of 21th International Conference on Very Large Data Bases (VLDB95), Zurich, pp. 432-444, 1995. [10] Tsay Y.J., J.Y. Chiang, CBAR: an efficient method formining association rules, Knowledge Based Systems, 18 (2-3), pp. 99-105, 2005. [11] Liu G., H. Lu, W. Lou, Y. Xu, J.X. Yu, Efficient mining of frequent patterns using ascending frequency Ordered prefixtree, Data Mining Knowledge Discovery, 9 (3), pp. 249-274, 2004. [12] Grahne G., J. Zhu, Fast algorithms for frequent itemset mining using FP-Trees, IEEE Transaction on Knowledge and Data Engineering, 17 (10), pp.1347-1362, 2005. [13] Zaki M.J., Scalable algorithms for association mining IEEE Transactions on Knowledge and Data Engineering, 12 (3), pp. 372-390, 2000. [14] Zaki M.J., K. Gouda, Fast Vertical Mining Using Diffsets, in Proceedings of the ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, pp. 326-335, 2003. [15] Dong J., M. Han, BitTableFI: an efficient mining frequent itemsets algorithm Knowledge Based Systems, 20 (4), pp. 329-335, 2007.
2013 ACEEE DOI: 03.LSCS.2013.2. 66
57

Modifed Bit-Apriori Algorithm For Frequent Item - Sets in Data Mining

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Modifed Bit-Apriori Algorithm For Frequent Item - Sets in Data Mining

Încărcat de

Drepturi de autor:

Formate disponibile

Poster Paper Proc. of Int. Conf.

on Advances in Information Technology and Mobile Communication 2013

Modifed Bit-Apriori Algorithm for Frequent ItemSets in Data Mining

TABLE II. T HE PSUEDOCODE FOR BIT-APRIORI

Fig. 2. Trie After Completion

2013 ACEEE DOI: 03.LSCS.2013.2. 66

S-ar putea să vă placă și