00 voturi pozitive00 voturi negative

10 vizualizări26 paginiAug 11, 2011

© Attribution Non-Commercial (BY-NC)

PPT, PDF, TXT sau citiți online pe Scribd

Attribution Non-Commercial (BY-NC)

10 vizualizări

00 voturi pozitive00 voturi negative

Attribution Non-Commercial (BY-NC)

Sunteți pe pagina 1din 26

20093173016

Table of Contents

Introduction. Literature

Review. algorithm.

Problem definition. (window initialization phase, window sliding phase, mining frequent itemsets phase. )

MRFI-SW

Introduction

A data stream is a massive sequence of data elements

continuously generating at a rapid rate. Different from the traditional static datasets, data streams are continuous, unbounded and have a data distribution that changes with time.

Many applications generate large amount of data streams in real

time, such as sensor data generated from sensors networks, online transaction flows in retail chains, Web record and clickstreams in Web applications, etc.

Data streams can be classified into offline data streams [1] and

Cont..

[1] The target applications domains of offline data

stream are a bulk addition of new transactions, such as a data warehouse system.

[2] Online data streams are characterized by real-

time updated data. The streaming data of online data stream come one by one in time, such as a continuously generated transaction as in a network monitoring system.

Literature Review

Researchers have proposed many algorithms of mining frequent item

The researches of mining frequent itemsets in data streams can be

landmark window model. the time-fading model. the sliding window model.

Sampling and Lossy Counting . This algorithm can mine frequent items over offline data stream under landmark window model.

Cont..

SWFI-stream is an algorithm for mining frequent item sets in online data

streams under transaction-sensitive sliding window model proposed an incremental mining algorithm to mine frequent item sets in offline data streams with a time-sensitive sliding window.

MRFI-SW is Mining Recent Frequent Item sets over online data stream

Problem definition

Let

={i1,i2,,im} be a set of literals, called items. A transaction T={id, x1x2..xn}. A transaction data stream DS={T1, T2,TN} is a continuous sequence of transactions . A data stream can be also denoted as DS={W1, W2,Wm}, where each basic window is a transaction-sensitive sliding window. w is the size of the transaction-sensitive sliding window. s is a user-defined minimum support threshold in the rang of [0,1]. The support of a transaction X over SW is the number of transactions in SW containing X as a subset. If the support of X is higher than s*w, X is called a frequent item set (FI).

MRFI-SW algorithm

The proposed MRFI-SW algorithm consists of three

phases :

window

initialization phase. window sliding phase. and mining frequent itemsets phase.

The window initialization phase is activated by the first

transaction arriving. The phase lasts until the transactionsensitive sliding window is full. When the sliding window is full, the w items are transformed into bit-order representations. Each entry is the form of (bit, order), denoted as R(x). If item X is in the i-th transaction in current sliding window, the ith entry of R(X)_bit is set to be 1 and the order of items in a transaction can get from R(X)_order, otherwise the R(X) is set to be 0 (R(X)_bit=R(X)_order=0).

Cont..

T1, T2, and T3. The bit-order representations of items in SW1 are shown in Table 1.

Cont..

The window sliding phase is activated when the sliding window

becomes full. In this phase, a new arriving transaction is inserted into the sliding window, and the oldest transaction in current sliding window is removed. Because the bit-order sequence representation is a structure of sequence, we use left-shift operation on the sequence. To improve the memory usage, a pruning entry operation is executed after the window sliding. a pruning entry operation is executed after the window sliding. The operation is pruning the entry of item when its bit-order sequence is 0. If item X dose not appear in any transaction over current sliding window, where sup(X)SW=0, the entry R(X) is pruned.

Cont..

For instance, in Table 1, when the forth transaction T4 arrives, the first

transaction T1 must be removed from the current SW. The bit-order sequence entries of items in SW1 are executed left-shift. R(a) is modified from <(1, 1), 0, (1, 1)> to <0, (1, 1), 0>

Similarly R(c)=<(1, 2), (1, 3), 0> R(d)=<0, 0, 0> R(b)=<(1, 1), (1, 2), (1, 1)> R(e)=<(1, 3), (1, 4), (1, 2)>

1Initialize sliding window and bit-order sequence; 2While each new coming transaction Ti in SW do 3 If (SW is full) 4 Transform all of items in SW to bit-order sequence; 5 Else 6 Do left_shift operation on bit-order sequence of all items 7 For each item X arrives in SW 8 Transform X to bit sequence representation 9 End for 10 End if 11For each R(X) in SW 12 If SUM( R(X).bit)=0 13 Drop X from SW 14 End if 15End for

The mining frequent itemsets phase is activated when the bit-order sequences

are updated and the frequent itemsets are requested. We proposed a method to generate k-frequent items (itemsets with k items) from the known k-1-frequent items. The method works basing on Apriori property (If a pattern is frequent, all of its sub-patterns will also be frequent). We use SUM operation on the bit of each entry to compute the support of items, and find the frequent 1-itemsets in current SW . Then the proposed algorithm uses AND operation on the bit of each entry to find 2-itemsets. The support of 2-itemsets is computed, the itemsets whose supports are less than the user defined threshold are pruned. The process is terminated until no new k+1-itemsets are generated.

Cont..

For instance, consider the DS in Table 1. Let the minimum support

threshold s be 0.6. Hence, an item set X is frequent if sup(X)0.6*3=1.8. We discuss the step of mining frequent item sets in SW2. First, MRFISW algorithm finds out frequent 1-itemsets, through computing the support of items where R(a)=<0, (1, 1), 0>, i.e., sup(a)=1 R(c)=<(1, 2), (1, 3), 0>, i.e., sup(c)=2 R(b)=<(1, 1), (1, 2), (1, 1)>, i.e., sup(b)=3 R(e)=<(1, 3), (1, 4), (1, 2)>, i.e., sup(e)=3

Cont..

Cont..

itemsets.

1Find frequent 1-itemsets FI1 2For (k=2; FIk-1null; k++) 3 Do AND operation on R(FIk-1).bit to find Candidate FIk 4For each FI do 5 Do bitwise SUM operation on R( Candidate FIk) 6 If SUM(R( Candidate FIk).bit ) s*w 7 If k=2 8 Scan R(Candidate FIk).order 9 Output FIk 10 End if 11 End if 12End for

Experiment

Our algorithm was written in C and compiled using Microsoft

Visual C++ 6.0. We generate online data streams using IBM synthetic data generator.

sliding window

Conclusion

Mining online data stream is an interesting and challenging research

field. The characteristics of data stream make many traditional mining algorithms unable to be applied. In this paper proposed an efficient algorithm of three phases for

mining recent frequent item sets over online data stream with transaction-sensitive sliding window. Experiment shows that using the proposed algorithm not only attains highly accurate mining result, but also runs significant faster and consume less memory than SWFI-algorithm for mining recent frequent item sets over online data streams.

Questions??

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.