Sunteți pe pagina 1din 9

Mobile Sequential Pattern Mining in Location-Based Service Environment

Author 1: Ms. V. C. Belokar Author 2: Mr. P.S. Kulkarni


Lecturer, Information Technology Dept. Lecturer, Information Technology Dept.
K. K. Wagh Polytechnic, K. K. Wagh Polytechnic,
Nashik ,India Nashik ,India
varsha30belokar@gmail.com pankaj_kulkarni1985@rediffmail.com

INTRODUCTION
The advancement of wireless communication
techniques and the popularity of mobile devices such
as mobile phones, PDA, and GPS-enabled cellular
phones, have contributed to a new business model. In
this chapter various basic terms involved in mining
mobile transactions are explained briefly.
Keywords Data mining, transportation, mining
methods and algorithms, mobile environments.

1.1 Mobile Communication:
Mobile users can request services through their
mobile devices via Information Service and
Application Provider (ISAP) from anywhere at any
time. This business model is known as Mobile
Commerce (MC) that provides Location-Based
Services (LBS) through mobile phones. The
communication coverage of each base station in
mobile network is called a cell as a location area. The
average distance between two base stations is
hundreds of meters and the number of base stations is
usually more than 10,000 in a city. When users move
within the mobile network, their locations and service
requests are stored in a centralized mobile transaction
database.


Fig. 1 shows an MC scenario, where a user moves in
the mobile network and requests services in the
corresponding cell through the mobile devices.
Fig.1 An example for a mobile transaction sequence.
(a) Moving sequences.
(b) Service sequences.
Fig. 1a shows a moving sequence of a user, where
cells are underlined if services are requested there.
Fig. 1b shows the record of service transactions,
where the service S1 was requested when this user
moved to the location A at time 5. In fact, there exists
insightful information in these data, such as
movement and transaction behaviors of mobile users.
Mining mobile transaction data can provide insights
for various applications, such as data prefetching and
service recommendations.

1.2 Location Based Services:
A Location-Based Service (LBS) is an information
or entertainment service, accessible with mobile
devices through the mobile network and utilizing the
ability to make use of the geographical position of
the mobile device. LBS include services to identify
a location of a person or object, such as discovering
the nearest banking cash machine or the
whereabouts of a friend or employee. LBS include
parcel tracking and vehicle tracking services. LBS
can include mobile commerce when taking the form
of coupons or advertising directed at customers
based on their current location.
Some examples of location-based services are:
- Requesting the nearest business or service, such
as an ATM or restaurant
- Turn by turn navigation to any address
- Locating people on a map displayed on the
mobile phone
- Receiving alerts, such as notification of a sale
on gas or warning of a traffic jam
- Location-based mobile advertising
- Real-time Q&A revolving around restaurants,
services, and other venues.
1.3 Clustering Mobile Transactions:
Clustering mobile transaction data helps in the
discovery of social groups, which are used in
applications such as targeted advertising, shared data
allocation, and personalization of content services. In
previous studies, users are typically clustered
according to their personal profiles (e.g. age, sex, and
occupation). However, in real applications of mobile
environments, it is often difficult to obtain users
profiles. That is, we may only have access to users
mobile transaction data. To achieve the goal of user
clustering without user profiles, we need to evaluate
the similarities of mobile transaction sequences
(MTSs). Although a number of clustering algorithms
have been studied in the rich literature, they are not
applicable in the LBS scenario in consideration of the
following issues: 1) Most clustering methods can
only process data with spatial similarity measures,
while clustering methods with non spatial similarity
measures are required for LBS environments.2) Most
clustering methods request the users to set up some
parameters. However, in real applications, it is
difficult to determine the right parameters manually
for the clustering tasks. Hence, an automated
clustering method is required. Although there exist
many non spatial similarity measures, most of them
are used to measure the string similarity. However,
the mobile transaction sequences discussed in this
report include multiple and heterogeneous
information such as time, location, and services.
1.4 Time Segmentation:
The time interval segmentation method helps us find
various user behaviors in different time intervals. For
example, users may request different services at
different times (e.g., day or night) even in the same
location. If the time interval factor is not taken into
account, some behaviors may be missed during
specific time intervals. To find complete mobile
behavior patterns, a time interval table is required.
Although some studies used a predefined time
interval table to mine mobile patterns, the data
characteristic and data distribution vary in real
mobile applications. Therefore, it is difficult to
predefine a suitable interval table by users.
Automatic time segmentation methods are, thus,
required to segment the time dimension in a mobile
transaction database. Genetic Algorithm (GA) is
automatic time segmentation method. GA produces a
more suitable time interval table
1.5 Mining Mobile Transactions:
A mobile transaction database is complicated since a
huge amount of mobile transaction logs is produced
based on the users mobile behaviors. Data mining is
a widely used technique for discovering valuable
information in a complex data set and a number of
studies have discussed the issue of mobile behavior
mining. However, mobile behaviors vary among
different user clusters or at various time intervals.
The prediction of mobile behavior will be more
precise if we can find the corresponding mobile
patterns in each user cluster and time interval. To
provide precise location-based services for users,
effective mobile behavior mining systems are
required pressingly.

1.4.1 CTMSP Mine:
A novel data mining algorithm named Cluster-based
Temporal Mobile Sequential Pattern Mine (CTMSP-
Mine) is proposed to efficiently mine the Cluster-
based Temporal Mobile Sequential Patterns
(CTMSPs) of users. To mine CTMSPs, a transaction
clustering algorithm named Cluster-Object-based
Smart Cluster Affinity Search Technique (CO-Smart-
CAST) is proposed, that builds a cluster model for
mobile transactions based on the proposed Location-
Based Service Alignment (LBS-Alignment)
similarity measure. Mining and prediction of mobile
sequential patterns by considering user clusters and
temporal relations in LBS environments
simultaneously gives more precise predictions. The
main contributions of this work are that not only a
novel algorithm for mining CTMSPs but also two
nonparametric techniques for increasing the
predictive precision of the mobile users behaviors
are proposed. Besides, the proposed CTMSPs
provide information including both user clusters and
temporal relations. Meanwhile, user profiles like
personal information are not needed for the clustering
method and time segmentation method proposed.
LITERATURE SURVEY
In this chapter brief description of different papers
about clustering mobile transactions, temporal pattern
mining, mobile pattern mining & mobile behavior
predictions is carried out.
In recent years, a number of studies have discussed
the usage of data mining techniques to discover
useful rules/patterns from:
- WWW
- Transaction databases
- Mobility data.
Sequential pattern mining was first introduced in to
search for time ordered patterns, known as sequential
patterns within transaction databases. SMAP-Mine
was proposed by Tseng and Lin for efficiently
mining users sequential mobile access patterns,
based on the FP-Tree to discover both the user
movements and service requests. Lee et al. proposed
T-MAP to efficiently find the mobile users mobile
access patterns based on SMAP in distinct time
intervals which are predefined by users. Yun and
Chen proposed the Mobile Sequential Pattern (MSP)
to take moving paths into consideration and add the
moving path between the left hand and the right hand
in the content of rules. However, there is no work
that considers user clusters and temporal relations in
the mobile pattern mining simultaneously. The
clustering analysis can be roughly divided into two
categories.
1. On similarity measures that may affect the final
clustering results directly.
2. The second category is on the clustering
methods.
For density-based clustering methods, Ben-Dor and
Yakhini proposed the Cluster Affinity Search
Technique (CAST) that requires an affinity threshold
t, where 0 < t < 1. The algorithm guarantees that the
average similarity in each generated cluster is higher
than the threshold t. Tseng and Kao proposed the
Smart Cluster Affinity Search Technique (Smart-
CAST). The main ideas of the Smart-CAST are as
follows: First, the method uses the CAST as the basic
clustering method. Second, the method uses a quality
validation method, Huberts (gamma) statistics, to
find the best clustering result.
The genetic algorithm was proposed by Holland.It
needs to define a fitness function to evaluate the
quality of a chromosome, and then, randomly
generate a population. Through the evolution
processes: 1) Selection, 2) Crossover, and 3)
Mutation, the chromosomes of the population
repeatedly create new generations. The weakest
chromosomes become obsolete.
The mobile behavior predictions can be roughly
divided into two categories.
1. Time series-based prediction that can be divided
into two types :
1) Linear models
2) Nonlinear models
The nonlinear models considered the objects
movements by more sophisticated regression
functions. Thus, their prediction accuracies are higher
than those of the linear models. Recursive Motion
Function (RMF) is the most accurate prediction
method in the literature based on regression
functions.
2. Pattern-based prediction. Ishikawa et al. derived a
Markov Model (MM) that
3. generates Markov transition probabilities from
one cell to another for predicting the next cell of
the object.HPM method can only predict
4. the next spatial locations of objects.
SMAP-Mine was first proposed to discover
sequential mobile access rules and predict the users
next locations and services. Monreale et al.
proposed a prediction model, namely, Where Next
that utilized trajectory patterns to predict the next
locations of moving objects.

METHDOLOGY
There are mainly four important issues addressed in
mining mobile patterns. Those are:
1. Clustering of mobile transaction sequences
2. Time segmentation of mobile transaction
sequences
3. Discovery of CTMSPs
4. Mobile behavior prediction for mobile users using
combined approach
The system framework for the mining mobile
patterns is:

Fig 2: System Framework
Fig. 2 shows the proposed system framework.
System has an offline mechanism for CTMSPs
mining and an online engine for mobile behavior
prediction. When mobile users move within the
mobile network, the information which includes time,
locations, and service requests will be stored in the
mobile transaction database. Table 1 shows an
example of mobile transaction database which
contains seven records. In the offline data mining
mechanism, there are two design techniques and the
CTMSP-Mine algorithm to discover the knowledge.
First, the CO-Smart-CAST algorithm is proposed to
cluster the mobile transaction sequences. In this
algorithm, the LBS-Alignment is proposed to
evaluate the similarity of mobile transaction
sequences.
Second, a GA based time segmentation algorithm is
proposed to find the most suitable time intervals.
After clustering and segmentation, a user cluster table
and a time interval table are generated, respectively.
Third, the CTMSP-Mine algorithm is proposed to
mine the CTMSPs from the mobile transaction
database according to the user cluster table and the
time interval table.
In the online prediction engine, a behavior prediction
strategy is proposed to predict the subsequent
behaviors according to the mobile users previous
mobile transaction sequences and current time.
The main purpose of this framework is to provide
mobile users a precise and efficient mobile behavior
prediction system.
TABLE 1
An Example of Mobile Transaction Database


3.1 Clustering mobile transaction database:
A mobile transaction database, users in the different
user groups may have different mobile transaction
behaviors. The first task to tackle is to cluster mobile
transaction sequences. A parameter-less clustering
algorithm CO-Smart-CAST is proposed.
Before performing the CO-Smart-CAST, a similarity
matrix S, based on the mobile transaction database is
generated. The entry S
i,j
in matrix S represents the
similarity of the mobile transaction sequences i and j
in the database, with the degrees in the range of [0,
1]. A mobile transaction sequence can be viewed as a
sequence string, where each element in the string
indicates a mobile transaction. The major challenge is
to measure the content similarity between mobile
transactions. So, LBS-Alignment is proposed, which
can obtain the similarity based on the concept of
DNA alignment.

3.1.1 Location Based Service Alignment:
LBS Alignment is based on the consideration that
two mobile transaction sequences are more similar,
when the orders and timestamps of their mobile
transactions are more similar. Based on this concept,
the time penalty (TP) and the service reward (SR) in
the LBS-Alignment is generated. The base similarity
score is set as 0.5. Two mobile transactions can be
aligned if their locations are the same. Otherwise, a
location penalty is generated to decrease their
similarity score.
The location penalty is defined as
0.5/(|s1|+|s2|),where |s1|and |s2| are the lengths of
sequences s1 and s2,respectively. When two
sequences are totally different, their similarity score
is 0.


Fig. 3. The LBS-Alignment algorithm.
When two mobile transactions are aligned, their time
penalty and service reward is measured. TP focuses
on their time distance. The farther the time distances
between them, the larger their time penalty. TP that is
generated to decrease their similarity score is defined
as (|s1 time - s2 time|)/len, where len indicates the
time length. SR focuses on the similarity of the
service requests. The more similar their service
requests, the larger their service reward. SR that is
generated to increase their similarity score is defined
as (|s1.services ^ s2.services|)/(|s1.services U
s2.services|).Fig. 3 shows the procedures of an LBS-
Alignment measure. Input data include two mobile
transaction sequences (line 1).
Output data are the similarity between two mobile
transaction sequences, with the degrees in the range
from 0 to 1 (line 2). Some parameters are initialized
(line 4 to line 7). The base similarity score is set as
0.5 (line 5). Dynamic programming to calculate Mi,j
(line 8 to line18) is used. Mi,j indicates the value of
matrix M in column i and row j, where M is the
score matrix of LBS-Alignment. In this procedure, if
the locations of two transactions are the same (line
10), both the time penalty (line 11) and the service
reward (line 12) are calculated to measure the
similarity score (line 13). Otherwise (line 14), the
location penalty is generated to decrease the
similarity score (line 15). Finally, s.length, s.length
is returned as the similarity score of the two mobile
transaction sequences (line 19).
After obtaining the similarity matrix, clusters of the
mobile transaction sequences by the proposed CO-
Smart-CAST are formed. Fig. 4 shows the procedure
of CO-Smart-CAST. The input data are an N-by-N
similarity matrix S (line 1). The output data are the
clustering result (line 2). CO-Smart-CAST can
automatically cluster the data according to the
similarity matrix without any user-input parameter.
The main ideas of CO-Smart-CAST are as follows:
First, the CAST method that takes a parameter named
affinity threshold t is used as the basic clustering
method. Second, a quality validation method is used,
called Huberts Statistics, to find the best clustering
result.
Third, a hierarchical concept to reduce the sparse
clusters is used. For a clustering result, Huberts
Statistics is used to measure its quality by taking the
similarity matrix and the clustering result as the
input. In each clustering result, its
obj
and
clu
which
represent the clustering qualities measured by the
original object similarity matrix S and the last cluster
similarity matrix S, respectively.
Example:
Let s and s be two mobile transaction sequences.
s ={(1,A,S1),(4,B,),(6,C,S2),(8,E, ),(17,G,s4)}
s={(3,A,),(5,D,S1),(8,C,),(19,E,),(20,G,{S4,S5
})}
Time length=20 and location penalty=.05

Fig.4.(a) gives similarity matrix and (b) gives
LBS-alignment result
Using above algorithm, similarity between s and
s=0.405

3.1.2 CO-Smart-Cast Algorithm:
- Cluster Object Based Smart Cluster Affinity
Search Technique.
- Co-Smart-CAST clusters the data without any
user-input parameter.
- Steps for Clustering method:
Takes affinity threshold t for basic clustering
Use quality validation method, Hubert's
Statistics, to find best clustering result
Use hierarchical concept to reduce sparse
cluster
Input to CO-Smart-CAST Algorithm
TABLE 2 :The Similarity Matrix

Algorithm:

Fig. 5. The CO-Smart-CAST algorithm.

The initial values of S and S are the same since
every object be an independent cluster (line 4).The
F1 score which is the harmonic mean to combine
obj

and
clu
as
CO
is used. A higher value of
CO

represents the better clustering quality. To determine
the most suitable t, the easiest way is varying t with a
fixed increment and iterating the executions of CAST
to find the best clustering result with the highest
CO
.

The main drawback of this way is that many
iterations of computation are required. For this
reason, the number of computations by eliminating
unnecessary executions are reduced, and then, obtain
a near-optimal clustering result. The main idea is to
narrow down the range of t effectively. A testing
range R for setting t is from 0 to 1(line 5). By the
points P0, P1, P2, P3and P4, R is equally divided into
five points, where P0 < P1 < P2 < P3 < P4.Then, the
value of each Pi (line 8) is sequentially taken as the
affinity threshold to perform the CAST algorithm
(line 9), and then, obtain the
CO
of the clustering
result of each Pi (line 10 to line 12). When a run of
executing the clustering is completed (line 7 to
line13), the clustering at point P
b
that produces the
highest
CO
is considered to be the best clustering
(line 14). Then, the testing range R is limited within
the new range [P
b-1
,P
b+1
]containing the point Pb (line
15). The above process is repeated until the testing
range R is smaller than the threshold " (line 16),
where is a very small value, i.e., less than 10
-5
. If
the
CO
statistic produced by point P
Best
is higher than
the best
CO
statistic(line 17), the best cluster result is
recorded (line 18 and line19) and all of the entities in
similarity matrix S are modified to the average
similarities between all pairs of corresponding cluster
results.The total process is repeated until no better

CO
statistic is generated (line 07 to line 26).Finally,
the clustering result with the highest quality during
the tested.
Example:
The best affinity threshold t found using highest CO.

After 1
st
clustering output:

Fig.6. Clustering after 1
st
iteration

Similarity matrix for newly formed clusters:

Fig.7. table formed after clustering

The final clustering result:

Fig.8. Finally formed clusters

3.2 Segmentation of Mobile Transaction:
In a mobile transaction database, similar mobile
behaviors exist under some certain time segments.
Hence, it is important to make suitable settings for
time segmentation so as to discriminate the
characteristics of mobile behaviors under different
time segments. A GA-based method to automatically
obtain the most suitable time segmentation table with
common mobile behaviors is proposed. Fig. 9 shows
the procedure of proposed time segmentation method,
named Get Number of Time Segmenting Points
(GetNTSP) algorithm. The input data are a mobile
transaction database D and its time length T (line 01).
The output data are the number of time segmenting
points (line 02). For each item, the total number of
occurrences at each time point is accumulated (line
07 to line 11). At each time point with the largest
change rate (line 13) is obtained. The change rate is
defined as (C[i+1]-C[i])/(1+C[i]), where C[i]
represents the total number of occurrences for the
item at time point i. Then, count occurrences of all
these time points (line 15), and find out the satisfied
time points whose counts are larger than or equal to
the average of all occurrences from these ones, and
then, take these satisfied ones as a set of the time
point sequence (TPS) (line 17). In the time point
sequence, the average time distance a between two
neighboring time points is calculated (line 18).



Algorithm:

Fig. 9. The GetNTSP algorithm.
The number of neighboring time point pairs, in which
the time distance higher than , is calculated (line 19
to line 23). The result represents the time
segmentation count (line 24). After obtaining the
number of time segmenting points, use genetic
algorithm to discover the most suitable time intervals.
Example:
Time points with the largest change rates are
5, 10, 13, 30, 5, 28, 10, 7,20, 30, 25, and 28.
These time points can be sorted as 5(2),7(1),
10(2), 13(1), 20(1), 25(1), 28(2), and 30(2),
where t(n) indicates that the number of time
points t is n.
The Time Point Sequence (TPS) :{5, 10, 28,
and 30} as average number of time points is
12/8=1.5
Calculate a , 25/3= 8.33
Only 1 interval in 10 and 28 larger than
8.33.
So, time segmenting point =1.
Use Genetic algo. with fitness function :






Nc-total no. of cells Ns-total no of services
Ti[c, s]-request count of cell c & service s in time
interval
Ti avg. service request count

3.3 Discovery of CTMSPs:
The entire procedures of CTMSP-Mine algorithm can
be divided into three main steps:
1) Frequent-Transaction Mining,
2) Mobile Transaction Database Transformation, and
3) CTMSP Mining.

3.3.1 Frequent Transaction Mining:
Mine Frequent transactions (F -Transactions)
using modified Apriori Algorithm.
At first, count support of each cell and service in
each user cluster
time interval
Keep frequent 1-transactions with minimal
support threshold TSUP.
A candidate 2-transaction is generated by joining
two frequent 1-transactions if -
user clusters
time intervals
cells are same.
Repeat same procedures until no candidate
transaction is generated.
Construct service mapping table to transform
services into F-transactions to reduce time

TABLE 3
Frequent Transactions

The frequent transactions are shown in Table 3. Here,
a service mapping table is constructed to transform
services into F-Transactions in Table 3. For each
service set, we use a contiguous and unique symbol
LSi (Large Service i) to represent it. The mapping
procedure can reduce the time required to check if a
| |
( )
( )

+
= = =
|
.
|

\
|

=
1
1 1 1
2
,
1
) (
X Len
i
Nc
c
Ns
s
i i T s c T
Ns Nc
X Fitness
mobile sequential pattern is contained in a mobile
transaction sequence.

3.3.2 Mobile Transaction Database
Transformation:
The main objectives and advantages are:
1) service sets can be represented by symbols for
efficiently processing
2) transactions whose support is less than the
minimal support threshold can be eliminated to
reduce the size of database.

3.3.3 CTMSP Mining:
In this phase, Frequent 1-CTMSPs are obtained
in the frequent-transaction mining.
Utilization of a two-level tree named (CTMSP-
Tree).
The internal nodes store the frequent mobile
transactions
The leaf nodes store the corresponding
paths.
Every parent node of a leaf node is designed
as a hash table stores:
the combinations of user cluster tables
time interval tables.
CTMSP-Mining Tree obtained is:









Fig.10. CTMSP-Tree. (a) The part of frequent 2-
CTMSPs. (b) 3-CTMSPs. (c) 4-CTMSPs

3.4 Prediction Strategies:
3 prediction strategies :
Patterns selected only from the corresponding
cluster a user belong.
Patterns selected only from the time interval
corresponding to current time.
Patterns selected only from the ones that
match the users recent mobile behavior.
If exist more than one pattern that satisfy above
conditions, the one with the maximal support is
selected.

SUMMARY AND FUTURE SCOPE

A novel method CTMSP-Mine ,
for discovering CTMSP in LBS environments
prediction strategies to predict the subsequent
user mobile behaviors using CTMSP is
proposed combining user cluster and time
interval.
This method is not yet applied on real data.

Future Scope:
As above techniques are not yet implemented on
real data, the work is to implement all the above
algorithms for real data and obtain results. And then
for the same by applying different strategies,
efficiency of the mining can be tested in future.



BIBLIOGRAPHY
[1] Eric Hsueh-Chan Lu,Vincent S.Tseng and Philip
S. Yu, Mining Cluster-Based Temporal Mobile
Sequential Patterns in Location-Based Service
Environment, IEEE Trans. Knowledge and
Data engineering , vol. 23, no. 6, June 2011.
[2] J. Han and M. Kamber, Data Mining: Concepts
and Techniques, second ed., Morgan
Kaufmann, Sept. 2000.
[3] A. Ben-Dor and Z. Yakhini, Clustering Gene
Expression Patterns, J. Computational Biology,
vol. 6, no. 3, pp. 281-297, July 1999
[4] V.S. Tseng and C. Kao, Efficiently Mining
Gene Expression Data via a Novel Parameterless
Clustering Method, IEEE/ACM Trans.
Computational Biology and Bioinformatics, vol. 2,
no. 4, pp. 355-365,Oct.-Dec. 2005.
[5] C.H. Yun and M.S. Chen, Mining Mobile
Sequential Patterns in a Mobile Commerce
Environment, IEEE Trans. Systems, Man, and
Cybernetics, Part C, vol. 37, no. 2, pp. 278-295, Mar.
2007.
[6] V.S. Tseng, H.C. Lu, and C.H. Huang,
Mining Temporal Mobile Sequential Patterns in
Location-Based Service EnvironmentsProc. 13th
IEEE Intl Conf. Parallel and Distributed Systems,
pp. 1-8,Dec. 2007.
[7] V.S. Tseng and W.C. Lin, Mining Sequential
Mobile Access Patterns Efficiently in
Mobile Web Systems, Proc. 19th Intl
Conf. Advanced Information Networking
and Applications, pp. 867-871,Mar. 2005.
[8] Mr.A.Dubey and Prof.S.K.Shandilya,Exploiting
need of data mining services in mobile
computingenvironments,IntlConf.Computational
Intelligennce and Networks 2010

S-ar putea să vă placă și