Sunteți pe pagina 1din 1

Behavioral Pattern-Based Customer Segmentation

from each cluster are very different from those gener- in this area addresses the problem of clustering based
ated from another cluster may be an effective method on behavioral patterns, the methods can potentially be
for learning the natural categorizations. modified for this purpose. Wang et al (1999) introduces
a clustering criterion suggesting that there should be
Behavioral Pattern Representation: many large items within a cluster and little overlapping
Itemset of such items across clusters. They then use this crite-
rion to search for a good clustering solution. Wang et al
Behavioral patterns first need to be represented (1999) also points out that, for transaction data, methods
properly before they can be used for clustering. In using pairwise similarity, such as k-means, have prob-
many application domains, itemset is a reasonable lems in forming a meaningful cluster. For transactions
representation for behavioral patterns. We illustrate that come naturally in collection of items, it is more
how to use itemsets to represent behavioral patterns meaningful to use item/rule-based methods. Since we
by using Web browsing data as an example. Assume can represent behavioral patterns into a collection of
we are analyzing Web data at a session level (continu- items, we can potentially modify Wang et al (1999)
ous clicks are grouped together to form a session for so that there are many large items within a cluster and
the purpose of data analysis.). Features are first cre- little overlapping of such items across clusters. Yang et
ated to describe the session. The features can include al (2002) addresses a similar problem as that in Wang
those about time (e.g., average time spent per page), et al (1999), and does not use any pairwise distance
quantity (e.g., number of sites visited), and order of function. They study the problem of categorical data
pages visited (e.g., first site) and therefore include clustering and propose a global criterion function that
both categorical and numeric types. A conjunction of tries to increase the intra-cluster overlapping of transac-
atomic conditions on these attributes (an “itemset”) is tion items by increasing the height-to-width ratio of the
a good representation for common behavioral patterns cluster histogram. The drawback of Wang et al (1999)
in the Web data. For example, {starting_time = morn- and Yang et al (2002) for behavioral pattern based clus-
ing, average_time_page < 2 minutes, num_cate gories tering is that they are not able to generate a set of large
= 3, total_time < 10 minutes} is a behavioral pattern itemsets (a collection of behavioral patterns) within a
that may capture a user’s specific “morning” pattern cluster. Yang & Padmanabhan (2003, 2005) define a
of Web usage that involves looking at multiple sites global goal and use this goal to guide the clustering
(e.g., work e-mail, news, finance) in a focused manner process. Compared to Wang et al (1999) and Yang et
such that the total time spent is low. Another common al (2002), Yang & Padmanabhan (2003, 2005) take a
pattern for this (same) user may be {starting_time = new perspective of associating itemsets with behavior
night, most_visted_category = games}, reflecting the patterns and using that concept to guide the clustering
user’s typical behavior at the end of the day. process. Using this approach, distinguishing itemsets
Behavioral patterns from other domains (e.g. shop- are identified to represent a cluster of transactions. As
ping patterns in grocery stores) can be represented in noted previously in this chapter behavioral patterns
a similar fashion. The attribute and value pair (start- describing a cluster are represented by a set of itemsets
ing_time = morning) can be treated as an item, and the (for example, a set of two itemsets {weekend, second
combination of such items form an itemset (or a pattern). site = eonline.com} and {weekday, second site = cnbc.
When we consider a cluster that contains objects with com}. Yang & Padmanabhan (2003, 2005) allow the
similar behavior patterns, we expect these objects in possibility to find a set of itemsets to describe a cluster
the cluster share many patterns (a list of itemsets). instead of just a set of items, which is the focus of other
item/itemsets-related work. In addition, the algorithms
Clustering Based on Frequent Itemsets presented in Wang et al (1999) and Yang et al (2002)
are very sensitive to the initial seeds that they pick,
Clustering based on frequent-itemsets is recognized while the clustering results in Yang & Padmanabhan
as a distinct technique and is often categorized under (2003, 2005) are stable. Wang et al (1999) and Yang et
frequent-pattern based clustering methods (Han & Kam- al (2002) did not use the concept of pattern difference
ber 2006). Even though not a lot of existing research and similarity.



S-ar putea să vă placă și