Documente Academic
Documente Profesional
Documente Cultură
WHY CLUSTERING?
• Organizing data into clusters shows internal structure of the data
– Ex. Clusty and clustering genes above
• Sometimes the partitioning is the goal
– Ex. Market segmentation
• Prepare for other AI techniques
– Ex. Summarize news (cluster and then find centroid)
• Techniques for clustering is useful in knowledge discovery in data
– Ex. Underlying rules, reoccurring patterns, topics, etc.
FUZZY C MEANS:
fuzzy c means is by allocating memberships for each data point equivalent to
each and every cluster midpoint based on the distance between cluster
midpoint and its data point. This type of fuzzy implementation can reduce data
loss while clustering huge amounts of data. However, on using non-fuzzy
clustering, the data points are able to be stored in only a single cluster. i.e.
single data point, in which retrieving data will results in losing of data.
Types of different Clustering like k means, gives good results in which data sets
will be belongs to only single cluster which results in, while retrieving data, loss
of data.
Partitional Clustering:
A partitional clustering a simply a division of the set of data objects into non-
overlapping subsets (clusters) such that each data object is in exactly one
subset. partitional clustering aims successive clusters using some iterative
processes. Partitional clustering assigns a set of data points into k-clusters
by using iterative processes. In these processes, n data are classified into
k-clusters. The predefined criterion function J assigns the datum
into kth number set according to the maximization and minimization
calculation in k sets.
FP GROWTH:
1. This algorithm needs to scan the database only twice when compared to
Apriori which scans the transactions for each iteration.
2. The pairing of items is not done in this algorithm and this makes it faster.
3. The database is stored in a compact version in memory.
4. It is efficient and scalable for mining both long and short frequent patterns .
APRIORI:
1. Easy to understand algorithm
2. Join and Prune steps are easy to implement on large itemsets in large databases
3. It requires high computation if the itemsets are very large and the minimum
support is kept very low.
4. The entire database needs to be scanned.
FP GROWTH VS APRIORI:
FP GROWTH:
Pattern generation, FP Growth generates pattern by constructing FP Tree
Candidate generation, There is no candidate generation.
Process, The process is faster as compared to Apriori. The runtime of process
increases linearly with increase in number of itemsets.
Memory Usage, A compact version of database is saved.
APRIORI:
Pattern Generation, Apriori generates pattern by pairing the items into singletons, pairs
and triplets.
Candidate Generation, Apriori uses candidate generation
Process, The process is comparatively slower than FP Growth, the runtime increases
exponentially with increase in number of itemsets
Memory Usage, The candidates combinations are saved in memory.
DRAWBACKS OF APRIORI:
1. Apriori algorithm uses Apriori property and join, pure property for mining
frequent patterns. FP Growth algorithm constructs conditional pattern free and
conditional pattern base from the database which satisfies the minimum
support.
2.Apriori uses breadth first search method and FP Growth uses divide and
conquer method.
3.Apriori algorithm requires large memory space as they deal with large
number of candidate itemset generation. FP Growth algorithm requires less
memory due to its compact structure they discover the frequent itemsets
without candidate itemset generation.
4. Apriori algorithm performs multiple scans for generating candidate set. FP
Growth algorithm scans the database only twice.
5. In Apriori algorithm execution time is more wasted in producing candidates
every time. FP Growth’s execution time is less when compared to Apriori.