Sunteți pe pagina 1din 5

Movies

Recommendation
Data Analytics Project
Report
FAS Members :

Avirup Banerjee (18S714)


Movies
Data Analytics Project
Swaroop Singamsetty()
Vipras Ladu Morye()
Report
Recommendation
Yagneshwar Chowdary Bandlamudi()
Anshul Sharma (18S712)
CONTENTS

Introduction......................................................1

Project Objective..........................2

Methodology 3

Results............................................4

Summary...............................................5

References 6
Introduction
Project Objective
Methodology

In this project we have used Association Rule Mining Technique which is a unsupervised learning
methodology.
Association Rule Mining technique is used when one wants to figure out associations between
different objects in a set, within in a transaction database find frequent patterns, or search for patterns
within relational databases or any other information repository. The applications of Association Rule
Mining are found in Marketing, Basket Data Analysis (or Market Basket Analysis) in retailing,
clustering and classification
We will consider an example,

So, in the above transactions numbered 1 to 5 we can see diapers are bought with beer in 3 occasions.
Similarly bread is bought with milk in 3 transactions, making them both ‘frequent transactions’.

To understand the mechanics of the methodology we need to know a few things here,

Itemsets: Collection of one or more items, in the above example transaction 2 is a 4-item-set simply
because it’s a set of 4 items.

Support: Fraction of transactions that contain item-set ‘I’ , i.e. support(I) = [frequency(I)] / N
where, N = no. of total transactions

Confidence: In confidence we have to understand the concept of antecedent and consequent.


Antecedent is as the name suggests the transaction that occurs previously and consequent is the
transaction that occurs as a reason of the antecedent.
Moreover, confidence compares the co-occurrence of the antecedent and consequent itemsets in the
database to the occurrence of only antecedent item-sets.
confidence = (no. of transactions where both antecedent and consequent occurs) / (no. of transactions
with antecedent transactions)
like for bread and milk this confidence ratio would be simply put, (¾) = .75 and in probability terms
this turns up to 75% ; for every association rule implemented there always has to be a minimum
confidence level.

Lift ratio: This is defined as the comparison of the confidence ratio with the benchmark confidence
value where, benchmark confidence = (no. of transactions with consequent dataset)/(no. of transactions
in the database)

and so, Lift Ratio = (confidence) / (benchmark confidence). A lift ratio greater than 1 suggests that
there is some usefulness to the rule, in other words the level of association between the antecedent and
consequent itemsets is higher than would be expected if they were independent, the larger the ratio, the
greater the strength of the association.

The algorithm we used here is Apriori. In this algorithm, Association Rule Mining is used as a two-
step-approach ,
i. Frequent item-set generation(where support >= pre determined min-support)
ii. Rule Generation: Calculate support and confidence for all rules and discard rules that fail min-
support and min-confidence thresholds.

For frequent item-set generation full database scan is required, so this turns out to be most costly in
terms of computation. Behind the algorithm, there is a concept of lattice creation. Like for ‘n’ number
of items the size of the lattice will become 2n .
Check the below example, when one start moving upwards subsets get created till the null set. And
also infrequent item-sets get deleted one by one after the full lattice is created.

R has packages to be used to implement Apriori algorithm, the most important being ‘arules’ etc. And
we used these same to implement here in our project.

S-ar putea să vă placă și