Sunteți pe pagina 1din 39

Business Analytics

Today Objective

Association Mining (unsupervised Learning)

Lift Value Calculation, Market Basket Analysis(support and


confidence ) ,Super market design, Association
Mining(apriori)
Indian Institute of Management (IIM),Rohtak
Unsupervised learning is a machine learning
technique, where you do not need to supervise the
model.
Instead, you need to allow the model to work on its
own to discover information. It mainly deals with the
unlabelled data.
Unsupervised learning algorithms allows you to
perform more complex processing tasks compared to
supervised learning. Although, unsupervised learning
can be more unpredictable compared with other
natural learning methods.

Indian Institute of Management (IIM),Rohtak


Let's, take the case of a baby and her family dog.

She knows and identifies


this dog.

Few weeks later a family


friend brings along a dog
and tries to play with the
baby.
Baby has not seen this dog earlier. But it
recognizes many features (2 ears, eyes,
walking on 4 legs) are like her pet dog. She
identifies the new animal as a dog. This is
unsupervised learning, where you are not
taught but you learn from the data (in this
case data about a dog)

Indian Institute of Management (IIM),Rohtak


Why Unsupervised Learning?
Here, are prime reasons for using Unsupervised Learning:
•Unsupervised machine learning finds all kind of unknown patterns in
data.
•Unsupervised methods help you to find features which can be useful
for categorization.
•It is taken place in real time, so all the input data to be analyzed and
labeled in the presence of learners.
•It is easier to get unlabeled data from a computer than labeled data,
which needs manual intervention.
Association
Association rules allow you to establish associations amongst
data objects inside large databases. This unsupervised
technique is about discovering interesting relationships between
variables in large databases. For example, people that buy a
new home most likely to buy new furniture.

Indian Institute of Management (IIM),Rohtak


Market Basket Analytics(basic
introduction)

Indian Institute of Management (IIM),Rohtak


Association Rule a concept of Mining

Indian Institute of Management (IIM),Rohtak


Indian Institute of Management (IIM),Rohtak
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

In this study we learn how to use market


basket analysis to identify pairs or sets of
products that customers tend to purchase
together and how this knowledge can help the
retailer increase profits. You'll then learn how
to use the solver specially Evolutionary Solver
to both the computational burden of finding
products that tend to be purchased together and
to lay out a store so that products with high
lifts(??) are located near each other to optimize
sales. Indian Institute of Management (IIM),Rohtak
Basic Rule
A `rule’ is something like this:
If a basket contains Bread and Butter , then it also contains Milk
Any such rule has two associated measures:
1. confidence – when the `if’ part is true, how often is the `then’ bit
true? This is the same as accuracy.
#_ _ _ _ _ _
Confidence (A )
#_ _ _
2. coverage or support – how much of the database contains
#_ _ _ _ _ _
support(A B) =
_#_ _

Indian Institute of Management (IIM),Rohtak


Rule Measures: Support & Confidence
Transaction ID Items Bought
1 Trouser, Shirt, Jacket
2 Trouser,Jacket
3 Trouser, Jeans
4 Shirt, Sweatshirt
If the minimum support is 50%, then {Trouser, Jacket} is the only 2- itemset
that satisfies the minimum support.
Frequent Itemset Support
{Trouser} 75%
{Shirt} 50%
{Jacket} 50%
{Trouser, Jacket} 50%

If the minimum confidence is 50%, then the only two rules generated
from this 2-itemset, that have confidence greater than 50%, are:
Trouser  Jacket Support=50%, Confidence=66%
Jacket  Trouser Support=50%, Confidence=100%
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Lift
Lift (x => y) is nothing but the ‘interestingness’ or the likelihood
of the item y being purchased when the item x is sold. Unlike
confidence (x => y), this method takes into account the
popularity of the item y.
Lift= support(X & Y)/Support(X)*Support(Y)

•Lift (x => y) = 1 means that there is no correlation within the


itemset.
•Lift (x => y) > 1 means that there is a positive correlation within
the itemset, i.e., products in the itemset, x and y, are more likely
to be bought together.
•Lift (x => y) < 1 means that there is a negative correlation
within the itemset, i.e., products in itemset, x and y, are unlikely
to be bought together.

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

For the superstore data, the lift for meat and vegetables would equal:

Lift(A→B) = (Confidence (A→B))/(Support (B))

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

To be more specific , suppose that in 1,000 transactions, 300 involved a


meat purchase, 400 involved a vegetable purchase, and 200 involved a
purchase of meat and vegetables. Independence of meat and vegetable
purchases implies that the likelihood of a transaction involving meat is
0.30 irrespective of a transaction involving a vegetable purchase. Thus
independence implies that 1,000 (0.40) (0.30) = 120 transactions should
involve purchase of meat and vegetables. Because 200 transactions
involved a purchase of meat and vegetables, knowing that a transaction
involves meat makes it 1.67 times (200/120) more likely that a
transaction involves vegetables. This is consistent with Equation 1,
which tells you that the lift for vegetables and meat is

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Product combinations with lifts much greater than 1


indicate items tend to be purchased together. This is
valuable information for the retailer because placing
products with large lifts near each other in a store
display can increase sales based on the assumption
that the sales of one product will stimulate sales of
the other product. For Example handbags and
makeup have a large lift, this explains why Most of
shopping mall’s placed handbags and makeup
together. Promoting cross-selling of products with
high lifts can also stimulate profits.

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Grehasthi Grocery store

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Computing Lift for Two Products
A two-way product lift therefore is simply a lift involving two
products and can easily be computed in Excel. It can be
generalized to situations involving the computation of lifts
involving more than two items or other transaction attributes
(such as day of week). To practice computing lift, you’ll use
the superstore transaction data in the file
marketbasket.xls. Data shows a subset of the data. The day
of the week is denoted by 1 = Monday, 2 = Tuesday … 7 =
Sunday. For example, the first transaction represents a person
who bought vegetables, meat, and milk on a Friday.(look you
excel)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Creating Named Ranges
Returning to the superstore data, you can now try to find the lift
for all two product combinations. Before you begin computing all
these lifts, though, it is convenient to create a few named ranges.
You can use the Name box to assign the name data to the range
B9:H2936, which contains all the transactions. Simply select the
range B8:H296 and choose INSERT NAME CREATE to name
each column of data by its heading. For example, you can call
Column B day_week, Column C vegetables, and so on (refer to
marketbasket.xls file for the rest of (already I did for you)
the heading names). Now perform the following steps to
determine the fraction of all transactions involving each type of
product and the fraction of transactions taking place on each day
of the week. (remember Mr. Sashwat objective)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Go to formula ,then name manger OR Ctrl+F3

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Now perform the following steps to determine the fraction of all
transactions involving each type of product and the fraction of
transactions taking place on each day of the week.
In cell L7 compute the total number of transactions with the formula
=COUNT(B9:B2936). This formula counts how many numbers occur
in Column B, which gives you the number of transactions.

Indirect function or countif function

2928

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Use of Indirect Function
To illustrate the use of INDIRECT, look at the file

In cell C4, I entered the formula =INDIRECT(A4). Excel


returns the value 6 because the reference to A4 is immediately
replaced by the text string “B4”, and the formula is evaluated
as =B4, which yields the value 6. Similarly, entering in cell C5
the formula =INDIRECT(A5) returns the value in cell B5,
which is 9.
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis
Copy the formula =COUNTIF(INDIRECT(K9),1)/$L$7 from L9 to
cells L10:L14 to compute the fraction of transactions involving each
product. Recall that COUNTIF counts the number of entries in a range
matching a given number or text string (in this case 1). Any cell
reference within an INDIRECT function is evaluated as the contents of
the cell. Thus INDIRECT(K9) becomes vegetables. This enables you to
copy your COUNTIF statement and pick off the range names. Thus
60.7 percent of all transactions involve vegetables, and so on.

Instead of indirect function ,we can also write


=COUNTIF(vegetables,1)/$L$7

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Copy the formula =COUNTIF(day_week,K17)/COUNT(day_week)
from L17 to L18:L23 to determine the fraction of all transactions
occurring on each day of the week. For example, 13.9 percent of all
transactions occur on Monday, and so on. These calculations will be
used in the next section when you compute three-way lifts.

days of week

1 13.9%
2 14.0%
3 13.4%
4 14.6%
5 14.3%
6 15.3%
7 14.4%

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Calculating Lifts for Multiple Two-way Product Combinations
Simultaneously (put cursor on N9)
Now you can use a two-way data table to compute the lift for any
combination of two products. Enter the range names for any two
products in the range N9:O9. To ease the selection of products you can
use the drop-down list in cells N9 and O9. Put the cursor on N9

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Calculating Lifts for Multiple Two-way Product Combinations
Simultaneously

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Calculating Lifts for Multiple Two-way Product Combinations
Simultaneously
Now you can use a two-way data table to compute the lift for any
combination of two products. Enter the range names for any two
products in the range N9:O9. To ease the selection of products you can
use the drop-down list in cells N9 and O9.

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
In cell P10 use the array formula
=SUM((INDIRECT(N9)=1)*(INDIRECT(O9)=1)) to compute the
number of times the combination of vegetables and fruit occur together.
After typing in the formula, press Control+Shift+Enter instead of just
Enter. This formula creates two arrays:
■ An array containing 1 whenever the entry in the vegetable column is
1 and 0 otherwise.
■ An array containing 1 whenever the entry in the fruit column is 1 and
0 otherwise.
This formula causes the arrays to be pairwise multiplied and then the
entries in the resulting array are added together. The pairwise
multiplication yields the number of transactions involving both fruits
and vegetables (520).

=SUMPRODUCT(vegetables,fruit)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

=SUMPRODUCT(vegetables,fruit)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
In cell Q10(predicted) use the formula
=IF(N9<>O9,VLOOKUP(N9,K9:L14,2,FALSE)*L7*V
LOOKUP(O9,K9:L14,2,FALSE),0) to compute the predicted
number of transactions involving the two products assuming
independence. This formula computes the denominator of Equation 1.
If you choose the same product twice, enter a 0.

527.09

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
In cell R10 compute the total lift for these categories with the formula
=IF(Q10=0,1,P10/Q10). If you chose the same item twice,
simply set the lift to equal 1. Otherwise, divide actual occurrence of
fruits and vegetables together by the predicted number of occurrences
(assuming fruits and vegetables are purchased independently.)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
The lift for fruits and vegetables does not indicate a lack of independence
(.99 is near 1). Taking this one step further, you can use a two-way data
table to compute the lift for all two-product combinations
simultaneously.
In cell O17 place the formula for lift (=R10) that you want to
recalculate. R10 contains the lift for a generic two-product combination.

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Select the table range O17:U23.

Select What-If Analysis from the Data Tools Group on the Data tab and
choose Data Table .... From the Data Table dialog box enter N9 as the
row input cell and O9 as the column input cell. After clicking, you now
have the lift for each two-product combination For example, DVDs an
d baby goods have a relatively large lift of 1.4.
NOTE
The lift matrix is symmetric; that is, the entry in row I and Column J of
the lift matrix equals the entry in row J and Column I.
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Thank you !!!
Indian Institute of Management (IIM),Rohtak

S-ar putea să vă placă și