Sunteți pe pagina 1din 36

Market Basket Analysis

Software requirement specification

P R O J E C T G U I D E :

MADHAV INSTITUTE OF TECHNOLOGY AND SCIENCE


Gwalior-474005 M.P.

Pr of . A k hi le sh

Tiwari
Department of CSE & IT MITS, Gwalior

IBM

A S S I D u o u s G R O U P

TEAM MEMBERS:

Anushri Jain Shruti Goyal


Ashish Bandil

Ajit Singh Kushwah

Market Basket Analysis Software Requirement Specification Assiduous Group

Version I 30-01-2012

Table of Contents

Description

Page no.

1. Introduction

1.1 Purpose

1.2 Scope

1.3 Definition, Acronyms, and Abbreviations

1.4 References

1.5 Technologies to be used

1.6 Overview

2. Overall Description

2.1 Product Perspective

2.2 Software Interface

10

Assiduous Group/MITS

Page 2

Market Basket Analysis Software Requirement Specification Assiduous Group

Version I 30-01-2012

2. 3 Hardware Interface 2. 4 Product Function 2. 5 User Characteristics 2. 6 Constraints

10

11

11

11

2.7Architecture Design 2. 8 Assumptions and Dependencies

12

15

Assiduous Group/MITS

Market Basket Analysis Software Requirement Specification Assiduous Group

Version 1 30-01-2012

1. Introduction:
1.1 Purpose:
The amount of data being collected in databases today far exceeds our ability to reduce
and analyze data without the use of automated analysis techniques. Many scientific and transactional business databases grow at a phenomenal rate. Knowledge discovery in

databases (KDD) is the field that is evolving to provide automated analysis solutions.
In view of above, purpose is to analyze market basket data for t he extraction of hidden

trends and buying behavior of customers.

1.2 Scope:
Suppose as a manager of an All Electronics branch, you would like to learn the buying habits of your customers. Specifically, you wonder, Which groups or sets of items are customers likely to purchase on a given trip to the store? To answer your question
market basket analysis may be performed on the retail data of customer transactions at

your store. The results may be used to plan marketing or advertising strategies, as well as catalog design. For instance market basket analysis may help managers design different store layouts. In one strategy, items that are frequently purchased together can be placed

in close proximity in order to further encourage the sale of such items together.
Although Market Basket Analysis conjures up pictures of shopping carts and supermarket shoppers, it is important to realize that there are many other areas in

which it can be applied. These include: Analysis of credit card purchases. Analysis of telephone calling patterns. Identification of fraudulent medical insurance claims. (Consider cases where common rules are broken).

Analysis of telecom service purchases. Assiduou

s Group/MITS Page 4

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group

1.3 Definitions, Acronyms, and Abbreviations:


1.3.1Data Mining Data Mining refers to extracting or "mining" knowledge from huge amount of data. Many
people treat data mining as a synonym for another popularly used term, knowledge discovery in databases, or KDD. Data Mining is simply an essential step in the process of

knowledge discovery in databases. 1.3.2Data Mining Techniques


Fast technological changes and related research has led to the development of many data mining techniques and systems. Because of the inherent differences in the data model, specific techniques are developed to mine different types of databases. Different classification schemes have been used in the literature to categorize data mining methods based on the kind of databases to be studied (Such as transactional databases, relational databases, spatial databases, temporal databases, multimedia databases, and Internet information databases etc.), the kind of technique to be utilized (such as autonomous knowledge miner, data driven miner, and query driven miner etc.) and the kind of knowledge to be discovered. According to the last classification scheme the

following are the common data mining techniques: Association Rule Classification Clustering Sequence Rule Generalization and Summarization etc.

Since the proposed project is related to Association Rule Mining, a brief description of

Association Rule Mining is given below.

Assiduou s

Group/MITS

Page 5

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group 1.3.3 Association Rule Mining association rules in transactional or relational databases has recently attracted a
lot of attention in databases communities. The task is to find interesting associations or correlations among a large set of data i.e. to identify sets of items or predicates that frequently occurs together and then formulate rules that characterize their relationship. For example one may find, from a large set of transaction data, such an association rule as if a customer buys "X", he/she usually buys "Y", in the same transaction. Here "X" and "Y" are individual items or set of items. Retail stores frequently use association rules in order to assist marketing, advertising, floor-management and inventory control etc. Although they have a direct applicability to retail business, they can also be used for

other purposes. A formal statement of the association rule problem is given in [1]. Let I = { i , i , i , i ,.......,i }, I , be a set of m distinct literals called items.
1 2 3 4 m

Let D be a set of transaction (variable length) over I. Each transaction contains a

set of items i , i , i , i ,............,i I.


1 2 3 4 k

An association rule is an implication of the form X Y, where X, Y I and X Y = . Here 'X' is called the antecedent or body and 'Y' is called consequent or head of the rule.

1.4 References:
[1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in Large Databases. In proceedings of ACM-SIGMOD International conference on management of Data, Washington D.C., May 1993, pp 207-216. [2] Abraham Silberschatz, Henry F. Korth and S. Sudarshan. Database System

Concept. The McGRAW HILL Companies Fifth Edition 2006. [3] Margaret H. Dunham. Data Mining, PEARSON Education Sevent Edition 2005.

Assiduous Group/MITS Page 6

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification

Assiduous Group

1.5 Technologies to be used:

J2EE: (Servlet, JSP, JAX Java Platform, Enterprise Edition or Java EE is a widely
used platform for server programming in the Java programming language. The Java

Platform (Enterprise Edition) differs from the Java Standard Edition Platform (Java SE) in that it adds libraries which provide functionality to deploy fault-tolerant, distributed, Multi-tier Java software, based largely on modular components running on an application

serverP, Java Beans)


JAVA: Application architecture. Java is an object-oriented programming language developed by Sun Microsystems a company best known for its high end UNIX workstations. Java language was designed to be small, simple, and portable across platforms, operating systems, both at the source and at the binary level, which means that Java programs (applet and application) can run on any machine that has the Java virtual

machine (JVM) installed.


DB2 9.7: IBM Database. DB2 Database is the database management system that delivers a flexible and cost effective database platform to build robust on demand business

applications and supports the J2EE and web services standards.


RAD 7.0: Development tool. IBM Rational Application Developer for WebSphere Software is an integrated development environment (IDE), made by IBM's Rational Software division, for visually designing, constructing, testing, and

deploying Web services, portals, and Java (J2EE) applications.

Page 7

Assiduous Group/MITS

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group

1.6 Overview:

-I- Overall Description: Processes during the tenure of project (i) Study of Apriori Algorithm (ii) Data Collection (iii) Implementation of Apriori Algorithm (iv) Development of user interface (v) Application of Apriori on collected market basket data (vi) Analysis of results -I- Specific Requirements: Real-life dataset (Market Basket Data)

2. Overall Description:

2.1 Product Perspective:


Client Tier It implements the "look and feel" of an application. It is responsible for the presentation
of data, receiving user events and controlling the user interface. Most ecommerce applications are web-based. The programming languages used are the combination of

HTML, CSS and Javascript.

Application Tier
This layer implements the business logic of the applications. It is usually powered by a Java Application Server (WebSphere). There're several sub-layers within the application

layer.

Assiduous Group/MITS Page 8

Version I 30-01-2012 Market Basket Analysis Software Requirement Specification Assiduous Group Data Tier

This is the layer that manages the persistence of application information. It is usually powered by a relational database server ( MS SQL Server).
Stored Procedures and Functions are used to execute database server-side
processes pertinent to data integrity. Business logic processes should be part of

application layer in general, not part of data layer.

Fig I : Object Oriented Scenario(of three tier architecture)

Page 9

Assiduous Group/MITS

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group

2.2 Software Interface:

Front End Client: HTML , Dream Weaver Web Server: Apache, Tomcat, Web Sphere. Back End: DB2 9.7

2.3 Hardware Interface:


Disk Space

Minimum

RAM Requirements: Client Side Processor

Intel Pentium III or Internet Explorer 6 AMD 800 MHz

128 MB S

100 MB erver Side

Pr

Intel Pentium III or Web Sphere AMD 800 MHz Data Tier Processor DB 2 Intel Pentium III or AMD 800 MHz RAM 256 MB Disk Space 500 MB RAM 1 GB Disk Space 3.5 GB

Page 10

Assiduous Group/MITS

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group

2.4 Product Functions:


Developed /proposed product will include the following functions-

1. Specify input data: Define the data to be mined, data may be in the form of
dataset file or any other file etc.

2. Process data/ preprocess the input data: 3. Select technique/algorithm: Select the appropriate data mining algorithm. 4. Work on results: Select visualization tools to analyze the result.
2.5 User Characteristics:
Users can be characterized as: 1. General (Non Technical User): This category includes general users having no technical information. 2. Technical User: This category includes users having technical information. 3. Analyst: This category includes users having the ability to analyze the data as well as result.

2.6 Constraints:
Proposed application requires user-specified Support and Confidence framework as

constraints, description of which is as follows-

Support and Confidence Framework:


Support: The first number is called the support for the rule. The support is simply the number of transactions that include all items in the antecedent and consequent parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.)

measure of how often the collection of items in an association occur together as a

percentage of all the transactions


In 2% of the purchases at hardware store, both pick and shovel were

bought Assiduous Group/MITS Page 11

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group Rules originating from the same itemset have identical support but can have different confidence support = #tuples(LHS, RHS)/N

Confidence: The other number is known as the confidence of the rule. Confidence is the ratio of the number of transactions that include all items in the consequent as well as the
antecedent (namely, the support) to the number of transactions that include all items in

the antecedent.
confidence of rule B given A is a measure of how much more likely it is that B

occurs when A has occurred 100% meaning that B always occurs if A has occurred confidence = #tuples(LHS, RHS) / #tuples(LHS) Example: bread and butter milk [90%, 1%]

For example, if a supermarket database has 100,000 point-of-sale transactions, out of

which 2,000 include both items A and B and 800 of these include item C, the association rule "If A and B are purchased then C is purchased on the same trip" has a support of 800 transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000). One way to think of support is that it is the probability that a randomly selected transaction from the database will contain all items in the antecedent and the consequent, whereas the confidence is the conditional probability that a randomly selected transaction will include all the items in the consequent given that the transaction includes all the items in the antecedent.

2.7 Architecture Design:


Architecture of our developed product is inspired with the 3-tier architecture. The architecture of a database system is greatly influenced by the underlying computer

system on which the database system runs.Database systems can be centralized, or client server, where one server machine executes work on behalf of multiple client machines.

Assiduous Group/MITS

Page 12

Version I 30-01-2012 Market Basket Analysis Software Requirement Specification Assiduous Group In case of three tier architecture, the client machine acts as merely a front end and does
not contain any direct database calls. Instead, the client ends the communication with an application server, usually through a forms interface. The application server in turn communicates with a database system to access data. The business logic of the application, which says what actions to carry out under what conditions, is embedded in the application server, instead of being distributed across multiple clients. Three tier applications are more appropriate for large applications, and for applications that run on

the

World Wide Web. The architecture is given in [2].

Fig 2: Three tier architecture

Page 13

Assiduous Group/MITS

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group Client Tier

It implements the "look and feel" of an application. It is responsible for the presentation
of data, receiving user events and controlling the user interface. Most ecommerce applications are web-based. The programming languages used are the combination of

HTML, CSS and Javascript. JSP or ASP are used for dynamic content.

HTML is a Web authoring markup language for defining content structures and rendering a web page.
Javascript is commonly used for client-side validation. Javascript does have some

control over the look-and-feel of a page in dynamic HTML. Application Tier This layer implements the business logic of the applications. It is usually powered by a Java Application Server (WebLogic or WebSphere). There're several sub-layers within the application layer.

Control Layer is the interface layer between presentation tier and application tier. The implementation of this layer is dependent on the languages used for

implementing the presentation tier.

Transaction Layer usually implements business processes that may involve many business objects. In J2EE architecture, session beans are commonly used for implementing the transaction layer. Transaction Layer and Business Object Layer are not constrained by the programming languages for the presentation and the

database used for persistence.

Business Object Layer consists of objects that represent business entities which

always should be 100% independent of database used for data persistence.

Data Access Object (DAO) Layer is the interface between the application tier and persistence tier. Besides the methods for "creating", "retrieving", "updating" and "removing" a business object from database, DAO objects implement other

Assiduous Group/MITS

Page 14

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group business-specific methods as well. Even with JDBC, DAO objects may not be 100% database independent. Data Tier

This is the layer that manages the persistence of application information. It is

usually powered by a relational database server (MS SQLServer

Stored Procedures and Functions are used to execute database server-side


processes pertinent to data integrity. Business logic processes should be part of

application layer in general, not part of data layer.

2.8 Assumptions and Dependencies:


Support : The first number is called the support for the rule. The support is simply the

number of transactions that include all items in the antecedent and consequent parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.)
measure of how often the collection of items in an association occur together as a

percentage of all the transactions


In 2% of the purchases at hardware store, both pick and shovel were

bought
Rules originating from the same itemset have identical support but can have

different confidence support = #tuples(LHS, RHS)/N

Confidence : The other number is known as the confidence of the rule. Confidence is the

ratio of the number of transactions that include all items in the consequent as well as the
antecedent (namely, the support) to the number of transactions that include all items in

the antecedent.

Assiduous Group/MITS

Page 15

Version I 30-01-2012

Market Basket Analysis Software Requirement Specification Assiduous Group confidence of rule B given A is a measure of how much more likely it is that B occurs when A has occurred 100% meaning that B always occurs if A has occurred confidence = #tuples(LHS, RHS) / #tuples(LHS) Example: bread and butter milk [90%, 1%]

For example, if a supermarket database has 100,000 point-of-sale transactions, out of

which 2,000 include both items A and B and 800 of these include item C, the association rule "If A and B are purchased then C is purchased on the same trip" has a support of 800 transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000). One way to think of support is that it is the probability that a randomly selected transaction from the database will contain all items in the antecedent and the consequent, whereas the confidence is the conditional probability that a randomly selected transaction will include all the items in the consequent given that the transaction includes all the items in the antecedent.

Assiduous

Group/MITS

Page 16

Version I 30-01-2012 Market Basket Analysis Software Requirement Specification Assiduous Group

Special Thanks
We convey a special thanks to our department and to our
college. We also convey a special thanks to all these softwares and websites, they have been helping a lot in

doing the project.

Assiduous

Group/MITS

Page 17

S-ar putea să vă placă și