Sunteți pe pagina 1din 20


SuperMarket :
A SuperMarket is a self Service shop offering a wide variety of food and household products. It is a larger and
has a wider selection of grocery stores. The SuperMarket typically comprises Meat, Vegitables, Dairy and
Baked Goods along with self space reserved for Canned & Package Goods and as well as for various non-food
items such as kitchen ware, household cleaners, pharmacy products and pet supplies.
SuperMarkets usually offer products at relatively low prices by using their buying power to buy goods from
manufactures at lower prices than smaller stores. SuperMarket also provide discounts on products to increase their

Details About Columns :

1) Sl.No :
It maintains the data in a chronological order.

2) Product_id :
It contains the product identification number to recognize the product easily. It contains unique product
3) Product_Name :
It contains the product name
4) Quatity :
It contains the each product Quantity. The Quatity means Kilograms or Litres.
5) Manufactured Date :
The manufactured date printed on the label is the date of the product was produced in compliance with
good manufacturing regulations.
6) Expiry Date :
The date on which something comes to an end ,can no longer be used.
7) Stock Level :
The maximum stock level is not-to- exceeded amount used for inventory planning. The stock level is based
on a calculation on a cost of storage,standard order quantites.

8) Reorder Level :
The level of inventory which triggers an action to replenish that particular inventory stock . It is a
minimum amount of an item which a firm holds in stock ,such that , when stock falls to this amount , the
item must be reordered.

9) Order Quantity :
The total number of stock-keeping units that have been ordered from a supplier. These units are not
counted as part of the quantity on hand until they actually arrive.
10) Buying Price :
A purchase price is the price an inventor pays for an inventor pays for an inveatment,and the price
becomes the investor ‘s cost basis for the calculation of a gain or loss when the investment is sold.
11) Selling Price :
Selling price is the final amount of money that a buyer and seller have agreed upon when a product sells.
12) Profit :
A financial gain ,especially the difference between the amount earned and the amount spent in buying
,operating or producing something.
13) Category :
A class or division of things regarded as having particular shared characteristics.

14) Sales per Day :

Divide your sales generated during the accounting period by the number of days in the period to
calculate your average daily sales.
15) Sales per Month :
Total units sales per month.these figures are calculated by multiplying the annual forecasted sales by
the forecasted sales percentage each month.

Process of Preparing Data :
Collect the Data From Sources and Prepare a Excel Spread Sheet and Save the file. But most of the times data is
stored in a Microsoft Excel spreadsheet, and we can save this data in comma-separated format. Open
the Excel file; first, select the Save As... item from the File pull-down menu. Then, in the ensuing
dialog box, select CSV, and save the file.

Dataset in Excel Format (1-24)

Dataset in Excel Format (25-50)

Dataset in Excel Format (51-74)

Dataset in Excel Format (74-100)

Converting xls format into csv format

The CSV file can be open in a Text Editor, which is shown in below Figure

CSV file opened in Wordpad

Loading the Data in Weka Tool:-

Weka front panel

Weka has the Compatibility to read “.csv” format files. Click “open” and navigate to the directory containing the
data file (.csv or .arff). We opened SuperMarket.csv data file.

Open file Dialogue box

Clicking on any attribute in the left panel will show the basic statistics on that attribute. For categorical attributes, the
frequency for each attribute value is shown, while for continuous attributes we can obtain min, max, mean, standard
deviation, etc.

Panel showing attributes and its statistics

Visualizing All Attributes

Selecting and filtering Attributes

We need to remove Sl.No, Product_id attributes before the data-mining step. First tick the check box
corresponding to the attributes Sl.No, Product_id. On the down left side click on click on the Remove Command Box.

We have made some changes in the original data in the SuperMarket.csv. To save the new working data as an
ARFF file, click on save button in top panel.

File save Dialogue box

Open the saved SuperMarket.arff file in WordPad. Below figure shows the top portion ofthe new generated

ARFF file (in WordPad). Note that in the new dataset, the "Sl.No" & “Product_id” attributes and allthe corresponding

values in the records have been removed. Note the file and observe the first lines!

@relation SuperMarket-final-weka.filters.unsupervised.attribute.Remove-R1-2. This statement simplydescribes the

operation that has been done till now on the dataset. Attributes can be both numeric and nominal type.

ARFF file in WordPad

Some techniques, such as association rule mining, can only be performed on categorical data. This
requires performing discretization on numeric or continuous attributes.

Select weka.filters.unsupervised.attributes.Descretize. The text box in the filter dialog box will have
something like Descretize–B 10 –M -1.0 –R first-last –precision 6.

Binning a Continuous Variable

Classification is a data mining function that assigns items in a collection to target categories or classes. The
goal of classification is to accurately predict the target class for each case in the data. For example, a
classification model could be used to identify loan applicants as low, medium, or high credit risks.

Select Classifier class: weka.classifiers.trees.J48 then click on start.

Result of J48 decision tree construction

Decision tree using J48

Weka also let’s us view a graphical rendition of the classification tree. This can be done by right clicking the
last result set and selecting “Visualize tree” from the popup menu.

Decision Tree

Clustering is a process of partitioning a set of data(or objects) into a set of meaningful sub-classes, called
clusters. Help users understand the natural grouping or structure in a data set. Clustering: unsupervised
classification: no predefined classes.

To perform clustering, select cluster tab from in the explorer and click on box below “Clusters” label.
This results in a drop down list of available clustering algorithms. In this case we select “SimpleKMeans”
resulting in a popup window.

Choosing cluster algorithm

Simple k-mean dialogue box

Result of K-Mean algorithm

Exploring the cluster result in various options

Associative Rule Mining :
Clicking on the Associate tab will bring up the interface for the Apriori algorithm. If you click the choose
tab we can see that Apriori algorithm is one of the three association algorithms provided. The Apriori dialog
box appears after clicking in the Associator text box, which shows the default command line. The dialog box is
depicted in figure. We can specify various parameters associated with Apriori. Click on the more button to see
the synopsis for the different parameters.

Once the parameters have been set, the Associator text box shows the new command line. We now
click on start to run the program.

Selecting Apriori in Associations

Result of Association Rule Mining

Result of Association Rule Mining

Select Attributes :
Click on Select Attributes tab then click on start

Result of select Attributes

Visualizing All Attributes :