Sunteți pe pagina 1din 6

PART 1 Concepts:

Data mining: The process of using querying and sorting techniques on


huge chunks of data looking for a data pattern. Businesses often use data
mining to understand supply and demand patterns.
Data warehousing: It is essentially designing the storage of data between
databases and ways to link these databases in order to aid data mining.
Business Intelligence: The process of converting huge chunks of data in
hand into actionable knowledge.
Relationship: Efficient data warehousing will aid data mining. Both data
mining and data warehousing are business intelligence tools.

PART 2 Visualization using Tableau:


1.

Retail price vs. Sales volume

The price of laptop ranges between $300 and $650. Most common
price range falls between $450 and $500, indicated by two spikes in
the center.
2.

Retail price vs. Hard disk size


We could see a direct correlation between hard disk size and retail
price i.e., retail price increases as hard disk size increases.
Although we could establish a direct correlation between hard disk
size and retail price, the relationship is not that profound in the case
of 80GB and 120GB hard disks. The median price for both the hard
disk sizes is $480. Hence if we have to reduce hard disk categories
from four to three, we can combine 120GB and 80 GB hard disks in
one category. Since the pricing points for both the hard disk
categories are similar (median - $480) combining these 2 categories
will not affect sales figure. It also helps in maintaining the profit
margin.

3.

Retail price vs. Customer-Store distance


The above graph shows a scatter plot between retail price and customer
store distance. From the plot we could clearly see no relationship exists
between two factors taken into consideration. Hence customer store
distance cannot be used as a predictor for retail price.

Scatter plot showing outliers


The above scatter plot shows outliers for customer store distance.

4.

Sales volume vs. Store postcode


According to above bar chart, the store that sells the most is SW1P
3AU and the store that sells the least is S1P 3AU.

Box plots for sales record of each store


From the box plot, we can find that there are no outliers considering
the sales volume of each store since none of the values fall beyond
Q1-IQR and Q3+IQR ( Q1 lower quartile, Q2 upper quartile and
IQR inter-quartile distance).
5.

Sales volume vs. Avg. customer-store distance (with outliers)


From the above scatter plot, we could say that an inverse correlation
exists between customer-store distance and sales volume.

Sales volume vs. Avg. customer-store distance (w/o outliers)


Outliers are SW18 1NN, E7 8NW, CR7 8LE and KT2 5AU. By
removing the outliers, we can reduce the p-value of the trend line to
less than 0.0001.

S-ar putea să vă placă și