Documente Academic
Documente Profesional
Documente Cultură
Submitted by
16MIS0166 P.Medhavini
16MIS0401 V.Krithika
ABSTRACT-------------------------------------------------------------------------------3
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION-----------------------------------------------------------------4
1.2 OBJECTIVE OF THE WORK---------------------------------------------------4
1.3 SCOPE OF THE WORK----------------------------------------------------------4
CHAPTER 2
LITERATURE REVIEW
2.1 INTRODUCTION
2.2 BACKGROUND
2.3 CHALLENGES
2.4 PROBLEM DEFINITION AND APPROACH
CHAPTER 3
EXPERIMENTAL DETAILS
CHAPTER 4
CHAPTER 5
INTRODUCTION
1.1 INTRODUCTION
A Big mart is a shopping mall which sells variety of all household, eatables,
electronic devices,Garments,Groceries at a large scale. But the sales of a product
may vary season to season. For instance, Large scale of Air conditioners will be
bought by the customers during summer and less in winter. When the sales of
products vary, the employees of big mart may not know what the sales forecast is
and how much production is needed in the stock. In this case, sales forecasting
plays an important role to predict the sales of each and every product by the help of
cumulative sales report. To predict future sales, an algorithm is required to predict
the sales and in order to get accurate results. Decision trees are basically predictive
machine learning models. Decision trees models helps to predict a class for the
case after training pruning and testing is over. It is mainly of two types:
2) REGRESSION TREES.
In case data is continuous type with associated classes also numerical type. For
example if target is to predict sales forecast of big mart or price of a house or
setting of an apparatus mostly Regression type DECISION TREES are preferred.
1.2 OBJECTIVE
The aim is to build a predictive model and find the sales of each product at a
particular store. Using this model Bigmart will try to understand the properties and
stores which plays a key role in increasing sales, where to improve the marketing
or to stop the selling of the product.
future sales.
LITERATURE REVIEW
2.1 INTRODUCTION
Big mart is a wholesale shopping mall where people can purchase all the needed
items. Predicting the sales is more important in increasing the profits of the mart
and controlling the production in stock. Many machine learning algorithms are
used . These algorithms are trained using the cumulative sales report and tested for
future sales. However all algorithms may not produce same accuracy over the
prediction. Neural networks was the most used for prediction when reviewing the
literature papers. The derivative analysis shows that the neural network model is
able to capture the dynamic nonlinear trend and seasonal patterns, as well as the
interactions between them. However, we use the decision tree regression model
which predicts the forecasts of products in sales with low error rate and higher
accuracy.
2.2 BACKGROUND
We are using pandas for handing data and numpy for handling numerical
operations in arrays.
Pandas
Python has long been great for data munging and preparation, but less so for data
analysis and modeling. pandas helps fill this gap, enabling you to carry out your
entire data analysis workflow in Python without having to switch to a more domain
specific language like R.
Combined with the excellent IPython toolkit and other libraries, the environment
for doing data analysis in Python excels in performance, productivity, and the
ability to collaborate. pandas does not implement significant modeling
functionality outside of linear and panel regression; for this, look to statsmodels
and scikitlearn. More work is still needed to make Python a first class statistical
modeling environment, but we are well on our way toward that goal.
NumPy
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data types can be defined. This
allows NumPy to seamlessly and speedily integrate with a wide variety of
databases.
2.3 CHALLENGES
Datasets acquired need to be divided, tuples were divided under train data
and test data.
Training the model with the current dataset which is the cumulative sales
report was greater challenge among all.
Applying pre-processing techniques to remove the missing values, outliers
and noisy data to train the model with a clean data.
Accuracy measures to check whether the algorithm works best among all to
predicting the sales approximately.
Determine the use of required packages which suits the model to work the
best.
To Predict Future Sales For Each Product Of Big Mart Using The Cumulative
Sales Reports. Also, Certain Attributes Of Each Product And Store Have Been
Defined.
The Aim Is To Build A Predictive Model And Find Out The Sales Of Each Product
At A Particular Store. Using This Model, Big Mart Will Try To Understand The
Properties Of Products And Stores Which Play A Key Role In Increasing Sales.
Approach
3. Data Cleaning – imputing missing values in the data and checking for outliers.
4. Feature Engineering – modifying existing variables and creating new ones for
analysis.
Various Research journals and papers were studied which relates the content on
sales forecast prediction using machine learning algorithms. Below are the list of
few papers which were studied and a review of the paper was added along.
This article [1] proposes a new hybrid sales forecasting system based on genetic
fuzzy clustering and Back Propagation (BP) Neural Networks with adaptive
learning rate (GFCBPN).The proposed architecture consists of three stages: (1)
utilizing Winter’s Exponential Smoothing method and Fuzzy C-Means clustering,
all normalized data records will be categorized into k clusters; (2) using an adapted
Genetic Fuzzy System (MCGFS), the fuzzy rules of membership levels to each
cluster will be extracted; (3) each cluster will be fed into parallel BP networks with
a learning rate adapted as the level of cluster membership of training data records.
Compared to previous researches which use Hard clustering, this research uses the
fuzzy clustering which capable to increase the number of elements of each cluster
and consequently improve the accuracy of the proposed forecasting system.
Experimental results show that the proposed model outperforms the previous and
traditional approaches. Therefore, it is a very promising method for financial
forecasting.
Operations management [2] dysfunctions and lost production time are problems of
enormous magnitude that impact the performance and quality of industrial systems
as well as their cost of production. Association rule mining is a data mining
technique used to find out useful and invaluable information from huge databases.
This work develops a better conceptual base for improving the application of
association rule mining methods to extract knowledge on operations and
information management. The emphasis of the paper is on the improvement of the
operations processes. The application example details an industrial experiment in
which association rule mining is used to analyze the manufacturing process of a
fully integrated provider of drilling products. The study reports some new
interesting results with data mining and knowledge discovery techniques applied to
a drill production process. Experiment’s results on real-life data sets show that the
proposed approach is useful in finding effective knowledge associated to
dysfunctions causes.
To analyze this [3] spatial phenomenon, they proposed using a spatial divergence
approach based on the Ali-Silvey class of divergence measures to determine the
“distance” between two distribution functions. They apply this approach to both
simulated and real-life data. Using two divergence measures, we find that the
spatial divergence approach is capable of predicting success in the beginning of the
process, which makes it appealing for use in marketing activity in general, and
particularly for launches of new products. When applied to 17 actual product
introductions, the method succeeded in correctly predicting the success or failure
of the products in 15 cases.
Due to the strong competition that exists today, most manufacturing organizations
are in a continuous effort for increasing their profits and reducing their costs.
Accurate sales forecasting is certainly an inexpensive way to meet the
aforementioned goals, since this leads to improved customer service, reduced lost
sales and product returns and more efficient production planning. Especially for the
food industry, successful sales forecasting systems can be very beneficial, due to
the short shelf-life of many food products and the importance of the product
quality which is closely related to human health. In this paper [4] we present a
complete framework that can be used for developing nonlinear time series sales
forecasting models. The method is a combination of two artificial intelligence
technologies, namely the radial basis function (RBF) neural network architecture
and a specially designed genetic algorithm (GA).
Different prediction methods give different performance predictions when used for
daily fresh food sales forecasting. Logistic Regression (LR) is a good choice for
binary data, the Moving Average (MA) method is good for simple prediction,
while the Back-Propagation Neural Network (BPNN) method is good for long term
data. In this study [5] we develop and compare the performance of three sales
forecasting models, based on the above three prediction methods, for the
forecasting of fresh food sales in a point of sales (POS) database for convenience
stores. Fresh food is characterized by two factors: its short shelf-life and its
importance as a revenue producer for convenience stores. An efficient forecasting
model would be helpful to increase sales volume and reduce waste at such stores.
The correctness of the prediction rate is a good way to compare the efficacy of
different models which is the method used here.
Neural networks trained with the backpropagation algorithm are applied to predict
the future values of time series that consist of the weekly demand on items in a
supermarket. The influencing indicators of prices advertising campaigns and
holidays are taken into consideration .The performance of the networks [6] is
evaluated by comparing them to two prediction techniques used in the supermarket
now The comparison shows that neural nets outperform the conventional
techniques with regard to the prediction quality.
In this paper [7] is to forecast sales volumes as accurately as possible and as far
into the future as possible. The choice of network topology was Silva's adaptive
back-propagation algorithm and the network architectures were selected by
Genetic Algorithms (GAS). The networks were trained to forecast from 1 month to
6 months in advance and the performance of the network was tested after training.
The test results of artificial neural networks (ANNs) are compared with the time
series smoothing methods of forecasting using several measures of accuracy. The
outcome of the comparison proved that the ANNs generally perform better than the
time series smoothing methods of forecasting.
To create effective promotions and offers to meet its sales and marketing goals,
otherwise they will forgo the major opportunities that the current market offers.
Big Data application enables these retail organizations to use prior year’s data to
better forecast and predict the coming year’s sales. It also enables retailers with
valuable and analytical insights, especially determining customers with desired
products at desired time in a particular store at different geographical locations. In
this paper [9], we analysed the data sets of world’s largest retailers, Walmart Store
to determine the business drivers and predict which departments are affected by the
different scenarios (such as temperature, fuel price and holidays) and their impact
on sales at stores’ of different locations.
In this paper [11] it gain insights from the encoder-decoder recurrent neural
network (RNN) structure, and propose a novel framework named TADA to carry
out trend alignment with dualattention, multi-task RNNs for sales prediction. In
TADA, we innovatively divide the influential factors into internal feature and
external feature, which are jointly modelled by a multi-task RNN encoder. In the
decoding stage, TADA utilizes two attention mechanisms to compensate for the
unknown states of influential factors in the future and adaptively align the
upcoming trend with relevant historical trends to ensure precise sales prediction.
Experimental results on two real-world datasets comprehensively show the
superiority of TADA in sales prediction tasks against other state-of-the-art
competitors.
In this paper [13], there present confidence issues of rules, the association rules
mining. Accordingly, we present an approach for hiding a set of ARs, which is
detected as informative by database administrators. One rule has been called as
informative if its leakage risk is above a certain analyzer threshold. In some cases,
informative rules must not be disclosed to the unauthorized corporations, since
they are referring informative data which their disclosures may be utilized by
company competitor’s analyzers. We also evaluate the hiding process with a
similar one in order to analyze their performance.
Direct marketing is a modern business activity with an aim to maximize the profit
generated from marketing to a selected group of customers. A key to direct
marketing is to select a subset of customers so as to maximize the profit return
while minimizing the cost. Achieving this goal is difficult due to the extremely
imbalanced data and the inverse correlation between the probability that a
customer responds and the dollar amount generated by a response. They [16]
presented a solution to this problem based on a creative use of association rules.
Association rule mining searches for all rules above an interestingness threshold,
as opposed to some rules in a heuristic-based search. Promising association rules
are then selected based on the observed value of the customers they summarize.
Selected association rules are used to build a model for predicting the value of a
future customer. On the challenging KDD-CUP-98 dataset, this approach generates
41% more profit than the KDD-CUP winner and 35% more profit than the best
result published thereafter, with 57.7% recall on responders and 78.0% recall on
non-responders. The average profit per mail is 3.3 times that of the KDD-CUP
winner.
In this paper [17] they have given a large database of customer transactions. Each
transaction consists of items purchased by a customer in a visit. We present ancient
algorithm that generates all significant association rules between items in the
database. The algorithm incorporates buyer management and novel estimation and
pruning techniques. We also present results of applying this algorithm to sales data
obtained from a large retailing company, which shows the effectiveness of the
algorithm.
EXPERIMENTAL DETAILS
Decision trees are basically predictive machine learning models. Decision trees
models helps to predict a class for the case after training pruning and testing is
over. It is mainly of two types:
In case data is continuous type with associated classes also numerical type. For
example if target is to predict sales forecast of big mart or price of a house or
setting of an apparatus mostly Regression type DECISION TREES are preferred.
• The main difference between a regression tree and a classification tree is the
how you measure the "badness" of a node. There are various ways to do it
for both regression and classification trees. For regression trees, sum of
squared error or median absolute deviation or some other function is used.
• In a regression tree the idea is this: since the target variable does not have
classes, we fit a regression model to the target variable using each of the
independent variables. Then for each independent variable, the data is split
at several split points. At each split point, the "error" between the predicted
value and the actual values is squared to get a "Sum of Squared Errors
(SSE)". The split point errors across the variables are compared and the
variable/point yielding the lowest SSE is chosen as the root
node/split point. This process is recursively continued.
3.2 DESIGN FRAMEWORK
3.3 DATASET,DATASOURCE,CHARACTERIZATION,
PREPROCESSING
DATA SET:
No of Columns (12)
No of Rows (8524)
Item_identifier
Item_weight
Item_fat_content
Item_visibility
Item_type
Item_MRP
Outlet_identifier
Outlet_establishment_year
Outlet_size
Outlet_location_type
Outlet_type
Item_outlet_sales
Sample Datasets:
Item_ Item_ Item_ Item Item Outlet_ Outlet Outle Outl Outlet Item_Outlet_S
Weig Fat_ Visibi _ _ Identifi _Establi t_ et _Locati ales
ht Conte lity Type MRP er sh Size _Typ on
nt ment e _Type
_Year
9.3 1 0.016 1 249.8 49 1999 2 1 1 3735.138
05 1
5.92 2 0.019 2 48.26 18 2009 2 2 3 443.4228
28 9
17.5 1 0.016 3 141.6 49 1999 2 1 1 2097.27
76 2
19.2 2 0 4 182.1 10 1998 4 3 732.38
8.93 1 0 5 53.86 13 1987 1 1 3 994.7052
1
10.39 2 0 6 51.40 18 2009 2 2 3 556.6088
5 1
13.65 2 0.012 7 57.65 13 1987 1 1 3 343.5528
74 9
1 0.127 7 107.7 27 1985 2 3 3 4022.764
47 6
16.2 2 0.016 8 96.97 45 2002 1 2 1076.599
69 3
19.2 2 0.094 8 187.8 17 2007 1 2 4710.535
45 2
11.8 1 0 4 45.54 49 1999 2 1 1 1516.027
18.5 2 0.045 1 144.1 46 1997 3 1 1 2187.153
46 1
15.1 2 0.100 4 145.4 49 1999 2 1 1 1589.265
01 8
17.6 2 0.047 7 119.6 46 1997 3 1 1 2145.208
26 8
16.35 1 0.068 4 196.4 13 1987 1 1 3 1977.426
02 4
9 2 0.069 9 56.36 46 1997 3 1 1 1547.319
09 1
11.8 1 0.008 10 115.3 18 2009 2 2 3 1621.889
6 5
9 2 0.069 9 54.36 49 1999 2 1 1 718.3982
2 1
1 0.034 11 113.2 27 1985 2 3 3 2303.668
24 8
DATA SOURCE:
Reference link:
https://www.kaggle.com/devashih0507/big-mart-sales-prediction
CHARACTERIZATION:
Variable Description
Item_identifier Unique produce id
Item_weight Weight of product
Item_fat_content Whether the product is low fat or not
The % of total display area of all
Item_visibility products in a
store allocated to the particular product
The category to which the product
Item_type belongs
Item_MRP Maximum retail price of the product
Outlet_identifier Unique store id
Outlet_establishment_year The year in which store was established
The size of the store in terms of ground
Outlet_size area
covered
The type of city in which the store is
Outlet_location_type located.
PREPROCESSING
Pre- processing is performed on the cumulative sales data sets in order to remove
missing values, outliers and noisy data’s.
In the current dataset, many missing values were found in many tuples under the
attributes.
We have applied mean formulae to calculate the missing values. The mean of the
whole column was calculated to provide data to the missing cell. This process was
applied iteratively until no missing cells were left under attributes.
Scaling is performed to the values for fitting and transforming the dataset. In this
case, the data’s under an attribute are taken. Mean and standard deviation is
performed. The data’s are scaled and transformed until we get mean value as 0 and
standard deviation value as 1.
Techniques used for training the model and testing is decision tree regression. By
importing libraries such as scikit learn, pandas, numpy, we opt for regression on
the preprocessed train data.
1. The Required libraries such as pandas and numpy was imported.
2. The path to read the train dataset was set.
3. The train dataset has missing values, outliers and few noisy data’s.
4. Preprocessing needs to be performed to replace remove such data’s.
5. The main strategy used to fill the missing values was mean.
6. The mean of the whole column was taken to fill up the missing cell in the
dataset.
7. The clean data was scaled and transformed using scaler.tranform to fit.
8. The same strategy was performed with the test dataset.
9. Since the datasets were continuous values, decision tree regressor model was
the most suitable one.
10.Using the function tree.DecisionTreeRegressor(), the train and test were fit
into the model.
11.While test data was applied on the model, the proper forecasting was
measured using the performance metrics.
12.Regression algorithm has separate performance metrics unlike accuracy.
13.It has the mean absolute error where if this metric is 0, the accuracy of
prediction is cent percentage.
14.Few more metrics used to measure the progress was mean squared error, R2
score and the median absolute error.
CHAPTER 4
PERFORMANCE METRICS:
Mean Absolute Error :
The mean absolute error (MAE) is the simplest regression error metric. Effectively,
MAE describes the typical magnitude of the residuals.
Because we use the absolute value of the residual, the MAE does not
indicate underperformance or overperformance of the model (whether or not the
model under or overshoots actual data). Each residual contributes proportionally to
the total amount of error, meaning that larger errors will contribute linearly to the
overall error.
A small MAE suggests the model is great at prediction, while a large MAE
suggests that your model may have trouble in certain areas. A MAE of 0 means
that your model is a perfect predictor of the outputs.
RESULTS
METRICS
VALUES
MEAN ABSOLUTE ERROR 0.1057
MEAN SQUARE ERROR 0.1125
MEDIAN ABSOLUTE ERROR 0
R2 SCORE 0.8205
PLOT
CHAPTER 6
SUMMARY AND CONCLUSIONS
SUMMARY
A Big mart is a shopping mall which sells variety of all household, eatables,
electronic devices, Garments, Groceries at a large scale. But the sales of a product
may vary season to season. In this case, sales forecasting plays an important role to
predict the sales of each and every product by the help of cumulative sales report.
The algorithm which was used in this thesis is Decision Tree regression.
Regression is used to predict a range of numerical values, given a particular
dataset. The aim was to build a predictive model and find the sales of each product
at a particular store. Using this model, big marts will try to understand the
properties of the products and stores which play a key role in increasing sales,
where to improve the marketing or to stop the selling of the product.
CONCLUSION
We have analyzed datasets of big mart sales prediction and performed literature
survey related to sales prediction using various techniques such as fuzzy logic,
deep learning, neural networks,etc,.We used Jupyter tool through Anaconda
Navigator for processing the techniques. Decision tree based Regression proved
the best model to predict the future sales with the accuracy rate of 90%. Training
the model was easier than any other models. It proved to be the best model in
forecasting sales of Big Mart. This indirectly helps to gain more profit and have a
scheduled products in stock.
[2] Kamsu-Foguem, B., Rigal, F., & Mauget, F. (2013). Mining association rules for the quality
improvement of the production process. Expert systems with applications, 40(4), 1034-1045.
[3] Garber, T., Goldenberg, J., Libai, B., & Muller, E. (2014). From density to destiny: Using
spatial dimension of sales data for early prediction of new product success. Marketing
Science, 23(3), 419-428
[4] Doganis, P., Alexandridis, A., Patrinos, P., &Sarimveis, H. (2016). Time series sales
forecasting for short shelf-life food products based on artificial neural networks and evolutionary
computing. Journal of Food Engineering, 75(2), 196-204.
[5] Lee, W. I., Chen, C. W., Chen, K. H., Chen, T. H., & Liu, C. C. (2013). Comparative study
on the forecast of fresh food sales using logistic regression, moving average and BPNN
methods. Journal of Marine Science and Technology, 20(2), 142-152.
[6] Thiesing, F. M., &Vornberger, O. (2013, June). Sales forecasting using neural networks.
In International conference on neural networks (Vol. 4, pp. 2125-2128).
[7] Yip, D. H., Hines, E. L., & Yu, W. W. (2013). Application of artificial neural networks in
sales forecasting.
[11] Chen, T., Yin, H., Chen, H., Wu, L., Wang, H., Zhou, X., & Li, X. (2018, November).
TADA: trend alignment with dual-attention multi-task recurrent neural networks for sales
prediction. In 2018 IEEE International Conference on Data Mining (ICDM) (pp. 49-58). IEEE.
[12] Bakshi, N. A., Kolan, P. R., Behera, B., Kaushik, N., & Ismail, A. M. (2018). Predicting
Pregnant Shoppers Based on Purchase History Using Deep Convolutional Neural
Networks. Journal of Advances in Information Technology Vol, 9(4).
[14] Baecke, P., & Van den Poel, D. (2013). Data augmentation by predicting spending pleasure
using commercially available external data. Journal of Intelligent Information Systems, 36(3),
367-383.
[15] Mahbub, N., Paul, S. K., &Azeem, A. (2013). A neural approach to product demand
forecasting. International Journal of Industrial and systems engineering, 15(1), 1-18.
[16] Wong, K. W., Zhou, S., Yang, Q., & Yeung, J. M. S. (2014). Mining customer value: From
association rules to direct marketing. Data Mining and Knowledge Discovery, 11(1), 57-79.
[17] Agrawal, R., Imieliński, T., &Swami, A. (2015, June). Mining association rules between
sets of items in large databases. In Acmsigmod record (Vol. 22, No. 2, pp. 207-216). ACM.