Sunteți pe pagina 1din 6

K. J.

Somaiya College of Engineering, Mumbai-77


(Autonomous College Affiliated to University of Mumbai)

Department of Electronics and Telecommunication Engineering

Class: F.Y., M.Tech. Semester:II (2018-19)

Course: Introduction to Machine Learning

Course code: 1PETE208

Internal Assessment 2 Report-Research Paper reading and Implementation

1. Name and Roll No. of student: Shreyash S Nandgaonkar- 1803009.

2. Title of selected paper: Prediction of Users purchase based intention.

3. Conference\Journal details: International Conference on Soft Computing & Machine


Intelligence (ISCML-IEEE).

4. Output of pass 1,2,3 : PASS 1

Page 1 of 6
What research area / sub-topic Paper describes prediction of users purchase based intention
does the paper fall under? where in researcher has calculated and compared modelling
time of two models i.e Naïve Bayes and C4.5.

What problem does the paper A naive Bayesian algorithm has the advantages of simple
attempt to solve? implementation and high classification efficiency. However,
this method is too dependent on the distribution of samples in
the sample space, and has the potential of instability. To this
end, the decision tree method is introduced to deal with the
problem of interest classification

What is related work and why Researcher has calculated modelling time of Naïve Bayes and
is it not sufficient, what are C4.5 , here the train data is insufficient and complex. There are
gaps various algorithm which can be tested to check the accuracy of
models. Rather than comparing these two models only which
has limited the scope of solution, researcher could have tried
other methods or models also.

What key contribution does Implementation of decision tree for prediction is a key
the paper claim? contribution which has enhanced the scope of paper for further
research.

Broadly, how does the paper Modelling time is a very essential parameter in designing any
solve the problem? model, time for two models were estimated and results were
compared for better prediction.

How do the authors defend the Author is defending solution on modelling time.
solution?

What category of paper is this? This paper falls in classification category in machine learning.

PASS 2

What is the precise research question Paper describes prediction of users purchase
addressed? based intention where in researcher has calculated
and compared modelling time of two models i.e
Naïve Bayes and C4.5.

Why is it believed that solution works, The naive Bias algorithm is accepted by most
better than previous ? researchers for its simple implementation and
efficient classification. In this paper, the principle
of the naive Bias algorithm is analyzed, and it is

Page 2 of 6
proved that the Bias theorem algorithm has the
potential to deal with the classification problem
and introduces the C4.5 decision tree method to
deal with the classification problem.

What are assumptions, scope? The principle of the naive Bias algorithm is
analyzed, and it is proved that the Bias theorem
algorithm has the potential to deal with the
classification problem and introduces the C4.5
decision tree method to deal with the
classification problem.

What are details of proposed solution – The experimental data were divided into three
argument, proof, implementation, groups, each group had 5000 experimental data.
experiment? Using K fold cross validation method, in proper
order to select one of the data set as a training set,
the remaining two data sets were tested as a test
set. NB algorithm average modeling time was
0.41 seconds and average modeling time of C4.5
algorithm to 0.36 seconds, we can see that C4.5
algorithm in time to than NB algorithms

What is the take-away message from the When processing the samples, the C4.5 decision
paper? tree model only needs to compare the attribute
values, and the processing is relatively simple. It
has obvious performance advantages in dealing
with large-scale classification problems. .(2) C4.5
decision tree method does not depend on the prior
probability of the sample, and can effectively
avoid the negative impact of the sample
distribution. The C4.5 algorithm of Decision trees
is a classical classification algorithm, but any
algorithm are inevitably exist some defects. Due
to the limitation of network information and the
complexity of the training data. the next step of
research work will be to expand the testing set
and further improve the recognition accuracy.

PASS 3
1. Is the research problem significant ?
Ans :- Yes, research problem is significant only if we have more train data to build accurate
model.

Page 3 of 6
2. Is the problem novel?
Ans :- Yes.
3. Is the solution approach novel ?
Ans :- Yes.
4. Are the contributions significant ?
Ans :- Yes.
5. Is relevant related work surveyed “sufficiently” enough?
Ans :- Yes, work is surveyed sufficiently and maximum has been discovered from data.
6. Have alternate approaches of solution been explored?
Ans :- No.
7. Are assumptions valid? Has paper violated assumptions?
Ans :- Assumptions are valid and there is no violation of assumptions.
8. Are the claims valid?
Ans :-Yes.
9. Are the different parts of the paper consistent?
Ans :- Yes.
10. Are the figures, graphs, diagrams precise?
Ans :- Yes.
11. Does the paper flow logically?
Ans :- Yes.
12. What is the paper trying to convince you of? Does it succeed?
Ans :- Paper gives us the modelling time for algorithms and compares the same. It succeeds
to an limit.

5. Dataset description(if applicable): Data was collected from Kaggle which contained
data of 6000 customers and their purchase history of two products.
6. ML Algorithm used in the work:
We have used –
 KNN.
 Logistic Regression.
 Decision tree-CART.
 Naïve Bayes.
 SVM-Linear.
Page 4 of 6
 SVM-RBF.

7. Comparison of result mentioned in paper and obtained by you: In research


paper, researcher has computed modelling time for 2 algorithms i.e Naïve bayes and decision
tree C4.5, here we have computed accuracy of all the algorithms which we have used and
compared them to find which model is accurate.
IN PAPER :-

Our Work :-
Classifiers Model Fitness on Training Model Fitness on Testing set
set
Accuracy Accuracy
Logistic Regression 83.21% 84.68%
Decision Tree- CART Without Gini Index :- Without Gini Index:- 85.28%
81.38% With Gini Index :- 85.72%
With Gini Index :- 82.04%
Naïve Bayes 84.32% 85.02%
KNN 82.18% 85.08%
SVM-Linear kernel 83.83%, C=0.01, 84.04%, C=0.01,
Sigma=0.123 Sigma=0.123
SVM-RBF kernel 84.04%, C=1, Sigma=0.9 84.81%, C=1, Sigma=0.9

8. Software\language used in implementation: R

9. Conclusion: The feature vectors so selected are run on the rest of the algorithms and the
parameters are tuned to optimize the maximum score. 10 fold cross validation is used for this
purpose. The final model obtained from each algorithm is applied on the test set. For SVM
there are two parameters which can be tuned - the C value which optimizes the distance of
the hyperplane from the data points and the kernel shape. From the results, we can say that
CART with gini index gives us the best accuracy among all. Naïve Bayes and KNN gives us
near about same accuracy.
Page 5 of 6
10. Future scope: If products are increased i.e if we have more than 2 products, we can
perform multiclass classification with various parameters. Due to limitation of information
and complexity of training set, the next step would be to expand testing set by increasing
various features and products and further improve the accuracy.

Page 6 of 6

S-ar putea să vă placă și