Documente Academic
Documente Profesional
Documente Cultură
Volume 4 Issue 5, August 2020 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
@ IJTSRD | Unique Paper ID – IJTSRD31774 | Volume – 4 | Issue – 5 | July-August 2020 Page 230
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
find the best solution for any problem. They took their crossover, mutation and tournament selection, fitness
example over boundary value testing in the terms of function, population initialization, and a random forest
application domain. Based on the transactions with the classifier and getting a good result.
optimization technique the genetic programming got the
solution that satisfies the minimum criteria. The fitness Population initialization: It is divided into two types
function helps the fitness function for successive iterations mainly 1) Heuristic 2) Random. Heuristic will initialize the
and they adapted anti-fraudulent strategies are predicted. population with complete random solutions. Initialize the
This algorithm does not fir the situation that they decided population using heuristic for the problem.
multi-population to an optimized parameter [1].Ruchi
Oberoi, they proposed the “Credit card fraud detection using Fitness function: The fitness function will calculate the
genetic algorithm” in this paper the genetic algorithm a result repeatedly there for sufficient works.
technique for the optimal solution and also generates the
fraudulent transactions in this first they removed the Crossover: It is applied with high probability and here more
fraudulent transactions are done with time and online than one parents is selected and one more offspring are
payment after the iterations are done finally they got the produced.
best solution is obtained. Here the set of interval-valued
parameters is optimized in the short of the period of time we Mutation: It Produces small random changed by choosing a
can reduce the risks [2]. Ibtissum, Samira Douzi, Bouabid El single bit randomly and it maintains a small random tweak.
Quahidi proposed “Using genetic algorithm to improve the
classification of imbalanced datasets for credit card” in this Tournament selection: It is extremely popular it can work
they combined k-Means Clustering and genetic algorithm. In on negative fitness values also.
this k-means clustering done the sampling and under-
sampling and genetic programming is used for fraud Random forest classifier: It works on no of decision tress
transaction there are dealing if highly imbalanced data by for various subsets to improve the accuracy result.
generating new minority classes. It is done the sample for
each cluster into chromosomes it inherits the individual’s 5. PROJECT OUTLINE
characteristics of generation. It aims to get more accuracy in This project is structured as the follows:
fraudulent transactions. So, the genetic algorithm is used for we introduced the project idea along with background
removing transactions and by the parameters, it will about the topic, the problem statement, literature
decrease transactions [3]. review, objectives and the followed methodology.
(System design), we explain the plan of the project using
Sahil Dhankhand, Behrouz Far proposed “Supervised the system model. In our system model, we have the
machine learning algorithm for credit card fraud detection: A following three main parts:
comparative study”. In this paper, they did different The part which we can initialize the population and
classification algorithms with card numbers, account getting fitness values.
numbers, CVV in these they are dealing with highly The part which the feature selection can be done with
imbalanced data. Their data can balance by under-sampling operators’ crossover, mutation.
technique 70% used for training data 30% for testing data. The part which divide the data into training and testing
The model evaluation is done by using accuracy, precision, for accurate result.
specificity, G-mean by the confusion matrix. Random forest (Analysis and Implementation), we give details about
and Logistic regression have good performance in the the algorithms which based on them we conducted our
comparative study [4]. work. Then using Python language, we determine:
We Initialize the population and performs fitness
M. Suresh Kumar, V. Soundarya, Kavitha, E.S. Keerthika, E. function based on operators.
Ashwini proposed “Credit card fraud detection using random The split them by using random forest classifier with the
forest” in this mainly they detected the fraud transactions. operators.
For that they collected from a bank they took amount, The implementation details.
transaction, and time for fraud detection and they divided (Results and Discussion), we present and analyze the
the data into the training set and testing set. The accuracy obtained results.
results around 90% approximately they got the result [5]. (Conclusions and Recommendations), we draw
Sonali Bakshi “Credit card fraud detection a classification conclusions and recommendations regarding the future
analysis” in this paper they discussed they discussed the scope of this project.
arrangement of charge card and challenges looked for visa
cardholders and they detected visa card fraudster [6]. 6. BACKGROUND
Fraud is the main problem for the credit card companies. It is
Vyom shah, Parin shah, Harish Shetty proposed “A review of growing bigger in our daily life and detecting the scam in
credit card fraud detection techniques” in this paper the time is a big challenge. Primarily in this paper, we are
genetic algorithm gave a good accuracy result when compare detecting the fraudulent transactions by using the combined
with HMM and ANN. In this, no absolute values are used for approach of genetic algorithm and random forest with a
standardized techniques [7]. Combined approach random highly imbalanced dataset.
forest and genetic algorithm gives better result.
Imbalanced data: It is mainly referring to a classification
4. METHODOLOGY problem where the observations per class are not equally
In this paper, we used python language for comparing the distributed. We have to declare a large number of
algorithms with the help of a genetic algorithm operator’s observations in one class. The main advantage with this is no
@ IJTSRD | Unique Paper ID – IJTSRD31774 | Volume – 4 | Issue – 5 | July-August 2020 Page 231
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
loss of information and the main disadvantage is over fitting operators and later random forest classifier will split the
(when a function is closely fit to set of data points). training and testing and perform the operators crossover
and mutation with the evaluation of fitness function later the
The main reason for this project to detect the fraud which is graphs are drawn based on the fitness values and which
related to credit card and by using the genetic algorithm operator got the highest fitness values will help to give the
with random forest the genetic algorithm will perform good accuracy result.
feature selecting by initializing the population values and
with the help of operators cross over and mutation it inhibits 8. BLOCK DIAGRAM
the fitness values and with the values, the random forest
classifier will split into training and testing and helps for
good accuracy result.
7. SYSTEM DESIGN
The system model of our project which forms the plan for
the rest of project and then break down this model into its
main parts.
@ IJTSRD | Unique Paper ID – IJTSRD31774 | Volume – 4 | Issue – 5 | July-August 2020 Page 232
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Results of analyzing the graphs: The mutation got the 12. CONCLUSION AND FUTURE SCOPE
highest fitness value and here the graph we took based on For datasets in Table.1, we compared the performances of
genetic algorithm operator with the help of random forest random forests and genetic algorithm individually and also
classifier. With the initialization the population the fitness combined approach of genetic algorithm and random forest
values can be evaluate by fitness function with these we when compare with single accuracy results the combined
applied for both crossover and mutation the mutation got approach got 92% accuracy. In this we took dataset from
the highest fitness value and around 92% accuracy we got Kaggle and I divided the data set into training and testing
when we combined the both genetic algorithm and random dataset with the help of genetic algorithm and later with the
forest. random forest we performed the operators. Finally, mutation
got the highest value with the fitness function, got the good
Comparison of accuracy in percentage: performance. In future I want to implement a software with
high security and costumers can pay their money whenever
Algorithms Dataset Accuracy
they want with the help of eye recognition and finger print.
Random Forest Credit Card 90%
13. REFERENCES
Genetic Algorithm Credit Card 89% [1] Satvik vats, Suryakant Dubey, Naveen Kumar Pandey
they proposed “Genetic algorithms for credit card fraud
Genetic Algorithm with RF Credit Card 92% detection”.
[2] Ruchi Oberoi, they proposed the “Credit card fraud
When random forest algorithm applied alone, we got the detection using genetic algorithm”.
accuracy result around 90% and genetic algorithm applied
[3] Ibtissum, Samira Douzi, Bouabid El Quahidi proposed
alone we got the accuracy result around 89% when we
“Using genetic algorithm to improve the classification
combine the both algorithms, we got the better accuracy
of imbalanced datasets for credit card”.
around 92%.
[4] Sahil Dhankhand, Behrouz Far proposed “Supervised
11. SCOPE AND LIMITATION machine learning algorithm for credit card fraud
In the scope of this project, we will analyze the operators detection: A comparative study”.
using fitness values with both genetic algorithm and a
[5] M. Suresh Kumar, V. Soundarya, Kavitha, E. S.
random forest algorithm. Beyond that which operator got
Keerthika, E. Ashwini proposed “Credit card fraud
the highest fitness value, we will plot the graph with the
detection using random forest”.
parameters and we will implement the accurate result. In the
limitation of this project, the data set is highly imbalanced so [6] Sonali Bakshi “Credit card fraud detection a
applying random under-sampling on the top if applying classification analysis”.
cross-validation, the accuracy won’t increase because data
[7] Vyom shah, Parin shah, Harish Shetty proposed “A
set already reduced.
review of credit card fraud detection techniques”.
@ IJTSRD | Unique Paper ID – IJTSRD31774 | Volume – 4 | Issue – 5 | July-August 2020 Page 233