Documente Academic
Documente Profesional
Documente Cultură
ANALYSIS
BY
TEAM : SPARKS
N. VINILA
S. BHAVANI
S. MOUNIKA
CH. MADHUMITHA
I. SRI POORVAJA
INTRODUCTION
• Numpy
• Pandas
• Matplotlib
Syntax :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
DATA COLLECTION
Correlation
Data Cleaning
• Data cleansing or data cleaning is the process of detecting and
correcting (or removing) corrupt or inaccurate data from a
data set.
• We have generally two ways of imputing missing values: the
Pandas Data Frame fillna method or the SciKit Imputer.
finding null values
• Lambda function
Unique() :
Replace :
Fillna :
• Firstly we need to use the value_counts()
• Apply fillna
• Applying fillna to all the columns with NaN values .
• Check whether NaN values are present or not?
Apply Lambda
split dataset into attributes using iloc
Encoding labels.
Iloc()
train_test_split
TRAINING TESTING
80 20
70 30
60 40
50 50
Algorithms
• Logistic Regression
• Support Vector Classification
• Decision Tree Classifier
• Random Forest
• K Nearest Neighbors
• Naive bayes
Confusion Matrix
• A confusion matrix is a table that is often used to describe the
performance of a classification model (or "classifier") on a set
of test data for which the true values are known.
Recall
Precision
F-Measure
Logistic Regression
A logistic regression model predicts a dependent data variable
by analyzing the relationship between one or more existing
independent variables.
Support Vector Classification
Logistic Regression 92 89 86 84
Support vector 82 79 79 79
classification
Decision Tree 90 90 91 89
Random Forest 90 90 91 89
KNN 77 77 75 73
Naïve Bayes 92 89 86 84
We have consider these input variable & output variable.
Both the input var has got accuracy at different ratio’s for diff algorithm