Documente Academic
Documente Profesional
Documente Cultură
ipynb - Colaboratory
1 import pandas as pd
2 import seaborn as sns
3 import matplotlib.pyplot as plt
4 import numpy as np
5 from google.colab import files
1 uploaded = files.upload()
1 haberman = pd.read_csv("haberman.csv")
1 print(haberman.shape)
(305, 4)
1 print(haberman.columns)
2 haberman.columns = ["age", "operation_year", "axil_nodes", "survival status"]
3 haberman.head()
0 30 62 3 1
1 30 65 0 1
2 31 59 2 1
3 31 65 4 1
4 33 58 10 1
"]
1 haberman.info()
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 1/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 305 entries, 0 to 304
Data columns (total 4 columns):
age 305 non-null int64
operation_year 305 non-null int64
axil_nodes 305 non-null int64
survival status 305 non-null int64
dtypes: int64(4)
1 haberman["survival
memory usage: 9.6status"].value_counts()
KB
1 224
2 81
Name: survival status, dtype: int64
observation:- out of 305 observation , we found 224 people lived more than 5 years,and 81 people died
wthin 5 years.
1 haberman.describe()
2 - d scatter plot
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 2/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
1 sns.set_style("whitegrid");
2 sns.FacetGrid(haberman, hue="survival status", size = 8) \
3 .map(plt.scatter, 'age', 'axil_nodes') \
4 .add_legend()
5 plt.show()
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 3/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
observation:-here we cannot distinguished between orange and blue dots, and here most patient has 0
axil_nodes
1 plt.close();
2 sns.set_style("whitegrid")
3 sns.pairplot(haberman, hue = 'survival status', vars =("age", "operation_year", "axil_no
4 plt.show()
observation:-by observing these pair-plot, wecan't distingush cause most of the point are overlaping
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 4/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 5/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
observation :- 1. only axil_nodes is usefull to read the graph 2. ages and operation are not usefull as they
are overlap, 3. In 1965 more number of people are not survive.
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 6/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
MEAN , MEDIAN,PERCENTILE
1 print("mean:")
2 print(np.mean(alive["age"]))
3 print(np.mean(dead["age"]))
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 7/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
mean:
52.11607142857143
53.67901234567901
1 print(np.mean(alive["operation_year"]))
2 print(np.mean(dead["operation_year"]))
62.857142857142854
62.82716049382716
1 print(np.mean(alive["axil_nodes"]))
2 print(np.mean(dead["axil_nodes"]))
2.799107142857143
7.45679012345679
1 print('std')
2 print(np.std(alive["age"]))
3 print(np.std(dead["age"]))
std
10.913004640364269
10.10418219303131
1 print(np.std(alive['operation_year']))
2 print(np.std(dead["operation_year"]))
3.2220145175061514
3.3214236255207883
1 print(np.std(alive["axil_nodes"]))
2 print(np.std(dead["axil_nodes"]))
5.869092706952767
9.128776076761632
1 print("median")
2 print(np.median(alive['age']))
3 print(np.median(dead["age"]))
median
52.0
53.0
1 print(np.median(alive['operation_year']))
2 print(np.median(dead["operation_year"]))
63.0
63.0
1 print(np.median(alive['axil_nodes']))
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 8/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
2 print(np.median(dead['axil_nodes']))
0.0
4.0
1 print('quantiles')
quantiles
1 print(np.percentile(alive["age"],np.arange(0,100,25)))
2 print(np.percentile(dead["age"],np.arange(0,100,25)))
3 print(np.percentile(alive["operation_year"],np.arange(0,100,25)))
4 print(np.percentile(dead["operation_year"],np.arange(0,100,25)))
5 print(np.percentile(alive["axil_nodes"],np.arange(0,100,25)))
6 print(np.percentile(dead["axil_nodes"],np.arange(0,100,25)))
1 print("90th percentile")
90th percentile
1 print(np.percentile(alive["age"], 90))
2 print(np.percentile(alive["operation_year"], 90))
3 print(np.percentile(alive["axil_nodes"], 90))
4
5 print(np.percentile(dead["age"], 90))
6 print(np.percentile(dead["operation_year"], 90))
7 print(np.percentile(dead["axil_nodes"], 90))
67.0
67.0
8.0
67.0
67.0
20.0
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 9/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
<matplotlib.axes._subplots.AxesSubplot at 0x7fb8a5ed8b38>
1 print("\n.......year......")
2 sns.boxplot(x='survival status',y='operation_year',data=haberman)
.......year......
<matplotlib.axes._subplots.AxesSubplot at 0x7fb8a5eb8e80>
1 print("\n......axil_nodes.....")
2 sns.boxplot(x="survival status",y="axil_nodes",data=haberman)
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 10/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
......axil_nodes.....
<matplotlib.axes._subplots.AxesSubplot at 0x7fb8a5e1d358>
VOLIN PLOT
1 sns.violinplot(x='survival status',y='age',data=haberman,size= 8)
2
3 plt.show()
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 11/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
1 sns.jointplot(x='age',y='operation_year',data=dead,kind='kde')
2 plt.show()
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 12/13
5/22/2019 Copy of haberman datasets analysis.ipynb - Colaboratory
1 sns.jointplot(x='axil_nodes',y='operation_year',data=alive,kind='kde')
2 plt.show()
https://colab.research.google.com/drive/1o1_WfES3ATAM2cXHemuFB6JVMSf76syP#scrollTo=nCsQxUiSTW7_&printMode=true 13/13