Documente Academic
Documente Profesional
Documente Cultură
NetId: vdaga2
Section: C1
Homework 1
Exercise 1(a)
Variable: PetalLength (Petal Length (mm))
Moments
150 Sum Weights
150
Mean
5637
Std Deviation
17.6529823 Variance
311.627785
Skewness
-0.2748842 Kurtosis
-1.4021034
258271 Corrected SS
Uncorrected SS
46432.54
1.44135997
Variability
14.00000 Range
Interquartile Range
17.65298
311.62779
59.00000
35.00000
Mean: 37.58
Median: 43.5
Spread (Standard Deviation) : 17.65298
Skewness: -0.2748842
Range: 59.0
Exercise 1(b)
Variable: PetalLength (Petal Length (mm))
Iris Species=Setosa
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
50 Sum Weights
14.62 Sum Observations
50
731
1.73663996 Variance
3.01591837
0.1063939 Kurtosis
1.02157611
10835 Corrected SS
147.78
0.24559798
Variability
14.00000 Range
1.73664
3.01592
9.00000
50
Mean
2130
Std Deviation
4.69910977 Variance
22.0816327
Skewness
-0.6065077 Kurtosis
0.0479033
91820 Corrected SS
Uncorrected SS
1082
0.66455448
Variability
45.00000 Range
Interquartile Range
4.69911
22.08163
21.00000
6.00000
50 Sum Weights
55.52 Sum Observations
50
2776
Std Deviation
5.51894696 Variance
30.4587755
Skewness
0.54944459 Kurtosis
-0.1537786
Uncorrected SS
155616 Corrected SS
1492.48
0.78049696
Variability
51.00000 Range
Interquartile Range
5.51895
30.45878
24.00000
8.00000
From the above statistics, we do not see much similarity between the petal lengths of three
species. On average, species Setosa has the lowest petal length amongst the three with mean
14.62, species Virginica has the highest petal length with mean 55.52, and species Veriscolor
lies in between, but closer to Virginica with mean petal length 42.6. Out of the three, Virginia
also has the highest standard deviation of 5.51, followed by Veriscolor with standard deviation
of 4.69, and Setosa has the lowest standard deviation of 1.73. Also, Setosa flowers and
Virginica flowers have positive skewness of 0.10 and 0.54 respectively, which means that the
right tail is longer, whereas the Veriscolor has negative skewness of -0.60 which means that the
left tail is longer.
Exercise 2(a)
Variable: PetalLength (Petal Length (mm))
Tests for Normality
Test
Statistic
p Value
Shapiro-Wilk
0.876268 Pr < W
<0.0001
Kolmogorov-Smirnov
0.198154 Pr > D
<0.0100
Cramer-von Mises
Anderson-Darling
A-Sq
<0.0050
Distribution of PetalLength
25
20
Percent
15
10
0
12
20
28
36
44
52
Normal(Mu=37.58 Sigma=17.653)
60
68
60
50
40
30
20
10
0.1
10
25
50
75
90
95
99
99.9
Normal Percentiles
If we look at the normal histogram plot, the bell shape curve signifies that the data for petal
length is normal. However, the probability plot has a huge bend in the middle which
signifies that the data is not normal. Further statistical analysis is needed to measure the
normality.
Ho: Data is normal
Ha: Data is not normal
The quantitative test results depict that the data is not normal, and all the tests reject the null
hypothesis at 5 % level of significance.
Hence, our overall decision is that the data for petal length is not normal, and further we
conclude that the bell shape curve in histogram plot does not mean that the data is normal,
and quantitative analysis is very important.
Exercise 2(b)
Variable: PetalLength (Petal Length (mm))
Iris Species=Setosa
Tests for Normality
Test
Statistic
p Value
Shapiro-Wilk
0.954977 Pr < W
0.0548
Kolmogorov-Smirnov
0.153398 Pr > D
<0.0100
Cramer-von Mises
Anderson-Darling
A-Sq
0.0070
0.0111
Iris Species=Setosa
Distribution of PetalLength
40
Percent
30
20
10
0
9.75
11.25
12.75
14.25
15.75
17.25
Normal(Mu=14.62 Sigma=1.7366)
18.75
Iris Species=Setosa
18
16
14
12
10
1
10
25
50
75
90
95
99
Normal Percentiles
Statistic
p Value
Shapiro-Wilk
0.966004 Pr < W
0.1585
Kolmogorov-Smirnov
0.117121 Pr > D
0.0855
Cramer-von Mises
Anderson-Darling
A-Sq
0.1479
Iris Species=Versicolor
Distribution of PetalLength
40
Percent
30
20
10
0
30
34
38
42
46
Normal(Mu=42.6 Sigma=4.6991)
50
Iris Species=Versicolor
50
45
40
35
30
1
10
25
50
75
90
95
99
Normal Percentiles
Species: Veriscolor
The normal histogram plot and the probability plot both signifies that the Petal length data for the
species Veriscolor is normally distributed. Also, the quantitative results do not reject the null
hypothesis at 5 % level of significance which depicts that the data is normal. Hence, overall
Petal length data for Veriscolor do not show significant differences from normality.
Statistic
p Value
Shapiro-Wilk
0.962186 Pr < W
0.1098
Kolmogorov-Smirnov
0.113606 Pr > D
0.1036
Cramer-von Mises
Anderson-Darling
A-Sq
0.1088
Iris Species=Virginica
Distribution of PetalLength
30
25
Percent
20
15
10
0
46
50
54
58
62
Normal(Mu=55.52 Sigma=5.5189)
66
70
Iris Species=Virginica
65
60
55
50
45
40
1
10
25
50
75
90
95
99
Normal Percentiles
Species: Virginica
The normal histogram plot and the probability plot both signifies that data for species Virginica
is normally distributed. Also, the quantitative results do not reject the null hypothesis at 5 %
level of significance which depicts that the Petal length data for species Virginica do not show
significant differences from normality.
In conclusion, we see that only the Petal length data for the species Setosa shows significant
differences from normality, and the other two species follow normal distribution.
Exercise 3(a)
The petal length data for Virginica flowers is normally distributed, however, since the
assumption of normality was not reasonable for Petal length data in case of Setosa flowers, we
will use Wilcoxon Rank Sum test.
Mean
Score
Species
Setosa
50
1275.0
2525.0 144.632938
25.50
Virginica
50
3775.0
2525.0 144.632938
75.50
1275.0000
Normal Approximation
Z
-8.6391
One-Sided Pr < Z
<.0001
<.0001
t Approximation
One-Sided Pr < Z
<.0001
<.0001
80
Score
60
40
20
Pr < Z <.0001
Pr > |Z| <.0001
0
Setosa
Virginica
Iris Species
Ho: Petal length values for Setosa and Virginica have stochastically equal distribution.
Ha: Petal length values for Setosa and Virginica do not have stochastically equal distribution.
The Wilcoxon rank sum test signifies that the null hypothesis is rejected at 5 % level of
significance with Pr < Z is < 0.0001 (One-sided test). Also, the box plot graph signifies that the
Virginica flowers has much longer petal length than Setosa flowers.
Exercise 4(a)
45 50 55 60 65 70
55
50
45
35
30
25
25.0
22.5
20.0
17.5
15.0
50
60
70
80
25
Iris Species
30
35
Virginica
For Petal width and Sepal Length, the data points are highly scattered, hence they are less
chances that they have correlation, however, further statistical analysis is required to measure
the exact correlation.
Sepal Length and Sepal Width might be correlated, however, there are some portions in the plot
where that data points are scattered, hence further analysis is required to measure the exact
correlation.
Petal length and Petal width have data points highly scattered, hence, there are less chances that
they have correlation, however, further statistical analysis is required to measure the exact
correlation.
Lastly, Sepal width and Petal width might have a correlation between them because data points
are aligned one direction, however, they are also scattered, hence further statistical analysis is
required to measure the exact correlation.
Exercise 5(a)
Iris Species=Virginica
Pearson Correlation Coefficients, N = 50
Prob > |r| under H0: Rho=0
SepalLength PetalLength SepalWidth PetalWidth
SepalLength
1.00000
0.86422
<.0001
0.45723
0.0008
0.28111
0.0480
PetalLength
0.86422
<.0001
1.00000
0.40104
0.0039
0.32211
0.0225
SepalWidth
0.45723
0.0008
0.40104
0.0039
1.00000
0.53773
<.0001
PetalWidth
0.28111
0.0480
0.32211
0.0225
0.53773
<.0001
1.00000
1.00000
0.82432
<.0001
0.42652
0.0020
0.31577
0.0255
PetalLength
0.82432
<.0001
1.00000
0.38736
0.0054
0.36291
0.0096
SepalWidth
0.42652
0.0020
0.38736
0.0054
1.00000
0.54431
<.0001
PetalWidth
0.31577
0.0255
0.36291
0.0096
0.54431
<.0001
1.00000