Sunteți pe pagina 1din 16

Name :Varun Daga

NetId: vdaga2
Section: C1
Homework 1
Exercise 1(a)
Variable: PetalLength (Petal Length (mm))
Moments
150 Sum Weights

150

37.58 Sum Observations

Mean

5637

Std Deviation

17.6529823 Variance

311.627785

Skewness

-0.2748842 Kurtosis

-1.4021034

258271 Corrected SS

Uncorrected SS

46432.54

Coeff Variation 46.9744075 Std Error Mean

1.44135997

Basic Statistical Measures


Location
Mean

Variability

37.58000 Std Deviation

Median 43.50000 Variance


Mode

14.00000 Range

Interquartile Range

17.65298
311.62779
59.00000
35.00000

Mean: 37.58
Median: 43.5
Spread (Standard Deviation) : 17.65298
Skewness: -0.2748842
Range: 59.0
Exercise 1(b)
Variable: PetalLength (Petal Length (mm))
Iris Species=Setosa
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS

50 Sum Weights
14.62 Sum Observations

50
731

1.73663996 Variance

3.01591837

0.1063939 Kurtosis

1.02157611

10835 Corrected SS

Coeff Variation 11.8785223 Std Error Mean

147.78
0.24559798

Basic Statistical Measures


Location
Mean

Variability

14.62000 Std Deviation

Median 15.00000 Variance


Mode

14.00000 Range

1.73664
3.01592
9.00000

Interquartile Range 2.00000

Variable: PetalLength (Petal Length (mm))


Iris Species=Versicolor
Moments
50 Sum Weights

50

42.6 Sum Observations

Mean

2130

Std Deviation

4.69910977 Variance

22.0816327

Skewness

-0.6065077 Kurtosis

0.0479033

91820 Corrected SS

Uncorrected SS

1082

Coeff Variation 11.0307741 Std Error Mean

0.66455448

Basic Statistical Measures


Location
Mean

Variability

42.60000 Std Deviation

Median 43.50000 Variance


Mode

45.00000 Range

Interquartile Range

4.69911
22.08163
21.00000
6.00000

Variable: PetalLength (Petal Length (mm))


Iris Species=Virginica
Moments
N
Mean

50 Sum Weights
55.52 Sum Observations

50
2776

Std Deviation

5.51894696 Variance

30.4587755

Skewness

0.54944459 Kurtosis

-0.1537786

Uncorrected SS

155616 Corrected SS

Coeff Variation 9.94046642 Std Error Mean

1492.48
0.78049696

Basic Statistical Measures


Location
Mean

Variability

55.52000 Std Deviation

Median 55.50000 Variance


Mode

51.00000 Range

Interquartile Range

5.51895
30.45878
24.00000
8.00000

From the above statistics, we do not see much similarity between the petal lengths of three
species. On average, species Setosa has the lowest petal length amongst the three with mean
14.62, species Virginica has the highest petal length with mean 55.52, and species Veriscolor
lies in between, but closer to Virginica with mean petal length 42.6. Out of the three, Virginia
also has the highest standard deviation of 5.51, followed by Veriscolor with standard deviation
of 4.69, and Setosa has the lowest standard deviation of 1.73. Also, Setosa flowers and
Virginica flowers have positive skewness of 0.10 and 0.54 respectively, which means that the
right tail is longer, whereas the Veriscolor has negative skewness of -0.60 which means that the
left tail is longer.

Exercise 2(a)
Variable: PetalLength (Petal Length (mm))
Tests for Normality
Test

Statistic

p Value

Shapiro-Wilk

0.876268 Pr < W

<0.0001

Kolmogorov-Smirnov

0.198154 Pr > D

<0.0100

Cramer-von Mises

W-Sq 1.222285 Pr > W-Sq <0.0050

Anderson-Darling

A-Sq

7.678546 Pr > A-Sq

<0.0050

Distribution of PetalLength
25

20

Percent

15

10

0
12

20

28

36

44

52

Petal Length (mm)


Curve

Normal(Mu=37.58 Sigma=17.653)

60

68

Probability Plot for PetalLength


70

Petal Length (mm)

60

50

40

30

20

10
0.1

10

25

50

75

90

95

99

99.9

Normal Percentiles

If we look at the normal histogram plot, the bell shape curve signifies that the data for petal
length is normal. However, the probability plot has a huge bend in the middle which
signifies that the data is not normal. Further statistical analysis is needed to measure the
normality.
Ho: Data is normal
Ha: Data is not normal
The quantitative test results depict that the data is not normal, and all the tests reject the null
hypothesis at 5 % level of significance.
Hence, our overall decision is that the data for petal length is not normal, and further we
conclude that the bell shape curve in histogram plot does not mean that the data is normal,
and quantitative analysis is very important.

Exercise 2(b)
Variable: PetalLength (Petal Length (mm))
Iris Species=Setosa
Tests for Normality
Test

Statistic

p Value

Shapiro-Wilk

0.954977 Pr < W

0.0548

Kolmogorov-Smirnov

0.153398 Pr > D

<0.0100

Cramer-von Mises

W-Sq 0.189745 Pr > W-Sq

Anderson-Darling

A-Sq

1.007324 Pr > A-Sq

0.0070
0.0111

Iris Species=Setosa

Distribution of PetalLength
40

Percent

30

20

10

0
9.75

11.25

12.75

14.25

15.75

17.25

Petal Length (mm)


Curve

Normal(Mu=14.62 Sigma=1.7366)

18.75

Iris Species=Setosa

Probability Plot for PetalLength


20

Petal Length (mm)

18

16

14

12

10
1

10

25

50

75

90

95

99

Normal Percentiles

Ho: Data is normal


Ha: Data is not normal
Species: Setosa
The normal histogram plot depicts a bell shape curve, hence it signifies that the data is normal.
However, the probability plot has distribution that signifies that the data is not normal.
Quantitative analysis also has mixed results. Shapiro-Wilk test do not reject the null hypothesis
at 5 % significance level, however, all the other three test depicts that the data is not normal and
rejects the null hypothesis at 5 % level of significance. Hence, based on the majority of tests, we
can conclude that Petal data for species Setosa shows significant differences from normality.

Variable: PetalLength (Petal Length (mm))


Iris Species=Versicolor
Tests for Normality
Test

Statistic

p Value

Shapiro-Wilk

0.966004 Pr < W

0.1585

Kolmogorov-Smirnov

0.117121 Pr > D

0.0855

Cramer-von Mises

W-Sq 0.090004 Pr > W-Sq 0.1506

Anderson-Darling

A-Sq

0.555056 Pr > A-Sq

0.1479

Iris Species=Versicolor

Distribution of PetalLength
40

Percent

30

20

10

0
30

34

38

42

46

Petal Length (mm)


Curve

Normal(Mu=42.6 Sigma=4.6991)

50

Iris Species=Versicolor

Probability Plot for PetalLength


55

Petal Length (mm)

50

45

40

35

30
1

10

25

50

75

90

95

99

Normal Percentiles

Species: Veriscolor
The normal histogram plot and the probability plot both signifies that the Petal length data for the
species Veriscolor is normally distributed. Also, the quantitative results do not reject the null
hypothesis at 5 % level of significance which depicts that the data is normal. Hence, overall
Petal length data for Veriscolor do not show significant differences from normality.

Variable: PetalLength (Petal Length (mm))


Iris Species=Virginica
Tests for Normality
Test

Statistic

p Value

Shapiro-Wilk

0.962186 Pr < W

0.1098

Kolmogorov-Smirnov

0.113606 Pr > D

0.1036

Cramer-von Mises

W-Sq 0.086306 Pr > W-Sq 0.1725

Anderson-Darling

A-Sq

0.608956 Pr > A-Sq

0.1088

Iris Species=Virginica

Distribution of PetalLength
30

25

Percent

20

15

10

0
46

50

54

58

62

Petal Length (mm)


Curve

Normal(Mu=55.52 Sigma=5.5189)

66

70

Iris Species=Virginica

Probability Plot for PetalLength


70

Petal Length (mm)

65

60

55

50

45

40
1

10

25

50

75

90

95

99

Normal Percentiles

Species: Virginica
The normal histogram plot and the probability plot both signifies that data for species Virginica
is normally distributed. Also, the quantitative results do not reject the null hypothesis at 5 %
level of significance which depicts that the Petal length data for species Virginica do not show
significant differences from normality.
In conclusion, we see that only the Petal length data for the species Setosa shows significant
differences from normality, and the other two species follow normal distribution.

Exercise 3(a)
The petal length data for Virginica flowers is normally distributed, however, since the
assumption of normality was not reasonable for Petal length data in case of Setosa flowers, we
will use Wilcoxon Rank Sum test.

Wilcoxon Scores (Rank Sums) for Variable PetalLength


Classified by Variable Species
Sum of Expected
Std Dev
Scores Under H0 Under H0

Mean
Score

Species

Setosa

50

1275.0

2525.0 144.632938

25.50

Virginica

50

3775.0

2525.0 144.632938

75.50

Average scores were used for ties.

Wilcoxon Two-Sample Test


Statistic

1275.0000

Normal Approximation
Z

-8.6391

One-Sided Pr < Z

<.0001

Two-Sided Pr > |Z|

<.0001

t Approximation
One-Sided Pr < Z

<.0001

Two-Sided Pr > |Z|

<.0001

Z includes a continuity correction of 0.5.

Distribution of Wilcoxon Scores for PetalLength


100

80

Score

60

40

20

Pr < Z <.0001
Pr > |Z| <.0001

0
Setosa

Virginica

Iris Species

Ho: Petal length values for Setosa and Virginica have stochastically equal distribution.
Ha: Petal length values for Setosa and Virginica do not have stochastically equal distribution.
The Wilcoxon rank sum test signifies that the null hypothesis is rejected at 5 % level of
significance with Pr < Z is < 0.0001 (One-sided test). Also, the box plot graph signifies that the
Virginica flowers has much longer petal length than Setosa flowers.

Exercise 4(a)
45 50 55 60 65 70

15.0 17.5 20.0 22.5 25.0


80
70

Sepal Length (mm)


60
50
70
65
60

Petal Length (mm)

55
50
45

35

Sepal Width (mm)

30
25

25.0
22.5
20.0

Petal Width (mm)

17.5
15.0
50

60

70

80

25

Iris Species

30

35

Virginica

Pairwise Scatter Plot-Virginica Flower


For Petal length and Sepal length, we can say that there is a strong positive correlation between
petal length and Sepal Length with Petal length at x axis and Sepal length on y axis because the
plot looks like a line.
For Petal length and Sepal width, there are chances that they might be correlated, however,
there are some places where the data points are scattered, hence further statistical analysis is
required to measure the exact correlation.

For Petal width and Sepal Length, the data points are highly scattered, hence they are less
chances that they have correlation, however, further statistical analysis is required to measure
the exact correlation.
Sepal Length and Sepal Width might be correlated, however, there are some portions in the plot
where that data points are scattered, hence further analysis is required to measure the exact
correlation.
Petal length and Petal width have data points highly scattered, hence, there are less chances that
they have correlation, however, further statistical analysis is required to measure the exact
correlation.
Lastly, Sepal width and Petal width might have a correlation between them because data points
are aligned one direction, however, they are also scattered, hence further statistical analysis is
required to measure the exact correlation.
Exercise 5(a)
Iris Species=Virginica
Pearson Correlation Coefficients, N = 50
Prob > |r| under H0: Rho=0
SepalLength PetalLength SepalWidth PetalWidth
SepalLength

1.00000

0.86422
<.0001

0.45723
0.0008

0.28111
0.0480

PetalLength

0.86422
<.0001

1.00000

0.40104
0.0039

0.32211
0.0225

SepalWidth

0.45723
0.0008

0.40104
0.0039

1.00000

0.53773
<.0001

PetalWidth

0.28111
0.0480

0.32211
0.0225

0.53773
<.0001

1.00000

Sepal Length (mm)


Petal Length (mm)
Sepal Width (mm)
Petal Width (mm)

Spearman Correlation Coefficients, N = 50


Prob > |r| under H0: Rho=0
SepalLength PetalLength SepalWidth PetalWidth
SepalLength

1.00000

0.82432
<.0001

0.42652
0.0020

0.31577
0.0255

PetalLength

0.82432
<.0001

1.00000

0.38736
0.0054

0.36291
0.0096

SepalWidth

0.42652
0.0020

0.38736
0.0054

1.00000

0.54431
<.0001

PetalWidth

0.31577
0.0255

0.36291
0.0096

0.54431
<.0001

1.00000

Sepal Length (mm)


Petal Length (mm)
Sepal Width (mm)
Petal Width (mm)

Petal length and Sepal length


Have statistically significant positive correlation at 5 % level of significance
Pearson value: 0.864 (P-value: <0.0001)
Spearman value: .82432 (P-value: <0.0001)
Petal length and Sepal width
Have statistically significant positive correlation at 5 % level of significance
Pearson value: 0.4010 (P-value: 0.0039)
Spearman value: 0.387 (P-value: 0.0054)
Petal width and Sepal Length
Have statistically significant positive correlation at 5 % level of significance
Pearson value: 0.2811 (P-value: 0.0480)
Spearman value: 0.3157 (P-value: 0.0255)
Sepal Length and Sepal Width
Have statistically significant positive correlation at 5 % level of significance
Pearson value: 0.45 (P-value: 0.0008)
Spearman value: 0.426 (P-value: 0.0020)
Petal length and Petal width
Have statistically significant positive correlation at 5 % level of significance
Pearson value: 0.322 (P-value: 0.0225)
Spearman value: 0.3629 (P-value: 0.0096)
Sepal width and Petal width
Have statistically significant positive correlation at 5 % level of significance
Pearson value: 0.53 (P-value: <.0001)
Spearman value: 0.54 (P-value: <.0001)
Hence, from the above statistics, we can see that both Pearson and Spearman correlation gives
us similar results, and all the pairs are considered to be significantly correlated at 5 % level of
significance.

S-ar putea să vă placă și