Sunteți pe pagina 1din 22

Correlation and Regression

Correlation and Regression


No. Studied % Test
Hours (X) Score(Y)
1. 20 40
2. 24 55
3. 46 69
4. 62 83
5. 22 27
6. 37 44
7. 45 61
8. 27 33
9. 65 71
10. 23 37
Correlation and Regression
No. Studied % Test
Hours (X) Score(Y) Test Score % (Y)
90
80
1. 20 40 70
60
2. 24 55 50
40
3. 46 69 30
20
4. 62 83 10
0
5. 22 27 0 10 20 30 40 50 60 70

6. 37 44
7. 45 61
8. 27 33
9. 65 71
10. 23 37

Pearson Correlation Coefficient

Study Hours (X) Test Score % (Y) XY X2 Y2

20 40 800 400 1600


24 55 1320 576 3025
46 69 3174 2116 4761
62 83 5146 3844 6889
22 27 594 484 729
37 44 1628 1369 1936
45 61 2745 2025 3721
27 33 891 729 1089
65 71 4615 4225 5041
23 37 851 529 1369

Sum= 371 520 21764 16297 30160


Pearson Correlation Coefficient

Study Hours (X) Test Score % (Y) XY X2 Y2

20 40 800 400 1600


24 55 1320 576 3025
46 69 3174 2116 4761
62 83 5146 3844 6889
22 27 594 484 729
37 44 1628 1369 1936
45 61 2745 2025 3721
27 33 891 729 1089
65 71 4615 4225 5041
23 37 851 529 1369

Sum= 371 520 21764 16297 30160

Pearson Correlation Coefficient

Study Hours (X) Test Score % (Y) X - Xbar Y-Ybar Sqr (X-Xbar) Sqr (Y-Ybar) (X-Xbar)(Y-Ybar)

20 40 -17.1 -12 292.41 144 205.2

24 55 -13.1 3 171.61 9 -39.3

46 69 8.9 17 79.21 289 151.3

62 83 24.9 31 620.01 961 771.9

22 27 -15.1 -25 228.01 625 377.5

37 44 -0.1 -8 0.01 64 0.8

45 61 7.9 9 62.41 81 71.1

27 33 -10.1 -19 102.01 361 191.9

65 71 27.9 19 778.41 361 530.1

23 37 -14.1 -15 198.81 225 211.5

Mean 37.1 52 2532.9 3120 2472


50.3279247 55.85696018
Correlation
Correlation
• Measures the strength of linear
relationship between Y and X
• Pearson Correlation Coefficient, r (r varies
between -1 and +1)
- Perfect positive relationship: r = 1
- No relationship: r = 0
- Perfect negative relationship: r = -1

Correlation Coefficient
Regression
Regression

• Metric: Coefficient of Determination, R-


Sq (varies from 0.0 to 1.0 or zero to 100%)
- None of the variation in Y is explained by
X, R-Sq = 0.0
- All of the variation in Y is explained by X,
R-Sq = 1.0

Linear Regression
Regression

• Quantifies the relationship between Y


and X (Y = a + bX)
Linear Regression
Study Hours (X) Test Score % (Y) XY X2 Y2

20 40 800 400 1600


24 55 1320 576 3025
46 69 3174 2116 4761
62 83 5146 3844 6889
22 27 594 484 729
37 44 1628 1369 1936
45 61 2745 2025 3721
27 33 891 729 1089
65 71 4615 4225 5041
23 37 851 529 1369

Sum= 371 520 21764 16297 30160

Linear Regression
Study Hours (X) Test Score % (Y) XY X2 Y2

20 40 800 400 1600


24 55 1320 576 3025
46 69 3174 2116 4761
62 83 5146 3844 6889
22 27 594 484 729
37 44 1628 1369 1936
45 61 2745 2025 3721
27 33 891 729 1089
65 71 4615 4225 5041
23 37 851 529 1369

Sum= 371 520 21764 16297 30160


Linear Regression
• The fitted lines:
- Quantify the relationship between the
predictor (input) variable (X) and response
(output) variable (Y)
- Enable predictions of the response Y to be
made from a knowledge of the predictor X

Hypothesis Testing
Hypothesis:
A claim that we want to test

Null Hypothesis - H0
- Default position / Currently accepted
position / Assumed / Status Quo
Alternate Hypothesis – Ha
- Claim to be tested. Also known as
Research Hypothesis or the other option.
Hypothesis Testing
Null Hypothesis - H0

H0 : μ = 150cc
Alternate Hypothesis – Ha

Ha : μ ≠ 150cc

Two possibilities:
- Reject Null Hypothesis
- Fail to Reject Null Hypothesis

Hypothesis Testing
Two possibilities:
- Reject Null Hypothesis
- Fail to Reject Null Hypothesis

- You either reject the null hypothesis or fail


to reject. You never accept Null Hypothesis.
- If Null Hypothesis is rejected you proceed
to Alternate Hypothesis.
Hypothesis Testing
- Null Hypothesis and Alternate
Hypothesis are pair and cover all H0 : μ = 150cc
possibilities. Ha : μ ≠ 150cc
- Only one of these has to stand and not
both.
H0 : μ ≤ 150cc
Ha : μ > 150cc

H0 : μ ≥ 150cc
Ha : μ < 150cc

H0 : μ = 150cc
Ha : μ ≠ 150cc
Hypothesis Testing
Using 10 samples:
x̄ = 150cc

x̄ = 153cc

x̄ = 167cc
Statistically where do you draw the line?
H0 : μ = 150cc
Ha : μ ≠ 150cc
Hypothesis Testing
Level of Confidence / Confidence Interval:

C = 0.90, 0.95, 0.99 (90%, 95%, 99%)

Level of Significance:

α = 1 – C (0.10, 0.05, 0.01)

Rejection Region
Nonrejection Region
Hypothesis Testing - Visual Critical Value

Using 1 samples:
α = 0.05
150cc

153cc

167cc
Alpha vs Critical Value
α = 0.01

α = 0.05

α = 0.10

α = 0.05
Hypothesis Testing - Visual
Using 4 samples:
x̅ = 150cc

x̅ = 153cc

x̅ = 167cc
Types of Errors
True State of Nature

H0 Ha
Is true Is true

Support H0 /
Reject Ha Correct Type II Error
Conclusion
Conclusion Support Ha /
Reject H0 Type I Error Correct
Conclusion

Types of Errors
Type I error (alpha) Type II error (beta)
Name Producer’s risk/ Significance Consumer’s risk
level

1 minus error is Confidence level Power of the test


called
Example of Fire False fire alarm leading to Missed fire leading to disaster
Alarm inconvenience

Effects on process Unnecessary cost increase Defects may be produced


due to frequent changes

Control method Usually fixed at a pre- Usually controlled to < 10% by


determined level, 1%, 5% or appropriate sample size
10%
Simple definition Innocent declared as guilty Guilty declared as innocent
t Test
- When population standard deviation is
known (sigma) we use Z-distribution
- When population standard deviation is
unknown (estimate of s.d. is s) we use t-
distribution

When sample size is more than 30 z-


distribution could be used even if we do
not know the population standard
deviation

Z-Test vs t-Test
Z-Test vs t-Test

H0 : μ = 150cc
t-Test
Ha : μ ≠ 150cc
150 -1.8 3.24
151 -0.8 0.64
153 1.2 1.44
150 148 -3.8 14.4
151 149 -2.8 7.84
153 150 -1.8 3.24
148 152 0.2 0.04
149 155 3.2 10.2
150 154 2.2 4.84
152 156 4.2 17.6
155 63.6
154
156 7.07
2.66
t-Test
H0 : μ = 150cc
Ha : μ ≠ 150cc Level of Significance α = 0.05

150
151
153
148
149
150
152
155
154
156

t-Test
H0 : μ = 150cc
Ha : μ ≠ 150cc
Level of Significance α = 0.05
150
151
153
148
149
150
152
155
154
156
Analysis of Variance (ANOVA)

x1 x2
6 11
8 12
10 9
8 11
8 12

Analysis of Variance (ANOVA)


A One-Way ANOVA (Analysis of Variance) is
a statistical technique by which we can test
if three or more means are equal. It tests if
the value of a single variable differs
significantly among three or more levels of
a factor.
We can say we have a framework for one-
way ANOVA when we have a single factor
with three or more levels and multiple
observations at each level.
Analysis of Variance (ANOVA)
• The one-way analysis of variance (ANOVA)
is used when the input or independent
variable X is categorical data and the
output or dependant variable Y is
continuous data.
• The independent variable X can consist of
any number of groups (levels) typically
greater than two.
- If the number of levels are just two, then
it is a hypothesis test of the means

Analysis of Variance (ANOVA)


• Used to test the null hypothesis that
multiple population means are equal:
• ANOVA tests to determine if the
means are different, not which of the
means are different.
H: μ = μ = μ = μ …
o 1 2 3 4

H : At least one μ is different


a k

• ANOVA tests to determine if the means


are different, not which of the means are
different.
Analysis of Variance (ANOVA)
• Degrees of Freedom (DF): the number of independent conclusions
that can be drawn from the data.
• SSFactor: The SSFactor measures the variation of each group mean to
the grand mean of all groups.
• SSError: Measures the variation of each observation within each
factor
level to the mean of that level.
• Mean Square Error (MSE): SSerror divided by the error DF.
• F: the ratio of the variance between treatments to the variance within
treatments = MS/MSE. If F is near 0, the treatment means are not
different (P value is large).
• P: the probability that the difference observed is due to chance
(sampling error). A small P value (typically <.05) indicates a
difference and Ho should be rejected.

Analysis of Variance (ANOVA)

Null Hypothesis - H0
Method 1 Method 2 Method 3
6 11 4
8 12 7 There is no difference between three methods
9
7
9
13
3
6
H0 : μ1 = μ2 = μ3
8 12 4
Alternate Hypothesis – Ha

At least one of the three methods is different.


Analysis of Variance (ANOVA)

Method 1 Method 2 Method 3


6 11 4
8 12 7
9 9 3
7 13 6
8 12 4

Analysis of Variance (ANOVA)


Method 1 Method 2 Method 3
6 11 4
8 12 7
9 9 3
7 13 6
8 12 4

x1 x1-x1 bar (x1-x1 bar)2 x2 x2-x2 bar (x2-x2 bar)2 x3 x3-x3 bar (x3-x3 bar)2
6 -2 4 11 0 0 4 1 1
8 0 0 12 1 1 7 2 4
10 2 4 9 2 4 3 2 4
8 0 0 11 0 0 6 1 1
8 0 0 12 1 1 5 0 0
8 8 11 6 5 10

Sum of Square Within = 8+6+10=24


Total Sum of Square = Sum of square Within + Sum of square Between
Analysis of Variance (ANOVA)
x x-x bar (x-x bar)2
Method 1 Method 2 Method 3 6 -2 4
6 11 4 8 0 0
8 12 7 10 2 4
9 9 3 8 0 0
7 13 6 8 0 0
8 12 4 11 -3 9
12 -4 16
9 -1 1
11 -3 9
12 -4 16
Sum of Square Within = 8+6+10=24 4
7
4
1
16
1
Total Sum of Squares = 114 3 5 25
6 2 4
5 3 9
Sum of Squares Between = 114-24 = 90
8 114

Total Sum of Square = Sum of square Within + Sum of square Between

Analysis of Variance (ANOVA)

Total DF = Observations - 1 = 15-1 = 14


DF Between = Groups - 1 = 3-1 = 2
DF Within = Total DF - DF Between = 14-2 = 2
Analysis of Variance (ANOVA)

SS DF MS F Value
SS Between 90 2 45 22.5
SS Within 24 12 2

Sum of Square Within = 8+6+10=24 Total DF = Observations - 1 = 15-1 = 14


Total Sum of Squares = 114 DF Between = Groups - 1 = 3-1 = 2
DF Within = Total DF - DF Between = 14-2 = 2
Sum of Squares Between = 114-24 = 90

Analysis of Variance (ANOVA)

SS DF MS F Value
SS
Between 90 2 45 22.5
SS Within 24 12 2

Critical Value of F
F(0.05,2,12) = 3.8853
Analysis of Variance (ANOVA)

S-ar putea să vă placă și