Documente Academic
Documente Profesional
Documente Cultură
•Null Hypothesis (H0 ):is always hoped to be What are the two kinds of research
rejected. It always contains “=“ sign questions?
•Alternative Hypothesis (Ha ): 1. The first is where a particular value is chosen for
o Challenges H0 practical or policy reasons.
o Never contains the “=“ sign 2. The other situation in which we will have a
specified test value where we want to compare the
o Uses the “< or >” or “ ≠ ”
population under investigation with another
o generally represents the idea which the
population whose parameter region is known.
researcher wants to prove. The null
hypothesis (H0 ) The null hypothesis of no difference (Ho)
•The null hypothesis (Ho ) represents the current •The null hypothesis must be clearly capable of
line of thought concerning population parameters, being rejected, that is, it can be shown to be false.
prior to any application of inferential statistics. •The null hypothesis is also called the statistical
While; hypothesis because it is stated for the purpose of
•The alternative hypothesis (Ha ) is accepted only either accepting or rejecting it after submitting the
after the validity of the null hypothesis is data to statistical analysis.
statistically inferred to be incorrect. Examples:
Example: “The dependant is assumed to be Title 1: An evaluation of the effectiveness of online
innocent until proven guilty beyond all reasonable learning
doubt. Problem: The researcher wants to know if online
learning has increased the average GPA of NEU
What are the steps in hypothesis testing? students from 80%.
1. State the null hypothesis and alternative H0 : = 80; Online learning has not increased the
hypothesis average GPA of NEU students
• Begin with very clear, precisely stated research Ha : > 80; Online learning has increased the
question that will guide the way we conduct and average GPA of NEU students
ensure that we do not just end up with a jumble of Explanation: This is because the researcher is
information interested in knowing if online learning has
that do not create any real knowledge. increased the average GPA of NEU students ( >,
Example: Research Questions because of the word increased)
1. Are men from America on the average taller
than men from the Philippines?
2. Is the proportion of cigarette smokers who
suffer from lung cancer higher than the proportion
of non-smokers who suffer from lung cancer?
2. If Ha uses the <, the test is one-tailed left
directional
3. If Ha uses the >, the test is one-tailed right
Lesson 4.2: Types of Hypothesis Test directional
Level of Significance
1. One-tailed left directional test The level of significance is the area of the rejection
• this is used if Ha uses the < symbol region designated by the Greek Letter (alpha)
Left directional test: It is used when the while the area of the acceptance region is designed
alternatives hypothesis uses comparative such as by the Greek Letter (beta). If =0.05, = 0.95,
less than, smaller than, inferior to, below, etc. the typical values of are 0.01 and 0.05.
The figure above illustrates the acceptance and But you are not prevented from 0.02, 0.03, … etc.
rejection regions. Left tail tests normally used when In your research, however, you just have to use =
we want to test to see if some minimum 0.05.
requirement has been met. Decisions Made Regarding H0
(Reject H0 / Do not reject H0 )
Example: If you reject H0 , it means it is wrong!
1.It is known in our school canteen that the average If you do not reject H0 , it doesn’t mean it is
waiting time for a customer to receive and pay his correct! – you simply don’t have enough evidence
order is 20 minutes. Additional personnel have been to reject it!
assigned and now the management wants to know if
the average waiting time had been reduced. What is Type I error?
H0: The average waiting time had not been reduced • Type I error () is the error of rejecting the true
H0: The average waiting time is equal to 20 minutes null hypothesis (H0 ).
Ha: The average waiting time had been reduced or • It is called the level of significance of a test.
Ha: The average waiting is less than 20 minutes. What is Type II error?
• Type II error () is the error of accepting the false
2. One-tailed right directional test null hypothesis when the alternative hypothesis (H1
• this is used if Ha uses the > symbol ) is true.
Right directional test: It is used when the
alternatives hypothesis uses comparative such as •An of 0.01 (compared with 0.05) means the
greater than, higher than, better than, superior researcher is being relatively careful. He/she is only
to, exceeds, etc. The figure above illustrates the willing to risk being wrong once in a 100 times in
acceptance and rejection regions. A right tail tests rejecting the null hypothesis which is true.
normally used when we want to test whether some
•If the null hypothesis is rejected at = 0.05, the
maximum limit or standard has not been exceeded.
perceived difference is significant, but if it is
rejected at = 0.01, the difference is highly
3. Two-tailed test: Non-directional
significant.
Two-tailed test: It is used when the alternative
hypothesis uses words such as not equal to,
Testing the Significance of Difference Between
significantly different, etc.
Means
z – test n 30 → is known
Alternative hypothesis Tail of sampling
distribution t – test n < 30 → is known
F – test (ANOVA)→ 3 or more μs
H1 : ≠ X Both
H1 : < X Left
•To reiterate, the z – test is used when “n is large”
H1 : > X Right
or when “n 30” and (population standard
deviation) is known. Three types of hypotheses can
Therefore: be tested by z – test
1. If Ha uses the ≠, the test is two-tailed non-
directional Testing the significance of difference between
•Population or hypothesized mean, that is •The second compares the pvalue (the area to the
Population mean vs Sample mean right of the computed value) and .
•Two sample means and two sample standard
deviations are known, that is Sample mean 1 vs
Sample mean 2
•Two sample means and population standard Lesson 4.3: The Z-Test
deviation is known, that is Sample mean 1 vs
Sample mean 2 •A table is constructed so that you don’t have to go
back to the areas under the normal curve table. You
Testing the Significance of Difference Between will always refer to this table whenever you use the
Means “n is large or when n 30 and is ztest in hypothesis testing.
unknown”
Test 0.01 0.05
z – test n 30 → is known One-tailed 2.33 1.65
•Hypothesized/population mean vs Sample mean Two-tailed 2.58 1.96
and population standard deviation is known Examples:
( x́−μ ) √ n 1.A supermarket owner believes that the mean
Z=
σ income of its customers is P50, 000 per month. One
x́- is the sample mean hundred customers are randomly selected and asked
- is the population mean of their monthly income. The sample mean is P48,
n - is the sample size 500 per month and standard deviation is P3, 200. Is
- is the population standard deviation there sufficient evidence to indicate that the mean
income of the customers of the supermarket is P50,
z – test n 30 → is known 000 per month? Use a = 0.05.
•Sample mean 1 vs Sample mean 2 and 2 sample Answer:
standard deviation are known. Since n = 100, and only one sample mean is given,
( x 1−x 2 ) use the z-test, and the test statistic is:
Z=
s12 s 22
√ +
n1 n2
x 1- is the mean of sample 1
Z=
( x́−μ ) √ n
σ
Substituting μ=50000 , x́=48500 ,
x 2- is the mean of sample 2 n=100 , σ=3200 in the formula, you obtain
n1 & n2 - are the sample sizes ( 48500−50000 ) √ 100
Z=
s1 & s2 - are the sample standard deviations 3200
−1500 ( 10 ) −1500
z – test n 30 → is known ¿ =
3200 3200
•Sample mean 1 vs Sample mean 2 and population ¿−4.69
standard deviation is known. 5 – Step solution
n1+n2 -2
√
n1 +n2 −2
+ ¿¿
n1 n 2 • Dependent samples are drawn from the same
population or the same set of samples subjected to
different experimental conditions, like weight
S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 before and after attending aerobics, pretest and
TBI 30 28 29 20 18 19 16 27 22 24 26 28 30 29 18 posttest, etc. • Correlated samples are two
IGI 25 27 20 30 16 21 15 25 28 21 19 17 18 1 different sets of samples which are related, like
Method n x́ sx mother and daughter, brother and sister, old and
TBI 15 24.27 4.98 new machine, etc.
IGI 14 21.07 5.21 Example:
•Based on the result of the test, can you say that the The following are the weights in pounds of 15
TBI method of teaching is more effective than the students before and after six months of attending
IGI method? Use = 0.05. aerobics.
Example: Weights Before 243 179 201 165 183 153 170
A teacher wants to find out if the Team-based Weights After 231 173 199 162 179 152 164
Instruction (TBI) method of teaching Statistics is Difference 12 6 2 3 4 1 6
more effective than the Individually-guided Weights Before 180 212 169 178 209 158 192 144
Instruction (IGI) method. Two classes of Weights After 177 207 170 171 196 159 190 140
approximately equal intelligence were selected. Difference 3 5 -1 7 13 -1 2 4
From one class, he/she considered 15 students with •Test at = 0.05 if aerobics is effective in reducing
whom he/she used TBI method of teaching and weight.
from the other class, he/she considered 14 students Solution: Since the two sets of data are taken from
with whom he/she used IGI method. After several the same set of sample, use the t-test and the test
sessions, a 30-item test was given. The scores are statistic is :
shown in the following table. •First, get d́and sd . By using the formula or
Solution: Since n1 = 15 and n2 = 14, and there are calculator for finding the mean and standard
two independent samples, use the t-test and the test deviation, you find that the mean of the difference is
statistic is: 4.40 and standard deviation is 4.05.
Substituting: Substituting:
x 1=24.27 x 2=21.07 d́=4.40 s d =4.05 n=15
s1=4.98 s2=5.21
4.40 √ 15
n1 =15 n2 =14 t= =4.21 ; df =15−1=14
4.05
24.27−21.07 If you let :
t=
√ (15−1 ) ¿¿ ¿ ¿ μ B=The mean before attending aerobics
df =15+14−2=27 μ A =The mean after attending aerobics
5 – step solution
1.H0 : The TBI method of teaching Statistics is as 5 – step solution
effective as the IGI method. (H0 : TBI=IGI) 1.H0 : Aerobics is not effective in reducing weight.
Ha : The TBI method of teaching Statistics is more (H0 : B=A )
effective as the IGI method. (Ha : TBI>IGI) Ha : Aerobics is effective in reducing weight.
2. = 0.05; one-tailed; df = 27; t tab = 1.703 (Ha : B ≠A )
2. = 0.05; two-tailed; df = 14; t tab = 2.145 σ 2 = 1,120 / 5 = 224
3. Decision rule: Reject H0 if , s2 = 1,120 / 4 = 280
|t c ( 4.21 )|≥|t tab ( 2.145.)|that is if 4.21 2.145. STANDARD DEVIATION: The square root of the
4. Decision: Reject H0 , since tc (4.21) > t tab Variance
(2.145). BOYS
5. Conclusion: Based on the sample evidence, σ 2 = 224 s2 = 280
aerobics is effective in reducing weight. σ = 14.97 s = 16.73
6 ∑ d2 Definition:
ρ ( rho )=1
[ n(n2−1) ] • Regression Analysis deals with the estimation of
one variable based on the changes or movements of
Where: the other variable.
1 and 6 = constant • Regression Equation: Y = a + bx
d = the difference in ranks and
n = the number of pairs
BSE Rank BA Rank d D squared a=
∑ y−b ∑ x
N
1 1 0 0 b=N ¿ ¿
6 5 1 1
5 6 -1 1 Linear Regression of Y on X
4 3 1 1 •In a regression equation Y = a + bx, “a” which is
9 9 0 0 constant is the value of y-intercept while “b” is the
3 2 1 1 slope of the regression line.
2 4 -2 4 •The regression line is the line which traces the
8 8 0 0 general direction of the points plotted in the scatter
7 7 0 0 diagram. It is called the Trend Line or the Least
8 Square Regression Line (LSRL) because it is the
6 (8) line which gives the minimum sum of the
ρ ( rho )=1
[
9(92−1) ] differences from the actual values.
•Thus, Y = a + bx is the linear regression of Y on
48
¿ 1− =0.93 X, and is used to predict the value of Y from the
720
knowledge of X. the line which gives the minimum sum of squares of
the differences from the line parallel to the x axis.
•Two types of variables are involved in the •Thus, “X = a + by” is the linear regression of X on
regression equation: Y, and is used to predict the value of X from the
1. The predictor (independent) variable which is knowledge of Y. The Formulas are indicated in the
“x” in the regression equation (Y = a + bx). box.
2. The predictand (dependent) variable which is
“y” in the regression equation.
•Again, take as your example the hours spent in • Regression of X to Y: X =a+by
studying (x) and grades received (y) to predict the b=N ¿ ¿
grades received (y) using the knowledge of the
number of hours spent in studying (x). 10 ( 2914 )−(38)( 734)
b= = 0.1481
•In getting Pearson’s r , the same value can be 10 ( 54718 )−¿ ¿
obtained even if x and y are interchanged. Enter x
then y or y then x and the same r (0.89) will be ∑ x−b ∑ y
obtained. •However, you have to enter first the a=
N
independent variable, in this case, x followed by
the dependent variable y. They cannot be 38−0.1481(734)
interchanged! a= =−7.07
10
Now predict the grade (Y) of students whose • Thus, X = -7.07 + 0.15Y. This is the equation
number of hours spent for studying are: which will be used to predict the number of hours
• First have to set up the regression equation “Y = a spent in studying from the knowledge of the grades
+ bx” received are:
• Get “a” by pressing SHIFT then “A” or its a) 87: X’ = -7.07 + 0.15(87) = 5.98 6 hours
equivalent, and “53.30508475” will display
b) 60: X’ = -7.07 + 0.15(60) = 1.93 2 hours
• Get “b” by pressing SHIFT then “B” or its
equivalent, and “5.288135593” will display
• Now, can you use the calculator to get the “a and
• Putting them together in the regression equation
b” for the regression equation of X and Y the way
“Y = a + bx”, you have: “Y = 53.31 + 5.29x”, that is
you did for the regression of Y and X? YES! But
rounding off A and B to two-decimal places.
you have to interchange “x and y” thus, the grade
takes the role of X while the number of hours spent
• To predict the grade received (Y’) when the
in studying becomes Y.
number of hours spent in studying is:
• Notice that you get exactly the same values for “a
a) 7 hours: substitute 7 to x in your equation, Y’ =
and b” as what you got using the formula. It gives
53.31 + 5.29(7) 90.34 or 90.
you the same equation X = -7.07 + 0.15Y.
b) 1 hour: substitute 7 to x in your equation, Y’ =
Therefore, to make your job easier, and to avoid
53.31 + 5.29(1) 58.6 or 59. using the formula, you can just interchange X and Y
c) 45 min: substitute 7 to x in your equation, Y’ = as what you did above.
53.31 + 5.29(45/60) 57.28 or 57. • Now since you interchanged X and Y, your
equation becomes: Y = -7.07 + 0.15X.
Linear Regression of X on Y
•If you want to predict the number of hours spent in
studying given the grades, can you use the same
equation “Y = 53.31 + 5.29x”.? The answer is
NO!
•This time, the independent variable (x) is the
grade while the dependent variable (y) is the
number of hours spent in studying.
•This is now the linear regression of X on Y. The
Least Square Regression Line (LSRL) this time is