Documente Academic
Documente Profesional
Documente Cultură
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Terminology
Notations
Populaions/Greek = μ, σ, Ν
Sample/Roman= x, s, n
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Content
Types of data
Categorical/Qualitaive
Numerical/Quanitaive
Cross Secional
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Terminology
Ordered Array: Data in Order (e.g. 1,2,3,4,5)
Number Frequency
1 4
2 2
3 3
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Content
Graphs
Class Intervals
Puts data into class intervals/bins
o E.g. Instead of 1,6,15,18,19,22,30
Class Interval Frequency
1-10 2
11-20 3
21-30 2
Every class grouping has same width
o Determined using below formula
Width of interval = Range/number of desired class groupings
Histograms
Uses midpoint of class intervals
Verical axis = frequency or relaive frequency or percentage
Coningency Tables
Can be in frequency or percentage
Used for bar charts
o Frequency
Cola Preference
Total
Regular diet
Asian 12 3 15
Ethnicity Caucasian 12 13 25
Other 6 4 10
Total 30 20 50
o Percentage
Cola Preference
Total
Regular diet
Asian 12/15=80% 20% 100%
Ethnicity Caucasian 48% 52% 100%
Other 60% 40% 100%
Total 60% 40% 100%
Scater Diagrams
Examine relaionships between two numerical variables (X, Y)
o X = Independent variable [Horizontal/X axis]
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Time-Series Plot
Studies paterns in values over ime
Examines one variable
o Y = the variable [verical axis]
o X= Time [horizontal axis]
=
∑x
n
n
∑ Xi X 1 + X 2 +⋯+ X n
i =1
X= =
= n n
x̅: X-bar/Mean/Averagew
∑: Sum of/add up
n+1
Posiion of Median =
2
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
E.g. 1,2,3,3,3,3,4,5,5
3 has the largest frequency, occurring 4 imes
o Therefore, mode = 3
E.g. 3,5,7,8,9,11,15
Range = Largest Number – Smallest number
Range = 3 – 15
Range = 12
Named Q1 , Q2 , Q3 , Q4
Q1=( N + 1)/4
Q2=( N + 1)/2
Q3=3∗( N +1)/4
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Terminology
Variance: Litearlly, just the Standard deviaion2..
Note: Remember to convert to St Dev by using √Square root.
Standard Deviaion/Sigma/ σ/s: The averaged deviaion of each number from the average.
aka MAD
Aka Mean absolute distribuion
aka Mean Squared Deviaion
Content
Describing data
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Measures of Dispersion
Standard deviaion
Unit used to count how far away a number is from the mean
Formula
o Sum of all numbers – the average squared / number of total numbers – 1
x́
X i−¿
¿
¿2
o ¿
∑¿
¿
S= √ ¿
S = Standard Deviaion
∑ = Sum of all
X i = any of the numbers/results
x́ = The average
n = total number of numbers/results
For example
o Standard deviaion of : 2,5,10,15,18,21,44
o Average/ x́ = 16.42
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
x́
X i−¿
o ¿ = 1165.71
¿
∑¿
Explanaion: (2 - 16.42)2 +(5-1.42)2 +(10-1.42)2 …
o N-1 = 7-1 = 6
Therefore
√ 1165.71
6
Standard deviaion = 13.93
= standard deviaion
Variance
Deiniion = The mean squared deviaion
Diference to standard deviaion
o It is in a squared2, format as it doesn’t have the square root.
o It is expressed using speciic greek/roman , populaion/sample characters
o Populaion does not use N-1, instead it just uses N
o Sample uses N-1 to adjust for biasness of sample staisics
o Sample variance = s 2=
∑ (x i− x́ )
n−1
o Populaion variance = σ 2=
∑ (x i−µ)
N
EXAM NOTE: Be aware of whether the quesion gives you
o Standard deviaion or variance
o If they only give variance
Remember to square root it
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
o CV = ( Sx́ )∗100 %
S= Standard Deviaion
x́ = Average (X-bar)
Z-Scores
Diference between given observaion(X) + mean( x́ ¿ , divided by the standard deviaion
Formula
x−x́
o Z=
S
X = given observaion/number/result/value
x́ = Mean/average
S = Standard deviaion
Z = the score that you will be using to look on the z table
Table value = a probability to the let of the graph.
Also used to describe the number of standard deviaions from the mean a value is
o E.g. z score of 2.0 means that a value is 2.0 standard deviaions away from the mean
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Distribution Shape
Describes how data are distributed
o Let Skewed
Mean<Median
o Symmetric
Mean= Median
o Right Skewed
Median< Mean
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
cov ( X ,Y )=
∑ ( x i−x́ )( y i− ý ) (see above)
n−1
Sx = Standard deviaion of x values
Sy = Standard deviaion of y values
x́
X i−¿
¿
¿2 = Standard deviaion formula
¿
∑¿
¿
S= √ ¿
Example
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Terminology
Regression analysis: Explains and measures relaionship between two variables.
Content
Simple Line Regression
Helps with predicion
Relaionship between Y and X are a Causal relaionship
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Probability
Interpreting probability
Probability
o Chance that an outcome is achieved in a paricular experiment
o Notaion for the probability of an event, A, occurring is P(A)
Notaion for event = A
Notaion for Probability = P
Notaion for probability of an event = P(A)
o Probability of an event ranges between 0 and 1
0% chance to 100% chance
o The probability of all simple events must = 1
E.g. probability of rolling a dice for 1,2,3,4,5 or 6 = 100% = 1
Assessing Probabilities
A priori classical probability
o Based on prior knowledge
o E.g. based on the symmetrical nature of an experiment
Flipping a coin (1/2)
Rolling a die (1/6)
Empirical classical probability
o Based on observed data or repeated experimentaion to assign probabiliies
NA
o P ( A )=
N
N = number of trials
Subjecive probability
o Based on individual judgement or opinion about the probability of occurrence
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
A U B) = P(A)+P(B) –P(A ∩ B)
P¿
o Example – Probability that the individual likes coke
or is female
P(Coke U Female) = P (Coke) + P(Female) – P(Coke ∩ Female)
120 110 75
¿ + −
200 200 200
= 0.6 + 0.55 – 0.375
= 0.775
o BIGGEST NOTE: the key here is the words EITHER/OR. The biggest mistake is when
people confuse or/either with AND.
E.g. Individual likes coke OR is female (U)
vs
Individual likes coke AND is female (∩)
And has an ‘n’ in it… ∩
Condiional Probability
o Probability of one event condiional upon the state of another event
o KEY WORD: given
The condiional probability of A given we know about B
o P(A | B)
Note: B is the condiion/given
o Example
Given a person is male, what is the probability that they prefer Pepsi
P (Pepsi | Male)
What is the probability that an individual is female given they prefer coke
P(Female | Coke)
o EXAM NOTE
They might not give us a coningency table (as above) and do a full text
quesion.
Pay atenion to the words used and use the following formula
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
P (A ∩ B)
P ( A|B )= Remember B is the GIVEN
P(B)
EXAMPLE
Situaion
The probability that Mark has lunch in the tearoom is 0.6 and the
probability that Gary has lunch in the tea room is 0.5. However,
Mark and Gary don’t like each other and the probability that they
have lunch together in the tea room is 0.1
Mark Tearoom = 0.6
Gary Tearoom = 0.5
Both Tearoom = 0.1
Quesion
what is the probability that Mark will have lunch in the tea room
given Gary is having lunch in the tea room?
Or
Given Gary is having lunch in the tea room, what is the probability
that mark will have lunch in the tea room?
P (A ∩ B)
P ( A|B )=
P(B)
o B= Given = Gary = 0.5
o A = Probability event = Mark = 0.6 (NOT NEEDED IN THIS Q)
o P(A ∩ B) = Gary AND Mark = 0.1
P ( A ∩ B) 0.1
P ( A|B )= = =0.2
P(B) 0.5
REMEMBER
B = GIVEN/the condiion
A = The EVENT we are trying to ind the probability for
Mutually exclusive Events
o If events are mutually exclusive they cannot occur together.
E.g. male and female are mutually exclusive, you cant be both… or can you?
E.g. in rolling a die, events of 1 and 2 are mutually exclusive
Collecively exhausive events
o One set of the events must occur
o The set of events covers the enire sample space
E.g. male and female are collecively exhausive
E.g. in rolling a die, 1,2,3,4,5 and 6 are collecively exhausive
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Learning objectives
Random variable
Probability distribuions
o Discrete
o Coninuous
Binomial distribuion
Terminology
Random Variable:
variable that assumes numerical values according to the random outcome of an experiment.
notaion = X
Content
The Addition Rule
The formula can be rearranged
o This enables us to us an EITHER/OR probability to ind an AND probability and vice
versa
The below will ind Either/OR
P ( A ∩ B ) =P ( A )+ P ( B )−P ( A U B )
Example
o P(A U B) = 0.8 = Probability that A OR B occurs
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
P= Probability
A= Event (that you’re trying to ind probability of)
B= Given/Condiion (of A happening)
P(A ∩ B) = Probability A and B occur = NAB
Note: For some reason the lecturers used N and P interchangeably, I dunno why
Example
o Situaion
In a recent study it was found that probability that a speeding driver was male given
they were driving a ‘big car’ was 0.6. The probability that the car speeding was a ‘big
car’ was 0.8
o Quesion
What is the probability that the car speeding was a ‘Big car’ AND driven by a male
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Random Variable
Random variable
o Notaion = X
o Is a variable that assumes numerical values according to the random outcome of an
experiment
o Can be discrete or coninuous
Remember:
Discrete = no decimals (E.g 3)
Coninuous= decimals (E.g 3.14159)
Probability distribuion of a random variable, X, describes the probability that X will take on
for each of its possible values.
Probability distribuions can be discrete OR coninuous
o Discrete distribuion = Binomial Distribuion
o Coninuous Distribuions = Normal distribuion
Discrete Distributions:
Expected Value:
Formula
N
o µ=E ( X )=∑ xi p(x i)
i=1
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Variance:
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Variance
o Weighted average of the squared diference between each observaion from the
expected value of the distribuion
o Measure of dispersion
o An indicator of RISK in investment
o Formula
Xi
¿
E(X )
−¿
¿
N
σ =V ( X )=∑ ¿
2
x
i =1
EXAMPLE
o Calculate the expected variance for each stock as a measure of the stocks Risk
o Stock A
Xi
¿
E(X )
o −¿
¿
N
σ 2x =V ( X )=∑ ¿
i =1
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Binomial Distribution
Discrete distribuion where the underlying experiment has only two outcomes
o Success or failure
Binomial experiment possess the following properies
o Fixed number of trials, n
o the probability π of success is constant for each trial
o each trial is independent of the other trials
The binomial random variable (X)
o Number of successes is n trials of the binomial experiment
Example: if we toss a coin 3 imes what is the probability…
Formulas
o Mean
µ=E ( X )=np
o Variance and standard deviaion
σ 2=NP ( 1−P )=variance
σ =√ np ( 1− p ) = Standard deviaion
N= Sample size
P = probability of success
(1-p) = probability of failure
Shape of the binomial distribuion depends of the values of p and n
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Example
o Situaion
If sausage machine is working properly, no more than 10% of the output
produced will be defecive
No. sausages sampled, n= 10
Inspect and record the no. of defecives, x
o N= no. sampled = 10
o X = no. defecives
o π = prob. Of defecive = 10%
Quesions
P(X=1) = 0.387
P(X ≥2)
o =P(x=2) + p(X=3)… + P(X=10)
o =0.194+ 0.057+ 0.011 + 0.001…
o 0.263
P(X≥4)
o P(x=4) + p(X=5)… + P(X=10)
o 0.011 + 0.001
o 0.012
How to do the above quesions
Use your binomial tables
o Find the table where N = what you have
In this example n = 10
o Go down the table with the ‘x’ you need e.g. 2
o Go across the table with the π/probability you have e.g. 0.1
Interpreing your answers
P(X ≥2) = 0.263
o We expect to ind 2 or more defecive sausages in 26.3% of
samples of 10 sausages, if the machine was producing 10%
defecive.
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Standardisaion
Z-tables
Inverse Z-Tables
Rearranging Z Formula
Terminology
Coninuous random variable: variable that can assume any value on a coninuum
Normal Distribuions: Where results are evenly spread from mean (50% above 50% below)
Content
Summary
Normal Distribuion
o Symmetric around the mean
o Area under curve = 1
o Area under curve represents probability
o Determined by Mean and standard deviaion
Standard normal (Z)
o Mean = 0
o Standard deviaion = 1
o Transform:
X −μ
o Z=
σ
Using the tables:
o P(Z<cutof) = Probability
Types of Q’s:
o Less than
o Greater than
o In between
o Inverse problems
o Solve the 4th variable in z = x – mean/standard deviaion
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Normal distributions
Characterisics
o Coninuous random variable
o – ininite < X < + ininite
o Area under curve = 1
o Mean and standard deviaion uniquely determine a normal distribuion
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
o
Inverse normal table
o Use it when you need to ind a Z-score or X value (plug Z-score into inverse formula)
You are provided with a percentage
That is to the right hand tail (upper tail)
o For example
Find Min weight for heaviest 1% of cans by weight
Mean = 400g
St dev = 10g
Area given 1% (to the right)(as it says heaviest)
Formula
X=μ+ Zσ
X=400 g+Z 10 g
o Two variables sill exist (X and Z)
o Use inverse table to ind Z
X=400 g+2.3263∗10 g
X=423.26
Golden rule
You can use the formula to ind X or Mean or ST dev
o They will give you all known variables + (use table to ind z)
o Then just rearrange formula as needed
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Terminology
Staisical inference: when a sample is selected to draw conclusions regarding a populaion
Probability distribuion: all the possible sample means that can occur in a graph
Sampling distribuion: just a subset/speciic name for probability distribuions regarding samples
Uniform distribuion: all outcomes have equal probability (like a coin has ½ for T or H)
Summary
Fundamentally the exact same as week 6 (Finding probabiliies)
X −μ
Content is based around the Z formula Z =
σ
Adds an extra step
o Find probability a sample will have a certain mean instead of a single observaion will
have a certain value
o Observaion -> Sample & value -> mean
( x́−μx́ ) ( x́−μ)
Z= =
Z Test is changed to σ x́ σ
√n
We use above formula for:
o Normal distribuions of populaions
o Non-normal distribuions of populaions when N ≥ 30
(Central limit theorem)
Content
Sampling Distributions
Distribuion of possible values an sample staisic may take or spread around the populaion
parameter of interest inaccuracy
o Every sample staisic calculated is a random variable
o Every random variable will have a distribuion
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
o Formula = μx́ =
∑ x́ i
N
x́ i = Any of means (18, 19, 20…, 24)
N=¿ Populaion total
∑ ¿ Sum of all
Plain English = Populaion Mean = sum of all sample
means divided by populaion total.
o
18+19+19+ 20+ 20+20+21+21+21+21+22+22+22+23+23+24
μx́ =
16
o μx́ =2 1
We can also now calculate the Populaion standard deviaion (
σ ¿ of the sample means (x-bar)
o Notaion = σ x́
Formula
o Plain English
Populaion standard dev of sample means Equals
The square root Of
Sum of all
Sample means Minus populaion mean Squared
divided by
total populaion
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
x́i −μx́
¿
¿2
o ¿
∑¿
¿
σ x́ =√ ¿
2
24−21¿
¿
¿ 16
2
o 19−21 ¿ +… ¿ 1.58
18−21 ¿2 +¿
¿
¿
σ x́ =√ ¿
Noice
With the above we have….
o Something that looks like/is a
normal distribuion
o A mean
o A standard deviaion
o X values… (which are the X-bars)…
Even though it is prety much a normal
distribuion, we call it a sampling
distribuion
We can now answer probability quesions
X −μ
using Z=
σ
o A step back to what we have found (Populaion vs Sampling distribuion)
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
x́i −μx́
¿
¿2
Before we said ¿
∑¿
¿
σ x́ =√ ¿
o This gets us our absolute correct standard deviaion
But we can only do it when we have the populaion total and when we know
all the possible sampling averages
If population is normal
If OUR POPULATION IS NORMAL we can say
σ
o σ x́ = and μx́ =μ
√n
σ = Standard deviaion of populaion
n = total sample number
the greater our n (the more samples we get)the smaller our
standard deviaion is
o which is what we want because it shows greater accuracy as
there is less deviaion from the mean
o Our Z formula!!!
Original (week 6)
X −μ
Z=
σ
Z formula for Sampling distribuion of mean
( x́−μx́ ) ( x́−μ)
Z= =
σ x́ σ
√n
Comparison
they share the exact same format
μ→ μx́ =μ = mean becomes mean of the samples’
averages
σ
σ → σ x́ = = st dev becomes st dev of the samples’ st devs
√n
EXAMPLE
o Quesion
How likely is it we would get a mean ill from a sample of 25 botles which has a result
of 598mls or less?
N = 25
X-bar = 598
µ = 600
σ = 10
P(x-bar<598)
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
o Answer
( x́−μx́ ) ( x́−μ)
Z= =
σ x́ σ
√n
(598−600)
¿
10
√ 25
−2 −2
¿ = =−1=0.1587 probability
10 2
5
Note: -1 .0 on the normal table is 0.1587
If populaion mean = 600, there is a 15/87% chance a sample of 25 botles
would produce a sample mean of less than 598 mls.
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
√
σp π (1−π )
n
o State rules
σ p=
μ =π
√ π (1−π )
n
p
o Idenify known values
o Find µp and σp
o Complete original formula
o Use table to convert z-scores to probabiliies
EXAMPLE
o Quesion
Voters who support proposiion A is 0.4, what is the probability that a sample size of
200 yields a sample porion between 0.4 and 0.45?
o Formula
p−π p−π
Z= =
o
σp
Known informaion
√
π (1−π )
n where
μ p=π and
σ p=
√ π (1−π )
n
Π = µp = 0.4
N= 200
P(0.40 ≤ Z ≤ 0.45)
σ p=
√
π (1−π )
n
=
√
0.4 (1−0.4 )
200
=
0.24
200
= √ 0.0012=0.03464
√
0.40 ≤ Z ≤ 0.45=p
0.4−0.4
0.03464 (
≤Z
0.45−0.4
0.03464
=P (0 ≤ Z ≤ 1.44) )
p¿
Z- score 0 = 0.5000
Z- score 1.33 = 0.9251
p ( 0≤ Z ≤ 1.44 )=P ( z <1.44 )−P ( z <0 )=0.9251−0.5=0.4251
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Answer
42.51% chance that a sample size of 200 yields a sample porion
between 0.4 and 0.45.
Week 8 – Estimation
Learning objectives
Conidence intervals for the populaion mean (µ)
o When populaion standard deviaion σ is known
σ
x́ ± Z Use inverse Z table with α/2 in each tail
√n
o When populaion standard deviaion σ is unknown (when s is given)
S
x́ ± T n−1 use inverse T table with α/2
√n
o Answer format __________≤ µ ≤ _________ with ___% conidence
o Formula rearrangement to ind sample size:
z2 σ 2 Always round answer up to next whole number
n=
e2
Conidence intervals for populaion porion (π)
o
p± z
√ p(1−P) Use inverse Z table α/2 in each tail
n
Answer format __________≤ π ≤ _________ with ___% conidence
o Formula rearrangement to ind sample size:
2
z π (1−π )
n= 2
always round up answer to next whole
e
Terminology
Point esimate: value of a single staisic (best guess of mean)
Conidence interval: range of values around the point esimate. Limited by upper and lower
boundaries. Range is created by having paricular level of conidence. e.g. 95% conidence interval
means we are 95% conident mean is within the interval.
Conidence interval and Alpha (α): You can rearrange the formula (just remember area under graph
is always = to 1. Therefore:
1 = Conidence interval + α
1- Conidence interval = α
1- α = Conidence interval
Content
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Point esimate
We can esimate a populaion parameter… With a sample staisic
(point esimate)
Mean µ X-bar
Proporion π p
Confidence Interval
An interval gives a range of values
o Takes into consideraion variaion in sample staisics from sample to sample
o Based on observaions from 1 sample
o Gives informaion about closeness to unknown populaion parameters
o Stated in terms of level of conidence
The general formula for all conidence interval is:
o Point esimate +/- (criical value) * (standard error)
(1 – α)
o Common conidence levels = 90% 95% or 99%
Also writen (1 – α) = 0.9, 0.95 or 0.99
o A relaive frequency interpretaion
In the long run, 90%, 95% or 99% of all the conidence intervals that can be
constructed (in repeated samples) will contain the unknown true parameter.
o For example, if we were to randomly select 100 samples and use the results of each
sample to construct 95% conidence intervals, approximately 95 out of 100 would
contain the populaion mean.
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Example
o Quesion
Form 95% conidence interval for µ
o Given informaion
n = 25
x- bar = 50
s=8
Tn-1 = Use table and locate: = 2.0639
Degrees of freedom = N-1 = 25-1 = 24 = Degrees of freedom (for T
table)
Area (α)
o Upper tail = (1 – 95% Conidence)/2 = 0.025 (for T table)
o Lower tail = (1 – 95% conidence)/2 = 0.025 (For T table)
o Formula
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
S
x́ ± T n−1
√n
o Working
S
95% conidence = x́ ± T n−1
√n
= 50 ± (2.0639 ×
8
√25 (8
¿=50 ± 2.0639 × =50 ± ( 2.0639 ×1.6 )=50 ±3.30224
5 )
Upper Tail = 50 + 3.30224
o 53.30224
Lower tail = 50 – 3.30224
o 46.69776
o Answer
95% conident the populaion mean is within 46.698 and 53.302
o Remember
Draw your graphs and use visualisaions
Start by inding what T is
o Find Degrees of freedom
o Find the area of EACH tail
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
p± z
√
p(1−P)
n
=0.25 ±1.96
0.25(1−0.25)
100 √
=25 ± 1.96
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
σ
x́ ± Z
√n
σ
Remember that Z was the sampling error
√n
we denote sampling error as ‘e’
Therefore,
σ
Sampling error = Z
√n
σ
E= Z
√n
If you don’t have ‘n’ we have to rearrange the formula
o Swap E with N (so we isolate N)
σ z2 σ 2
E= Z →n= 2
√n e
FORMULA TO FOCUS ON
z2 σ 2
n= 2
e
For example ‘Finding Sample size, n, when the quesion doesn’t give it to us’
o Quesion
Find sample size ,n,
o Given informaion
Standard deviaion = 45
± 5 with 90% conidence interval
Z = Area = 1- conidence interval= 1- 0.9 = 0.1
Upper tail = area/2 = 0.05 (Use inverse normal table)
Lower tail= area/2= 0.05 (Use inverse normal table)
o Z = 1.6449
E=5
σ
Remember that the inal step to inding true mean was ± Z
√n
σ
o Therefore ± 5 = ± Z =e
√n
o Formula
2 2 2 2
z σ 1.6449 45 2.7057 ×2025 5479.0344
n= 2
= 2
= = =219.16
e 5 25 25
N = 220
Remember to round up (you can’t have a part sample e.g. having
219.16 students in a class)
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Sampling error = e =
o Quesion
how large a sample would be necessary to esimate the true porion of a defecive in
a large populaion?
o Given informaion
±3% with 95% conidence
P = 0.12
o Appropriate formula
2
z π (1−π )
n=
e2
o Thought process
we have ,p(sample), aka, π (populaion)
= 0.12
We have e
E= sampling error = 3% = 0.03
We need to ind Z before we can use the formula
z 2 0.12(1−0.12)
n=
0.032
Find Z
Z = value that plots upper and lower tails = 95% conidence = 0.95
o Using inverse normal table (which gives us area to the right)
1-0.95 = 0.05
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Hypothesis Testing
All we are doing is tesing a ‘claim’, about a populaion, to see if it is true or not. We do this by taking
a SAMPLE and make a conclusion about the populaion.
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
o to ind p-value just ask the quesion, what is the probability that you get your sample
staisic (mean or proprion).
remember signiicance level (alpha) is in a percentage/probability
therefore your p-value is also a percentage/probability
Notations guide:
H0: Null Hypothesis (what the current hypothesis is)
H1/Ha: Alternaive hypothesis (what you are proving is correct/incorrect)
α: alpha, area of rejecion (level of signiicance=total size of reject region)
D.R: Decision Rule – when do you reject H0 (Null Hypothesis)
T.S: Test Staisic – The formula you use
T.S.V: Point esimate from the test staisic
Con: Are you rejecing or keeping H0 (Null hypothesis)? State the signiicance level and contextualise
or
is there enough evidence to support Ha? Yes/no. Do you reject H0? Yes/no.
n: Sample Size
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
x́−μ
t n−1 =
S
√n
Proporion Test π=? π≠? Z-Test (Proporion)
p−π
Z=
√ π (1−π)
Two Sample Test (Means) Lower: µ1 ≥ µ2 i.e., µ1 - µ2 ≥ 0 Lower: µ1 < µ2 Pooled variance Test
Sigma unknown i.e., µ1 - µ2 < 0 ( x́ 1−x́ 2 )−( μ 1−μ 2 )
variances assumed equal T STAT =
Assumpions
N=minimum 30
or
Upper: µ1 ≤ µ2 i.e., µ1 - µ2 ≤ 0 Upper: µ1 µ2
i.e., µ1 - µ2 > 0
√( s2p
Where
1 1
+
n1 n 2 )
normally distributed
2 ( n1−1 ) S21 + ( n2−1 ) S 22
s p=
( n1−1 ) + ( n2−1 )
Two: µ1 = µ2 i.e., µ1 - µ2 = 0 Two: µ1 ≠ µ2 Conidence interval (extra)
√
i.e., µ1 - µ2 ≠ 0
(x́ 1−x́ 2) ±t n +n −2 s p
1 2
2
( n1 + n1 )
1 2
Degrees of freedom = n1 + n2 - 2
Two Sample Test (means) Lower: µ1 ≥ µ2 i.e., µ1 - µ2 ≥ 0 Lower: µ1 < µ2 Does this even have a name?
sigma unknown i.e., µ1 - µ2 < 0 ( x́ 1−x́ 2 )−( μ 1−μ 2 )
NOT assumed equal T STAT =
Assumpions:
-populaions are
normally distributed or
Upper: µ1 ≤ µ2 i.e., µ1 - µ2 ≤ 0 Upper: µ1 µ2
i.e., µ1 - µ2 > 0
√( S 21 S 22
+
n1 n2
Tstat has no D.f. v =
)
both sample sizes are
s21 s22 2
atleast 30 + ¿
-populaion variances are n1 n2
unknown and cannot be Two: µ1 = µ2 i.e., µ1 - µ2 = 0 Two: µ1 ≠ µ2 ¿
2
assumed to be equal i.e., µ1 - µ2 ≠ 0 s1 2
¿
n1
¿
2
s2 2
¿
n2
¿
¿
¿
¿
¿
v =¿
F- Test for diference Lower: σ12 ≥ σ22 Lower: σ12 < σ22 F=S 21 /S 22
between two variances Where:
USE THE F TABLE S12 = variance of sample 1 (larger
sample variance)
N1 = sample size of sample taken from
populaion 1
2
S2 = variance of sample 2
Upper: σ12 ≤ σ22 Upper:σ12 ≤ σ22 N2 = sample size of sample taken from
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
populaion 2
N1-1 = numerator degrees of freedom
(from sample 1)
N2- 1= denominator degrees of
freedom (from sample 2)
Two: σ12 = σ22 Two: σ12 ≠ σ22
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Con : Do not reject H0 : not suicient evidence that true mean cost is diferent than $168
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Quesion:
o A Markeing company claims that it receives 8% responses from its mailing. To test
this claim, a random sample of 500 were surveyed with 25 responses.
o Test at α = 0.05 signiicance level
H0 : π = 0.08
H1 : π ≠ 0.08
α : 0.05 = signiicance level, area of two tails = a/2 = 0.025 = Z- score1.96
D.R : Z is < -1.96 or >1.96
T.S :
p−π
Z=
o
√ π (1−π)
n
Where
p = Sample proporion (Responses/total)= 25/500 = 0.05
π = µp = mean of samples proporion = 0.08
n = sample total = 500
0.05−0.08 −0.03 −0.03 −0.03
Z= = = = =−0.2472
T.S.V
Con
:
√ 0.08(1−0.08)
500 500√
0.0736 √ 0.0001472 0.12132
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
NYSE NASDAQ
Number 21 25
Sample Mean 3.27 2.53
Sample Std Dev 1.30 1.16
2
note: variance = St dev
H0 : µNYSE = µNASDAQ (µ1 = µ2 i.e., µ1 - µ2 = 0)
H1 : µNYSE ≠ µNASDAQ (µ1 ≠ µ2 i.e., µ1 - µ2 ≠ 0)
α : 0.05
a/2 :0.025
o T value = Degrees of freedom + 0.025
Degrees of freedom= NYSE + NASDAQ – 2 = 21+25 – 2 = 44
T values = 2.0154,-2.0154
D.R : If µ1 - µ2 < -2.0154, or µ1 - µ2 > 2.0154
T.S :
( x́ 1−x́ 2 )−( μ 1−μ 2 )
T STAT =
√ ( n1 + n1 )
o
s2p
1 2
Where
X-bar 1 = NYSE Mean = 3.27
X-bar 2 = NASDAQ Mean = 2.53
µ1 = unknown =0
µ2 = unknown =0
n1 = NYSE Number = 21
n2 = NASDAQ Number = 25
s1 = NYSE st dev = 1.30
S2 = NASDAQ st dev = 1.16
2 ( n1−1 ) S21 + ( n2−1 ) S 22
o s =
p
( n1−1 ) + ( n2−1 )
T.S.V :
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
o
( 3.27−2.53 )−( 0−0 ) 0.74 0.74 0.74
T STAT = = = = =2.040
√ 1.5021
WHERE
( 211 + 251 )
√ 1.5021× 0.08761 √ 0.1316 0.3627
o
2 ( 21−1 ) 1.32 + ( 25−1 ) 1.162 (20∗1.69)+(24∗1.3456) 33.8+32.2944 66.0944
s =
p = = = =1.5
( 21−1 ) + ( 25−1 ) 20+24 20+24 44
021
Con : There is suicient evidence to support alternaive hypothesis. Reject the null
NYSE NASDAQ
Number 21 25
Sample Mean 3.27 2.53
Sample Std Dev 1.30 1.16
note: variance = St dev2
H0 : µNYSE < µNASDAQ (µ1 = µ2 i.e., µ1 - µ2 = 0)
H1 : µNYSE > µNASDAQ (µ1 ≠ µ2 i.e., µ1 - µ2 ≠ 0)
α : 0.05
a/2 :0.025
o T value = Degrees of freedom + 0.025
Degrees of freedom= NYSE + NASDAQ – 2 = 21+25 – 2 = 44
T values = 2.0154,-2.0154
D.R : If µ1 - µ2 (0) <0.009 , or µ1 - µ2 > 1.471, reject the null hypothesis
T.S :
Where
√( 2 1
1
1
( x́ 1−x́ 2 ) ± t α/ 2 s p n + n
2
)
X-bar 1 = NYSE Mean = 3.27
X-bar 2 = NASDAQ Mean = 2.53
µ1 = unknown =0
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
µ2 = unknown =0
n1 = NYSE Number = 21
n2 = NASDAQ Number = 25
s1 = NYSE st dev = 1.30
S2 = NASDAQ st dev = 1.16
2 ( n1−1 ) S21 + ( n2−1 ) S 22
o s p=
( n1−1 ) + ( n2−1 )
T.S.V :
o
( 3.27−2.53 ) ± 2.0154 √ 1.5021 ( 0.0476+0.04 ) =0.74 ± ( 2.0154 ×1.1471 )=0.74 ± 0.7310=0.009 ,1.4
WHERE
o
( 21−1 ) 1.32 + ( 25−1 ) 1.162 (20∗1.69)+(24∗1.3456) 33.8+32.2944 66.0944
s 2p= = = = =1.5
( 21−1 ) + ( 25−1 ) 20+24 20+24 44
021
Con : There is suicient evidence to support alternaive hypothesis. Reject the null
NYSE NASDAQ
Number 21 25
Sample Mean 3.27 2.53
Sample Std Dev 1.30 1.16
Is there a diference in the variances between NYSE and NASDAQ at the 0.05% level
H0 : σ12 = σ22 Note: Standard deviaion2 = Variance
H1 : σ12 ≠ σ22
α : 0.05
a/2 : 0.025
o Degrees of freedom1 = 20 = n-1 (when using the F table[columns])
o Degrees of freedom2 = 24 = n -1(when using the f table [rows])
F-value upper = 2.33
F-value lower = D.f FLIPPED. [look at column 24 and row 20) = 2.41
You must invert 2.41 (because it’s a lower limit and 2.41>2.33)
1/2.41 = 0.41 = F-value lower
D.R : If F is < 0.41 or F is > 2.33 reject the null hypothesis
2
S1
T.S : F= 2
S2
o Where:
S12 = variance of sample 1 (larger sample variance)
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
36+31 67
ṕ= = =0.5491
72+50 122
Con : there is not signiicant evidence of a diference in proporions who will vote yes
between men and women. Do not reject the null hypothesis
Conidence interval
( p1− p2) ± Z α /2
√ p 1 (1− p1 ) p2 (1−p 2 )
n1
+
n2
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
allcells
Where
F0 = observed frequency in a paricular cell of the r x c table
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Con :
3.905 is < 5.991
Therefore, we don’t reject the null.
Downloaded by HY ER (erinhaneko@gmail.com)
lOMoARcPSD|2665969
Where
K = Number of categories or classes remaining ater combining classes
F0= observed frequency
Fe= Expected frequency
P = number of parameters esimated from the data
T.S.V :
Con :
Downloaded by HY ER (erinhaneko@gmail.com)