Sunteți pe pagina 1din 2

Applied Statistics Cheat Sheet_2017

Probabiltiy There can be more than one independent variables (β2, β3, β4…..) and
those variables can be compared by t-stat
Both… and… = A∩B If t-Stat value (absolute value) less than 2 means “Statistically
Either… or… = A∪B Insignificant”
If A∩B = ∅, it’s Mutually Exclusive Higher than 2 means “Statistically Significant”
If A∪B = 1, it’s Collectively Exhaustive
P(A) = no. of event A / total no. of outcome in sample space Discrete Random Variables

Joint Probability, Marginal Probability Probability Density Function----- P(x) = P(X=x), ΣP(x) = 1
∑𝑛
𝑖=1 𝑊𝑖 𝑋𝑖 𝑊1𝑋1+𝑊2𝑋2+⋯+𝑊𝑛𝑋𝑛
Weighted average---- X̅ = =
∩ B1 B2 𝑛 𝑛
A1 P(A1∩B1) P(A1∩B2) P(A1)
A2 P(A2∩B1) P(A2∩B2) P(A2) Cumulative probability function---- F(x0) = P(X≤x0) = ∑𝑥≤𝑥0 𝑃(𝑋)
P(B1) P(B2) 1 Expected value of a function---- E[g(X)] = ∑𝑥 𝑔(𝑥)𝑃(𝑥) (Similar
concept as average), E(X) = ∑𝑥 𝑥𝑃(𝑥)
Conditional Probability Variance---- σx2 = E(X-μx)2 = ∑𝑥(𝑥 − μx)2 𝑃(𝑥)
P(A∩B)
P(A|B) = → P(A∩B) = P(A|B) P(B) Linear Function of a Random Variable---- mean μy = a+b𝜇𝑥 ,
P(B)
If P(A∩B) = P(A)P(B), they’re Statistically Independent → P(A|B) = variance σx2 = 𝑏 2 σx2 Standard deviation σy = |𝑏| σx
P(A) and
P(B|A) = P(B) 𝑋−𝜇𝑥
Standardization of a Random Variable-- Z = , E(Z)=0, Var(Z)=1
P(A̅) = 1 – P(A) (Complement Rule) 𝜎𝑥

P(A∪B) = P(A)+P(B) – P(A∩B)


P(A∪B) = P(A)+P(B) – P(A) * P(B), if Statistically Independent Bernoulli trial---P(success) = π, P(Failure) = 1- π, Mean = π,
P(A∪B) = P(A)+P(B), if P(A∩B) = 0 Variance = π(1- π)
De Morgan’s law---(𝐴̅̅̅̅̅̅̅ ̅̅̅̅̅̅̅
∪ 𝐵 )’ = A̅ ∩ B̅ and (𝐴 ∩ 𝐵)’ = A̅∪B̅ Combination---- Cxn =
𝑛!
, in calculator: nCr
𝑥!(𝑛−𝑥)!

Population size = N, Population mean = μ Binomial distribution (BD)---- P(x) = Cxn πx(1- π)n-x,
Sample size = n, Sample mean = X̅ Mean of BD πx = E(X) = nπ, Variance of BD σx2 = E(X-μx)2 = nπ(1-π)
“Mean > Median” = positive or right skewed (skewed means going
down so going down to right) Joint probability function-----P(x,y) = P(X=x ∩ Y=y)
“Mean = Median” = symmetric
“Mean < Median” = negative or left skewed Y
𝑛
Mean 𝜇 = ∑𝑖=1( xi − X̅ )2 ∩ y1 y2
Variance x1 P(x1,y1) P(x1,y2) P(x1)
X
Population variance = Sample variance = x2 P(x2,y1) P(x2,y2) P(x2)
δ2 s2 P(y1) P(y2) 1
𝑁 𝑛
∑ ( xi − μ)2 ∑𝑖=1( xi − X̅ )2
𝑖=1 Marginal probability function---- P(x) = Σy P(x,y), P(y) = Σx P(x,y)
𝑁 𝑛−1
Covariance Cov(X,Y) = E[(X-μx)(Y-μy)] = ∑𝑥 ∑𝑦(𝑥 −μx)(y-μy) P(x,y)
Standard deviation (SD) = √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
or Cov(X,Y) = E(XY)-μxμy = ∑𝑥 ∑𝑦 𝑥𝑦 P(x,y) - μxμy
Coefficient of variation (CV)= SD/mean 𝐶𝑜𝑣(𝑋,𝑌)
Correlation Corr(X,Y) =
Range = x (largest) – x (smallest) σx σy
>>>If two random variables are statistically independent,
Covariance between X and Y (not CV) P(x,y) = P(x)P(y),Cov is 0
Population covariance = δxy or Sample covariance = sxy or
𝑃(𝑥,𝑦)
Cov(x,y) Cov(x,y) Conditional probability function of X, given Y=y, P(x,y) =
𝑁 𝑛 𝑃(𝑦)
∑ 𝑥 −𝜇 ∗ 𝑦 −𝜇 ∑𝑖=1(𝑥𝑖 – x̅ ) ∗ (𝑦𝑖 – y̅ )
𝑖 𝑥 𝑖 𝑦
𝑖=1 𝑛−1 Portfolio Analysis---W= aX+bY, Mean μw = E[aX+bY] = aμx+ bμy
𝑁
Correlation coefficient (CC) Variance σ2𝑤 = a2σ2𝑥 + b2σ2𝑦 + 2ab Cov(X,Y)
Covariance btw X and Y
rxy = (or) σ2𝑤 = a2σ2𝑥 + b2σ2𝑦 + 2ab Corr(X,Y) σx σy
SD of X∗SD of Y

Result is between -1 and 1, where -1 is negatively and perfectly Continuous Random Variables
correlated, 1 is positively and perfectly correlated, and 0 means no
correlation at all. Probability Density Function-----P(X=x) = 0 , Area f(x)0 = 1,
1
f(x) ≥ 0, f(x) =
Regression Analysis 𝑏−𝑎

Important: Y = Dependent (e.g. exam score) … X = Independent (e.g. Cumulative Distribution Function----- F(x) = P(X≤x),
study hour) 𝑏
P(a<X<b) = F(b)-F(a) = ∫𝑎 𝑓(𝑢)𝑑𝑢
Y = β0 + β1 (X)
Applied Statistics Cheat Sheet_2017

𝜎 𝜎
Expectation for continuous random variables E(X) = 𝜇𝑥 𝑍𝛼⁄2 = Margin of Error (ME), 𝑋̅ + 𝑍𝛼⁄2 = Upper Confidence Limit
√𝑛 √𝑛
𝑥 ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑥 ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝜎
=∫𝑥 𝑙𝑜𝑤𝑒𝑠𝑡 𝑥𝑓(𝑥)𝑑𝑥, E[g(X)] = ∫𝑥 𝑙𝑜𝑤𝑒𝑠𝑡 𝑔(𝑥)𝑓(𝑥)𝑑𝑥 (UCL), 𝑋̅ − 𝑍𝛼⁄2 = Upper Confidence Limit (LCL)
√𝑛

𝑥 ℎ𝑖𝑔ℎ𝑒𝑠𝑡
Var(X) = 𝜎𝑥2 = ∫𝑥 𝑙𝑜𝑤𝑒𝑠𝑡 (𝑥 − 𝜇𝑥 )2 𝑓(𝑥)𝑑𝑥 , SD(x) = 𝜎𝑥 = √𝑉𝑎𝑟 (𝑋) (𝝈𝟐 =unknown) t-distribution P(𝑡𝑣 >𝑡𝑣,𝛼 ⁄2 ), 𝛼⁄2 = both tail

𝑎+𝑏 (𝑏−𝑎)2 The confidence interval estimator for the population mean
Uniform Distribution -----mean 𝜇 = , variance 𝜎 2 = 𝑠 𝑠
2 12
𝑋̅ − 𝑡𝑛−1,𝛼⁄2 <𝜇 < 𝑋̅ + 𝑡𝑛−1,𝛼⁄2 , s=sample standard deviation
√𝑛 √𝑛
Linear Functions of Variables---W= a+bX, mean 𝜇𝑤 = a+b𝜇𝑥 ,
Var(w) = 𝜎𝑤2 =𝑏 2 𝜎𝑥2 , SD(w) = 𝜎𝑤 = |b|𝜎𝑥 Hypothesis (population distribution is normal with unknown mean
and variance.)
𝑋−𝜇𝑥
Standardized random variable---- mean = 0, variance = 1, Z=
𝜎𝑥 The null hypothesis 𝐻0 : 𝜇 = 8%
1 2 /2𝜎 2
Normal Distribution ----- f(x) = 𝑒 −(𝑥−𝜇) The alternative hypothesis 𝐻1 : 𝜇 ≠ 8%
2√𝜋𝜎 2
(e=2.71828, 𝜋 = 3.14159, X~N(𝜇,𝜎 2 )
𝑋̅−𝜇0
Excel Function--- NORN.DIST(x, 𝜇,𝜎, FALSE) (𝝈𝟐 =known) Z-test--- Z= , one sided test = 𝑍𝛼 (𝐻0 : 𝜇 = 𝜇0 , 𝐻1 : 𝜇
𝜎/√𝑛
>>> if standard deviation is greater, the distribution is flat ≠ 𝜇0 )

Cumulative Distribution function for Normal Distribution two sided test = 𝑍𝛼⁄2 (𝐻0 : 𝜇 = 𝜇0 , 𝐻1 : 𝜇 < 𝜇0 or 𝜇 > 𝜇0 )
F(𝑥0 )=P(X≤ 𝑥0 ), P(a<X<b) = F(b)-F(a)
𝑋̅− 𝜇0
(𝝈𝟐 =unknown) t-statistic--- t= , one sided test = 𝑡𝑛−1,𝛼 , two sided
𝑠/√𝑛
Standard Normal Distribution Z~N(0,1) ,
f(Z) for PDF, F(Z) for CDF test =𝑡𝑛−1,𝛼⁄2
F(-2) = P(Z<-2) = 1-P(Z<2), P(Z>-2.25) = P(Z<2.25)
Reject 𝐻0 if the absolute value of t-statistic is greater than 𝑡𝑛−1,𝛼⁄2 ,
𝑋−𝜇 otherwise accept
Transform any normal distribution to N (0,1)---- Z=
𝜎
X=𝜇+Z𝜎 Testing the difference in the population means between two different
𝑎−𝜇 𝑋−𝜇 𝑏−𝜇 𝑏−𝜇 𝑎−𝜇
P(a<X<b) = P( < < ) = F( ) – F( ) samples
𝜎 𝜎 𝜎 𝜎 𝜎
(distributions for both groups are normal, population mean different but
Covariance Cov(X,Y) = E[(X-𝜇𝑥 )-(Y-𝜇𝑦 ) or E(XY)-𝜇𝑥 𝜇𝑦 , population 𝜎 2 same.)
>>>if both X and Y conform to normal distribution, then we can say: one sided test = 𝐻0 : 𝜇𝑥 − 𝜇𝑦 = 0 , 𝐻1 : 𝜇𝑥 − 𝜇𝑦 > 0,
if Cov(X,Y) = 0, then X,Y are independent. two sided test = 𝐻0 : 𝜇𝑥 − 𝜇𝑦 = 0 , 𝐻1 : 𝜇𝑥 − 𝜇𝑦 ≠ 0
(𝑛𝑥 −1)𝑠𝑥2 +(𝑛𝑥 −1)𝑠𝑝𝑦
2
(𝑋̅−𝑌̅)− 0
𝐶𝑜𝑣(𝑋,𝑌)
pooled sample variance 𝑠𝑝2 = , t-stat --- t =
𝑛𝑥 +𝑛𝑦 −2 𝑠2 2
Correlation 𝜌 = Corr (X,Y) = √ 𝑝 𝑠𝑝
+
𝜎𝑥 𝜎𝑦 𝑛𝑥 𝑛𝑦

Portfolio Analysis II-----E(aX+bY) = a𝜇𝑥 + b𝜇𝑦 cutoff point 𝑡𝑛𝑥 +𝑛𝑦−2,𝛼


Var(aX+bY) = 𝑎2 𝜎𝑥2 +𝑏 2 𝜎𝑦2 +2abCov(X,Y) =
Reject the null hypothesis 𝐻0 if the t-statistic is greater than 𝑡𝑛𝑥 +𝑛𝑦−2,𝛼
𝑎2 𝜎𝑥2 +𝑏 2 𝜎𝑦2 +2abCorr(X,Y)𝜎𝑥 𝜎𝑦

Sampling Distribution

E(𝑋̅) = 𝜇 , 𝜎 2 𝑥̅ = 𝜎 2 /n (𝜎 2 is population variance), 𝜎 2 > 𝜎 2 𝑥̅


>>> 𝜎 2 𝑥̅ decrease as the sample size increases.,
Random variable X can be any distribution

Bernoulli distribution--- success probability 𝜋 = 𝑝̅

The distribution of the normal sample mean--- 𝑋̅~N(𝜇, 𝜎 2 /𝑛)

(𝝈𝟐 =known) Significance level 𝛼 (Confidence level) = 100*(1-𝛼 ),


P(Z>𝑍𝛼⁄2 ), 𝛼⁄2 = both tail

The confidence interval estimator for the population mean


𝜎 𝜎
𝑋̅ − 𝑍𝛼⁄2 <𝜇 < 𝑋̅ + 𝑍𝛼⁄2
√𝑛 √𝑛

S-ar putea să vă placă și