Sunteți pe pagina 1din 12

Basic Statistics

Estimation
Population -

A set of entities on which some


statistical inference is to be drawn . Typically, the
population is very large, making a census or a
complete enumeration of all the values in the
population impractical or impossible.

Parameter -

Characteristics of population,
generally captured by its mean, variance.

Sample -

A subset of a population which


represents the population. The sample represents a
subset of manageable size.

Sampling

Technique

A method of
selecting a subset of individuals from within a
population to estimate characteristics of the whole
population. For eg; Random Sampling, Systematic
Sampling, Stratified Sampling.

Estimation
Statistic

Characteristics of sample. Mean,


Variance comparable to population characteristics.
For another sample, the values may vary.

Estimator - Any quantity calculated from the


sample data which is used to give information about
an unknown quantity in the population. For example,
the sample mean is an estimator of the population
mean. If the value of the estimator in a particular
sample is found to be 5, then 5 is the estimate of the
population mean .

Random/Stochastic Variable -

A variable
whose values vary and has chance /probability
associated with it, hence follows a distribution.

Distribution

The sampling distribution


describes probabilities associated with a statistic
when a random sample is drawn from a population.

Estimation
Degrees of Freedom (dof)

the number
of degrees of freedom is the number of values in the
final calculation of a statistic that are free to vary. If
the sample size is n and 2 parameters (say) mean &
variance are estimated, then the dof of any statistic
calculated from the sample will be n-2.

Hypothesis Testing
Null Hypothesis H0 -

Formulation of things
that
one
wants
to
test
about
the
occurrence/prevalence/existence of some properties in
the sample. Generally, the statement is written in a
way that it assumes the non-occurrence of the event.
For example, two samples are not different from each
other.

Alternative Hypothesis H1 -

Hypothesis
against null hypothesis. For example, in the above
case it is two samples are different from each other.

Critical Values

Used as a reference point for


acceptance or rejection of null hypothesis. If the value
of test statistic under the null hypothesis is less than
the critical value, then the null is accepted or rejected.

Hypothesis Testing
Type II error

Probability of accepting a FALSE

null.

Level of Significance -

The significance level


of a statistical hypothesis test is a fixed probability
(0.05/0.02/0.01) of wrongly rejecting the null
hypothesis H0, if it is in fact true. It is represented as
. P (Type I error) = . 1- is the confidence
coefficients.

p-values

Probability of wrongly rejecting a true


null or acceptance probability of H0. It is equal to the
significance level. It is compared with the actual
significance level of the test and, if it is smaller, the
result is significant. That is, if the null hypothesis were
to be rejected at the 5% significance level, this would
be reported as "p < 0.05". Small p-values suggest that
the null is unlikely to be true. The smaller it is, the

Hypothesis Testing
Power

- It measures the test's ability to reject the


null hypothesis when it is actually FALSE - that is, to
make a correct decision. In other words, it is the
probability of not committing a type II error. It is
calculated by subtracting the probability of a type II
error from 1, usually expressed as: Power = 1 - P(type
II error) = 1- . The maximum power a test can have
is 1, the minimum is 0. Ideally we want a test to have
high power, close to 1.

Critical Region = 5%
f(t)
Critical Region

0.025%

Critical Region

Acceptance
Region

0.025%
Total = 5% Le

95%
-t0.05/2

t0.05/2

Accept null H0 : = 0 If t lies between - t0.05/2


to - t0.05/2 at 5% level of significance, else
reject at 5%.

Critical Region = 5%
f(t)
Critical Region

Critical Region

0.025%

Acceptance
Region

0.025%
Total = 5% Le

95%
b - t0.05/2 se (b )

/0

b + t0.05/2 se (b )

Accept null H0 : = 0 If lies between b +/t0.05/2 SE (b ) at 5% level of significance, else


reject at 5%.

Critical Region = 1%
f(t)/f(
)
Critical Region

Critical Region

Acceptance
Region

0.005%

0.005%
b - t0.01/2

Total = 1% Le

99%
/0
se (b )/-t0.01/2

/t
b + t0.01/2 se
(b )/t0.01/2

Accept null H0 : = 0 If lies between b +/t0.01/2 SE (b )


OR, t = b/SE(b) <= |t0.01/2 | at

1% level of significance.
else reject at 1%.

Critical Region = 10%


f(t)/f(
)
Critical Region

Critical Region

0.05%

0.05%

Total = 10% Lev

Acceptance
Region
90%
/0

b - t0.10/2 se (b )/-t0.10/2

/t
b + t0.10/2 se (b )/t0.10/2

Accept null H0 : = 0 If lies between b +/t0.10/2 SE (b )


OR, t = b/SE(b) <= |t0.10/2 | at

10% level of significance.


else reject at 10%.

Chi-sq, t & F statistics

S-ar putea să vă placă și