Sunteți pe pagina 1din 3

Statistics S1 Summary

Remember to use 3 Significant Figures unless the question says otherwise .


If you are given individual raw data you are expected to use a calculator to find , s,
r and a+b x.
If you are given summarised data, eg
or
, you are required to use the
appropriate formula from the booklet.
Population
Variance &
Standard
Deviation

A collection of individuals or items.

Estimators

An estimator is a sample statistic used to estimate a population parameter.


The estimator is unbiased if the mean of its distribution is equal to the parameter that
it is estimating.
is used to denote the population standard deviation.
s is an unbiased estimate of the population standard deviation.
If you are unsure whether to use or s, it is best to use s
The unbiased estimator of the population mean is x .
(x x)2
S2
The unbiased estimator of the population variance is:
n 1

Unbiased
estimator

Standard deviation
Variance and standard deviation are measures of spread.
They are always positive values.

Probability
P(A ) = 1 P(A), the probability that A does not happen
P(A B) = P(A) + P(B) P(A B)
P(A B) = P(A) P(B A) conditional probability
P(A B) = P(A) P(B)
independent events
P(A B) = 0
mutually exclusive events
P( A B )
P( A | B )
conditional probability
P( B )
Questions on conditional probability are sometimes worded given that a particular
event has happened (or not happened) what is the probability that some other event
will happen? For P(A B), B is the event that has already occurred and A is the event
that is going to take place.
P(A
B ) = 1 P(A B)
A and B are independent P(A|B) = P(A|B ) = P(A)
Questions can often make more sense when the probabilities are written on a tree
diagram.

Correlation
and Regression

Binomial
distribution

Normal
distribution

When making a comment about correlation, remember to use the words strong, fairly
strong, weak etc as well as positive and negative.
Other comments can include what the relationship is in everyday language, being
careful to be precise, and the presence of Anomalous Values (freak results or outliers)
and Influential Data Points (when the x value is much bigger or smaller than all of the
other x values).
Consider the relationship between two variables carefully in case it is spurious
i.e. is there a cause and effect?
The vertical distances between points on a scatter diagram and the regression line are
called residuals,
The regression line,
is a line of best fit and is sometimes called the least
squares regression line.
The point
always lies on the regression line. You can obtain the equation
directly from your calculator.
Extrapolation - Predicting y from x when x is outside the range of given data
Not very reliable.
Interpolation - Predicting y from x when x is inside the range of given data
More reliable.
X ~ B (n, p) X is a discrete random variable, the number of trials n is fixed, the trials
are independent of one another, the probability p is the same for each trial. There are
two outcomes.
The probability function, mean, variance and the cumulative probability tables can be
found in the formulae booklet.
Be careful with the wording of the questions
e.g.
X ~ N( ,

) X is a continuous random variable.


To find probabilities first standardise z x
and use the tables for Z ~ N(0, 1).
Use clear diagrams.
The distribution is symmetrical about . The mode, median and mean are all equal.
The cumulative normal distribution function tables give P(Z < z). If the area you want
includes the mean then use P(Z < |z|), otherwise use 1 P(Z < |z|).

Distribution of
the sample
mean

If X ~ N( ,

), then the distribution of the sample mean is X ~ N( ,


2

The standard deviation of the distribution of sample means is

or
n

The central
limit theorem

).

This is called the standard error.


For large enough n, the distribution of the sample mean is approximately normally
2

distributed with mean

and variance

, i.e. X ~ N( ,

) regardless of the

distribution of the parent population.


If the population variance is unknown use the unbiased estimator.

The distribution of the sample mean has a normal distribution or approximately


normal distribution (provided n is large enough). From a sample we can obtain a
symmetrical confidence interval for the population mean. E.g. The 95% C.I. is

Confidence
intervals

; note: 95% of such intervals will contain . If the population variance is

x 1.96

unknown use the unbiased estimator.


To find the appropriate z value to use, first
change the percentage into a decimal fraction,
then subtract from one. Divide this by 2. Now
0.025
0.025
add the original decimal fraction and look up this
0.95
value in the tables (using RH tables is easier)
For 95% CI use the z value for p = 0.975

Do not get confused between


Confidence Intervals

, which give a range of values relating to the

population mean. The width of a Confidence Interval is


Intervals

, which are a range of values. The width of an interval is

, where n is the sample size.

If a question mentions a sample size use

Normal distribution diagrams


P(Z < z)

P(Z < |z |)

P(Z < z)

P(Z < |z 1 |)

P(Z < |z 2 |)

P(Z < z2)

z1 z2

z1 z2

P(Z < z2)


z1

z2

P(Z < |z |)

(1

P(Z < |z 1 |))

P(Z < z1)

S-ar putea să vă placă și