species present in a defined area is called the population.
SAMPLE
A finite subset of statistical individuals in a
population is called a sample. DEFINITIONS
RANDOM SAMPLING
The best way to get a representative sample is
usually to choose a proportion of the population at random – without bias, with every possible experimental unit having an equal chance of being selected.
A random sample is one in which each unit of the
population has an equal chance of being selected. PROBLEMS WITH RANDOM SAMPLING
First, even a random sample may not be a good
representative of the population from which it has been taken.
(See Next Slide)
PROBLEMS WITH RANDOM SAMPLING
By chance, sample 1 contains a group of
relatively large fish, while those in sample 2 are relatively small.
So, if you take a random sample from each of two
similar populations, the samples may be different to each other simply by chance.
On the basis of it, you might mistakenly conclude
that the two populations are very different. PROBLEMS WITH RANDOM SAMPLING
Second, even if two populations are very different,
samples from each may be similar and give the misleading impression that the populations are also similar.
(See Next Slide)
Simply by chance, sample 1 and sample 2 are similar. PROBLEMS WITH RANDOM SAMPLING
Third, natural variation among individuals within
a sample may obscure any effect of an experimental treatment.
For example, if tomato plants treated with a new
fertilizer yielded from 1.5 to 9 kg of fruit per plant, compared with 1.5 to 7.5 kg per plant in an untreated group, can you conclude that the fertilizer really had an effect? PROBLEMS WITH RANDOM SAMPLING
It is clear from the above discussion that it is often
difficult to make a decision about a difference between samples from different populations or different experimental treatments.
Is it the sort of difference you would expect
by chance, or are the populations really different? Is the experimental treatment having an effect? PARAMETER AND STATISTIC
For a population, the values of mean (μ),
standard deviation (σ) and variance (σ2) are called parameters.
For a sample, the values of mean ( ), standard
deviation (s) and variance (s2 ) are called sample statistics. TESTS OF SIGNIFICANCE
Tests of significance enable us to decide on the basis
of sample results, if
the deviation between the observed sample statistic
and the hypothetical parameter value, or
the deviation between two independent sample
statistics
is significant or might to attributed to chance.
TESTS OF SIGNIFICANCE
If you take a lot of samples of certain size (n) at
random from a normal population and calculate the mean of each sample, they are unlikely to be same. But the sample means will be dispersed around the population mean μ.
The distribution of these sample means is also
normal with its own mean (which is also μ) and standard deviation. TESTS OF SIGNIFICANCE
The standard deviation of the distribution of
sample means is called the standard error of the mean (abbreviated as SEM or SE).
where σ = standard deviation of the population
n = sample size TESTS OF SIGNIFICANCE
As the sample size, i.e., the value of ‘n’ increases,
the standard error of the mean decreases and therefore the sample mean becomes a more appropriate estimate of the population mean.
So, the distribution of the means of samples of a
particular size (n) taken from a normal population will also be normal, with a mean of μ and standard error of mean of THE 95% CONFIDENCE INTERVAL
95% of the means of sample size n, taken from a
population with a known μ and σ would be expected to occur within the range of μ ± (1.96 x SEM).
This range is called the 95% confidence interval
and μ - (1.96 x SEM) and μ + (1.96 x SEM) are called the 95% confidence limits. USING ‘Z’ STATISTIC TO COMPARE A SAMPLE MEAN AND POPULATION MEAN WHEN POPULATION STATISTICS ARE KNOWN
Set up the Null Hypothesis, H0 – there is no
significant difference between the sample mean and the population mean, μ
Set up the Alternative Hypothesis, H1.
Compute the standard normal variate:
USING ‘Z’ STATISTIC TO COMPARE A SAMPLE MEAN AND POPULATION MEAN WHEN POPULATION STATISTICS ARE KNOWN
If the value of Z falls between the limits -1.96 and
+1.96, there is no significant difference between the sample mean and the population mean. If the value of Z is < -1.96 or > +1.96, the null hypothesis is rejected and there is a significant difference between the sample mean and the population mean. EXAMPLE 1
A sample of 900 members has a mean of 3.4 cms
and standard deviation 2.61 cms. Is the sample from a population of mean 3.25 cms and s.d. 2.61 cms? SOLUTION
Null Hypothesis, H0: The sample has been drawn
from a population with mean μ = 3.25 cm and σ = 2.61 cm
Alternative Hypothesis, H1: μ ≠ 3.25 cm
SOLUTION (CONTD.)
= 3.4 cm, n = 900, μ = 3.25 cm, σ = 2.61 cm
Since the value of Z lies between -1.96 and +1.96,
therefore the sample data does not provide any evidence against the null hypothesis at 5% level of significance. EXAMPLE 2
A sample of 400 male students is found to have a
mean height of 67.47 inches. Can it be reasonably regarded as a sample from a large population with mean height 67.3 inches and standard deviation 1.30 inches? Test at 5% level of significance. TWO-TAILED & ONE-TAILED TESTS
A two-tailed test rejects the null hypothesis if the
sample mean is significantly higher or lower than the hypothesized value of the mean of the population.
Symbolically, a two-tailed test is appropriate when
we have: H0: μ = μH0 and H1: μ ≠ μH0
which may mean μ > μH0 or μ < μH0
TWO-TAILED & ONE-TAILED TESTS
A one-tailed test would be used when we are to
test whether the population mean is either significantly lower than or higher than some hypothesized value.
For example, if we have:
H0: μ = μH0 and H1: μ < μH0
then it is called left-tailed test, wherein there is
rejection region only on the left tail. TWO-TAILED & ONE-TAILED TESTS
If we have:
H0: μ = μH0 and H1: μ > μH0
then it is called right-tailed test, wherein there is
rejection region only on the right tail of the curve. EXAMPLE 3 The average age and standard deviation of policy holders insured by all insurance agents is 30.5 yrs and 6.35 yrs respectively. An insurance agent claims that the average age of policy holders who insure through him is less than the average age for all agents. A random sample of 100 policy holders who insured through him had mean age of 28.8 yrs. Test his claim at 5% level of significance. t - test
t – test is used to test if the sample mean,
differs significantly from the hypothetical value, μ of the population mean when population standard deviation is not known.
Here, we use standard deviation of the sample as
an estimate of the population standard deviation. t - test Under the null hypothesis, H0: the sample has been drawn from the population with mean μ. there is no significant difference between the sample mean, and the population mean, μ. t - test
The calculated value of ‘t’ is compared with the
tabulated value at certain level of significance. If |t| > tabulated t, null hypothesis is rejected and if calculated |t|< tabulated t, H0 may be accepted at the level of significance adopted.
NOTE: t-test applies only in case of small samples
when population variance is known. EXAMPLE 4
A random sample of 10 boys had the following
I.Q.’s : 70, 120, 110, 101, 88, 83, 95, 98, 107, 100. Do these data support the assumption of a population mean I.Q. of 100? SOLUTION – EXAMPLE 4
Null Hypothesis, H0: The data are consistent with
the assumption of a mean I.Q. of 100 in the population, i.e., μ = 100
Alternative Hypothesis, H1: μ ≠ 100
SOLUTION – EXAMPLE 4 (CONTD) X X– (X – )2
70 – 27.2 739.84
120 22.8 519.84
110 12.8 163.84
101 3.8 14.44
88 – 9.2 84.64
83 – 14.2 201.64
95 – 2.2 4.84
98 0.8 0.64
107 9.8 96.04
100 2.8 7.84
Total: 972 1833.60
SOLUTION – EXAMPLE 4 (CONTD)
Tabulated t for (10 – 1), i.e. 9 d.f. for two-
tailed test at 5% significance level is 2.262. SOLUTION – EXAMPLE 4 (CONTD)
Conclusion: Since calculated ‘t’ is less than
tabulated t for 9 d.f., H0 may be accepted at 5% level of significance and we may conclude that the data are consistent with the assumption of mean I.Q. of 100 in the population. EXAMPLE 5
A machinist is making engine parts with axle
diameters of 0.700 inch. A random sample of 10 parts shows a mean diameter of 0.742 inch with a standard deviation of 0.040 inch. Compute the statistic you would use to test whether the work is meeting the specifications. SOLUTION - EXAMPLE 5
Null Hypothesis, H0: μ = 0.700, i.e., the product is
conforming to specifications
Alternative hypothesis, H1: μ ≠ 0.700
Here, μ = 0.700 inch, = 0.742 inch, s = 0.040 inch
and n = 10.
We will use two-tailed t-test.
SOLUTION – EXAMPLE 5
Tabulated t for (10 – 1), i.e. 9 d.f. at 5% significance
level is 2.262. Since calculated ‘t’, i.e. 3.23 > tabulated t, we can say the value of ‘t’ is significant. This means that differs significantly from μ and thus H0 is rejected at 5% level of significance. So, the product is not meeting the specifications.