Documente Academic
Documente Profesional
Documente Cultură
If we have a small data set (n<30), we should test the assumption of normality. Unless the sample is
particularly small, then drawing a histogram of the data will give us an idea of whether the data
come from a normal distribution.
The appropriate test is the Kolmogorov-Smirnov test, which can be found under the
Nonparametric Tests option in the Analyze menu (choose Legacy Dialogs and then 1-Sample K-
S). The default option is to test for normality, but by highlighting another option, we can test
whether the data fit another distribution. We should highlight the Exact option from the Exact
menu. The null hypothesis is that the data come from a normal distribution. We use the p-value
(from Exact Sig. 2-tailed) to make our conclusion. By default we test at a 5% level of significance.
The following results were obtained for testing whether height (from the data file used in the
classes) follows a normal distribution.
is 0.983. Since p=0.983> =0.5, we do not reject the hypothesis that height has a normal
distribution, i.e. it is reasonable to assume that height comes from a normal distribution.
SPSS does not explicitly have an option for carrying out tests for a proportion (or difference
between two proportions). However, if we have a large sample, we can carry out approximate
versions of such tests as follows:
Suppose we wish to test the hypothesis that a proportion p of some population exhibits a certain
trait. We can create a variable called TRAIT, which takes the value 0 if the trait is absent and 1 if the
trait is present. The sum of all the observations of TRAIT is thus simply the number of individuals
in the sample with the trait. Hence, the sample mean of this variable (this sum divided by the
sample size) is simply the proportion of individuals in the sample with the trait. The test
H0 : p=p0
HA : pp0 ,
is (approximately) equivalent to the test of whether the population mean for the variable TRAIT is
equal to p0 or not.
Example: Suppose that we want to test the hypothesis that 60% of all Irish students are taller than
170cm. Firstly, we can categorise height into two categories: 0 height 170 (i.e. trait is absent)
and 1 height > 170 (i.e. trait is present). Suppose this categorical variable is called trait. We carry
out the following test
H0 : p=0.6
HA : p0.6.
To do this, from the Compare Means option on the Analyze menu choose One Sample t-test. The
variable of interest is trait and the test value is 0.6. Using the data from lab.xls, the following results
are obtained
The first table gives us the proportion of individuals in the sample who are taller than 170cm
(sample mean 0.47). The p-value for the test is 0.011. Hence, we reject the null hypothesis at the 5%
level, but not at the 1% level. We thus have evidence that the proportion of all Irish students that are
taller than 170cm is not 60% (it seems to be lower).
The (approximate) 95% confidence interval is given for the difference between the population
proportion and the hypothetical value, i.e. for p-0.6. Adding the test value (0.6) to both ends of the
confidence interval, we obtain a 95% confidence interval for the proportion of all Irish students
taller than 170cm. This is
Note that since 0 is not in the confidence interval for the difference (equivalently, 0.6 is not in the
confidence interval for the population proportion), we reject the null hypothesis at the 5%
significance level (corresponding to the 95% confidence level).
We can carry out tests for a difference between two proportions in a similar way. Suppose we wish
to carry out the following test
H0 : the proportion of male students taller than 170cm is equal to the proportion of female students
taller than 170cm
HA : the proportion of male students taller than 170cm differs from the proportion of female
students taller than 170cm
We define the variable TRAIT in the same way as above. Then, from the Compare Means option
on the Analyze menu, choose Independent Samples t-test. The test variable is TRAIT and the
grouping variable is SEX.
The first table indicates that 68% of the males in the sample are taller than 170cm, while only 26%
of the females are taller than 170cm. The significance for the test for a difference between
proportions gives a p-value of approximately 0. Hence, we have very strong evidence that the
proportion of males taller than 170cm differs from the proportion of females taller than 170cm. It
seems clear that a larger proportion of males are tall.
In addition, we have an approximate 95% confidence interval for the difference between these two
proportions [0.238,0.602].
The output is
sex * music Crosstabulation
music
Franz F. Radiohead Sting Total
sex f Count 17 12 21 50
Expected Count 16.5 18.0 15.5 50.0
m Count 16 24 10 50
Expected Count 16.5 18.0 15.5 50.0
Total Count 33 36 31 100
Expected Count 33.0 36.0 31.0 100.0
Chi-Square Tests
Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 7.934a 2 .019
Likelihood Ratio 8.097 2 .017
N of Valid Cases 100
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 15.50.
Approximate p-values are given for these tests. The approximation is reasonable as long as the
expected count in each cell is at least 5. Information regarding the proportion of cells in which
the expected count is less than 5 is given below the table giving the results of the test. In this
case all the expected counts are well above 5 (the minimum expected count is 15.5), hence the
p-value given will be a good estimate. We use the Pearson Chi-square test. The realisation of the
test statistic is given under value (here 7.934). This is a measure of the difference between what
is observed and what we expect under the null hypothesis that there is no association between
sex and musical preference. The number of degrees of freedom (df) is (r-1)(c-1), where r is the
number of rows and c is the number of columns (here r = 2, c = 3, hence (r-1)(c-1) =
(2-1)(3-1) = 12 = 2. The p-value of the test is approximately 0.019 (as explained above this is
a good estimate). Since 0.01<p<0.05, we have evidence that the null hypothesis is false (i.e. we
have evidence that there is an association between sex and musical preference). In order to
describe such an association, it is necessary to compare the observed and expected counts in the
contingency table. It can be seen that males are more commonly observed to prefer Radiohead
than expected and females are more likely to prefer Sting than expected. Hence, we conclude
that males are more likely to be Radiohead fans than females and females are more likely to be
Sting fans than males.