Sunteți pe pagina 1din 4

Statistics with the SPSS Package

5.3 Testing the assumption of normality

If we have a small data set (n<30), we should test the assumption of normality. Unless the sample is
particularly small, then drawing a histogram of the data will give us an idea of whether the data
come from a normal distribution.

The appropriate test is the Kolmogorov-Smirnov test, which can be found under the
Nonparametric Tests option in the Analyze menu (choose Legacy Dialogs and then 1-Sample K-
S). The default option is to test for normality, but by highlighting another option, we can test
whether the data fit another distribution. We should highlight the Exact option from the Exact
menu. The null hypothesis is that the data come from a normal distribution. We use the p-value
(from Exact Sig. 2-tailed) to make our conclusion. By default we test at a 5% level of significance.
The following results were obtained for testing whether height (from the data file used in the
classes) follows a normal distribution.

The p-value for the test

H0 : height has a normal distribution


HA : Height does not have a normal distribution

is 0.983. Since p=0.983> =0.5, we do not reject the hypothesis that height has a normal
distribution, i.e. it is reasonable to assume that height comes from a normal distribution.

5.4 Tests for a proportion

SPSS does not explicitly have an option for carrying out tests for a proportion (or difference
between two proportions). However, if we have a large sample, we can carry out approximate
versions of such tests as follows:

Suppose we wish to test the hypothesis that a proportion p of some population exhibits a certain
trait. We can create a variable called TRAIT, which takes the value 0 if the trait is absent and 1 if the
trait is present. The sum of all the observations of TRAIT is thus simply the number of individuals
in the sample with the trait. Hence, the sample mean of this variable (this sum divided by the
sample size) is simply the proportion of individuals in the sample with the trait. The test
H0 : p=p0
HA : pp0 ,

is (approximately) equivalent to the test of whether the population mean for the variable TRAIT is
equal to p0 or not.

Example: Suppose that we want to test the hypothesis that 60% of all Irish students are taller than
170cm. Firstly, we can categorise height into two categories: 0 height 170 (i.e. trait is absent)
and 1 height > 170 (i.e. trait is present). Suppose this categorical variable is called trait. We carry
out the following test

H0 : p=0.6
HA : p0.6.

To do this, from the Compare Means option on the Analyze menu choose One Sample t-test. The
variable of interest is trait and the test value is 0.6. Using the data from lab.xls, the following results
are obtained

The first table gives us the proportion of individuals in the sample who are taller than 170cm
(sample mean 0.47). The p-value for the test is 0.011. Hence, we reject the null hypothesis at the 5%
level, but not at the 1% level. We thus have evidence that the proportion of all Irish students that are
taller than 170cm is not 60% (it seems to be lower).

The (approximate) 95% confidence interval is given for the difference between the population
proportion and the hypothetical value, i.e. for p-0.6. Adding the test value (0.6) to both ends of the
confidence interval, we obtain a 95% confidence interval for the proportion of all Irish students
taller than 170cm. This is

[0.6-0.2295, 0.6-0.0305] = [0.3705, 0.5695].

Note that since 0 is not in the confidence interval for the difference (equivalently, 0.6 is not in the
confidence interval for the population proportion), we reject the null hypothesis at the 5%
significance level (corresponding to the 95% confidence level).

5.5 Tests for a difference between proportions

We can carry out tests for a difference between two proportions in a similar way. Suppose we wish
to carry out the following test
H0 : the proportion of male students taller than 170cm is equal to the proportion of female students
taller than 170cm
HA : the proportion of male students taller than 170cm differs from the proportion of female
students taller than 170cm
We define the variable TRAIT in the same way as above. Then, from the Compare Means option
on the Analyze menu, choose Independent Samples t-test. The test variable is TRAIT and the
grouping variable is SEX.

The first table indicates that 68% of the males in the sample are taller than 170cm, while only 26%
of the females are taller than 170cm. The significance for the test for a difference between
proportions gives a p-value of approximately 0. Hence, we have very strong evidence that the
proportion of males taller than 170cm differs from the proportion of females taller than 170cm. It
seems clear that a larger proportion of males are tall.

In addition, we have an approximate 95% confidence interval for the difference between these two
proportions [0.238,0.602].

5.6. 2 Tests of Independence


Such tests are used to see whether there is an association between two qualitative (categorical)
variables (for example sex and musical preference). In the Analyze menu it is necessary to
choose the Descriptive Statistics followed by Crosstabs. We choose sex to be the row variable
and musical preference to be the column variable (or vice versa). It is necessary to choose the
statistics option at the bottom of the crosstabs window. Highlight the Chi-square option. Also,
highlight the expected option in the cells window. We test between the hypotheses

H0: There is no association between sex and musical preference


H1: There is an association between sex and musical preference

The output is
sex * music Crosstabulation

music
Franz F. Radiohead Sting Total
sex f Count 17 12 21 50
Expected Count 16.5 18.0 15.5 50.0
m Count 16 24 10 50
Expected Count 16.5 18.0 15.5 50.0
Total Count 33 36 31 100
Expected Count 33.0 36.0 31.0 100.0

Chi-Square Tests

Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 7.934a 2 .019
Likelihood Ratio 8.097 2 .017
N of Valid Cases 100
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 15.50.

Approximate p-values are given for these tests. The approximation is reasonable as long as the
expected count in each cell is at least 5. Information regarding the proportion of cells in which
the expected count is less than 5 is given below the table giving the results of the test. In this
case all the expected counts are well above 5 (the minimum expected count is 15.5), hence the
p-value given will be a good estimate. We use the Pearson Chi-square test. The realisation of the
test statistic is given under value (here 7.934). This is a measure of the difference between what
is observed and what we expect under the null hypothesis that there is no association between
sex and musical preference. The number of degrees of freedom (df) is (r-1)(c-1), where r is the
number of rows and c is the number of columns (here r = 2, c = 3, hence (r-1)(c-1) =
(2-1)(3-1) = 12 = 2. The p-value of the test is approximately 0.019 (as explained above this is
a good estimate). Since 0.01<p<0.05, we have evidence that the null hypothesis is false (i.e. we
have evidence that there is an association between sex and musical preference). In order to
describe such an association, it is necessary to compare the observed and expected counts in the
contingency table. It can be seen that males are more commonly observed to prefer Radiohead
than expected and females are more likely to prefer Sting than expected. Hence, we conclude
that males are more likely to be Radiohead fans than females and females are more likely to be
Sting fans than males.

S-ar putea să vă placă și