Sunteți pe pagina 1din 5

Chapter 11

Correlation and regression


11.1

Learning activity A11.1


Question:
Draw a rough scatter diagram for each of the following situations:
a) r = 0
b) r is very high and positive
c) r is very high and negative
d) r is quite high but negative.

Solution:
a) When r = 0, this means that there is no one clear line. The scatter
diagram would look like this:

You can see that any straight line would be as good (or bad) as
another if you were to try to draw a line of best fit!
b) If r is high and positive, the line will slope upwards from left to
right and the points in the scatter diagram will be close. A typical
diagram might be:

Note that the following could give the same r. The angle of slope of

Statistics 1 Solutions to learning activities

the line is immaterial it is the closeness of the observations which


is pertinent.

c) If r is very high and negative, the slope is in the other direction,


but again the points must be close.

d) Quite high and negative. The slope may be the same, but the
points will be more spread out.

11.2

Learning activity A11.2


Question:
Think of an example where you feel the correlation is clearly
spurious and explain how it might arise.
Now think of a clear correlation and the circumstances in which
you might accept causality.

Solution:
Here you should think about two things which have risen or fallen
over time together, but have no obvious connection. Examples might
be the number of examination successes and the number of films
shown per year in a town.

CHAPTER 11. CORRELATION AND REGRESSION

A clear correlation might be recovery rates from a particular illness


in relation to the amount of medicine given. You might accept this
correlation as perhaps being causal if other things being equal if
everyone who got the disease were given medicine as soon as it was
diagnosed (control) and also if recovery began only after the
medicine was administered (time effect).


11.3

Learning activity A11.3


Question:
The following figures give examination and project results (in
percentages) for eight students.
a) Find the Spearmans rank correlation coefficient, rs , for the data.
b) Compare it with the sample correlation coefficient, r.

Students examination and project marks


1
2
3
4
5
6
7
Examination 95 80 70 40 30 75 85
Project
65 60 55 50 40 80 75

8
50
70

Solution:
a) Spearmans rank correlation coefficient is:
rs

6(32 + 22 + 12 + 02 + 02 + 32 + 02 + 32 )
8(82 1)
6(9 + 4 + 1 + 9 + 9)
= 1
8 63
= 1 0.3809
= 0.6190.
= 1

This compares with the sample correlation coefficient of 0.6012


which is very near.


11.4

Learning activity A11.4


Question:
Work out the b and a in the activity on page 172 given advertising
cost as the dependent variable. Now predict advertising costs when
sales are 460,000. Make sure you understand how and why your
results are different from the example in the subject guide!

Statistics 1 Solutions to learning activities

Solution:
You are asked to find b and a if the sales are x and the advertising
costs are y. So,
12 191325 410 5445
12 2512925 54452
= 0.1251

410 0.1251 5445


12
= 22.5975.

a =

So the regression equation, if we decide that advertising costs


depend on sales, is
y = 0.1251x 22.5975.
We are assuming that as sales rise, the company (or companies)
concerned spends more on advertising. When sales are 460,000 we
get advertising costs:
y

= 0.1251 460 22.5975


= 34.9485

or 34,948.50.
Note that the xs and the ys were given in thousands, so
be careful!


11.5

Learning activity A11.5


Question:
Try to think of a likely linear relationship between x and y which
would probably work over some of the data but then break down
like that in the anthropologist activity on page 173.
This should make sure you understand the difference between
interpolation (which statisticians do all the time) and extrapolation
(which they should not).

Solution:
Another example might be an equation linking national income as
an independent variable to percentage employed as the dependent
variable. Clearly it is impossible to employ more than 100% of the
population.


CHAPTER 11. CORRELATION AND REGRESSION

11.6

Learning activity A11.6


Question:
Draw a rough scatter diagram with the line of best fit for the
following a, b and r.
a) a = 2, b =

1
2

and r = 0.9

b) a = 3, b = 2 and r = 0.3.

Solution:
a)

Note how close the observations are to the line of best fit.
b)

Note how scattered the points are about the steep (gradient 2:1)
negative sloping line.

Solutions prepared by Dr James Abdey.

S-ar putea să vă placă și