Sunteți pe pagina 1din 21

~~ST104A ZA d0

This paper is not to be removed from the Examination Halls

UNIVERSITY OF LONDON

ST104A

BSc degrees and Diplomas for Graduates in Economics, Management, Finance


and the Social Sciences, the Diplomas in Economics and Social Sciences and
Access Route

Statistics 1

Wednesday, 3 June 2015 : 10:00 to 12:00

Candidates should answer THREE of the following FOUR questions: QUESTION 1 of


Section A (50 marks) and TWO questions from Section B (25 marks each). Candidates
are strongly advised to divide their time accordingly.
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.

PLEASE TURN OVER


University of London 2015
UL15/0850

Page 1 of 21

D1

SECTION A
Answer all parts of Question 1 (50 marks in total).
(a) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as either nominal or ordinal.
Justify your answer. (Note that no marks will be awarded without a justification.)
i. The manufacturer of a car.
ii. The amount of money in a bank account.
iii. The Gross Domestic Product (GDP) of a country.
iv. The rating of a hotel according to the number of stars it has.
[8 marks]

(b) Consider the following sample dataset:


4,

x,

8,

7,

You are told that the value of the sample mean is 5.


i. Calculate the value of x.
ii. Find the sample variance.
[4 marks]

(c) The salaries of the employees of a company are normally distributed with mean
25, 000 and a standard deviation of 10, 000.
i. What is the proportion of employees with a salary of at least 20, 000?
ii. What is the proportion of employees with salaries between 15, 000 and 35, 000?
[4 marks]

(d) Suppose that x1 = 3, x2 = 5, x3 = 5, x4 = 1, x5 = 2, and y1 = 1, y2 = 4,


y3 = 5, y4 = 1, y5 = 2. Calculate the following quantities:

i.

i=5
!
i=3

2xi

ii.

i=4
!

3(yi 3)

i=2

iii.

y42

i=3
!

(2xi + yi2 ).

i=1

[6 marks]

UL15/0217

Page 2 of 6

D00

UL15/0850

Page 2 of 21

(e) The variable X takes the values 2, 4, 6 and 8 according to the following distribution
x
pX (x)

2
0.3

4
0.2

6
0.1

8
0.4

i. What is the probability that X is an odd number?


ii. Find E(X), the expected value of X.
iii. Find the probability that X/2 > 3.
[5 marks]

(f) You toss two fair dice independently.


i. What is the probability that both numbers are sixes?
ii. What is the probability that both numbers are odd?
ii. You are now told that the first one of them shows a two. What is the probability
in this case that both are twos?
[4 marks]

(g) It is stated in a consumer magazine that the average price of football shirts in
London is 19.00. A random sample is taken by obtaining a single football shirt
from each of 16 randomly chosen London retailers. The sample mean is 20.20
and the sample standard deviation is 2.40. Carry out a hypothesis test, at two
appropriate significance levels, to determine whether the price of football shirts in
London is more expensive than the price stated in the consumer magazine. State
your hypotheses, the test statistic and its distribution under the null hypothesis,
and your conclusion in the context of the problem.
[7 marks]

(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. The chance that a normal random variable is less than two standard deviations
from its mean is 99%.
ii. The lower the regression coefficient in absolute value the weaker the correlation.
iii. Increasing the sample size will increase the width of a confidence interval for a
population mean (assuming that everything else remains constant).
iv. When testing a hypothesis, we use a two tailed test if we want to test whether
the parameter is greater than what is stated in the null hypothesis.
v. A population list is needed in order to conduct quota sampling.
vi. The regression of the variable Y on the variable X will always have the same
slope as the regression of the variable X on the variable Y .
[12 marks]

UL15/0217

Page 3 of 6

D00

UL15/0850

Page 3 of 21

SECTION B
Answer two questions from this section (25 marks each).
2. (a) Questionnaires were mailed to 300 households, in three different areas of a city,
to assess the level of local sporting facilities. The collected data are shown in
the table below

Area 1
Area 2
Area 3
Total

Sporting Facilities Level


Very good Fairly good Poor
44
30
26
29
26
45
45
28
27
118
84
98

Total
100
100
100
300

i. Based on the data in the table, and without conducting a significance test,
would you say there is an association between areas and level of local
sporting facilities?
ii. Calculate the 2 statistic and use it to test for independence, using two
appropriate significance levels. What do you conclude?
[14 marks]

(b)

i. Provide the definition of simple random sampling and cluster sampling


designs.
ii. Why might a researcher prefer cluster sampling rather than simple random
sampling?
iii. Name one other random sampling scheme, provide its definition and one
of its advantages.
[11 marks]

UL15/0217

Page 4 of 6

D00

UL15/0850

Page 4 of 21

3. The following data shows the recorded times (y) in seconds taken by 10 international
athletes to run 100 metres together with the corresponding wind speeds (x) at
the time of running. A positive wind speed indicates the wind is in the direction
of running and therefore considered to be helpful whereas a negative wind speed
indicates the wind is against the runner.
Athlete #1
x
-2.45
y
10.52

#2
-1.23
10.47

#3
-0.78
10.41

#4
-0.33
10.25

#5
-0.37
10.54

#6
0.34
10.09

#7
#8 #9 #10
0.53 1.17 2.35 2.91
10.30 9.99 9.92 9.87

The summary statistics for these data are:


Sum of x data: 2.14
Sum of the squares of x data: 24.13
Sum of y data: 102.36 Sum of the squares of y data: 1048.34
Sum of the products of x and y data: 18.56
(a)

i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
time for a runner for a wind speed of 1.5? Will you trust this value? Justify
your answer.
[13 marks]

(b) Behavioural researchers have developed an index designed to measure


managerial success. Of interest is whether there is a difference in average
managerial success based on the level of interaction with people outside a
managers immediate work unit. Managers in group 1 engage in a high
volume of interactions with people outside their work unit, while those in group
2 rarely do. The data are summarised in the table below:
Group 1
Group 2

Sample size
22
25

Sample mean
65.33
61.58

Sample standard deviation


6.61
5.37

i. Carry out a hypothesis test to determine whether the mean managerial


success index scores are different between the two groups. Test at two
suitable significance levels, stating clearly the hypotheses, the test statistic
and its distribution under the null hypothesis. Comment on your findings.
ii. State clearly any assumptions you made in (i.).
iii. Adjust the procedure above to determine whether the mean managerial
success for managers who have a high volume of interactions with people
outside their work unit is higher than that of those who rarely do.
[12 marks]

UL15/0217

Page 5 of 6

D00

UL15/0850

Page 5 of 21

4. (a) The following data show the length (in inches) of fish caught in one day in a
river:
10.1
11.2
12.1
12.4
13.2
14.3

10.4
11.2
12.1
12.5
13.4
14.5

10.5
11.5
12.2
12.6
13.5
14.8

10.9
11.7
12.2
12.8
13.6
15.2

11.1
11.9
12.3
12.9
13.7
15.5

i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean (given that the sum of the data is 376.3), the median and
the modal group.
iii. Comment on the data given the shape of the histogram and the measures
you have calculated.
iv Name two other types of graphical displays that would be suitable to
represent the data.
[12 marks]

(b) In order to estimate the percentage of city households that have high speed
internet access, a random sample of 140 city households was taken. Of these,
70 had high speed internet access. A similar sample of 170 rural households
was also taken and it was found that 61 of them had high speed internet access.
The data are summarised in the table below
With high speed internet
Total

City Households
70
140

Rural Households
61
170

i. Give a 95% confidence interval for the difference between the proportions
of high speed internet access in city and rural households.
ii. Carry out a hypothesis test, at two suitable significance levels, to determine
whether city households are more likely to have high speed internet access
compared to rural households. State the test hypotheses, and specify your
test statistic and its distribution under the null hypothesis. Comment on
your findings.
iii. State any assumptions you made in (ii.).
[13 marks]

END OF PAPER

UL15/0217

Page 6 of 6

D00

UL15/0850

Page 6 of 21

ST104a Statistics 1
Examination Formula Sheet
Standard deviation of a discrete random
variable:
v
uN
uX

2
= =t
pi (xi )2

Expected value of a discrete random


variable:
= E(X) =

N
X

pi x i

i=1

i=1

The transformation formula:


Z=

Finding Z for the sampling distribution


of the sample mean:

Z=

Finding Z for the sampling distribution


of the sample proportion:
Z=p

Confidence interval endpoints for a


single mean ( known):

x
z/2
n

P
(1 )/n

Confidence interval endpoints for a


single mean ( unknown):

Confidence interval endpoints for a


single proportion:
r
p(1 p)
p z/2
n

s
x
t/2, n1
n

Sample size determination for a mean:


n

Sample size
proportion:

z/2 2 2
e2

determination

z test of hypothesis for a single mean (


known):
0
X

Z=
/ n

for

z/2 2 p(1 p)
e2

t test of hypothesis for a single mean (


unknown):
T =

UL15/0850

/ n

Page 7 of 21

0
X

S/ n

z test for the difference between two means


(variances known):

z test of hypothesis for a single


proportion:
P 0
Z
=p
0 (1 0 )/n

Z=

Confidence interval endpoints for the


difference between two means:
s 

1
1
2
(
x1 x
2 )t/2, n1 +n2 2 sp
+
n1 n2

t test for the difference between two means


(variances unknown):
1 X
2 (1 2 )
X
T = q
Sp2 (1/n1 + 1/n2 )

t test for the difference in means in


paired samples:

Pooled variance estimator:


Sp2 =

1 X
(1 2 )
X
p 2
12 /n1 + 22 /n2

(n1 1)S12 + (n2 1)S22


n1 + n2 2

d d
X

Sd / n

T =

Confidence interval endpoints for the


difference in means in paired samples:

z test for the difference between two


proportions:

sd
x
d t/2, n1
n

(P1 P2 ) (1 2 )
Z=p
P (1 P ) (1/n1 + 1/n2 )

Pooled proportion estimator:


P =

Confidence interval endpoints for the


difference between two proportions:
s
p1 (1 p1 ) p2 (1 p2 )
+
(p1 p2 )z/2
n1
n2

R1 + R2
n1 + n2

2 test of association:

Sample correlation coefficient:

r X
c
X
(Oij Eij )2
Eij

r = s

i=1 j=1

n
P

xi yi n
xy

i=1
n
P

x2i

n
x2



i=1

rs = 1

n
P

d2i

i=1
n(n2

n
y2

i=1

1)

i=1

a = y b
x
2

UL15/0850

yi2

Simple linear regression line estimates:


n
P
xi yi n
xy
i=1
b =
n
P
x2i n
x2

Spearman rank correlation:


6

n
P

Page 8 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 9 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 10 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 11 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 12 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 13 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 14 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 15 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 16 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 17 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 18 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 19 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 20 of 21

Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.

UL15/0850

Page 21 of 21

S-ar putea să vă placă și