Sunteți pe pagina 1din 15

Paper Reference(s)

6683/01
Edexcel GCE
Statistics S1
Gold Level G4
Time: 1 hour 30 minutes
aterials re!uired "or examination #tems included $ith !uestion
%a%ers
Mathematical Formulae (Green) Nil
Candidates may use any calculator allowed by the regulations of the Joint
Council for Qualifications. Calculators must not have the facility for symbolic
algebra manipulation, differentiation and integration, or have retrievable
mathematical formulas stored in them.
#nstructions to Candidates
Write the name of the examining body (Edexcel) your centre number candidate number
the unit title (!tatistics !") the paper reference (##$%) your surname initials and
signature&
#n"ormation "or Candidates
' boo(let )Mathematical Formulae and !tatistical *ables+ is pro,ided&
Full mar(s may be obtained for ans-ers to '.. /uestions&
*here are 0 /uestions in this /uestion paper& *he total mar( for this paper is 01&
&dvice to Candidates
2ou must ensure that your ans-ers to parts of /uestions are clearly labelled&
2ou must sho- sufficient -or(ing to ma(e your methods clear to the Examiner& 'ns-ers
-ithout -or(ing may gain no credit&
Su''ested 'rade (oundaries "or this %a%er:
&) & * C + E
,- ,0 43 36 ./ ..
Gold 4 *his publication may only be reproduced in accordance -ith Edexcel .imited copyright policy&
34550645"% Edexcel .imited&
10 ' meteorologist belie,es that there is a relationship bet-een the height abo,e sea le,el h m
and the air temperature t 78& 9ata is collected at the same time from : different places on the
same mountain& *he data is summarised in the table belo-&
h ";55 ""55 4#5 $;5 :55 115 "4%5 "55 005
t % "5 45 : "5 "% 1 4; "#
<2ou may assume that h = 0"15 t =""5 h
4
= 0"0"155 t
4
= "0"# th = #; :$5
and !tt = %0"&1#>
(a) 8alculate !
th
and !
hh
& Gi,e your ans-ers to % significant figures&
132
(b) 8alculate the product moment correlation coefficient for this data&
1.2
(c) !tate -hether or not your ,alue supports the use of a regression e/uation to predict the air
temperature at different heights on this mountain& Gi,e a reason for your ans-er&
112
(d) Find the e/uation of the regression line of t on h gi,ing your ans-er in the form
t = a ? bh&
142
(e) @nterpret the ,alue of b&
112
(f) Estimate the difference in air temperature bet-een a height of 155 m and a height
of "555 m&
1.2
a3 .013
.0 ' group of office -or(ers -ere /uestioned for a health magaAine and
1
4
-ere found to ta(e
regular exercise& When /uestioned about their eating habits
%
4
said they al-ays eat brea(fast
and of those -ho al-ays eat brea(fast
41
:
also too( regular exercise&
Find the probability that a randomly selected member of the group
(a) al-ays eats brea(fast and ta(es regular exercise
1.2
(b) does not al-ays eat brea(fast and does not ta(e regular exercise&
142
(c) 9etermine gi,ing your reason -hether or not al-ays eating brea(fast and ta(ing regular
exercise are statistically independent&
1.2
4anuar3 .00/
Gold ;B "4C"4 4
30 'n agriculturalist is studying the yields y (g from tomato plants& *he data from a random
sample of 05 tomato plants are summarised belo-&
5ield ( y (g) 6re!uenc3 (f ) 5ield mid%oint (x (g)
5 D y E 1 "# 4&1
1 D y E "5 4; 0&1
"5 D y E "1 "; "4&1
"1 D y E 41 "4 45
41 D y E %1 ; %5
(2ou may use
fx

= 011 and
4
fx

= "4 5%0&1)
' histogram has been dra-n to represent these data&
*he bar representing the yield 1 D y E "5 has a -idth of "&1 cm and a height of $ cm&
(a) 8alculate the -idth and the height of the bar representing the yield "1 D y E 41&
132
(b) Fse linear interpolation to estimate the median yield of the tomato plants&
1.2
(c) Estimate the mean and the standard de,iation of the yields of the tomato plants&
142
(d) 9escribe gi,ing a reason the s(e-ness of the data&
1.2
(e) Estimate the number of tomato plants in the sample that ha,e a yield of more than
" standard de,iation abo,e the mean&
1.2
a3 .013 172
Gold ;B "4C"4 %
4. ' researcher belie,es that parents -ith a short family name tended to gi,e their children a
long first name& ' random sample of "5 children -as selected and the number of letters in
their family name x and the number of letters in their first name y -ere recorded&
*he data are summarised asB
x

= #5
y

= #"
4
y

= %:%
xy

= %$4 !
xx
= 4$
(a) Find !
yy
and !
xy
132
(b) 8alculate the product moment correlation coefficient r bet-een x and y&
1.2
(c) !tate gi,ing a reason -hether or not these data support the researcher+s belief&
1.2
*he researcher decides to add a child -ith family name G*urnerH to the sample&
(d) Fsing the definition ( )
4
!
xx
x x

state the ne- ,alue of !


xx
gi,ing a reason for your
ans-er&
1.2
Gi,en that the addition of the child -ith family name G*urnerH to the sample leads to an
increase in !
yy
(e) use the definition ( ) ( ) !
xy
x x y y

to determine -hether or not the ,alue of r -ill


increase decrease or stay the same& Gi,e a reason for your ans-er&
1.2
a3 .013 172
Gold ;B "4C"4 ;
,0
6i'ure .
' policeman records the speed of the traffic on a busy road -ith a %5 mph speed limit&
Ie records the speeds of a sample of ;15 cars& *he histogram in Figure 4 represents the
results&
(a) 8alculate the number of cars that -ere exceeding the speed limit by at least 1 mph in the
sample&
142
(b) Estimate the ,alue of the mean speed of the cars in the sample&
132
(c) Estimate to " decimal place the ,alue of the median speed of the cars in the sample&
1.2
(d) 8omment on the shape of the distribution& Gi,e a reason for your ans-er&
1.2
(e) !tate -ith a reason -hether the estimate of the mean or the median is a better
representation of the a,erage speed of the traffic on the road&
1.2
a3 .01.
Gold ;B "4C"4 1
60 *he discrete random ,ariable X can ta(e only the ,alues 4 % or ;& For these ,alues the
cumulati,e distribution function is defined by
F(x)
41
) (
4
k x +
for x = 4 % ;
-here k is a positi,e integer&
(a) Find k.
(2)
(b) Find the probability distribution of X&
(3)
a3 2!
-0 *he distances tra,elled to -or( D (m by the employees at a large company are normally
distributed -ith D N( %5 $
4
)&
(a) Find the probability that a randomly selected employee has a Journey to -or( of more
than 45 (m&
132
(b) Find the upper /uartile Q
%
of D&
132
(c) Write do-n the lo-er /uartile Q
"
of D&
112
'n outlier is defined as any ,alue of D such that D E h or D K k -here
h = Q
"
L "&1 M (Q
%
L Q
"
) and k = Q
%
? "&1 M (Q
%
L Q
"
)&
(d) Find the ,alue of h and the ,alue of k&
1.2
'n employee is selected at random&
(e) Find the probability that the distance tra,elled to -or( by this employee is an outlier&
132
a3 .010
T8T&L 687 9&9E7: -, &7:S
E;+
Gold ;B "4C"4 #
<uestion
;um(er
Scheme ar=s
10 (a)
( )
0"15 ""5
! #;:$5 44;5$&:
:
th

N

.. 400 M" '"


( )
4
0"15
! 0"0"155 ";:"444&4
:
hh
N 1 4/0 000 '"
(%)
(b)
1# & %0" ";:"444
: & 44;5$

r
= 6 5&:14555#$N a-rt > 00/,. M"'"
(4)
(c)
2es as r is close to > " (if >" E r E > 5&1) or
2es as r is close to " (if "K r K 5&1)
O"ft
(")
(d)
44;5$&:
5&5"1540&&&
";:"444&4
b

( )
1#
%041
allo-

a-rt > 5&5"1 M" '"


""5 0"15
Ptheir P ("4&4 5&5"1 0:;&;) 4;&"#5;&&&
: :
a b
so t ? .40. > 0001,h
M" '"
(;)
(e)
5&5"1 is the drop in temp (in
5
8) for e,ery "(m) increase in height
abo,e sea le,el&
O"
(")
(f)
8hange = (G4;&4 6 5&5"1H

155) 6 (G4;&4 6 5&5"1H

"555)
or 155

H5&5"1H
M"
= t 0&1 ( a-rt ? 0&1) '"ft
(4)
@13A
.0 (a) E = ta(e regular exercise B = al-ays eat brea(fast
P( ) P( Q ) P( ) E B E B B
M"
=
: 4 #
= 5&4; or
41 % 41
or '"
(4)
(b)
4 4 #
P( )
% 1 41
E B + or
P( Q ) E B
or P( B E ) or P( R B E ) M"
=
#4
01
=
"%
41
=
"4
01
=
%4
01
'"
"%
P( ) " P( )
01
E B E B or 5&"0%
&
M" '"
(;)
(c)
# 4 4
P( Q ) 5&%# 5&;5 P( ) or P( ) P( ) P( )
41 1 %
E B E E B E B M"
!o E and B are not statistically independent '"
(4)
@8A
30 (a) Width = 4 "&1 = 3 1cm2 O"
'rea =
4
$ "&1 "4 cm Fre/uency = 4; so
4
" cm = 4 plants (o&e&) M"
Fre/uency of "4 corresponds to area of # so height = . 1cm2 '"
Gold ;B "4C"4 0
<uestion
;um(er
Scheme ar=s
(%)
(b)

[ ] ( )
4
": ":&1
1 1 (use of ( ")) (1 ) 1
4; 4;
Q n + + + or

M"
= $&:1$%N a$rt 80/6 or :&5#41N a-rt :&5# '"
(4)
(c) [ ]
011

05
x or a$rt 1008 O"
[ ]
4
"45%0&1

05
x
x = 11&#%4#&&& M"'"ft
= a$rt -046 ('ccept s = a-rt 0&1") '"
(;)
(d)
4
x Q >
O"ft
!o positi,e s(e- dO"
(4)
(e)
"$&% x +
so number of plants is e&g&

( )
(41 P"$&%P)
"4 ;
"5

+
(o&e&)

M"
= "4&5; so 1. plants '"
(4)
@13A
<uestion
;um(er
Scheme ar=s
40 (a)
4
#"
! %:%
"5
yy
= .00/ M"'"
#" #5
! %$4
"5
xy

= 16 '"
Gold ;B "4C"4 $
<uestion
;um(er
Scheme ar=s
(%)
(b) [ ]
P"#P
P45&:P 4$
r

M"
= 5&##";5N a$rt 00661 '"
(4)
(c) Researcher+s belief suggests negati,e correlation data suggests
positi,e correlation
O"
!o data does not support researcher+s belief dO"
(4)
(d) Ne- x e/uals x = # O"
!ince
4
! ( )
xx
x x

the ,alue of
!
xx
is the same = 4$ dO"
(4)
(e)
! ( )( )
xy
x x y y

= ( ) x x y

so the ne- term -ill be Aero


(since mean = x) and since
!
yy
increases
O"
!o r -ill decrease dO"
(4)
@11A
<uestion
;um(er
Scheme ar=s
,0 (a)
Sne large s/uare =
;15
P44&1P
or one small s/uare =
;15
P1#4&1P
(o&e&) M"
Gold ;B "4C"4 :
<uestion
;um(er
Scheme ar=s
Sne large s/uare = 45 cars or one small s/uare = 5&$ cars
or " car = "&41 s/uares
'"
No& K %1 mph isB ;&1 P45P or ""4&1 P5&$P (o&e& e&g& using fd) dM"
= /0 (cars) '"
(;)
(b) [ ]
%5 "4&1 4;5 41 :5 %4&1 %5 %0&1 #5 ;4&1
;15
x
+ + + +

"4:01
;15
1

1
]
M" M"

"0%
4$&$%&&& or
#
a$rt .808 '"
(%)
(c) [ ]
4
":1
45 "5
4;5
Q + (o&e M"
= 4$&"41 <Fse of (n ? ") gi,es 4$&";1N> a$rt .801 '"
(4)
(d)
4
Q x <
<8ondone
4
Q x
> O"ft
!o positi,e s(e- < so (almost) symmetric> dO"ft
(4)
(e)
<@f chose s(e- in (d)>
median 1
4
Q
2
<@f chose symmetric in (d)>
mean 1 x 2
O"
!ince the data is s(e-ed or
median not affected by extreme ,alues
!ince it uses all the data
dO"
(4)
@13A
60 (a)
F(;) = "
4
(; ) 41 k +
M"
" as 5 k k > '"
(4)
(b)
O"ft
O" O"
(%)
@,A
Gold ;B "4C"4 "5
x
4 % ;
P(X=x)
:
41
0
41
:
41
<uestion
;um(er
Scheme ar=s
-0 (a)
P(D K 45) =
45 %5
P
$
Z

_
>

,
M"
= P(Z KT "&41) '"
= 008/44 a$rt 008/4 '"
(%)
(b) P(D E
%
Q
) = 5&01 so
%
%5
$
Q
= 5&#0 M" O"

%
Q
a$rt 3,04 '"
(%)
(c) %1&; T %5= 1&; so
"
%5 1&; Q
= a$rt .406 O"ft
(")
(d)
( )
% " % "
"5&$ so "&1 "#&4 Q Q Q Q

so
"
"#&4 Q
= h or
k Q + 4 & "#
%
M"
h=804 to 806 and k= ,104 to ,106 both '"
(4)
(e) 4P(D K 1"&#) = 4P(Z K 4&0) M"
= 4<" T 5&::#1> = a-rt 0000- M" '"
(%)
@1.A
Gold ;B "4C"4 ""
Examiner re%orts
<uestion 1
Part (a) -as as usual ans-ered ,ery -ell but a number of candidates lost the final mar(
because they did not round their ans-ers to % significant figures or more -orryingly they
thought that !
hh
= ";: to % significant figures&
Most (ne- ho- to calculate r in part (b) too but fe- ga,e a full ans-er to part (c)& Many
stated that there -as negati,e correlation (although some thought this meant that the use of a
regression e/uation -as not suitable) but fe- stated clearly that the use of a regression
e/uation -as suitable because there -as strong correlation& !ome simply said that Gthe points
-ere close to a straight lineH but there -as no scatter diagram to support this and -ithout a
clear statement that the strong correlation suggests this the examiners could not a-ard the
mar(&
Most candidates (e,en those -ho felt that a regression e/uation -as not appropriateU) could
carry out the calculations in part (d) although a siAeable minority used !
tt
instead of !
hh
-hich
ga,e them a some-hat unrealistic gradient of 6#5&%& Most found a correct gradient but often
rounded their ans-er before calculating the intercept and the final mar( -as fre/uently lost&
Full interpretations in part (e) -ere rare -ith candidates failing to mention the drop in
temperature or the rise in height abo,e sea le,el or gi,e their ,alue& *he final part -as
ans-ered /uite -ell -ith most candidates substituting ,alues of 155 and "555 into their
e/uation only the better candidates realiAed that the ans-er -as easily found from 155b& '
number of candidates seemed perfectly content -ith a final ans-er of around %5 5558 here
(due to their incorrect gradient in part (d)) and lost the final mar(& 8andidates should be
encouraged to try and engage -ith the context of the /uestions and this can help them both in
interpreting their statistical calculations and assessing the reasonableness of their ans-ers&
<uestion .
*his /uestion -as not ans-ered -ell& @t -as encouraging to see many attempting to use a
diagram to help them but there -ere often some false assumptions made and only the better
candidates sailed through this /uestion to score full mar(s&
*he first problem -as the interpretation of the probabilities gi,en in the /uestion& Many
thought
:
P( )
41
E B rather than
P( Q ) E B
& 'll possible combinations of products of t-o of
4 4 :
and
% 1 41
-ere offered for part (a) but
:
P( )
41
E B -as the most common incorrect
ans-er& @n part (b) a ,ariety of strategies -ere employed& Probably the most successful
in,ol,ed the use of a Venn diagram -hich once part (a) had been ans-ered could easily be
constructed& Sthers tried using a tree diagram but there -ere in,ariably false assumptions
made about
P( Q ) E B
-ith many thin(ing it -as e/ual to
:
"
41
& ' fe- candidates assumed
independence in parts (a) or (b) and did not trouble the scorers& *he usual approach in part (c)
in,ol,ed comparing their ans-er from part (a) -ith the product of P(E) and P(B) although a
fe- did use P(E|B) and P(E)& 9espite the /uestion stressing that -e -ere loo(ing for
statistical independence here many candidates -rote about healthy li,ing and exerciseU
*he large number of candidates -ho confused
P( ) and P( Q ) E B E B
suggests that this is an
area -here students -ould benefit from more practice&
Gold ;B "4C"4 "4
<uestion 3
@n part (a) some candidates could not calculate the -idths of the inter,als and therefore lost all
the mar(s& @n part (b) the techni/ue of linear interpolation is understood -ell but a number of
candidates could not find the correct endTpoints& 8andidates should loo( carefully at tables of
grouped data and determine carefully the end points and -idths of the inter,als&
Parts (c) and (d) -ere ans-ered -ell -ithout /uite so many false attempts at standard
de,iation as is often the case on !"& Part (e) -as not ans-ered so -ell as many candidates
didn+t appreciate the need to interpolate& *hose -ho did usually arri,ed at the correct ans-er
/uite efficiently&
<uestion 4
Parts (a) and (b) -ere ans-ered ,ery -ell -ith only minor slips causing a loss of mar(s in a
fe- cases& @n part (c) most candidates realiAed there -as positi,e correlation but some -ent
on to state that this suggested support for the researcher+s belief and only the more astute
explaining that the researcher should ha,e been expecting a negati,e correlation and these
data therefore did not offer support& Parts (d) and (e) -ere challenging& @n part (d) many stated
that !
xx
-ould remain the same but they -ere unable to pro,ide an ade/uate reason& @n part (e)
most thought that r -ould stay the same gi,ing the text boo( reason that Git is not affected by
codingH but a fe- did realiAe that !
xy
-ould stay the same and so the increase in !
yy
meant that
r -ould in fact decrease&
<uestion ,
*his /uestion -as not ans-ered ,ery -ell& Many candidates either made a poor attempt at
part (a) and then abandoned the /uestion or Just left it blan( and mo,ed on& *hose -ho
correctly formed a fre/uency table often scored -ell -hilst others -ho failed to complete
part (a) struggled to ma(e any head-ay -ith the remainder of the /uestion&
@n part (a) there needed to be some attempt to count s/uares and 44&1 1#4&1 or ""4&1 (small
s/uares greater than or e/ual to %1 mph) -ere fre/uently seen& Io-e,er many candidates did
not appreciate the (ey idea that area is proportional to fre/uency and there -as no attempt to
combine this figure -ith the total fre/uency of ;15& *hose -ho did combine their figures and
-ere able to come up -ith a correct relationship bet-een area and number of cars (e&g& " large
s/uare represents 45 cars) -ere usually able to complete this part successfully although a fe-
found the number of cars speeding abo,e %5 mph instead of %1 mph as re/uired& ' fe-
candidates stumbled upon :5 by di,iding the ;15 cars by the 1 bars of the histogram but this
of course recei,ed no credit&
@n part (b) most attempts tried to use midTpoints but many struggled to find suitable
fre/uencies and a fe- -ere unsure of the class -idths (using # and "")& !ome used the
number of s/uares as fre/uencies but they rarely had a compatible denominator for their
expression for the mean&
Most attempts at part (c) realised that interpolation -as re/uired but many promising
solutions used ":&1 or 45&1 as class boundaries rather than 45&
'lthough they may not ha,e had the correct ,alues for the mean and median many had some
,alues -hich they could use to ans-er part (d)& ' simple comparison of their ,alues (e&g&
mean greater than median) earned them the first mar( and then an appropriate comment about
the s(e-ness (such as positi,e s(e-) the second& !ome attempted to calculate the /uartiles
and in,ariably these -ere incorrect& 8andidates should consider the amount of -or( in,ol,ed
in finding these ,alues and compare it -ith the tariff for the /uestionB 4 mar(s for a comment
Gold ;B "4C"4 "%
and a reason should not in,ol,e half a page of calculations& Sther candidates tried to Justify
their comment from the shape of the histogram ignoring the calculations in parts (b) and (c)&
@n part (e) -e re/uired a choice of mean or median that -as compatible -ith their conclusion
in part (d)& !ome candidates -ho had correctly concluded that the distribution -as s(e-ed in
part (d) still chose the mean because it uses all the data but there -ere many correct ans-ers
seen to this part&
<uestion 6
*his /uestion -as an excellent example of -hy students should re,ise the syllabus and not
Just from past papers& Snly a minority of candidates tac(led this /uestion effecti,elyW some
candidates seemed to ha,e no idea at all as to ho- to tac(le the /uestion& *hose -ho ga,e
correct solutions often made many incorrect attempts in their -or(ing& *he ,ast maJority
sho-ed an understanding of discrete random ,ariables but most missed or did not understand
the -ord Gcumulati,eH and conse/uently spent a lot of time manipulating /uadratic
expressions trying to ma(e them into a probability distribution& *he maJority ,ie- -as that
F(") ? F(4) ? F(%) = " -hich led to a lot of incorrect calculations&
<uestion -
*his /uestion pro,ed to be /uite challenging for a high proportion of candidates& ' significant
number either made no attempt at the /uestion or offered ,ery little in the -ay of creditable
solutions -ith many unable to progress beyond part (a)& *ime issues may ha,e been a
contributing factor in some cases&
*he maJority of candidates ho-e,er -ere able to earn some credit at least in part (a) for their
standardisation although -hilst this -as often completely correct a fairly common mista(e
-as to gi,e " 6 5&$:;; = 5&"51# as their final ans-er&
Many students did not recognise that they needed to actually use the normal distribution in
part (b) and part (c) gi,ing rise to extremely poor attempts by numerous candidates& Sf these
many merely ga,e ;1 and "1 as their /uartiles -hilst others calculated
;
%
of some ,alue as
their upper /uartile (for example
;
%
#5) and
;
"
of the same ,alue as their lo-er /uartile&
'lternati,ely of those -ho understood that they -ere re/uired to use the normal distribution
most attempts -ere successful though there -ere some instances of their setting their
standardisation e/ual to a probability usually 5&01 or P(Z E 5&01) and not a zT,alue&
Fnfortunately 5&#$ -as used fairly fre/uently as the z ,alue& *he maJority of candidates -ere
ho-e,er able to follo- through their ,alue of the upper /uartile to find their lo-er /uartile
using symmetry though some performed a second calculation in,ol,ing standardisation&
!ome candidates miscalculated their lo-er /uartile as
%
"
of their upper /uartile&
9espite pre,ious errors most candidates tended to be successful in substituting their ,alues
correctly into at least one of the gi,en formulae& Io-e,er a fe- seemed una-are of the order
of the operations&
*he final part of the /uestion also pro,ed difficult for many candidates -ith some running
into trouble as a conse/uence of pre,ious errors in part (b) part (c) and part (d) and others
pro,iding no attempt at all& @ndeed for numerous candidates incorrect ,alues for h and k led
to probabilities of 5 being calculated from results such as P(Z K 0) and thus many creditable
attempts lost mar(s through earlier inaccuracies&
Gold ;B "4C"4 ";
Statistics "or S1 9ractice 9a%er Gold Level G4
Qu
Max
Score
Modal
score
Mean
% ALL A* A B C D E U
1 13 11 65 8.46 11.72 11.08 9.61 8.61 7.61 6.70 4.73
2 8 33 2.62 3.82 1.98 1.53 1.31 1.01 0.64
3 13 67 8.73 10.84 10.13 8.54 7.53 6.43 4.72 3.11
4 11 59 6.50 7.82 7.18 6.29 5.70 5.56 5.14 4.50
5 13 39 5.02 10.63 9.16 5.68 3.90 2.78 2.12 1.48
6 5 25 1.23 2.87 0.98 0.55 0.32 0.16 0.06
7 12 43 5.20 10.39 8.98 6.01 4.31 3.10 2.08 0.98
75 5 37!76 53!22 3"!" 32!13 27!11 21!"3 15!5
Gold ;B "4C"4 "1

S-ar putea să vă placă și