Sunteți pe pagina 1din 8

ST 305 Reiland PRACTICE PROBLEMS FOR FINAL EXAM

Topics covered on final exam: Chapters 24-27 in text.


This material is covered in webassign homework assignments 12 through 14
Exam information: 3 hour time limit; materials allowed: calculator (no laptops), 1 8 x11 sheet
"
#
of paper (2 sided)with notes, definitions, formulas, etc. Normal, t, and chi-square tables will be
provided with the exam.
The questions on the final exam will be multiple choice format.
Answers are at the end of the document.
1. To investigate the possible link between fluoride content of drinking water and cancer, the cancer
death rates (number of deaths per 100,000 population) from 1952-1969 in 20 selected U. S. cities - the
ten largest fluoridated cities and the ten largest cities not fluoridated by 1969 - were recorded. These
data were used to calculate for each city the annual rate of increase in cancer death rate over this 18
year period. The data are given below:
FLUORIDATED NONFLUORIDATED
Annual Increase in Annual Increase in
City Cancer Death Rate City Cancer Death Rate
Chicago 1.0640 Los Angeles .8875
Philadelphia 1.4118 Boston 1.7358
Baltimore 2.1115 New Orleans 1.0165
Cleveland 1.9401 Seattle .4923
Washington 3.8772 Cincinnati 4.0155
Milwaukee -.4561 Atlanta -1.1744
St. Louis 4.8359 Kansas City 2.8132
San Francisco 1.8875 Columbus 1.7451
Pittsburgh 4.4964 Newark -.5676
Buffalo 1.4045 Portland 2.4471
a. Construct a 95% confidence interval for the difference between the mean annual increases in
cancer death rates for fluoridated and nonfluoridated cities. (Use 18 degrees of freedom; do not
assume equal variances).
b. What assumption(s) must be satisfied for the confidence interval in part a) to be valid?
2a. We are interested in comparing the average supermarket prices of two leading colas. Our sample was
taken by randomly selecting eight supermarkets and recording the price of a six-pack of each brand of
cola at each supermarket. The data are shown in the following table:
Supermarket
1 2 3 4 5 6 7 8
Brand 1 cola $2.25 2.47 2.38 2.27 2.15 2.25 2.36 2.37
Brand 2 cola $2.30 2.45 2.44 2.29 2.25 2.25 2.42 2.40
Difference $-0.05 0.02 -0.06 -0.02 -0.10 0 -0.06 -0.03
Summary statistics: . !!$(& = !!$)"
.
Find a 98% confidence interval for the difference in mean price of brand 1 and brand 2.
i) ii) iii) iv) !!$(& !!%!% !!$(& !"$*$ !!$(& !!%(" !!$(& !!$%(
2b. Perform a hypothesis test to test if the difference in mean prices of brand 1 and brand 2 colas is
different from 0. Use =.02. Define the parameters, state hypotheses, calculate test statistic and - T
value.
ST 305 Final Exam Practice Problems page 2
3. A new weight-reducing technique, consisting of a liquid protein diet, is currently undergoing tests by
the Food and Drug Administration (FDA) before its introduction into the market. The weights of a
random sample of five people are recorded before they are introduced to the liquid protein diet. The
five individuals are then instructed to follow the liquid protein diet for 3 weeks. At the end of this
period, their weights (in pounds) are again recorded. The results are listed in the table. Let be the .
"
true mean weight of individuals before starting the diet and let be the true mean weight of .
#
individuals after 3 weeks on the diet.
1 2 3 4 5
Weight before diet 156 201 194 203 210
Weight after diet 149 196 191 197 206
Test to determine if the diet is effective at reducing weight. Use . "!
4. A cell phone company wants to determine if the use of text messaging is independent of age. The
following data has been collected from a random sample of customers.
Regularly use Do not regularly use
text messaging text messaging
Under 21 82 38
21-39 57 34
40 and over 6 83
4a. What is the expected value for the under 21 and regularly use text messaging cell.
4b. To conduct a hypothesis test using a 0.01 level of significance, the value of the critical value is:
i) 16.812 ii) 15.086 iii) 9.210 iv) 2.576 v) 2.33
4c. What contribution does the cell 21-39 and Do not regularly use text messaging make to the value
of the test statistic?
4d. The value of the test statistic is 88.3. The appropriate conclusion is
i) reject H and conclude the variables have a curvilinear relationship;
!
ii) reject H and conclude the variables are not related, that is, conclude that the variables are
!
independent.
iii) do not reject H and conclude the variables are independent.
!
iv) reject H and conclude the variables are related.
!
4e. Use the values of the standardized residuals to select the correct statement regarding
9,=/<@/./B:/->/.
/B:/->/.

therelationship between text messaging and age.


i) The Under 21 age group uses text messaging less than the other age groups.
ii) For each age group the standardized residual has a negative value and a positive value, so text
message and age are not related.
iii) The older the age group, the less that text messaging is used.
5. A simple linear regression was used to predict the score on a final exam from the score on the first y x
exam. The slope of the least squares regression line is . . The standard error of the slope is . and (& ""
the sample size is 200. A 90% confidence interval for the true slope is (though the is , use .0 8 #
200 in the -table) .0 >
a. . to . b. . to . c. . to . d. . to . e. none of the above '% )' &$ *( &( *$ '% )'
6. Refer to the previous problem. To test the null hypothesis that the slope is zero versus the one-sided
alternative that the slope is positive, we use the test statistic t
a. 6.82 b. .05 c. .15 d. .95 e. .75
ST 305 Final Exam Practice Problems page 3
7. When two competing teams are equally matched, the probability that each team wins any game is
!&! The (NBA) championship goes to the team that wins four National Basketball Association
games in a best-of-seven series. If the teams were evenly matched, the probability that the final series
ends with one of the teams sweeping four straight games would be [team A wins in 4 #!& !"#&
%
games with probability team B can also win in 4 games with the same probability, so the !&
%
probability the series ends in 4 games is ]. #!&
%
Similarly team A can win in 5 games if team A wins 3 of the first 4 games and then wins game 5. So
team A wins in 5 games with probability . But team B can also win in
% $
$
G !& !&!& !"#&
5 games with the same probability, so the probability that the series ends in 5 games is . Similar !#&
probability calculations show that the probability is that the series lasts six games, and the !$"#&
probability is that the series lasts the full seven games. The table below shows the number of !$"#&
games it took to decide each of the last 57 NBA champs. Do you think the teams are usually equally
matched? Give statistical evidence to support your conclusion.
Length of series 4 games 5 games 6 games 7 games
NBA finals 7 13 22 15
8. The president of a large university has been studying the relationship between male/female
supervisory structures in his institution and the level of employees' job satisfaction. The results of a
recent survey are shown in the table below. Conduct a test at the 5% significance level to determine
whether the level of job satisfaction depends on the boss/employee gender relationship.

Male/Female Female/Male Male/Male Female/Female
Satisfied 60 15 50 15
Neutral 27 45 48 50
Dissatsfied 13 32 12
Boss/Employee
Level of Satisfaction
55
9. (True or false) In a hypothesis test, a -value of means that there is only probability that the T !!$ !!$
null hypothesis is true.
10. A chi-squared test for independence with 6 degrees of freedom results in a test statistic 13.61. Using
the tables, the most accurate statement that can be made about the P-value for this test is that:
a. P-value >.10 b. P-value >.05 c. .05 <P-value <.10 d. .025 <P-value <.05
11. Sociologists are of the opinion that there has been a decrease in the difference in ages at first marriage
for men and women since 1975. We want to examine data to determine if this decrease is significant.
The following data summary and regression results were obtained, where the variable is year and the B
C variable is the age difference (husband age wife age) at first marriage.
Variable Count Mean StDev
Year ( ) 24 1986.5 7.071 B
husband-wife age ( ) 24 2.3125 0.249 C
ST 305 Final Exam Practice Problems page 4

a. Interpret the value of the least squares slope . ,
"
b. What is the value of the test statistic for testing ? L !
! "
"
c. For the hypothesis test vs select the choice below that gives the L ! L !
! " + "
" "
correct -value and correct conclusion. T
i. The -value is 0.68; do not reject ; there is no linear relationship since 1975 T L !
! "
"
between and between husband and wife at first marriage. year age difference
ii. The -value is 0.000152; reject ; there is evidence that since 1975 the age T L !
! "
"
difference (husband age wife age) has increased.
iii The -value is 0.0001275; reject ; there is evidence that since 1975 the age T L !
! "
"
difference (husband age wife age) has decreased.
iv The -value is 0.0001275; do not reject ; there is no linear relationship since 1975 T L !
! "
"
between and between husband and wife at first marriage. year age difference
d. What is a 95% confidence interval for the slope?
e. Find a 95% confidence interval for the mean difference (husband age wife age) at first marriage
in 1998.
f. Find a 95% prediction interval for the difference (husband age wife age) for a particular couple
getting married for the first time in 1998.
12. Over 6 decades the Gallup Organization has periodically asked the following question:
If your party nominated a generally well-qualified person for president who happened to
be a woman, would you vote for that person?
Below is a table showing the percentage answering yes and the year of the century (37 =
1937).
% Yes 92 82 78 80 76 73 66 53 57 55 57 54 52 48 33 33
Year 99 87 84 83 78 75 71 69 67 63 59 58 55 49 45 37
summary statistics:
B '(%% = "'( C '")" = "("* < *("
B C
a. Determine the estimates and of the parameters and in the linear model , ,
! " ! "
" "
y x , " " %
! "
where is year and is the percentage who respond yes. B C
b. Use the least squares line to estimate the percentage of respondents that would say yes in 1997.
c. Determine the estimate of the standard deviation of the error component =
/
5 %
ST 305 Final Exam Practice Problems page 5
(note that the sum of squares of residuals 255.748).

3"
"'
3 3
#
C C s
d. Calculate a 95% confidence interval for the slope . (Note that "
" "
WI,
=
8" =
/
B

)
. e Conduct an appropriate hypothesis test (use =.05) to determine if the year of the century is
useful for predicting the percentage of respondents that would answer yes to the above question.
State the hypotheses, find the value of the test statistic, and state your conclusion based on the P-
value or the rejection region.
f. What assumptions must be (approximately) satisfied for the above statistical procedures to be
valid? Perform 2 diagnostics to check these assumptions.
SOLUTIONS
1. a. 2.2573, 2.753, 10; 1.3411, 2.429, 10, 2.101, B B
" # #
s n s n t
1 2
2 2
.025, " ")

(2.2573 1.3411) 2.101 .9162 1.5124 ( .5962, 2.4286).
2.753 2.429
10 10
b. (i) The two samples must be independently selected (ii) The distributions of increases in cancer
death rates for fluoridated and nonfluoridated cities can be approximated by normal models. (iii) The
two groups we are comparing must be independent of each other.
2a. i). . > !!$(& #**)! !!$(& !!%!%
!"(
=
8
!!$)"
)
.

2b. L !
! " # .
. . .
L ! !#
E " # .
. . .
where is the mean price of brand 1 cola and is the mean price of brand 2 cola. . .
" #
test statistic: . > #()%
!$(&
!$)"
)

Since , do not reject the null hypothesis T @+6?/ #T> #()% !#( T @+6?/ !#
(
L !
! " # .
. . . . There appears to be no significant difference in the mean price of the 2
brands of soda.
3. L ! L ! "!
! " # . " # . E
. . . . . .
. & = "&)
.
-value > (!) T T> (!) !!"
. &
%
=
.
8
"&)
&

Conclusion: Reject and conclude that the mean weight of individuals after the diet is less than the L
!
mean weight of individuals before the diet.
4. Regularly use Do not regularly use
text messaging text messaging
Under 21 82 (58) 38 (62)
21-39 57 (43.983) 34 (47.017)
40 and over 6 (43.017) 83 (45.983)
4a. expected count:
<9A>9>+6 -96?78>9>+6
1<+8. >9>+6
&)
4b. iii) 9.210 (.0 $ "# "
4c.
$%%(!"(
%(!"(
observed - expected)
expected
# #
$'!%
4d. iv) reject H and conclude the variables are related.
!
4e. iii) The older the age group, the less that text messaging is used. (see table below)
ST 305 Final Exam Practice Problems page 6
standardized residuals
9,=/B:
/B:
Regularly use Do not regularly use
text messaging text messaging
Under 21
21-39
40 and over
$"& $!&
"*' "*!
&'% &%'
5. c , > WI, (& "'&#&""
" "

6. a >
,
WI, ""
(& "
"
7 (chap 26) .
teams are evenly matched L
!
teams are not evenly matched L
+
The table below shows the observed values in each cell and the expected cell values in parentheses if
the teams are evenly matched.
Length of series 4 games 5 games 6 games 7 games
NBA finals 7 (7.125) 13 (14.25) 22 (17.8125) 15 (17.8125)
: Test statistic
\ "&%
#
(("#& "$"%#& ##"()"#& "&"()"#&
("#& "%#& "()"#& "()"#&
# # # #
: if the null hypothesis is true, the test statistic has a chi-square distribution with Rejection region \
#
5 " % " $ .0 !& \ ()"& T @+6?/ !'( . If , the RR is . Note that
#
: do not reject . There is no evidence that the NBA championship series are inconsistent Conclusion L
!
with the conjecture that the teams are evenly matched.
8. (chap 26)
Expected cell counts are in parentheses:

Male/Female Female/Male Male/Male Female/Female
Satisfied 60 (33.175) 15 (30.521) 50 (36.493) 15
Boss/Employee
Level of Satisfaction
(39.81)
Neutral 27 (40.284) 45 (37.062) 48 (44.313 50 (48.341)
Dissatsfied 13 (26.54) 32 (24.417) 12 (29.194) 55 (31.848)
Boss/employee relationship and job satisfaction are independent L
!
Boss/employee relationship and job satisfaction are dependent L
+
Test statistic: ; degrees of freedom 3 4 ;
#
"
"#
9,=/<@/./B:/->/.
/B:/->/.
*#(!* " " '

#
. Conclusion: reject H and conclude that boss/employee relationship and job T @+6?/ !!!
!
satisfaction are related.
9. false 10. d
11. (approximately); this means that each year since 1975 the average difference a. , !!#%
"
(husband age wife age) has by !#% decreased
b. > %$&$""
,
WI, !!!&&!$
!!#$*&' "
"
iii. The -value given in the Excel output is always for a 2-tail test; since we are conducting a 1- c. T
tail test , the -value is L ! T !!!!"#(&
+ "
!!!!#&&
#
"
. from the output; notice that the interval is entirely negative. d !!$&$'*( !!"#&%$$&
. use ; e C > WI s s
"**)

8# "**)
.
for the calculations below, note that from the output we have ; = !")''#'%*
/
C %**!#"$!%$ !!#$*&'&##"**) #!$( s
"**)
ST 305 Final Exam Practice Problems page 7
WI WI , B B !!&&!$$"& "**) "*)'& s .
"**)
# # # #
"
=
8 #%
!")''#'%*

/
/
#
#
!($)'))*
> WI #!(% !($)'))* "&$#!% s

8# "**)
.
so ( , ) C > WI #!$( "&$#!% "))$(*' #"*!#!% s s
"**)

8# "**)
.
use ; f. C > WIC s s
"**) "**)

8#
for the calculations below, note that from the output we have ; = !")''#'%*
/
C %**!#"$!%$ !!#$*&'&##"**) #!$( s
"**)
WIC WI , B B = s
"**)
# # #
"
=
8
/

/
/
#
!!&&!$$"& "**) "*)'& !")''#'%* #!!("%
# # #
!")''#'%*
#%
#
> WIC #!(%#!!("% %"'#) s

8# "**)
so C > WIC #!$( %"'#) "'#!(# #%&$#) s s
"**) "**)

8#
12. . ; a , < *(" ***%* , C , B '")" ***%*'(%% &&*$&)
" ! "
=
= "'(
"("* C
B
b. C &&*$&) ***%**( *"$' s
*(
; c. = %#(%
/
WWI #&&(%)
8# "%

; d. WI, !''"
"
=
8" = "&"'(
%#(% /
B

=
confidence interval is , > WI, ***%* #"%&!''" ***%* "%"()
" "

8#
)&((" ""%"#(
vs e. L ! L !
! " + "
" "
test statistic > "&"#
,
WI, !''"
***%* "
"
for the rejection region is and ( ; !& > #"%& > #"%& 8 # "' # "% .0
the -value is 0 to nine decimal places. T
Conclusion: since the test statistic is in the rejection region, reject and conclude that year L !
! "
"
is useful for predicting the percentage of respondents that will answer yes to the question.
. In the model the error terms are assumed to be independent and normally f C
3 ! " 3 3 3
B " " % % ,
distributed with mean 0 and standard deviation for all , that is, iid for all 5 % 5 B R! B
3
Below is the least squares line on the scatterplot, a plot of the residuals against the -values, and a B
histogram of the residuals. The first 2 plots cast some doubt on the assumption of independent errors
since the residuals appear to occur in groups of positive and negative residuals. The histogram casts
some doubt on the normality of the errors since the residuals show some left-skewness.
y=0.9994x 5.5827
R=0.9423
25
35
45
55
65
75
85
95
105
35 45 55 65 75 85 95 105
%YesvsYear

ST 305 Final Exam Practice Problems page 8
15
10
5
0
5
10
30 50 70 90 110
R
e
s
i
d
u
a
l
s
Year
ResidualPlot

S-ar putea să vă placă și