Documente Academic
Documente Profesional
Documente Cultură
e) A manufacturer of toys claims that less than 3% of his toys are defective. When 100
toys were drawn from one production run of 5,000 toys, 5% were found to be defective.
For each term on the left, select the matching answer from the list to the right, and write
the number in the blank.
___ Population
1 The 3% value
___ Sample
2 The 5% value
___ Parameter
___ Statistic
MT2013: Question 2
Male
55
99
196
350
Female
87
150
113
350
Total
142
249
309
700
a) Which of the following charts would be appropriate for displaying the marginal
distribution of cell phone brand?
__A. Histogram
__B. Boxplot
__C. Bar Chart
__D. Line Graph
__E. Stem and Leaf Display
b) What percent of teenagers preferred Call Me Maybe?
__A. 50%
__B. 41%
__C. 25%
__D. 16%
__E. 20%
MT2013: Question 3
a) You have a set of 30 numbers. The standard deviation from these numbers is reported
as zero. You can be certain that:
__A. Half of the numbers are above the mean
__B. All of the numbers in the set are zero
__C. All of the numbers in the set are equal
__D. The numbers are evenly spaced below and above the mean
b) Here is the five number summary of the hourly wages ($) for sales managers.
Min
Q1
Median
Q3
Max
20.94
37.64
44.77
49.24
67.11
(i) The shape of this distribution is best described as:
__A. Symmetric
__B. Skewed to the right
__C. Skewed to the left
__D. Not enough information to tell
(ii) The IQR for these data is: ______________
(iii) Compute the lower and upper inner fences:
Decrease
Increase
b. Median
Decrease
Increase
c. Range
Decrease
Increase
d. IQR
Decrease
Increase
_____
_____
e) An office supply chain has stores in Toronto and Vancouver. One of these stores is to
be closed within the coming year, and to help make the decision, management reviews
sales data. Below are boxplots for monthly unit sales for both locations.
MT2013: Question 4
a) A consumer research group investigating the relationship between the price of meat
(per kg) and the fat content (grams) gathered data that produced the following scatterplot.
(i) Which best describes the association between the price of meat and fat content?
__A. Negative, moderately strong
__B. Negative, weak
__C. Positive, strong
__D. Positive, weak
__E. No apparent association
(ii) If the point in the lower left hand corner ($2.00 per kilogram, 6 grams of fat) is
removed, would the correlation would most likely
__A. remain the same
__B. become stronger negative
__C. become weaker negative
__D. become positive
__E. become zero
b) For each of the following pairs of variables, would you expect a large negative
correlation, a large positive correlation, or a small correlation? Circle your choices.
1. The age of a used car and its price
Large Neg.
Large Pos.
Small
Large Neg.
Large Pos.
Small
Large Neg.
Large Pos.
Small
c) For each of the following statements, about the correlation coefficient, r, decide
whether it is True or False. Circle your choices as appropriate.
1. r equals the proportion of times two variables lie
True
False
on a straight line
2. r will be +1.0 only if all the data lie exactly on a
True
False
horizontal straight line
3. r measures the fraction of outliers that appear in
True
False
a scatterplot
4. If the correlation between X and Y is r, the
True
False
correlation between Y and X is r
5. r is a unitless number and must always lie
True
False
between 1.0 and +1.0 inclusive.
6
MT2013: Question 5
_____________________________
(Use two decimals only for each value)
c) Complete this sentence: For each additional unit on the stress scale, the productivity
level _________________________________________ parts per hour.
d) What percentage of the variation in productivity levels can be explained by
the stress level variable? Give your answer here, to the nearest whole percent: _________
__________
f) Suppose the employee in part e) has an actual productivity level of 60 parts per hour.
Compute the residual and use the fact that the standard deviation of the residuals is 4.3 to
decide whether this data point would be considered an outlier. Explain why in one
sentence only.
Residual = ________
Outlier? Yes
No
Explanation:
h) Give an interval range in which the productivity level of 95% of employees would be
expected to fall. Report to the nearest whole numbers. ____________ to ____________
MT2013: Question 6
SD = ______________
(ii) What is the probability that the solving time is between 15 and 25 minutes?
__A. 0.38
__B. 0.17
__C. 0.68
__D. 0.06
__E. 0.12
__F. 0.50
c) A soft drink machine dispenses a cup, syrup and carbonated water, hopefully in that
order! The amount of syrup injected is normally distributed with mean 15 ml and
variance 10 ml2. The amount of water injected is normally distributed with mean 80 ml
and variance 15 ml2. The two amounts are independent of one another.
(i) Find the mean and standard deviation of the total amount of syrup and water
dispensed.
Mean = ____________
SD = ______________
(ii) If 25 drinks are dispensed in a day, what are the mean and standard deviation of the
total amount of liquid (syrup and water) that are required?
Mean = ____________
SD = ______________
d) Suppose the time it takes for a purchasing agent to complete an online ordering
process is normally distributed with a mean of 8 minutes and a standard deviation of 2
minutes. Suppose a random sample of 25 ordering processes is selected.
(i) The standard deviation of the sampling distribution of mean times is
__A. 0.4 minutes
__B. 2 minutes
__C. 0.08 minutes
__D. 1.6 minutes
__E. 0.12 minutes
(ii) What is the probability that the sample mean will be less than 7.5 minutes?
__A. 0.3944
__B. 0.1056
__C. 0.2114
__D. 0.4013
__E. 0.8944
e) The mean height of male UBC students is 70 inches, with SD 3 inches. The mean
height of female UBC students is 65 inches, with SD 4 inches. You measure the heights
of random samples of 100 males and 100 females. Which result is the most unlikely? To
decide, compute the z-score for each result and write the values in the spaces provided.
__A. One randomly chosen male having a height of 79 inches or more
__B. One randomly chosen female having a height of 74 inches or more
__C. All females in your sample having an average height of 68 inches or more
__D. All males in your sample having an average height of 73 inches or more
z-score for A = _______
MT2013: Question 7
a) EU (European Union) countries report that 46% of their labour force is female. Is the
percentage of females in the Canadian labour force the same? Statscan plan to check a
random sample selected from more than 10,000 employment records on file to estimate
the percentage of females in the Canadian labour force.
(i) Statscan wants to estimate the percentage of females in the Canadian labour force to
within 5% with 90% confidence. How many employment records should be sampled?
__A. 121
__B. 269
__C. 451
__D. 382
__E. 1000
(ii) Suppose that Statscan wants to be 90% confident of estimating the percentage of
females in the labour force to within 2% of the true percentage. Which of the following
would they have to do?
__A. Decrease the sample size
__B. Select the same number of employment records
__C. Increase the sample size
__D. Decrease the precision
__E. Increase the sampling error
(iii) They actually select a random sample of 525 employment records, and find that 229
of the people are females. The 90% confidence interval is closest to:
__A. 40.1% to 47.2%
__B. 27.5% to 59.7%
__C. 17.8% to 69.4%
__D. 42.4% to 56.8%
__E. 12.4% to 71.0%
b) For each of the following statements about a 95% confidence interval (CI) for the
mean, decide whether it is True or False. Circle your answers at the right.
1. Results from 95% of all samples will lie in this interval.
True
False
True
False
True
False
True
False
True
False
True
False
10
MT2013: Question 8
Hypothetically speaking
Suppose that a report indicates that 28% of Canadians have experienced difficulty in
making mortgage payments. Further suppose that a news organization randomly sampled
400 Canadians from 10 cities and found that 136 reported such difficulty. Does this
indicate that the problem is more severe among these cities?
a) The correct null and alternative hypotheses are
__A. H0 : p = 0.28 and Ha : p > 0.28
__B. H0 : p = 0.28 and Ha : p < 0.28
__C. H0 : p = 0.28 and Ha : p 0.28
__D. H0 : p 0.28 and Ha : p = 0.28
__E. H0 : p > 0.28 and Ha : p = 0.28
b) The correct value of the test statistic is:
__A. 1.28
__B. 2.67
__C. 2.67
__D. 1.96
__E. 1.28
11
MT2013: Question 9
Insurance companies track life expectancy information to assist in determining the cost of
life insurance policies. Last year the average life expectancy of all policyholders was 77
years. ABI Insurance wants to determine if their clients now have a longer life
expectancy, on average, so they randomly sample some of their recently paid policies.
The insurance company will only change their premium structure if there is evidence that
people who buy their policies are living longer than before. The sample has a mean of
78.6 years and a standard deviation of 4.48 years.
86 75 83 84 81 77 78 79 79 81
76 85 70 76 79 81 73 74 72 83
a) The appropriate null and alternative hypotheses are:
H0: _________________
Ha: _________________
b) Give the formula for the appropriate test statistic and compute its value.
Formula: ________________
e) Suppose ABI randomly samples 100 recently paid policies. This sample yields a mean
of 77.7 years and a standard deviation of 3.6 years. Compute a 95% confidence interval.
Report it in the format [xx.x , xx.x] with one decimal place. [_________ , _________]
12
MT2013: Answer 2
a) C. b) E. c) A.
d) A. e) B.
13
MT2013: Answer 3
a) C.
b) (i) C. (ii) 11.6 (iii) Lower inner fence = 20.24 Upper inner fence = 66.64
(iv) B. (v) Decrease, Stay the same, Increase, Stay the same
c) D. d) 15, 189, 138 e) E.
Details and Comments:
a) Look at the formula for standard deviation. If all numbers are equal, then they are also
all equal to the mean, so all the deviations are zero. This is the only way the standard
deviation can be zero.
b) (i) The median is closer to Q3 than to Q1 so the distribution is skewed to the left.
(ii) IQR = Q3 Q1 = 49.24 37.64
(iii) Lower inner fence = 37.64 1.511.6; Upper inner fence = 49.24 + 1.511.6
(iv) Yes, only on the right side of the distribution since the maximum exceeds 66.64.
(v) Decreasing the lowest data value decreases the sum, and hence the mean. But it
doesnt really affect which is the middle value or the quartiles. The range increases.
c) Quartiles divide the area of the distribution into four equal sections.
d) (i) Count up the number of data values. Dont forget to attach the leaf to the stem for
the maximum and median.
e) Monthly sales are more variable in Vancouver compared to Toronto since the box is
taller.
MT2013: Answer 4
a) (i) A. Negative, moderately strong
b) Large Neg.; Large Pos.; Small
14
MT2013: Answer 5
a) = 74.73 3.19x b) -0.95 c) decreases by 3.19
d) 90% e) 49
f) Residual = 11; Yes, it is an outlier since the resident is more than 2.5 s away from 0.
g) 57.5 h) 35 to 80
Details and Comments:
a)
= 59.5 (-3.19)(5.4) = 74.73)
b) Rearrange the formula for
= (-3.19)(3.3/11.1) = -0.95
c) Interpretation of slope.
d) r2 = (-0.95)2 = 0.90 or 90%
e) (8) = 74.73 3.19(8) = 49.21 (Round to 49)
f) Residual = y = 60 49 =11; remember the 68-95-99.7 Rule for identifying
outliers/unusual observations.
g) Since x is unknown, just use the mean of y.
h) Use the 68-95-99.7 Rule, i.e. 57.5 2(11.1) = 35.3, 74.7
MT2013: Answer 6
a) A. I and II only
b) (i) Mean = 20; SD = 10
(ii) A.
c) (i) Mean = 95 ; SD = 5
(ii) Mean = 2375; SD = 25
d) (i) A. 0.4 minutes
(ii) B. 0.1056
e) D. z-scores: 3, 2.25, 7.5, 10;
D has the highest z-score and therefore is the most unlikely.
Details and Comments:
a) First-year students ages will vary only slightly since most are within a year or two in
age. There might be some older students, i.e. those returning to school etc., but it is
highly unlikely to have students who are much younger than 18 or 19!
b) (i) Computations: Pr(Z > z) = 0.5 => z = 0, so X = + z => 20 = + 0 => = 20
Pr(Z > z) = 0.1587 => z = 1, so X = + z => 30 = 20 + 1 => = 10
(ii) Computations: Pr(15 < X < 25) = Pr([15-20]/10 < Z < [25-20]/10)
= Pr(-0.5 < Z < 0.5) = 1 2(0.3085) = 0.383
c) (i) Computations: E(X+Y) = E(X) + E(Y) =15 + 80 = 95;
Var(X+Y) = Var(X) + Var(Y) (since indep.) = 10 + 15 = 25, so SD =25 = 5
(ii) Computations: E(T) = 25(95) = 2375; Var(T) = 25(25) = 625; SD = 625 = 25
d) (i) /n = 2/25 = 0.4
(ii) Pr( < 7.5) = Pr(Z < [7.5-8]/0.4) = Pr(Z < -1.25) = 0.1056
e) Computations:
z-score for A = [79-70]/3 = 3
z-score for B = [74-65]/4 = 2.25
z-score for C = [68-65]/[4/100] = 7.5
z-score for D = [73-70]/[3/100] = 10
15
MT2013: Answer 7
a) (i) B. 269 (ii) C. Increase the sample size
(iii) A. [ 40.1% , 47.2% ]
b) 1. False; 2. False; 3. True; 4. False; 5. False; 6. True
Details and Comments:
a) (i) n = (1.6452)(0.46)(0.54)/(0.052) = 269
(ii) Look at the formula for the CI. The sample size is in the denominator of the margin of
error, so increasing the sample size decreases the margin of error.
(iii) =229/525 = 0.4362;
90% CI: 0.4362 1.645
= 0.4362 0.0356 or [0.4006, 4718])
b) 1. The interval changes from sample to sample
2. Population parameters dont vary; sample statistics vary
3. Higher confidence requires wider intervals
4. CIs are not about individual data values; they are about estimates
5. All CIs for mean include the sample mean; only 95% include the population mean
6. Definition of a CI
MT2013: Answer 8
a) A. H0: p = 0.28 and Ha: p > 0.28
b) C. 2.67
c) E. 0.0038 d) A. e) C.
f) B. 2000
= 1.597
A sole practitioner
ASW, a regional shoe chain, has recently launched an online store. Sales via the Internet
have been sluggish compared to their brick and mortar stores, and management suspects
that its regular customers have concerns regarding the security of online transactions. To
determine if this is the case, they plan to survey a sample of their regular customers.
a) Suppose that ASWs regular customers belong to a rewards program and have a
customer rewards ID number. ASW decides to randomly select 100 numbers. This
sampling plan is called:
__A. Simple Random Sampling
__B. Stratified Sampling
__C. Cluster Sampling
__D. Systematic Sampling
__E. Convenience Sampling
b) Suppose that ASW has an alphabetized list of regular customers who belong to their
rewards program. After randomly selecting a customer on the list, every 25th customer
from that point on is chosen to be in the sample. This sampling plan is called:
__A. Simple Random Sampling
__B. Stratified Sampling
__C. Cluster Sampling
__D. Systematic Sampling
__E. Convenience Sampling
c) All regular ASW customers is known as the ________ of the study.
__A. Parameter
__B. Statistic
__C. Target Population
__D. Sampling Frame
__E. Sample
d) Which of the following is the parameter of interest in the ASW study?
__A. All regular ASW customers
__B. % of regular ASW customers who have concerns about online security
__C. ASW customers who belong to the rewards program
__D. % of ASW customers who belong to the rewards program but dont shop online
__E. None of the above
e) One member of the management team at ASW suggests that their survey could be
done online. Customers logging on to the online store would be asked to complete the
survey and offered a coupon as incentive to participate. Which statement is true?
__A. This is a voluntary response sample
__B. This would result in an unbiased random sample
__C. This would result in a biased sample
__D. Both A and B
__E. Both A and C
17
Planning ahea
d
MT2012: Question 2
A brokerage firm gathered information on how their clients were investing for retirement.
Here is a small sample of the data they collected.
Respondent
Number
1001
1002
1003
Age
Gender
45
53
58
Male
Female
Female
Household
Income
$155,000
$160,000
$210,000
a) Place an X in the space beside each variable that is best described as Quantitative.
__ Respondent Number
__ Age
__ Gender
__ Household Income
__ Self-directed RRSP
__ Book value of portfolio
Based on age, clients were categorized according to where the largest percentage of their
retirement portfolio was invested and shown in the table below.
Age 50 or Younger Over Age 50
Mutual Funds
30
34
Stocks
37
45
Bonds
19
23
Total
86
102
Total
64
82
42
188
b) The percentage of clients who are over age 50 and invest in mutual funds is:
__A. 53.1% __B. 33.3% __C. 18.1% __D. 34.0% __E. 54.3%
c) Of the clients over age 50, the percentage who invest in mutual funds is:
__A. 53.1% __B. 33.3% __C. 18.1% __D. 34.0% __E. 54.3%
d) Of the clients who invest in mutual funds, the percentage over age 50 is:
__A. 53.1% __B. 33.3% __C. 18.1% __D. 34.0% __E. 54.3%
e) The percentage of clients over age 50 is:
__A. 53.1% __B. 33.3% __C. 18.1%
__D. 34.0%
__E. 54.3%
f) Consider the following side-by-side bar chart for the data below:
Does the chart indicate that mode of
investment is independent of age?
Yes
No
18
MT2012: Question 3
Here is a histogram and the five number salary for salaries (in $) for a sample of 48
marketing managers.
Min
46360
Q1
69693
Median
77020
Q3
91750
Max
129420
19
MT2012: Question 4
To determine whether the cash bonus paid by a company is related to annual pay, data
were gathered for 10 account executives at Outstanding Management Group (OMG) who
received cash bonuses in 2007. The data and summary statistics are shown below.
Mean
Standard Deviation
ANNUAL PAY
$ 70,609
$ 58,487
$ 104,561
$ 43,922
$ 82,613
$ 116,250
$ 76,751
$ 68,513
$ 137,000
$ 94,469
CASH BONUS
$ 11,225
$ 6,238
$ 14,194
$ 4,188
$ 11,863
$ 13,671
$ 7,758
$ 20,760
$ 55,000
$ 34,368
$ 85,318
$ 28,077
$ 17,927
$ 15,618
Correlation
0.735
b) What would the correlation be if the Dollars were converted to Euros at the current
conversion rate of (1 Canadian Dollar = 0.76 Euros)?
c) Estimate the linear regression model that relates the response variable (cash bonus) to
the predictor variable (annual pay).
Slope of the regression line:
___________________________
d) From the equation, in part c), estimate the cash bonus for an executive at OMG earning
$82,613 a year, and compute the residual for this estimate.
Estimated cash bonus: ___________
Residual: ____________
21
e)Would you be confident in using your regression equation to estimate the cash bonus
for an executive at OMG earning $200,000 a year?
Yes
No
Reason:
f) Below is a plot showing residuals versus fitted values for the estimated regression
equation relating cash bonus to pay for the account executives at OMG.
Circle the conditions for linear regression which are violated, if any.
None are violated
Linearity
Normality
Constant Variance (Equal spread)
Independence
Parts (g) through (i) are unrelated to parts (a) through (f):
g) In commenting on the increase in home foreclosures (i.e. banks repossessing homes), a
news reporter stated there appears to be a strong correlation between home foreclosures
and job loss of the head of household. Comment on this statement; use one sentence
only.
h)A research study investigated the relationship between number of hours individuals
spend on the Internet and age. Which is the predictor variable? Circle your choice.
Hours on Internet
Age
1.00
-1.00
0.50
-0.50
0.00
22
MT2012: Question 5
The Survey of Study Habits and Attitudes (SSHA) is a psychological test that measures
academic motivation and study habits. Females score higher, on average, than males. The
distribution of SSHA scores among the female students at a university has mean 120 and
standard deviation 28; the distribution among male students has mean 105 and standard
deviation 35. Scores are normally distributed. Assume also that scores are independent.
a) What percentage of female students have SSHA scores greater than 162? Report your
percentage to one decimal place only.
b) What SSHA score is exceeded by only 10% of female students? Round your answer
to the nearest whole number.
c) Compute the lower and upper quartiles for the distribution of scores of female students.
Round your answers to the nearest whole numbers.
d) Suppose you select a single female student and a single male student at random and
give them the SSHA test. What are the mean and the standard deviation of the difference
(female minus male) between their scores. Report to one decimal place.
Mean = __________
e) Using your answers from part d), compute the probability that the chosen female has a
higher score than the chosen male.
23
f) Suppose Angelina (a female) scores 78 on the SSHA, while Brad (a male) scores 70 on
the SSHA. Use an appropriate calculation to determine who did worse compared to the
average for their gender. Circle the name of the person who did worse.
Angelina
Brad
Explanation:
MT2012: Question 6
A convenient truth
Part I. A convenience store owner suspects that only 10% of the customers buy
magazines and thinks that he might be able to sell something more profitable. In order to
decide whether he should stop selling them, he tracks the number of customers who buy
magazines on a given day.
a) On that day he had 300 customers. Assuming it was a typical day and that his estimate
is correct, what are the mean and standard deviation of the number of customers who buy
magazines each day? Report your answers to one decimal place.
Mean = ___________
b) What is the probability that 25 to 35 customers (inclusive) bought magazines that day?
c) How many magazine sales would you consider to be very strong evidence that his 10%
estimate was too low. That is, what number of sales would be extremely unusually high?
Hints: Use The Empirical (68-95-99.7) Rule. Remember to give a whole number answer.
Part II. Past records indicate that the magazines he sells on any day have an average
revenue of $150 with a standard deviation of $30. Suppose he takes a random sample of
36 past days sales receipts and records the dollar value of magazine sales.
a) Describe the sampling distribution for the sample mean by naming the model and
telling its mean and standard deviation.
b) Suppose the resulting sample mean is $130. Do you think that this sample result is
unusually small? Explain.
24
MT2012: Question 7
One division of a telecommunications equipment company reports that 12% of nonelectrical components are reworked. Management wants to determine if this percentage is
the same as the percentage rework for electrical components manufactured by the
company. The Quality Control Department plans to check a random sample of the over
10,000 electrical components manufactured across all divisions.
a) The Quality Control Department wants to estimate the true percentage of rework for
electrical components to within 4%, with 99% confidence. How many components
should they sample?
__A. 651
__B. 1000
__C. 344
__D. 438
__E. 579
b) They actually select a random sample of 450 electrical components and find that 46 of
those had to be reworked. The 99% confidence interval is closest to:
__A. [ 0.0654 , 0.1390 ]
__B. [ 0.0432 , 0.1608 ]
__C. [ 0.0763 , 0.1277 ]
__D. [ 0.0541 , 0.1499 ]
__E. Cannot be determined with the given information.
c) The 95% confidence interval based on these data is 0.0742 to 0.1302. Which one of
the following is the correct interpretation?
__A. The percentage of electronic components that are reworked is
between 7.4% and 13.0%.
__B. We are 95% confident that between 7.4% and 13.0% of electrical
components are reworked.
__C. The margin of error for the true percentage of electrical components
that are reworked is between 7.4% and 13.0%.
__D. All samples of size 450 will yield a percentage of reworked electrical
components that falls within 7.4% and 13.0%.
__E. There is a 95% chance that 7.4% to 13.0% of the electrical components
have to be reworked.
d) Based on the 95% confidence interval, should the Quality Control Department
conclude that the percentage of rework for the electrical components is lower than the
rate of 12% for non-electrical components?
__A. Yes, because the lower limit of the confidence interval is 7.4%.
__B. Yes, because 12% is contained with the 95% confidence interval.
__C. No, because 12% is contained with the 95% confidence interval.
__D. No, because the upper limit of the confidence interval is 13.0%.
__E. We cannot say since the sample size is not large enough.
e) All else being equal, increasing the level of confidence desired will...:
__A. ...tighten the confidence interval
__B. ...decrease the margin of error
__C. ...increase precision
__D. ...increase the margin of error
__E. ...increase the margin of error and tighten the confidence interval
25
MT2012: Question 8
A dip in chips
A company manufacturing computer chips finds that 8% of all chips manufactured are
defective. Management is concerned that high employee turnover is partially responsible
for the high defect rate. In an effort to decrease the percentage of defective chips,
management decides to provide additional training to those employees hired within the
last year. After training was implemented, a sample of 450 chips revealed only 27 with
defects. Was the additional training effective in lowering the defect rate?
a) The appropriate null and alternative hypotheses are:
H0: ______________
Ha: ______________
b) Give the formula for the appropriate test statistic and compute its value.
c) Assume that the value of the test statistic is 1.4. Dont use your computed value from
part b).The P-value associated with the given test statistic is closest to:
__A. 0.0404
__B. 0.05
__C. 0.0808
__D. 0.1616
__E. 0.9192
d) From the P-value in part c), and using a 1% significance level (i.e. = .01), which of
the following is true?
__A. Conclude that additional training significantly lowered the defect rate.
__B. Conclude that additional training did not significantly lower the defect rate.
__C. Conclude that additional training significantly increased the defect rate.
__D. Conclude that additional training did not affect the defect rate.
__E. No conclusion can be made with the given information.
26
MT2012: Question 9
A large software development firm recently relocated its facilities. Top management has
encouraged their professional employees to engage in local service activities. They
believe that the firm's professionals volunteer an average of more than 15 hours per
month. If this is not the case, they will institute an incentive program to increase it. A
random sample of 24 professionals reported the following number of hours:
12 13 14 14 15 15 15 16 16 16 16 16
17 17 17 18 18 18 19 19 19 20 20 22
The sample has a mean of 16.75 hours and a standard deviation of 2.40 credit hours.
a) The correct null and alternative hypotheses are:
__A. H0 : = 15 and Ha : > 15
__B. H0 : = 15 and Ha : > 15
__C. H0 : = 15 and Ha : < 15
__D. H0 : 15 and Ha : = 15
__E. H0 : = 15 and Ha : 15
b) The correct value of the test statistic is closest to:
__A. 3.572
__B. -3.572
__C. 1.327
__D. -1.327
__E. 0.729
c) Which of the following conclusions is correct?
__A. We reject the alternative hypothesis at the 5% significance level.
__B. We fail to reject the null hypothesis at the 5% significance level.
__C. An incentive program is needed since the evidence indicates professional
employees volunteer an average of no more than 15 hours per month.
__D. We reject the null hypothesis; the firm shouldn't need to institute an
incentive program since the evidence indicates that professional
employees volunteer an average of more than 15 hours per month.
__E. No conclusion can be reached about the hypothesis with the information
that is given.
d) It is appropriate to test the mean because:
__ A. The data are a simple random sample from the population of interest
__ B. The distribution of the sample data appears to be approximately normal
__ C. Volunteer hours is likely to be independent across employees
__ D. All of the above
e) A 95% confidence interval for the true mean number of hours of volunteer time is
closest to:
__A. 16.75 1.016
__B. 16.75 0.840
__C. 16.75 4.966
__D. 16.75 4.114
__E. 2.40 7.074
MT2012 END OF QUESTIONS; ANSWERS AND EXPLANATIONS FOLLOW
27
e) E.
MT2012: Answer 2
a) Age, Household Income, Book value of portfolio
b) C. 18.1% c) B. 33.3% d) A. 53.1% e) E. 54.3%
f) Yes: The age distribution (ratio of younger to older) is about the same for each mode
(i.e. type) of investment.
Details and Comments:
a) Age (yrs), Household Income ($), and Book Value ($) all have units and are measured
on a continuum, so they are quantitative.
b) 34/188 = 0.181
c) 34/102 = 0.333
d) 34/64 = 0.531
e) 102/188 = 0.543
f) Look for differences across the clusters of bars.
MT2012: Answer 3
a) C. Skewed to the right
b) A. Mode < Median < Mean
c) B. $ 13,843
d) B. $22,057
e) Lower inner fence = $36,607.50; Upper inner fence = $124,835.50
f) B. g) A. h) B. i) D.
Details and Comments:
a) Long right-hand tail: more of the area is piled up to the left.
b) The mode is the peak and it is clearly to the left of the median value of 77020. The
median is less than the mean for a right-skewed distribution.
c) Use the rule of thumb: s Range/6
d) IQR = Q3 Q1 = 91750 69693 = 22,057
e) Lower inner fence = 69,693 1.522,057 = $36,607.50
Upper inner fence = 91,750 + 1.522,057= $124,835.50
f) The maximum is larger than the upper fence but the minimum is not smaller than the
lower fence.
g) The sum is increased so the mean is increased.
h) The median is the line in the interior of the box.
i) Variability is shown by the length of the box.
28
MT2012: Answer 4
a) r2 = 0.7352 = 0.5402 or 54%
b) Unchanged at 0.735
c)
= 0.735(15,618/28,077) = 0.409;
= 17,927 (0.409)(85,318) = -16,968;
= -16,968 + 0.409x
d) (82,613) = -16,968 + 0.409(82,613) = $16,821
Residual = 11,863 16,821 = -$4,958
e) No; a prediction at $200,000 requires extrapolation beyond the range of data.
f) Constant Variance (V-shape indicates violation of this assumption)
g) The two variables are categorical, not quantitative, so correlation is not appropriate.
h) Age
i) E. 0.00
Details and Comments:
a) This is the definition of r-squared.
b) The correlation coefficient has no units; it doesnt change if the measurement units
change.
c) Straightforward application of least squares regression line formulas.
d) Substitute the x-value into the regression equation to get the predicted y. The residual
is the observed y minus the predicted y.
h) Age precedes and therefore predicts Hours on Internet.
i) The best-fitting straight line is horizontal.
MT2012: Answer 5
a) 6.7%
b) 156
c) Q1 = 101; Q3 = 139
d) Mean = 15; SD = 44.8
e) 0.6293 or 0.63 or 63%
f) Angelina; Z-score for Angelina = -1.5; Z-score for Brad = -1.0;
Details and Comments:
a) Standardize the X-value; 162 is 1.5 SDs above the average. Find the area to the right of
1.5 on the Z-curve.
Pr(X > 162) = Pr(Z > [162 120]/28) = Pr(Z > 1.5) = 0.0668.
b) Find the value of Z that has an area of 10% to the right; then unstandardize.
z = 1.28; X = 120 + 1.28(28) =155.8.
c) Find z-values that have an area of 25% to the right and to the left; then
unstandardize. Since the Z is symmetric, the z-value on the left is the negative of the zvalue on the right.
Q1: z = 0.675; X = 120 + (0.675)(28) = 101
Q3: z = 0.675; X = 120 + (0.675)(28) = 139
d) Mean = 120105 =15; SD =
= 44.8
e) Pr(FM > 0) = Pr(Z > [015]/44.8) = Pr(Z > -0.33) = 0.6293 or 0.63 or 63%
f) Z-score for Angelina = (78120)/28 = -1.5; Z-score for Brad = (70105)/35 = -1.0;
Angelina did worse relative to the reference populations since her Z-score more negative.
29
MT2012: Answer 6
Part I.
a) Mean = np = 3000.10=30.0; SD =
=
= 5.2
b) Pr(25 X 35) = Pr([2530]/5.2 < Z < [3530]/5.2) = Pr(-0.96 < Z < 0.96)
= 1 2(0.1685) = 0.663.
c) From the Empirical Rule, 3 SDs above the mean is extremely unusual;
+3 = 30 + 3(5.2) = 45.6. Sales of 46 or more would be extremely unusual.
Part II.
a) Normal: Mean = 150 and SD = 30/
=5
b) Pr( < 130) = Pr(Z < [130 150]/5) = Pr(Z < -4) < 0001
There is an extremely small probability of getting a sample mean this small.
Details and Comments:
Part I.
a) Use the mean and standard deviation of a count.
b) Use the normal sampling distribution of a count. (Note: Continuity correction was not
needed, but if you used it correctly you would get an answer of 0.711.)
Part II.
a) Use the mean and standard deviation of a mean. (Note: The CLT applies here, but it is
not necessary to say this in the answer.)
b) Use the normal sampling distribution of a mean.
MT2012: Answer 7
a) D. b) A. c) B.
d) C. e) D.
30
MT2012: Answer 8
a) H0 : p = 0.08 and Ha : p < 0.08
b) Formula and computed value; = 27/450 = 0.06
z=
c) C.
= -1.56
d) B.
as in the
confidence interval.
c) Find the area to the left of -1.4 on the standard normal curve.
d)Since the P-value is not less than 0.05 the evidence is not statistically significant.
MT2012: Answer 9
a) B. b) A. c) D. d) D. e) A.
Details and Comments:
a) H0 : = 15 and Ha : > 15.
Use one-tailed alternative since the question is about increasing the volunteer time.
b) t =
= 3.572
c) The P-value is much smaller than 0.05 so reject the null hypothesis. The volunteer time
is greater than 15 hours. So no incentive program is needed to get past 15 hours.
d) These are the assumptions/conditions for a one-sample t-test.
e) 16.75 2.0692.40/
= 16.75 1.016
END OF ANSWERS AND EXPLANATIONS TO MIDTERM 2012
31
MT2011: Question 1
a) At the beginning of the term we asked all Commerce 291 students to complete our online survey. This survey was most likely designed to be:
__A. a random sample of all C291 students
__B. a census of all C291 students
__C. a random sample of business students
__D. a random sample of 2nd year UBC students
__E. all of the above
b) The survey asked a wide range of questions. For each variable, circle the description
which best describes the type of data the variable represents.
Ethnic background
Height
C290 grade
# hrs online per day
Categorical
Categorical
Categorical
Categorical
Quantitative
Quantitative
Quantitative
Quantitative
Identifier
Identifier
Identifier
Identifier
c) From the survey results, we can estimate that, on average, students spent 15.2 hours
per week studying. This number seems high given that for a course load of 4 courses the
students spend 12 hours per week in the classroom and nearly half of the students
reported doing paid work. What is the most likely explanation?
__A. the data are very skewed and the median is a better numerical summary
__B. the data are bimodal, the two groups are those that work and those that dont
__C. women study more than men
__D. none of the above
d) Unfortunately, not every C291-registered student responded to the survey. If it were
true that students who didnt respond also spend less time studying, then our estimate of
study time from the survey is:
__A. a good estimate of average study time of C291 students
__B. biased above the true average study time of C291 students
__C. biased below the true average study time of C291 students
__D. not a good estimate for study time of C291 students but
we cant say whether it is too high or too low.
e) From the survey we find that the Commerce 290 Grade (call this variable, X) has a
symmetric, bell-shaped distribution. Also, 95% of the grades fall in the range 53 to 93.
Use that information to compute the mean and standard deviation of X. Report to at most
one decimal place.
Mean of X
= _____
SD of X
= _____
32
MT2011: Question 2
Stock answers are sufficient here
a) The following data are the price-to-earnings ratios (P/E ratio) for a random sample of
25 stocks traded on the NYSE. The data values have been sorted from smallest to largest.
Data: 4 8 11 11 12 13 13 14 14 15 16 17 17 17 19
21 22 22 24 24 26 28 33 35 39
The mean of these values is 19.0 and the standard deviation is 8.5.
i) Find the following:
Median
Q1
Q3
IQR
Inner fences
Outliers:
= ______
= ______
= ______
= ______
= ________________
= __________________ (If there are no outliers, write None)
ii) Is the distribution symmetric or skewed? (Note: You do not have to draw a graph to
answer this.) Circle your choice. Then give your reason.
Symmetric
Skewed
Reason:
iii) Sketch a boxplot of these data. Use the version based on the five-number summary;
do not use the modified version using fences.
b) Determine whether each statement is true or false? Circle your choice. No explanation
is required.
1. If the mean and SD are equal for a measurement variable
True False
that only takes positive values, the distribution is symmetric.
2. If the mean and median are equal, the distribution must
True False
be normal.
3. If the mean and median are equal, the mode must also
True False
equal the mean and median.
4. The SD and IQR are always equal for a symmetric
True False
distribution.
5. The SD of a set of data values can never be zero.
True False
33
MT2011: Question 3
To-fu or not to-fu, that is the question
Read the following survey design plan and then answer the questions after it.
Get Healthy, a producer of health foods conducts a survey of the Lower Mainland to
determine how receptive high school students would be to its TOFU BURGH product and
what market potential (sales) it could expect. It plans the survey as follows:
i. From the list of all schools in the area, two groups are defined, public and private high
schools, called PUBS and PRIS
ii. From the PUBS, four schools are chosen randomly.
iii. From the PRIS, one school is chosen randomly.
iv. In the PUBS schools selected, on
one day, researchers give every
fifteenth student to exit the school a
TOFU BURGH and a-stamped, selfaddressed postcard (like the one here).
v. In the PRIS school, researchers set
up a stand outside the school and give a
free TOFU BURGH and the postcard to
any student who comes to the stand.
a) The overall survey sampling design planned by the company can best be described as:
__A. convenience sampling
__B. multi-stage sampling
__C. stratified sampling
__D. simple random sampling
__E. cluster sampling
b) In the PUBS selected, the sampling design uses:
__ A. systematic sampling
__ B. voluntary response strategy
__ C. unacceptable bribery of students
__ D. anecdotal responses
c) In the PRIS selected, the sampling design uses:
__ A. systematic sampling
__ B. voluntary response strategy
__ C. unacceptable bribery of students
__ D. anecdotal responses
d) One parameter of interest is likely to be:
__ A. the total number of students who replied to the survey
__ B. the number of high school students in the Lower Mainland
__ C. the number of students who replied they would buy at least one
TOFU BURGH in a typical week
__ D. the proportion of students who replied they would buy at least one
TOFU BURGH in a typical week
e) Which of the two samples is likely to have non-response bias?
__ A. PUBS schools only
__ B. PRIS school only
__ C. Both PUBS and PRIS schools
__ D. Neither will have non-response bias
34
MT2011: Question 4
(ii) How is non-response related to the size of the business? Use percents to make your
statement precise.
c) A study shows that there is a positive correlation between the size of a hospital
(measured by its number of beds, x) and the median number of days, y, that patients
remain in the hospital. Does this mean that you can shorten a hospital stay by choosing a
small hospital? Explain your answer choice.
Yes
No
Reason:
35
MT2011: Question 5
a) At a well-known business school the grade point averages (GPA) of its 1000
undergraduates are normally distributed with mean 2.84 and standard deviation 0.40.
(i) What percentage of the undergraduates have GPAs below 2.00 (i.e. on probation)?
Answer: ________
(ii) What GPA will be exceeded by only 20% of the student body?
Answer: ________
(iii) Compute the lower and upper quartiles, and the interquartile range for this
distribution.
Q1 = _______
Q3 = _______
IQR = ______
b) Bart scores 725 on the mathematics section of the Scholastic Aptitude Test (SAT). In a
reference population, SAT scores are normally distributed with mean 500 and standard
deviation 100. Lisa scores 33 on the American College Test (ACT) mathematics test;
ACT scores are normally distributed with mean 18 and standard deviation 6.
(i) What are the z-scores for each student?
Bart: _______
Lisa: _______
(ii) Circle either the name Bart or Lisa (above) based on who did better relative to the
reference populations.
36
MT2011: Question 6
a) To test the strength of building materials such as steel girders, engineers place
increasing loads on the girders until they break. The pressure exerted by the load that
eventually breaks the material is call the strength of the girder. Generally speaking, the
longer the girder, the less the strength. Your company makes steel girders. The engineer
in charge of testing tells you that he has tested 10 girders to breaking point and has
obtained data linking the length of each girder (in metres) to its strength (in kg per square
centimetre). But his computer crashed just after he ran a regression analysis on the data
and all he can remember is the lengths of the girders and a few strengths. He did manage
to record the means and standard deviations of all the lengths and strengths and the r2 of
the regression, which was 0.719.
Mean
SD
Note: The means and standard deviations are calculated for the ENTIRE data set,
including those that are missing.
(i) What is the correlation between length and strength? Report to three decimal places.
(ii) Work out a regression equation that predicts strength from length.
Equation: ___________________________
(iii) You notice that the purchaser of your girders requires the 5 m girders to support an
average load of 75 kg per square centimetre. Do you feel confident your girders will do
that? Give a numerical rationale.
37
b) What is the correlation coefficient for the following three points in the X-Y plane?
(STOP AND THINK BEFORE YOU START!)
X
1
3
5
Y
4
3
2
Answer: __________
c) An American study found that the correlation between two-year-old childrens heights
(measured in inches) and their weights (measured in pounds) was 0.46. What would the
correlation coefficient be if you converted their heights to centimetres and weights to
kilograms? (One inch = 2.54 cm and 1 pound = 0.454 kg.)
Answer: __________
d) An economist studied salaries of 321 bank employees with five or less years of
employment in a national bank. He found that the relationship between years of service
and salary was linear and that the regression equation predicting salary (in thousands of
dollars) was: Salary = 21.5 + 3.1 * Years.
He concludes that employees with 10 years of service should make an average salary of
$52,500. Is his conclusion correct? If not, say why.
e) In part d) the economist has used the regression equation to make a prediction. Which
of these numbers best measures the precision of this prediction?
__A. The slope of the line (b1)
__B. The standard deviation of y (sy)
__C. The standard deviation of x (sx)
__D. The square of the correlation coefficient (r2)
__E. The ratio of the two standard deviations (sy / sx)
f) An investigator measuring various characteristics of a large group of athletes found that
the correlation coefficient between the weight of the athlete and the weight that the
athlete could lift was r = 0.60. Determine whether each statement is true or false. Circle
your choice.
(i) If an athlete gains 5 kg, he/she will be able to lift
an additional 3 kg.
(ii) The more an athlete can lift, on the average the more
that athlete weighs.
(iii) 36 per cent of the athletes lifting ability can be
attributed to his or her weight alone.
(iv) 60 per cent of the athletes lifting ability can be
attributed to his or her weight alone.
True
False
True
False
True
False
True
False
38
MT2011: Question 7
b) Suppose it is also known that the repair time for a trouble call has a mean of 480
minutes and a standard deviation of 250 minutes. A random sample of 400 trouble calls
was taken and the repair times recorded. Compute the probability that the mean of the
400 repair times is less than 500 minutes.
39
MT2011: Question 8
40
MT2011: Question 9
You are the new Operations Manager of the local public transportation company and are
especially interested in the reliability of bus service. You plan, on a monthly basis, to take
a random sample of major bus stops and observe whether the buses depart on time or late
and how late they are. (Buses never leave early since, if they arrive early, they wait until
their departure will be exactly on time.)
a) The first month, you gather a random sample of 121 bus departures from a variety of
times of day, days of the week, routes and locations. The sample has an average lateness
of departure of 6.4 minutes with a standard deviation of 1.8 minutes. Which of the
following is closest to a 95% confidence interval for the average lateness of departures
for the entire bus system this month.
__ A. 6.4 0.029
__ B. 6.4 0.271
__ C. 6.4 0.324
__ D. 6.4 3.564
b) Which of the following would decrease the width of the confidence interval?
__ A. Reduce the confidence level
__ B. Increase the sample size
__ C. Reduce the sample standard deviation
__ D. All of the above
Five years ago, the system-wide mean lateness of departure was known to be 6.8 minutes.
Using a 5% level of significance and the sample results of part a), carry out a hypothesis
test to decide whether the system is improving; that is, whether the mean lateness has
decreased from five years ago.
c) The appropriate null and alternative hypotheses are:
H0: ____________
Ha: ____________
d) Give the formula for the appropriate test statistic and compute its value.
Formula: __________________
Computed value: ______________
(Show your work to the right ==>)
41
f) From the P-value associated with this test statistic, which of the following is correct?
__ A. Do not reject H0 at the 10% significance level
__ B. Reject H0 at the 10% significance but not at the 5% significance level
__ C. Reject H0 at the 5% significance level but not at the1% significance level
__ D. Reject H0 at the 1% significance level
g) Using the 5% significance level, state your conclusion in one clearly worded sentence
that the bus company management can understand.
BONUS: In what century did the equals sign first appear in print?
__ A. 1300s
__ B. 1400s
__ C. 1500s
__ D. 1600s
__ E. 1700s
__ F. 1800s
__ G. 1900s
MT2011 END OF QUESTIONS; ANSWERS AND EXPLANATIONS FOLLOW
42
MT2011: Answer 2
a) i) Median = 17, Q1 = 13, Q3 = 24, IQR = 11;
Inner fences = (-3.5, 40.5). [Accept also (0,40.5).] There are no outliers.
ii) The distribution is skewed since the mean is quite different from the median.
iii)
_________
|-----------|__|_______|-----------------|
___________________________________
0
10
20
30
40
b) All five statements are False.
Details and Comments:
a) i) With 25 data points, the median is the 13th value. The Q1 is between the 6th and 7th
values (which are equal here) and the Q3 is between the 19th and 20th values (which are
also equal here). IQR = Q3 Q1. Since the lower inner fence (Q1 1.5IQR) is negative,
it is also acceptable to report it as 0 because P/E ratios cannot be negative.
ii) Actually, the distribution is skewed to the right, but that distinction was not needed in
the answer.
iii) The sketch must show the skewness, namely that the median is closer to the left side
of the box and the left whisker is shorter than the right whisker.
b)
1. The Empirical Rule wouldnt be able to work so the distribution is NOT symmetric.
2. A distribution can be symmetric without being normal; e.g. pyramid shape, or uniform.
3. A symmetric distribution can have two peaks; the mean and median are in the middle
but the modes are at either end (e.g. U-shaped)
4. There is no reason for this to be true.
5. SD = 0 if all data values are the same.
43
MT2011: Answer 3
a) B or C; b) A; c) B; d) D; e) C;
Details and Comments:
a) Both multi-stage sampling and stratified sampling are acceptable answers.
Technically, multi-stage sampling is the preferred answer, since for PUBS, four schools
are chosen randomly but the actual students are selected systematically.
b) Since every fifteenth student is selected, the selection is systematic, not random.
c) Since students are free to come, or not, to the stand, this is voluntary response.
d) Counts are not parameters because they are not adjusted for sample size; however,
proportions are parameters.
e) Cards are handed out either to every fifteenth student or to volunteers; however, in
each group not everyone who receives a card will mail the card in; thats non-response.
MT2011: Answer 4
a) (i) 52% (625/1200 = 0.52)
(ii) Non-response rates are: Small: 37.5%, Medium: 60%, Large: 80%.
The larger the company the higher the expected rate of non-response.
b) Correlation is not the same as slope. So a perfect correlation does not mean that the
slope is 1, hence a 1 unit increase in x does not mean a 1 unit increase in y.
c) No: Larger hospitals are more likely to take more serious cases requiring longer length
of stay.
Details and Comments:
a) (i) Sum across the columns to get the row totals of 575 Respondents and 625 Nonrespondents. Then divide by the overall total of 1200.
(ii) Column percentages are needed here, not row percentages.
b) Remember the formula for slope:
. Even if r = 1, the slope is still the ratio of
the SDs, which need not be equal.
c) Look for lurking variables to explain unusual or nonsensical correlations.
MT2011: Answer 5
a) (i) Pr (X < 2.00) = Pr (Z < [2.002.84]/0.40) = Pr (Z < 2.10) = 0.0179 or 17.9%.
(ii) Z = 0.84; X = 2.84 + 0.84(0.40) = 3.18 (or 3.176)
(iii) Q1 = 2.57; Q3 = 3.11; IQR = 0.54
Q1 for Z = 0.675; X = 2.84 + (0.675)(0.40) = 2.57
Q3 for Z = 0.675; X = 2.84 + (0.675)(0.40) = 3.11
IQR = 3.11 2.57 = 0.54
b) Bart: 2.25; Lisa = 2.50, Circle Lisa
Z-score for Bart = (725500)/100 = 2.25; Z-score for Lisa = (3318)/6 = 2.50;
Lisa did better relative to the reference populations since her positive Z-score is higher.
Details and Comments:
a) Remember to make sketches of the required areas so that you get the correct parts of
the normal curve. In (i), standardize X to Z and find the corresponding area ; in (ii) and
(iii), begin with the area, find Z and unstandardize to get X.
44
MT2011: Answer 6
a) (i) r =
(ii)
MT2011: Answer 7
a) Pr ( > 0.80) = Pr (Z >
45
MT2011: Answer 8
a) A; b) B; c) A; d) B; e) B; f) C
Details and Comments:
a) Reason: =55/100 = 0.55
b) Reason:
= 0.050
c) Reason: 0.55 1.96(0.050)
d) Reason: n = (2.5762)(0.5)(0.5)/(0.052) = 664
e) Reason: Area to the right of 1.00 on the z-curve.
f) Reason: The P-value is not less than 0.05 (and not even less than 0.10).
MT2011: Answer 9
a) C b) D c) H0: = 6.8; Ha: < 6.8
d) t =
= -2.44
46
Categorical
Quantitative
Neither
Categorical
Quantitative
Neither
Categorical
Quantitative
Neither
Categorical
Quantitative
Neither
b) Credit card customers were divided into two groups: Canadian residents and visitors to
Canada. The average amount spent by all Canadian residents was $200. The average
amount spent by all visitors to Canada was $300. What must be true about the average
amount spent by all customers?
__ A. It must be $250
__ B. It must be larger than the median expenditure
__ C. It could be any number between $200 and $300
__ D. It must be larger than $250
c) A sample of 500 cash sales had a mean of $20 and a standard deviation of $40. The
histogram of the data would most likely be:
__ A. skewed to the left (i.e. long left-hand tail)
__ B. approximately symmetric
__ C. skewed to the right (i.e. long right-hand tail)
__ D. bimodal
d) Which of the following is likely to have a mean that is smaller than the median?
__ A. The salaries of all National Hockey League players
__ B. The grades of students (out of 100) on a very easy exam on which most
score very high or perfectly, but a few do very poorly
__ C. The prices of homes in Vancouver
__ D. The grades of students (out of 100) on a very difficult exam on which most
score poorly, but a few do very well
47
e) Here is the frequency distribution of the ages of a sample of 100 employees of the
Hudson's Bay Company.
Age (years) Frequency
15-19
2
20-24
10
25-29
19
30-34
27
35-39
16
40-44
10
45-49
6
50-54
5
55-59
3
60-64
2
Total
100
(i) What percentage of the employees is 50 or older? _______
(ii) The median age of the employees is:
__ A. About 40
__ B. Between 30 and 34
__ C. Between 40 and 49
__ D. None of the above
(iii) The mean age of the employees is:
__ A. About 34 because about half are younger than 34 and half are older than 34
__ B. Above the median because the distribution is approximately symmetric
__ C. Above the median because the distribution is skewed to the right
__ D. None of the above
f) Based on the following figure, decide whether each of the statements below the figure
is more likely to be True or False. (Note: House income means "total household income"
and is referred to simply as "income" in the statements.)
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
BMW
Cadillac
Lexus
Lincoln
Mercedes
True
False
True
False
True
False
48
b)What percentage of values of Z lie outside 1.5IQR on each side of the median? That
is, find the total percentage below "Median 1.5IQR" or above "Median + 1.5IQR".
c) Draw a boxplot that would represent data obtained from a large sample of values of Z.
____ Sarah
49
c) Are the answers in parts a) and b) contradictory? If so, how can you explain the
contradiction?
d) After disregarding gender, are admission rates different in the two programs? Support
your conclusion with an appropriate two-way table (i.e. admission decision by program).
50
Frolder Study
20
# Eyes
15
10
5
0
0
10
15
20
Weight (kg)
c) Which of the following values is the correct correlation coefficient for this data?
Note: You can reason this out without doing the calculation.
__ A. r = 0.5
__ B. r = 0.975
__ C. r = 0
__ D. r = -0.954
__ E. r = -0.5
d) Looking at the scatterplot, is the correlation coefficient an appropriate measure? Why
or why not?
e) A journalist reporting on this study claims that being heavier causes a frolder to grow
more eyes. What is wrong with this statement?
f) Do you think these five frolders represent a random sample? Why or why not?
51
Dam Wire
1000
800
600
400
200
0
0
200
400
600
800
52
d) A new type of wire has a corrosion rate measure of 555. What does the model predict
for the corrosion measure of this type of wire used at a dam?
e) One of the data points is (220, 245). What is the value of the residual for this point?
g) Can the regression line be used to reliably estimate the dam wire corrosion rate for a
wire which has a rate of 2500 mil under normal use? Give a reason.
___ Yes
___ No
Reason:
h) Fill in each blank with the letter of the ending that fits best.
(i) If the x and y variables are switched, __________.
(ii) If the units are changed for both x and y variables, __________.
(iii) If the units are changed for just the x variable, __________.
(iv) If a constant is added to the y variable, __________.
Endings:
A. ...the slope will change but the averages and standard deviations will not change.
B. ...sx will change but
53
b) Complete silver medals (i.e. medal plus ribbon) weigh 38 grams on average with a
standard deviation of 5 grams. Find the mean, variance and standard deviation of a pair of
complete medals (gold and silver) combined.
c) You were instructed to assume that the weights of the gold medals, silver medals, and
lengths of ribbon are all independent. Is this a reasonable assumption? Explain why or
why not in one brief sentence at most.
d) In some winter Olympic events, such as the snowboard parallel giant slalom, the
winner is the rider with the best combined time over two runs. In some summer Olympic
events, such as the javelin throw, the winner is athlete with the best single distance out of
four tries. Generally speaking, does the sum of two random times or the maximum of four
random distances have greater variability?
__ A. Sum of two random times
__ B. Maximum of four random distances
__ C. Cannot say because time and distance are unrelated variables
Why? Explain in one sentence maximum.
54
b) A quality control manager initially plans to take a random sample of size n from the
production line. If he were to double his sample size to 2n, the standard deviation of the
sampling distribution of the sample mean would be multiplied by:
__ A. 1/2
__ B. 1/
__ C.
__ D. 2
c) The quality control manager plans to take a random sample of size n from the
production line. How big should n be so that the sampling distribution of has standard
deviation 0.3 grams?
__ A. 10
__ B. 100
__ C. 1000
__ D. Cannot be determined unless we know that the population is normal.
d) If the quality control manager takes a random sample of nine chocolate bars from the
production line, what is the probability that the sample mean weight of the nine sample
chocolate bars will be less than 240 grams?
__ A. 0
__ B. 0.0013
__ C. 0.1587
__ D. 0.9987
Show your work:
55
d) How large a sample n would you need to estimate p with margin of error 0.01 with
95% confidence? Use the guess = 0.6 as the value for p.
__ A. 6768
__ B. 9220
__ C. 9502
__ D. 9596
56
b) Give the formula for the appropriate test statistic and compute its value.
d) From the P-value associated with this test statistic, which of the following is correct?
__ A. Do not reject H0 at the 10% significance level
__ B. Reject H0 at the 10% significance but not at the 5% significance level
__ C. Reject H0 at the 5% significance level but not at the1% significance level
__ D. Reject H0 at the 1% significance level
e) Using the 5% significance level, state your conclusion in one clearly worded sentence
that Canada Post management can understand.
f) The 95% CI for the mean time the population of postal employees have spent with the
postal service is closest to:
__ A. 7.0 0.2
__ B. 7.0 0.4
__ C. 7.0 2.0
__ D. 7.0 4.0
Bonus Question: Just for Fun and Bragging Rights
Over the 17 days of the Winter Olympics you saw the Olympic rings logo countless
times. In the official logo, not the single-colour Vancouver 2010 version, each of the five
rings is a different colour. How well do you remember the order of the colours in the
rings? Write the colours in the blanks as indicated.
________
Ring 1
________
Ring 2
________
Ring 4
________
Ring 3
________
Ring 5
(iii) C
58
MT2010: Answer 3
a) Yes: Percent of males admitted = 35/80 = 0.4375 or 43.75%
Percent of Females admitted = 20/60 = 0.33 or 33%
b) No: Half of engineers of either sex are admitted. One-quarter of English students of
either sex are admitted.
c) The English program is harder to get into, and that is where more females applied. This
is an illustration of Simpson's Paradox.
d)
Engineering
English
Row Total
40
15
55
Admitted
40
45
95
Not Admitted
80
60
140
Column Total
Admitted to Engineering:
40/80 = 0.50 or 50%
Admitted to English:
15/60 = 0.25 or 25%
Details and Comments:
When a two-way table is provided, it is useful to add the row totals and the column totals.
They are needed to compute conditional probabilities. Simpson's Paradox is one of the
most revealing illustrations of the need to dig deeper into the relationship between
categorical variables. What might appear to be the result for a two-way table may well be
reversed when a third variable is incorporated.
MT2010: Answer 4
a)
20
# 15
E 10
y 5
e 0
0
s
10
20
Weight (kg)
59
MT2010: Answer 5
a) A b) A
c)
= 0.8691(286.6104/196.4466) = 1.268
60
MT2010: Answer 7
a) 68%
b) B
c) B
d) B
Details for d): Pr ( < 240) = Pr (z < [240243] / [3/
e) A
= -2.5
61
62
Cant tell
Female
Same size
b) Given the information provided, which of the following is most likely the mean age of
the female team? (Circle the correct response)
21
22
23
30
c) For each of the three measures below, fill in the numerical value in the blank provided
and then decide if each is a measure of shape, centre, spread, or none of these (circle one
choice for each measure):
Value:
Is a measure of:
Interquartile range
(for males):
_________
Shape
Centre
Spread
None
50th percentile
(for females)
_________
Shape
Centre
Spread
None
_________
Shape
Centre
Spread
None
63
f) The mean male age is 22.5 years. One of the members of the male team is 22 years old
and has a z-score of -0.25. What is the standard deviation of male ages?
g) If we assume that male ages are normally distributed, what proportion of males on the
team are 22 years of age or younger?
h) Which of the following is the best justification for the assumption of normality made
in part g)? (Check the best response)
__ A. The Law of Large Numbers
__ B. The Central Limit Theorem
__ C. Least squares regression
__ D. None of the above
i) Team members are required to take a course in the history of underwater basketweaving. The professor records the values of several variables for each student. These
variables are listed below. For each one, decide whether it has been recorded as
quantitative or categorical.
Score on the final exam (out of 200 points)
Quantitative
Categorical
Quantitative
Categorical
Quantitative
Categorical
Quantitative
Categorical
64
Question A2 (MT2008Q1) There are two kinds of data -- good and bad!
a) Here is a small part of the data set in which CyberStat Corporation records information
about its employees.
Employee # Surname Age Gender Salary
Job Type
11234
Smith
39 Female
$62,100 Management
23467
Jones
27 Male
$47,350 Technical
98543
Chan
22 Female
$25,250 Clerical
76548
Wong
48 Male
$77,600 Management
Circle the names of the variables below which are recorded as quantitative scale variables
in the data set above.
Employee #
Surname
Age
Gender
Salary
Job Type
b) Three small Statistics classes all took the same test. Histograms of the scores for each
class are shown below.
(i)
(ii)
(iii)
(iv)
1
1
1
1
2
2
2
2
3
3
3
3
c) For each of these variables, decide whether its distribution is more likely symmetric or
skewed right (i.e. long right-hand tail) or skewed left (i.e. long left-hand tail). Circle one
choice for each variable.
Individual incomes in the United States
Symmetric
Skewed right
Skewed left
Symmetric
Skewed right
Skewed left
Symmetric
Skewed right
Skewed left
Symmetric
Skewed right
Skewed left
65
Mean
Transformed
Data (X*)
5.6
Median
Range
Q1
Q3
IQR
Std dev
11.7
c) Are there any outliers? Use the inner fences definition of outliers and the
original data (not the transformed data) to identify any outliers.
Lower inner fence = ___________________
Upper inner fence = ___________________
Observation numbers of outliers = _______
Obs# Margin
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
66
-22
-21
-15
-10
-9
-7
-7
-7
-6
-5
-4
-3
-3
-3
-2
-2
1
2
2
4
4
4
5
5
5
6
7
7
8
8
8
9
10
10
10
11
11
11
11
11
14
15
15
16
19
19
20
20
22
24
31
33
Question A4 (MT2007Q1) Data, data, data! I cant make bricks without clay!
Sherlock Holmes
a) A sample of shoppers at a mall was asked the following questions. Decide whether the
type of data are more likely to be quantitative or categorical. (Circle your choice)
What is your age (in years)?
Categorical
Quantitative
Categorical
Quantitative
Categorical
Quantitative
Categorical
Quantitative
b) Here is a table of sources of electricity in Canada and the US and the percentage of
electricity generated by each. Construct a bar graph to compare Canada and the US.
Do NOT use separate sets of axes for each graph.
Source
Hydropower
Coal
Nuclear
Natural Gas
Other
Canada
65
16
16
1
2
US
6
51
21
16
6
c) A news article reports that, Of the 411 players on National Basketball Association
rosters in February 1998, only 139 made more than the league _______ salary of $2.36
million. Which word should go in the blank, mean or median? That is, is $2.36 million
the mean or median salary for NBA players? Explain why, in one sentence only.
d) A study was made of the age of entering first-year university students. Which of the
following is most likely to be the standard deviation? Explain why, in one sentence only.
__A. 1 month
__B. 1 year
__C. 5 years
67
Number of States
e) The following histogram displays the December 2000 percentage unemployment rates
in the 50 U.S. states and Puerto Rico. The labels on the horizontal axis should be
interpreted as follows: the bar labelled 1 represents rates of 1.0% to 1.9%, the bar
labelled 2 represents rates of 2.0% to 2.9%, etc.
24
22
20
18
16
14
12
10
8
6
4
2
0
1
Unemployment Rate
(i) What percentage of the rates (out of a total of 51 observations) is 5.0% or greater?
(ii) Estimate the median unemployment rate.
f) You have decided to sell your home. The market is booming now with the 2010
Olympic Games preparations, and therefore most sellers of houses with similar
characteristics have received extremely good deals in the past few months. You ask the
realtor for a summary of net prices of homes sold in your neighborhood. The realtor
hands you the following two density curves, one of them of the prices of homes sold in
the past few months in your neighborhood, and the other of the prices of homes sold
during a deep economic recession.
Curve A
Curve B
(i) Under the given assumptions, which of the two curves better represents the
distribution of prices of homes sold in the past few months? Circle your answer choice.
Curve A
Curve B
(ii) A potential buyer offers to give you the mean, the median or the mode of the prices of
all the homes sold in the past few months in your neighborhood. Assuming that the
density curve is the one you chose in (i) directly above, which numerical measure would
you prefer? Circle your answer choice.
If you chose Curve A:
OR:
If you chose Curve B:
Mean
Median
Mode
Mean
Median
Mode
(iii) You are told that the mean price of 50 houses sold is $700,000. However, you notice
that there was a mistake in the calculation, and that one of the buyers paid $500,000
instead of the $800,000 that was used when making this calculation. What is the actual
mean price of the 50 houses sold?
68
b) Are there are any outliers, as defined by the 1.5 IQR rule (a.k.a. inner fences)?
Explain.
No
Yes
____
Median
1. Will be multiplied by
____
IQR
____
Range
____
Standard Deviation
69
13
19
14
20
16
20
17
21
17
21
18
22
18
24
19
25
19
25
Q1 = ___
Median = ___
Q3 = ___
Max = ___
10
8
6
4
2
0
Count
Count
(a)
5
5.
-2
.5
23
5
3.
-2
.5
21
5
1.
-2
.5
19
5
9.
-1
.5
17
5
7.
-1
.5
15
5
5.
-1
.5
13
5
3.
-1
.5
11
.5
26
.523
.5
23
.520
.5
20
.517
.5
17
.514
.5
14
.511
7
6
5
4
3
2
1
0
d) Which of these two histograms describes the dataset given at the start of the question?
__A. Histogram (a)
__B. Histogram (b)
__C. Both of them
__D. Neither one of them
e) Assuming that a workday in the call centre is 8 hours long and the workers are on the
phone 60% of the time, what is the mean length of a call?
__A. About 15 minutes
__B. About 19 minutes
__C. About 25 minutes
__D. Cannot tell, since the dataset does not contain data about individual calls
70
71
c) Incomes are skewed right because fewer people have very large incomes, more people
have incomes at the lower end or middle.
Age of heart attack victims is skewed left because heart attacks are much more likely in
older people.
Lifetimes of bulbs are skewed right because most bulbs last the amount of time they are
engineered to last but some will last much longer; that is, quality is designed in. Only a
few will fail early. Lifetimes in general are skewed right.
Answer to Question A3 (MT2008Q2)
a) and b)
Original
Data (X)
Transformed
Data (X*)
Mean
5.6
8.2
Median
6.5
10
Range
55
110
Q1
-3
-9
Q3
11
19
IQR
14
28
Std dev
11.7
23.4
c) Lower inner fence = -3 1.5(14) = -24
Upper inner fence = 11 + 1.5(14) = 32
Observation numbers of outliers = 52
Details and Comments:
Note that the question asked for the observation number(s), not the margin!
For part b): Suppose the data are transformed (linearly) as follows X* = a + bX; that is,
multiply the original observations by b and then add a. That shifts all the values of X
up or down by the amount a and changes the size of the unit of measurement by b.
Mean(X*) = a + bMean(X);
Median (X*) = a + bMedian(X);
Range(X*) = bRange(X); [the effect of a is cancelled]
Q1(X*) = a + bQ1(X);
Q3(X*) = a + bQ3(X);
IQR(X*) = bIQR(X); [the effect of a is cancelled]
SD(X*) = bSD(X); [the effect of a is cancelled]
72
Quantitative
Quantitative
Categorical
Categorical
b)
OR
c) Of the 411 players on National Basketball Association rosters in February 1998, only
139 made more than the league MEAN salary of $2.36 million. If it were the median,
then half of the 411 players (i.e. 205 or 206) would exceed the value.
d) 1 year is the typical difference in age between entering first-year university students.
e) (i) 5/51 = 0.098, so 9.8%. It is also acceptable to round to 10%.
(ii) The median is in the 3.0-3.9 interval, so the median is best estimated as the midpoint
of that interval at 3.5%.
Comment: It is also acceptable to give the range 3.0-3.9. It is not acceptable to estimate
the median as 3.0%.
f) (i) Curve B
(ii) If you chose Curve A: Mean
If you chose Curve B: Mode
Note: The two choices offered in part (ii) are to give you a chance to get the correct
answer to part (ii) even if you made the wrong choice in part (i).
(iii) [(50700,000) 300,000]/50 = $694,000
Comment: Use the formula for mean and adjust accordingly.
73
13
19
14
20
16
20
17
21
17
21
18
22
18
24
19
25
19
25
74
Decrease
g) A Canadian member of the research team measured the speed of the cars in kilometres
per hour (1 km 0.62 miles) and the traffic density in cars per kilometre. What is the
value of his calculated correlation between speed and traffic density?
h) Does this study demonstrate that traffic density is a causal factor in explaining the
average speed of the traffic on the thoroughfare?
i) Suppose another researcher got confused about which was the response variable and
which was the explanatory variable and computed a linear regression model to predict
Cars from Speed. What would the slope of this line be? Report a maximum of two
decimal places.
75
Describe the shape, direction and strength of this relationship by circling the best choice.
Shape:
Linear
Curved
No Pattern
Direction:
Positive
Negative
Neither
Strength:
Very strong
Fairly strong
Quite weak
b) The correlation between heart disease death rates and national wine consumption is
r 0.843 . What does a negative correlation say about wine consumption and heart
disease deaths? Answer this question by circling the appropriate italicized words below:
High wine consumption goes with (more / fewer) heart disease deaths,
while low consumption goes with (more / fewer) deaths.
c) Do you think these data give good evidence that drinking wine causes a reduction in
heart disease deaths? Explain why, in one sentence only.
d) About what percent of the variation among countries in heart disease rates is explained
by the straight-line relationship with wine consumption? Report the percentage to the
nearest whole number.
76
e) The least squares regression line for predicting heart disease death rate from wine
consumption is:
Use this equation to predict the heart disease rate in another country where adults average
4 litres of alcohol from drinking wine each year.
f) What is the predicted heart disease rate for a country that drinks enough wine to supply
150 litres of alcohol per person? Can this result be true? Explain why using the leastsquares regression line for this prediction is not justified.
g) Which of the three figures below corresponds to the plot of least-squares residuals
versus national wine consumption? Hint: Look at the vertical axis.
Graph (a)
Graph (b)
Graph (c)
77
If the least-squares regression line fits the data poorly, the residual
plot will exhibit a systematic pattern.
Yes
No
d) Which set of two variables is most likely to have a cause and effect relationship?
__ A. The height of a person and their corresponding weight
__ B. The weight of a box and the postage required to ship the box to Toronto
__ C. The make of a car and the fuel efficiency (miles per gallon) of the car
__ D. The age of a teacher and their corresponding yearly income
78
c) For each of the eight sections of last years C291 course, the average midterm exam
mark and the average final exam mark were calculated. The correlation for the eight pairs
of averages was +0.97. Does this mean that the relationship between a students midterm
and final exam marks scores for all students in the eight classes is almost exactly a
straight line? Explain, in one sentence only!
__Yes
__No
d) A study is made of people who stutter. Each subject is asked to read two passages of
equal length, and the number of times they stutter while reading each passage is recorded.
The researchers discover that the subjects who stuttered many times on the first passage
tended to stutter fewer times on the second passage. They conclude that the subjects who
stuttered many times on the first passage must have been nervous the first time and more
relaxed the second time, so that they tended to stutter less. Do you agree? Explain, in one
sentence only.
__Yes
__No
e) Studies show that in the period from 1850 to 1900 in the United States, the average
marriage lasted only 12 years. Does this show that the divorce rate was high in that
period? Explain, in one sentence only!
__Yes
__No
79
Mean = 50
Mean = 140
SD = 20
SD = 40
She computed the least squares regression equation and found that for a hospital with 100
FTEs, the estimated number of open beds was 32.
a) Use this information to compute the value of the correlation coefficient.
c) From the available data, what would you predict the number of open beds to be for a
hospital with an unknown number of FTEs?
d) What fraction of the variation in number of open beds is explained by the number of
FTEs?
e) Another expert consultant, this one in hospital administration, claims that the
regression was done the wrong way around, and that the number of FTEs required in a
hospital should be estimated from the number of open beds in the hospital. What would
the value of the correlation coefficient be if the analysis were done this way?
80
Innsbruck
1976
Lake
Placid
1980
Sarajevo
1984
Calgary
1988
Albertville
1992
Lillehammer
1994
Nagano
1998
13
37
40
49
86
Here is a scatterplot of the number of team medals Statland has won since Innsbruck
versus the corresponding individual medals.
a) What is the correlation, r, between these
two variables?
__A.
0.98
__B.
0.98
__C.
0.58
__D.
0.58
Medals: Individuals
100
80
60
40
20
0
0
10
15
Medals: Team
b) How would r change if Statland had won exactly 20 more individual medals in each of
these eight Winter Olympics games?
__A.
r would be 20/8=2.5 times larger
__B.
r would be 20/8=2.5 times smaller
__C.
r would be the same
__D.
r would increase, but I am not sure how much larger
c) Which word best describes the type of relationship between team and individual
medals?
__A.
Causation
__B.
Correlation
__C.
Confounding
__D.
Some other word starting with C
81
The number of medals Statland won in Winter Olympics has grown dramatically over the
30 years. Here is the total number of medals Statland has won in each of the past eight
Winter Olympics:
Site
Innsbruck
Year
Medals
1976
3
Lake
Placid
1980
2
Sarajevo
Calgary
Albertville
Lillehammer
Nagano
1984
4
1988
8
1992
42
1994
45
1998
57
Salt Lake
City
2002
99
Mean
SD
1989.25
32.5
8.9
34.8
Medals
100
80
60
40
20
0
1976
1980
1984
1988
1992
1996
2000
2004
Year
82
83
= (-0.945)(27/10) = -2.55
85
86
87
b) If I were willing to be late for class 3% of the time in the long run, how far ahead of
class time should I leave home?
c) Well, I admit that I am late very occasionally, but I have found over time that there is a
0.2 probability of any student being late to class. Assume that students are independent
from one another with respect to being late. (Actually, this probability was made up for
the purposes of this question; you all are much better than this! ) In a class of 100
students, what is the approximate probability of at least 75% of students being on time?
(Hint: Be careful in setting up the start of the question.)
88
c) What is the probability that the average, ,, of the 100 claims is larger than $1000?
__ A. 0.9200
__ B. 0.8212
__ C. 0.0800
__ D. 0.1788
In order to get full marks for part c), show your work:
d) The Central Limit Theorem justifies some of the calculations above. What does the
Central Limit Theorem say? Complete the following sentence by selecting the most
appropriate phrase from the choices.
When a sample of size n is to be drawn from any population with mean mu and standard
deviation sigma, then when n is sufficiently large
__ A. the standard deviation of the sample mean is
__ B. the distribution of the population is exactly normal
__ C. the distribution of the population is approximately normal
__ D. the distribution of the sample mean is exactly normal
__ E. the distribution of the sample mean is approximately normal
89
b) Instead of using the numbers you found in part a), assume that the mean haircut price
is 26.50 and the standard deviation is 5. Suppose a visiting tourist from Paris is so
desperate for a haircut that he walks into the first barber shop he sees and sits right down
in the chair. What is the probability that his haircut will be 25 or less?
c) Two twin sisters are registered in different MBA programs in the United States. The
sister registered at Harvard got 87% on her final comprehensive exam. The average mark
on that exam was 73% and the standard deviation was 7%. The sister registered at
Stanford got 84% on her final comprehensive exam. Its mean was also 73% but its
standard deviation was 5%. Assume exam marks are normally distributed. Which twins
result ranked higher within her own class? What are their respective percentiles?
Percentile of Harvard sister = ______
Percentile of Stanford sister = ______
Which twin ranked higher within her own class?
__ Harvard sister
__ Stanford sister
90
b) Assume that the mean of T is 70 and the standard deviation of T is 8. (Those arent
actually the correct answers for part a), but will allow you to do part b) independently .)
Let S be the time required to produce 9 widgets. What is the probability that S will exceed
660 seconds? Hint: Define S in terms of T from part a).
Answer: _____
Show your work:
91
________
________
and .
Mean of :
SD of :
________
________
(ii) What is the approximate shape of the distribution of , the difference between the
mean TV-watching times in the two samples? Give a reason why.
(iii) What is the probability that is greater than 30 minutes? Report your answer with
no more than two decimal digits. Hint: You will need the mean and standard deviation of
to solve this. You can easily compute the mean; well give you the standard
deviation, which is 6.88. You can trust us!
b) The Grocery Manufacturers of Canada reported that 72% of consumers read the
ingredients listed on a products label. Assume the population proportion is p = 0.72 and
a sample of 250 consumers is selected from the population. What is the approximate
probability that the percentage of consumers in the sample who read the ingredients on a
label will exceed 76%?
92
b) Now suppose that the final exam scores are normally distributed, also with a mean of
60 and a standard deviation of 20. The instructor wishes to give 20% As, 30% Bs and
the rest C or lower.
(i) What final score should be the cutoff between A and B?
(ii) What final score should be the cutoff between B and C? (This ones easy!)
c) A company has two manufacturing plants, one that uses low-tech machines and
another that uses high-tech machines. From recent history, the number of defects per
week observed at each plant is normally distributed with the following parameters.
Low-tech:
Mean = 15
SD = 3
High-tech:
Mean = 10
SD = 1
Last week, the low-tech plant produced ten defects, while the high-tech plant produced
eight defects. Which plant performed better relative to past performance? Explain why.
___ Low-tech
___ High-tech
d) Refer to part c). The two plants work independently. Compute the mean and standard
deviation of the total number of defects from the two plants.
Mean = ___________
SD
= ____________
93
b) Refer to part a). A survey of a random sample of 1200 undergraduate business students
indicates that there are 336 students who plan to major in accounting. What does this tell
you about the professors claim?
c) The restaurant in a large commercial building provides coffee for the buildings
occupants. The restaurateur has determined that the mean number of cups of coffee
consumed in a day by each occupant is 2.0 with a standard deviation of 0.6. A new tenant
of the building intends to have a total of 125 new employees. What is the probability that
the new employees will consume more than 240 cups per day? (Hint: You can answer
this using either of two different but related methods.)
94
Winner is _________
d) Mens 4x100m freestyle relay: The Dutch relay team has four swimmers, each of
whose past times are normally distributed as follows: N(49.5,2.5), N(50,3.0), N(51.5,5),
and N(53,2.5). What are the mean and standard deviation of the total time needed by the
four swimmers to complete the 400m swim? Assume racers times are independent.
Mean = _____
SD = _____
e) Refer to part d): What is the chance that the Dutch team will complete the swim in
under 200 seconds (the time needed to qualify)? (Report only 2 decimal places!)
95
96
97
98
99
100
Find bounds on the P-value (Assume the test statistic = 1.5 rather than the answer you
got immediately above.)
State your conclusion (Base it on the test statistic = 1.5 and the P-value you found
immediately above. Reminder: Dont just say Accept H0 or Reject H0.)
b) Construct a 95% confidence interval for the mean weight of the population of all pucks
made by Thunderbird.
101
d) State a conclusion about the managers claim, in one complete sentence. Dont just say
Accept H0 or Reject H0.
e) Assuming the same sample mean of 19 years and standard deviation of 4 years, what is
the smallest sample size that would still reject the null hypothesis at the 5% significance
level? Hint. Find the value of the test statistic that would give a P-value as close as
possible to 0.05.
102
b) Answer each of the following with: Yes, No, or Cant Tell. Circle your choices.
Does the sample mean lie in
the 95% confidence interval?
Yes
No
Cant Tell
Yes
No
Cant Tell
Yes
No
Cant Tell
Yes
No
Cant Tell
c) A radio talk show invites listeners to enter a dispute about a proposed pay increase for
city council members. What yearly pay do you think council members should get? Call
us with your number. In all, 958 people call. The station calculates the 95% confidence
interval for the mean pay, , that all citizens would propose for council members to be
$9669 to $9811. Is this result trustworthy? Explain your answer.
103
Question D4 (MT2007Q8) I owe, I owe, its off to work I go (go ahead & sing it)
The National Association of Independent Colleges and Universities took a random
sample of 64 college graduates and found that their average debt upon graduation was
$12,000, with a standard deviation of debt upon graduation of $1800.
a) Construct a 95% confidence interval for the mean debt of all college graduates.
b) True or False: The confidence interval you obtained in part a) means that
approximately 95% of sample averages obtained from repeated random samples of 64
college graduates will fall in that interval. (No explanation is necessary.)
___True
___False
c) Calculate the sample size required to have a 99% confidence interval with the same
margin of error as that found in part a).
d) A college president says that the sample of 64 graduates in part a) resulted in an
overestimate and that the actual mean debt is $11,500. Test whether the actual mean debt
exceeds $11,500 by forming the appropriate hypotheses, obtaining a P-value, and
interpreting it.
e) Decide whether each statement is more likely to be True or False:
(i) The larger the sample size, the more likely you will get
statistical significance using a t-test (assuming the sample
mean does not change).
True
False
True
False
(iii) As the P-value gets smaller, the evidence against the null
hypothesis gets stronger.
True
False
True
False
(v) The smaller the P-value, the less likely the null hypothesis
is true.
True
False
104
105
= 1.53
Degrees of freedom = 49
Two-tail P-value (from Table T) is between 0.10 and 0.20.
There is not enough evidence to conclude a difference from the target of 170 g.
b)
= -2.5
c) P-value = Pr(t99 < -2.5): From Table T, this is between 0.005 and 0.01.
d) There is enough evidence to support the managers claim that the average tenure is less
than 20 years.
e) From Table T, the critical value of t100 corresponding to a one-tail probability of 0.05 is
1.660. We are working with the left-hand tail, so t = -1.660 =
Hence
106
= 2.22
P-value = Pr (t63 > 2.22): from Table T, this probability is between 0.01 and 0.025.
Conclusion: Reject the null hypothesis. There is evidence that mean debt is greater than
$11,500. The college presidents claim is rejected.
d) True, True, True, False, False
Details and Comments:
d) (i) Examine the formula for the z-test statistic; with n in the denominator of the
denominator, increasing it will increase the value of the test statistic.
(ii) The level of significance (i.e. alpha) is chosen before the P-value is calculated and
does not enter into the calculation of P-value.
(iii) This is precisely the interpretation of P-value.
(iv) and (v) The P-value assumes the null hypothesis is true so it cant be a statement
about the chance that the null hypothesis is true. It is a statement about the data and the
consistency of the data with the null hypothesis.
107
= -2.1
c) Use Table T to find the area (i.e. probability) to the left of -2.1, with 35 degrees of
freedom. Remember to look up the one-tail probability.
d) Note that it is not sufficient simply to say, Reject H0.
108
SECTION E: MISCELLANEOUS
Question E1 (MT2009Q5) Drinking and de-riving a sample
A study of the number of years that employees work for food-and-drink businesses in the
Lower Mainland was based on a sample from the telephone directorys Yellow Pages
listings of food-and-drink businesses in the Lower Mainland. The sample was drawn as
follows. The study investigator first drew a simple random sample of four municipalities
in the Lower Mainland. Then within each selected municipality, he randomly sampled 50
businesses. For various reasons, the study got no response from 40% of the 200
businesses chosen. Interviews were completed with 120 businesses that responded. Each
of the 120 businesses was asked for the typical number of years that an employee stayed
with the business.
a) The population of interest to the investigator is:
__ A. all food-and-drink businesses in the Lower Mainland that are listed under
the telephone directorys Yellow Pages
__ B. all food-and-drink businesses in the Lower Mainland
__ C. the 200 businesses that were chosen by the investigator
__ D. the 120 businesses that responded
b) What is the relevant statistic here?
__ A. The mean years of employment of all food-and-drink businesses in the
Lower Mainland listed under the telephone directorys Yellow Pages
__ B. The mean years of employment of all food-and-drink businesses in the
Lower Mainland
__ C. The mean years of employment of the 200 businesses that were chosen by
the investigator
__ D. The mean years of employment of the 120 businesses that responded
c) The sampling scheme that the investigator used in choosing the 200 businesses is:
__ A. Simple random sampling
__ B. Stratified random sampling
__ C. Multistage sampling (also known as Cluster sampling)
__ D. Systematic sampling
d) The main source of bias in this study is due to the fact that:
__ A. only four municipalities were sampled
__ B. only 50 businesses in each municipality were sampled
__ C. only 120 of the 200 businesses sampled actually responded
__ D. not all food-and-drink businesses are listed in the Yellow Pages
e) This study is an example of
__ A. an experiment
__ B. a double-blind study
__ C. a census
__ D. a survey
109
110
f) In testing hypotheses, which of the following would be strong evidence against the null
hypothesis?
__ A. Using a small level of significance
__ B. Using a large level of significance
__ C. Obtaining data with a small P-value
__ D. Obtaining data with a large P-value
g) In a statistical test of hypotheses, we say the data are statistically significant at level
if
__A. = 0.05
__B. is small
__C. the P-value is less than
__D. the P-value is larger than
h) An engineer designs an improved light bulb. The previous design had an average
lifetime of 1200 hours. The new bulb has a lifetime of 1201 hours, using a sample of
2000 bulbs. Although the difference is quite small, it is statistically significant. The
explanation for the statistically significant difference is
__ A. that new designs typically have more variability than standard designs
__ B. that the sample size is very large
__ C. that the mean of 1200 is large
__ D. all of the above
111
SD = ______
b) The probability that a shooter hits the target 48 or more times is closest to:
__A. 0.95
__B. 0.26
__C. 0.74
__D. 0.05
Bobsleigh racing was developed in the 19th century by the Swiss in search of the ultimate
thrill. Race times are normally distributed with mean 53 seconds and standard deviation 3
seconds. In bobsleigh events, racers complete four runs.
c) A sample of four runs at a particular event gave a mean of 51 seconds, with the same
standard deviation of 3 seconds (as expected). Compute a 90% confidence interval for the
true mean run time. (Report 2 decimal places)
d) The observed margin of error for another sample of four runs was 8.76. What level of
confidence was chosen to compute that confidence interval?
__A. 90%
__B. 95%
__C. 99%
__D. None of the above
112
Question E4 (MT2006Q8)
a) The Environmental Protection Agency records data on the fuel economy of many
different makes of cars. Some of the variables collected are listed below. Identify each
variable as categorical or measurement (i.e. quantitative). (Circle your choice)
Manufacturer (GM, Ford, Toyota, etc.)
Gas mileage (miles per gallon)
Weight (in pounds)
Size (small, medium, full-size, truck, etc.)
Categorical
Categorical
Categorical
Categorical
Measurement
Measurement
Measurement
Measurement
b) A study of the caloric content of hot dogs was undertaken. As part of the study, the
number of calories in 20 brands of beef hot dogs were recorded and the five-number
summary computed as follows: Min = 110, Max = 190, Median = 152.5, Quartiles = 140,
180. The researchers did not provide the standard deviation. However, previous work has
shown that calorie count is approximately normally distributed. Which of the following is
the most reasonable estimate of the standard deviation?
__A. 10
__B. 20
__C. 40
__D. 80
c) A television station is interested in predicting whether or not voters in its listening area
are watching their coverage of the Winter Olympics. It asks its viewers to phone in and
report whether or not they have watched at least one hour of Olympic coverage in the
first week of the Games. Of the 1242 viewers who phoned in, 512 (41.22%) said Yes.
The number 41.22% is a:
__A. statistic
__B. parameter
__C. sample
__D. population
d) Refer to part e), immediately above. Choose the best statement from the following.
__A. The results are valid because the sample size is very large
__B. The results are valid because people who are undecided do not phone in
__C. The results are not valid because the response is voluntary
__D. The results are not valid because the question is poorly worded
113
Bonus Question 2:
Suppose a random variable X takes only two possible values: and +, each with
probability 0.5. What are the mean and standard deviation of X?
__A. Mean = , SD =
__B. Mean = 0, SD = 1
__C. Mean = , SD = 2
__D. Mean = 0, SD =
__E. You cant compute mean and SD without actual data.
114
g) C
h) B
i) D
115
116