Sunteți pe pagina 1din 6

Chapter 5: 2, 7, 8, 11, 13, 17, 20, 21, 25, 26, 27, 29, 30, 35, 36, 37, 38,

39, and 40.



2) Find an article in a newspaper, magazine, or the Internet that shows a time plot.

A. Does the article discuss the Ws?
B. Is the timeplot appropriate for the data? Explain.
C. Discuss what the time plot reveals about the variable.
D. Does the article accurately describe and interpret the data? Explain.

7) Crowd Management Strategies monitors accidents at rock concerts. In their database, they
list the names and other variables of victims whose deaths were attributed to crowd crush at
rock concerts. Here are the histogram and boxplot of the victims ages for data from 1999 to
2000:

A. What features of the distribution can you see in both the histogram and the boxplot?
a. Mostly symmetric, slightly skewed to right, two outliers at 36 & 48.
B. What features of the distribution can you see in the histogram that you could not see in
the boxplot?
a. Slight increase in histogram but not in boxplot. Looks to be second mode.
C. What summary statistic would you choose to summarize the center of this distribution?
Why?
a. Median because of the skew and outliers.
D. What summary statistic would you choose to summarize the spread of this distribution?
Why?
a. IQR because of skew and outliers.

8) The Mens Combined skiing event consists of a downhill and a slalom. Two displays of the
slalom times in the Mens Combined at the 2006 Winter Olympics are shown below.

A. What features of the distribution can you see in both the histogram and the boxplot?
a. They both show an average time of a slalom is around 92 seconds and that the
range of it is more than 20 seconds. Both show that its skewed to the right.
B. What features of the distribution can you see in the histogram that you could not see in
the boxplot?
a. A mode of 92 seconds was in the histogram. In the box plot, it shows two
possible outliers that the histogram does not.
C. What summary statistic would you choose to summarize the center of this distribution?
Why?
a. The slalom distribution times are skewed and contains outliers, the median is the
better summary of the center.
D. What summary statistic would you choose to summarize the spread of this distribution?
Why?
a. IQR to the standard deviation as a measure.

11) Here is a back-to-back stem-and- leaf display that shows two data sets at onceone
going to the left, one to the right. The display compares the percent change in population for two
regions of the United States (based on census figures for 1990 and 2000). The fastest growing
states were Nevada at 66% and Arizona at 40%. To show the distributions better, this display
breaks each stem into two lines, putting leaves 04 on one stem and leaves 59 on the other.

A. Use the data displayed in the stem-and-leaf display to construct comparative boxplots.
a. (separate sheet of paper)
B. Write a few sentences describing the difference in growth rates for the two regions of the
United States.
a. The northeast and midwest are clustered. The southwest states are bimodal with
modes around 14 and 22. All modes have symmetric distributions.

13) The U.S. National Center for Health Statistics compiles data on the length of stay by
patients in short-term hospitals and publishes its findings in Vital and Health Statistics. Data
from a sample of 39 male patients and 35 female patients on length of stay (in days) are
displayed in the histograms below.

A. What would you suggest be changed about these histograms to make them easier to
compare?
a. Should be placed all in one scale.
B. Describe these distributions by writing a few sentences comparing the duration of
hospitalizations for men and women.
a. Men have a mode of 1 day and women have mode of 5 days. Mens and womens
vary a large amount.
C. Can you suggest a reason for the peak in womens length of stay?
a. Childbirth could be reason for the drop.

17) In 1975, did men and women marry at the same age? Here are boxplots of the age at first
marriage for a sample of U.S. citizens then. Write a brief report discussing what these data
show.
A. Distribution are similar in shape and spread but women seem to marry younger than
men.

20) Ozone levels (in parts per billion, ppwere recorded at sites in New Jersey monthly between
1926 and 1971. Here are boxplots of the data for each month (over the 46 years), lined up in
order (January = 1):

A. In what month was the highest ozone level ever recorded?
a. April, 440
B. Which month has the largest IQR?
a. February, 50
C. Which month has the smallest range?
a. August, 50
D. Write a brief comparison of the ozone levels in January and June.
a. January had a lower median ozone in June (340, 350)
E. Write a report on the annual patterns you see in the ozone levels.
a. The ozone levels were highest in the spring and were lowest in the fall. The
ozone levels were consistent in the summer and were the most variable in the
winter.

21) Three Statistics classes all took the same test. Histograms and boxplots of the scores for
each class are shown below. Match each class with the corresponding boxplot
A. A= 1, B= 2, C= 3

25) A student study of the effects of caffeine asked volunteers to take a memory test 2 hours
after drinking soda. Some drank caffeine-free cola, some drank regular cola (with caffeine), and
others drank a mixture of the two (getting a half-dose of caffeine). Here are the 5-number
summaries for each groups scores (number of items recalled correctly) on the memory test:

A. Describe the Ws for these data.
a. Who: Volunteers
What: Memory test
Where, when: Not specified
How: Students took memory test 2 hours after drinking caffeine-free, half
caffeine, and high caffeine.
Why: Caffeine makes for more alert
B. Name the variables and classify each as categorical or quantitative.
a. Test score: quantitative Drink: Categorical
C. Create side-by-side boxplots to display these results as best you can with this
information.
a. (separate paper)
D. Write a few sentences comparing the performances of the three groups.
a. Both medians were 21 points. The high-caffeine groups were lower than the
other two groups in all the 5 number summary.

26) Here are the summary statistics for Verbal SAT scores for a high school graduating class:

A. Create side-by-side boxplots comparing the scores of boys and girls as best you can
from the information given.
a. (separate paper)
B. Write a brief report on these results. Be sure to discuss the shape, center, and spread of
the scores.
a. Females had higher first and third quartiles and the females in the graduating
class scored higher on the verbal SAT (median 625). The IQR of male scares
were smaller. The over male score were greater than female scores. Both were
skewed left.

27) How fast do horses run? Kentucky Derby winners top 30 miles per hour, as shown in this
graph. The graph shows the percentage of Derby winners that have run slower than each given
speed. Note that few have won running less than 33 miles per hour, but about 86% of the
winning horses have run less than 37 miles per hour. (A cumulative frequency graph like this is
called an ogive.)

A. Estimate the median winning speed.
a. ~36 MPH
B. Estimate the quartiles.
a. Q1= 35MPH Q3= 36MPH
C. Estimate the range and the IQR.
a. Range= ~7MPH from 31-38.
IQR= ~2MPH
D. Create a boxplot of these speeds.
a. (separate paper)
E. Write a few sentences about the speeds of the Kentucky Derby winners.
a. The max was around 38 MPH and the min was about 31 MPH. THe median
winning speed was around 36 MPH

29) A class of fourth graders takes a diagnostic reading test, and the scores are reported by
reading grade level. The 5-number summaries for the 14 boys and 11 girls are shown:

A. Which group had the highest score?
a. Boys
B. Which group had the greater range?
a. Boys
C. Which group had the greater interquartile range?
a. Girls
D. Which groups scores appear to be more skewed? Explain.
a. The boys had more skew and the girls quartiles are the same distance from
median.
E. Which group generally did better on the test? Explain.
a. Girls. Upper quartiles and median were larger.
Lower quartile is lower.
F. If the mean reading level for boys was 4.2 and for girls was 4.6, what is the overall mean
for the class?
(14(4.2) + 11(4.6))/ 25= 4.38
30) In an experiment to determine whether seeding clouds with silver iodide increases rainfall,
52 clouds were randomly assigned to be seeded or not. The amount of rain they generated was
then measured (in acre-feet). Here are the summary statistics:

A. Which of the summary statistics are most appropriate for describing these distributions.
Why?
a. The IQR, median, and quartiles are the most appropriate. The amount of rain has
high outliers and any indication of skewness can be seen in the median and the
mean.
B. Do you see any evidence that seeding clouds may be effective? Explain.
a. Yes, it shows that the seeded clouds had more rain. The median is higher than
the statistics for the unseeded ones. The median for the seeded clouds are 5
times more than the unseeded.

35) Researchers tracked a population of 1,203,646 fruit flies, counting how many died each day
for 171 days. Here are three time plots offering different views of these data. One shows the
number of flies alive on each day, one the number who died that day, and the third the mortality
ratethe fraction of the number alive who died. On the last day studied, the last 2 flies died, for
a mortality rate of 1.0.

A. On approximately what day did the most flies die?
a. Day 18
B. On what day during the first 100 days did the largest proportion of flies die?
a. Day 63
C. When did the number of fruit flies alive stop changing very much from day to day?
a. Around day 52


36) Accidents involving drunk drivers account for about 40% of all deaths on the nations
highways. The table tracks the number of alcohol-related fatalities for 26 years.
(www.alcoholalert.com)

A. Create a stem-and-leaf display or a histogram of these data.
a. (separate paper)
B. Create a timeplot.
a. (separate paper)
C. Using features apparent in the stem-and-leaf display (or histogram) and the timeplot,
write a few sentences about deaths caused by drunk driving.
a. Bimodal, cluster b/t 22 and 25 thousand deaths and an additional cluster b/t 16
and 17 thousand deaths. The number of deaths were high then decreased since
around 1994.

37) Here is a histogram of the assets (in millions of dollars) of 79 companies chosen from the
Forbes list of the nations top corporations:

A. What aspect of this distribution makes it difficult to summarize, or to discuss, center and
spread?
a. The distribution is skewed to the right. Most of the data is on the left side.
B. What would you suggest doing with these data if we want to understand them better?
a. I would suggest to find another way to display the data.

38) Students were asked how many songs they had in their digital music libraries. Heres a
display of the responses:

A. What aspect of this distribution makes it difficult to summarize, or to discuss, center and
spread?
a. Skewed to the right, hard to find center. Number of sons in library is in first bar of
histogram.
B. What would you suggest doing with these data if we want to understand them better?
a. The data should be re-expressed. By using square roots the graph would be
easier to understand and the center could be more easily determined.

39) Here are the same data you saw in Exercise 37 after re-expressions as the square root of
assets and the logarithm of assets:

A. Which re-expression do you prefer? Why?
a. The log makes it a bit more symmetric and the center is around ~ 3.5 in assets.
B. In the square root re-expression, what does the value 50 actually indicate about the
companys assets?
a. around $2500 million
C. In the logarithm re-expression, what does the value 3 actually indicate about the
companys assets?
a. around $1000 million

40) The table lists the amount of rainfall (in acre-feet) from the 26 clouds seeded with silver
iodide discussed in Exercise 30:

A. Why is acre-feet a good way to measure the amount of precipitation produced by cloud
seeding?
a. One acre-foot is around 320,000 gallons and gallons are easier to manage in this
situation.
B. Plot these data, and describe the distribution.
a. (separate paper)
C. Create a re-expression of these data that produces a more advantageous distribution.
a. The distribution is more symmetric and the center of the distribution is around log
2- log 2.5 acre-feet.
D. Explain what your re-expressed scale means.
a. The scale had to be re-expressed with the power of ten to be converted back to
acre feet.

S-ar putea să vă placă și