2) Find an article in a newspaper, magazine, or the Internet that shows a time plot.
A. Does the article discuss the Ws? B. Is the timeplot appropriate for the data? Explain. C. Discuss what the time plot reveals about the variable. D. Does the article accurately describe and interpret the data? Explain.
7) Crowd Management Strategies monitors accidents at rock concerts. In their database, they list the names and other variables of victims whose deaths were attributed to crowd crush at rock concerts. Here are the histogram and boxplot of the victims ages for data from 1999 to 2000:
A. What features of the distribution can you see in both the histogram and the boxplot? a. Mostly symmetric, slightly skewed to right, two outliers at 36 & 48. B. What features of the distribution can you see in the histogram that you could not see in the boxplot? a. Slight increase in histogram but not in boxplot. Looks to be second mode. C. What summary statistic would you choose to summarize the center of this distribution? Why? a. Median because of the skew and outliers. D. What summary statistic would you choose to summarize the spread of this distribution? Why? a. IQR because of skew and outliers.
8) The Mens Combined skiing event consists of a downhill and a slalom. Two displays of the slalom times in the Mens Combined at the 2006 Winter Olympics are shown below.
A. What features of the distribution can you see in both the histogram and the boxplot? a. They both show an average time of a slalom is around 92 seconds and that the range of it is more than 20 seconds. Both show that its skewed to the right. B. What features of the distribution can you see in the histogram that you could not see in the boxplot? a. A mode of 92 seconds was in the histogram. In the box plot, it shows two possible outliers that the histogram does not. C. What summary statistic would you choose to summarize the center of this distribution? Why? a. The slalom distribution times are skewed and contains outliers, the median is the better summary of the center. D. What summary statistic would you choose to summarize the spread of this distribution? Why? a. IQR to the standard deviation as a measure.
11) Here is a back-to-back stem-and- leaf display that shows two data sets at onceone going to the left, one to the right. The display compares the percent change in population for two regions of the United States (based on census figures for 1990 and 2000). The fastest growing states were Nevada at 66% and Arizona at 40%. To show the distributions better, this display breaks each stem into two lines, putting leaves 04 on one stem and leaves 59 on the other.
A. Use the data displayed in the stem-and-leaf display to construct comparative boxplots. a. (separate sheet of paper) B. Write a few sentences describing the difference in growth rates for the two regions of the United States. a. The northeast and midwest are clustered. The southwest states are bimodal with modes around 14 and 22. All modes have symmetric distributions.
13) The U.S. National Center for Health Statistics compiles data on the length of stay by patients in short-term hospitals and publishes its findings in Vital and Health Statistics. Data from a sample of 39 male patients and 35 female patients on length of stay (in days) are displayed in the histograms below.
A. What would you suggest be changed about these histograms to make them easier to compare? a. Should be placed all in one scale. B. Describe these distributions by writing a few sentences comparing the duration of hospitalizations for men and women. a. Men have a mode of 1 day and women have mode of 5 days. Mens and womens vary a large amount. C. Can you suggest a reason for the peak in womens length of stay? a. Childbirth could be reason for the drop.
17) In 1975, did men and women marry at the same age? Here are boxplots of the age at first marriage for a sample of U.S. citizens then. Write a brief report discussing what these data show. A. Distribution are similar in shape and spread but women seem to marry younger than men.
20) Ozone levels (in parts per billion, ppwere recorded at sites in New Jersey monthly between 1926 and 1971. Here are boxplots of the data for each month (over the 46 years), lined up in order (January = 1):
A. In what month was the highest ozone level ever recorded? a. April, 440 B. Which month has the largest IQR? a. February, 50 C. Which month has the smallest range? a. August, 50 D. Write a brief comparison of the ozone levels in January and June. a. January had a lower median ozone in June (340, 350) E. Write a report on the annual patterns you see in the ozone levels. a. The ozone levels were highest in the spring and were lowest in the fall. The ozone levels were consistent in the summer and were the most variable in the winter.
21) Three Statistics classes all took the same test. Histograms and boxplots of the scores for each class are shown below. Match each class with the corresponding boxplot A. A= 1, B= 2, C= 3
25) A student study of the effects of caffeine asked volunteers to take a memory test 2 hours after drinking soda. Some drank caffeine-free cola, some drank regular cola (with caffeine), and others drank a mixture of the two (getting a half-dose of caffeine). Here are the 5-number summaries for each groups scores (number of items recalled correctly) on the memory test:
A. Describe the Ws for these data. a. Who: Volunteers What: Memory test Where, when: Not specified How: Students took memory test 2 hours after drinking caffeine-free, half caffeine, and high caffeine. Why: Caffeine makes for more alert B. Name the variables and classify each as categorical or quantitative. a. Test score: quantitative Drink: Categorical C. Create side-by-side boxplots to display these results as best you can with this information. a. (separate paper) D. Write a few sentences comparing the performances of the three groups. a. Both medians were 21 points. The high-caffeine groups were lower than the other two groups in all the 5 number summary.
26) Here are the summary statistics for Verbal SAT scores for a high school graduating class:
A. Create side-by-side boxplots comparing the scores of boys and girls as best you can from the information given. a. (separate paper) B. Write a brief report on these results. Be sure to discuss the shape, center, and spread of the scores. a. Females had higher first and third quartiles and the females in the graduating class scored higher on the verbal SAT (median 625). The IQR of male scares were smaller. The over male score were greater than female scores. Both were skewed left.
27) How fast do horses run? Kentucky Derby winners top 30 miles per hour, as shown in this graph. The graph shows the percentage of Derby winners that have run slower than each given speed. Note that few have won running less than 33 miles per hour, but about 86% of the winning horses have run less than 37 miles per hour. (A cumulative frequency graph like this is called an ogive.)
A. Estimate the median winning speed. a. ~36 MPH B. Estimate the quartiles. a. Q1= 35MPH Q3= 36MPH C. Estimate the range and the IQR. a. Range= ~7MPH from 31-38. IQR= ~2MPH D. Create a boxplot of these speeds. a. (separate paper) E. Write a few sentences about the speeds of the Kentucky Derby winners. a. The max was around 38 MPH and the min was about 31 MPH. THe median winning speed was around 36 MPH
29) A class of fourth graders takes a diagnostic reading test, and the scores are reported by reading grade level. The 5-number summaries for the 14 boys and 11 girls are shown:
A. Which group had the highest score? a. Boys B. Which group had the greater range? a. Boys C. Which group had the greater interquartile range? a. Girls D. Which groups scores appear to be more skewed? Explain. a. The boys had more skew and the girls quartiles are the same distance from median. E. Which group generally did better on the test? Explain. a. Girls. Upper quartiles and median were larger. Lower quartile is lower. F. If the mean reading level for boys was 4.2 and for girls was 4.6, what is the overall mean for the class? (14(4.2) + 11(4.6))/ 25= 4.38 30) In an experiment to determine whether seeding clouds with silver iodide increases rainfall, 52 clouds were randomly assigned to be seeded or not. The amount of rain they generated was then measured (in acre-feet). Here are the summary statistics:
A. Which of the summary statistics are most appropriate for describing these distributions. Why? a. The IQR, median, and quartiles are the most appropriate. The amount of rain has high outliers and any indication of skewness can be seen in the median and the mean. B. Do you see any evidence that seeding clouds may be effective? Explain. a. Yes, it shows that the seeded clouds had more rain. The median is higher than the statistics for the unseeded ones. The median for the seeded clouds are 5 times more than the unseeded.
35) Researchers tracked a population of 1,203,646 fruit flies, counting how many died each day for 171 days. Here are three time plots offering different views of these data. One shows the number of flies alive on each day, one the number who died that day, and the third the mortality ratethe fraction of the number alive who died. On the last day studied, the last 2 flies died, for a mortality rate of 1.0.
A. On approximately what day did the most flies die? a. Day 18 B. On what day during the first 100 days did the largest proportion of flies die? a. Day 63 C. When did the number of fruit flies alive stop changing very much from day to day? a. Around day 52
36) Accidents involving drunk drivers account for about 40% of all deaths on the nations highways. The table tracks the number of alcohol-related fatalities for 26 years. (www.alcoholalert.com)
A. Create a stem-and-leaf display or a histogram of these data. a. (separate paper) B. Create a timeplot. a. (separate paper) C. Using features apparent in the stem-and-leaf display (or histogram) and the timeplot, write a few sentences about deaths caused by drunk driving. a. Bimodal, cluster b/t 22 and 25 thousand deaths and an additional cluster b/t 16 and 17 thousand deaths. The number of deaths were high then decreased since around 1994.
37) Here is a histogram of the assets (in millions of dollars) of 79 companies chosen from the Forbes list of the nations top corporations:
A. What aspect of this distribution makes it difficult to summarize, or to discuss, center and spread? a. The distribution is skewed to the right. Most of the data is on the left side. B. What would you suggest doing with these data if we want to understand them better? a. I would suggest to find another way to display the data.
38) Students were asked how many songs they had in their digital music libraries. Heres a display of the responses:
A. What aspect of this distribution makes it difficult to summarize, or to discuss, center and spread? a. Skewed to the right, hard to find center. Number of sons in library is in first bar of histogram. B. What would you suggest doing with these data if we want to understand them better? a. The data should be re-expressed. By using square roots the graph would be easier to understand and the center could be more easily determined.
39) Here are the same data you saw in Exercise 37 after re-expressions as the square root of assets and the logarithm of assets:
A. Which re-expression do you prefer? Why? a. The log makes it a bit more symmetric and the center is around ~ 3.5 in assets. B. In the square root re-expression, what does the value 50 actually indicate about the companys assets? a. around $2500 million C. In the logarithm re-expression, what does the value 3 actually indicate about the companys assets? a. around $1000 million
40) The table lists the amount of rainfall (in acre-feet) from the 26 clouds seeded with silver iodide discussed in Exercise 30:
A. Why is acre-feet a good way to measure the amount of precipitation produced by cloud seeding? a. One acre-foot is around 320,000 gallons and gallons are easier to manage in this situation. B. Plot these data, and describe the distribution. a. (separate paper) C. Create a re-expression of these data that produces a more advantageous distribution. a. The distribution is more symmetric and the center of the distribution is around log 2- log 2.5 acre-feet. D. Explain what your re-expressed scale means. a. The scale had to be re-expressed with the power of ten to be converted back to acre feet.