Documente Academic
Documente Profesional
Documente Cultură
Chapter 1, Section 1
Introduction
Chapter 1 Introduction & Data Analysis: Making Sense of
1.1: Analyzing Categorical Data
Data After this section, you should be able to…
Favorite Course Count Favorite Course Percentage Count of Stations Percent of Stations
English 8 English 16% 2500 Adult
Contemporary
Adult Standards
Foreign Language 8% 2000
Foreign Language 4 Contemporary hit
11%
1500 11%
Histroy 11 Histroy 22% 5% 9%
Country
1000 6% 4% News/Talk
8% Rock
16%
Spanish
Other
• What proportion of students that ride the school bus are • What proportion of males have “a good chance” at being
members of two or more clubs? rich?
• What proportion of students that are members of no clubs • What proportion of females have a “50-50 chance” at being
do not ride the school bus? rich?
• What proportion of students that do not ride the school bus • What proportion of young adults that have an “almost
are members of at least one club? certain” chance of being rich are male?
One 0 0 4 4
Two 1 3 12 16
Three 4 7 6 17
Four 7 4 8 19
Five 2 0 3 5
Total 14 14 33 61
Section 1.2
Displaying Quantitative Data with Graphs
1.2: Displaying Quantitative
Data with Graphs After this section, you should be able to…
Dotplots
– Each data value is shown as a dot above its
How to Make a Dotplot
location on a number line. 1. Draw a horizontal axis (a number line) and
label it with the variable name.
Number of Goals Scored Per Game by the 2004 US Women’s Soccer
Team
3 0 2 7 8 2 4 3 5 1 1 4 5 3 1 1 3 2. Scale the axis from the minimum to the
maximum value.
3 3 2 1 2 2 2 4 3 5 6 1 5 5 1 1 5
Shape Definitions:
Symmetric: if the right and left sides of the graph are
approximately mirror images of each other.
Skewed to the right (right-skewed) if the right side of the
graph is much longer than the left side.
Skewed to the left (left-skewed) if the left side of the graph
Dot Plot Dot Plot Dot Plot
is much longer than the right side.
Collection 1 Collection 1 Collection 1
0 2 4 6 8 10 12 70 75 80 85 90 95 100 0 1 2 3 4 5 6 7
DiceRolls Score Siblings
Outliers Center
Definition: Values that differ from the overall pattern We can describe the center by finding a value that
are outliers. divides the observations so that about half take larger
We will learn specific ways to find outliers in a later values and about half take smaller values.
chapter. For now, we can only identify “potential Ways to describe center:
outliers.” • Calculate median (best when distribution is
skewed)
• Calculate mean (best when distribution is
symmetric)
14 16 18 20 22 24 26 28 30 32 34
MPG
Chapter 1, Section 1
Number of States
0 to <5 20
3)Label and scale your axes and draw the histogram. The
5 to <10 13
height of the bar equals its frequency. Adjacent bars should
10 to <15 9
touch, unless a class contains no individuals.
15 to <20 5
20 to <25 2
25 to <30 1
Total 50
Percent of foreign-born residents
Chapter 1, Section 1
Section 1.3
Describing Quantitative Data with Measuring Center: The Mean
Numbers
To find the mean x (pronounced “x-bar”) of a set of
After this section, you should be able to… observations, add their values and divide by the number of
observations. If the n observations are x1, x2, x3, …, xn, their
ü MEASURE center with the mean and median mean is:
( x1 - x ) 2 + ( x2 - x ) 2 + ... + ( xn - x ) 2 1
variance = sx2 =
n -1
=
n -1
å ( xi - x )2 Proof
1
standard deviation = sx =
n -1
å (x i - x ) 2
Chapter 1, Section 1
8 52/(9-1) = 6.5 8 8 - 5 = 3
0 2 4 6 8 10
NumberOfPets 9 Variance = 6.5 9 9 - 5 = 4
Below are dotplots of three different TI-NSpire: Calculate standard deviation and
distributions, A, B, and C. Which one has the mean.
largest standard deviation? Justify your answer.
1. Select “Lists & Spreadsheet” (blue/green
button at bottom of home screen)
2. Type the values into list1.
Standard
Deviation
Find and Interpret the IQR… Find and Interpret the IQR…
Travel times to work for 20 randomly selected New Yorkers Travel times to work for 20 randomly selected New Yorkers
10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45 10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45
5 10 10 15 15 15 15 20 20 20 25 30 30 40 40 45 60 60 65 85
IQR = Q3 – Q1
= 42.5 – 15
= 27.5 minutes
Interpretation: The range of the middle half of travel
times for the New Yorkers in the sample is 27.5 minutes.
Identifying Outliers In the New York travel time data, we found Q1=15
minutes, Q3=42.5 minutes, and IQR=27.5 minutes.
Calculate the outlier cutoffs using the IQR rule.
In addition to serving as a measure of spread, the
interquartile range (IQR) is used as part of a rule of thumb for
identifying outliers.
In the New York travel time data, we found Q1=15 The Five-Number Summary
minutes, Q3=42.5 minutes, and IQR=27.5 minutes.
The five-number summary of a distribution consists of the
Calculate the outlier cutoffs using the IQR rule. smallest observation, the first quartile, the median, the third
quartile, and the largest observation, written in order from
smallest to largest.
For these data, 1.5 x IQR = 1.5(27.5) = 41.25
Q1 - 1.5 x IQR = 15 – 41.25 = -26.25 Minimum Q1 M Q3 Maximum
Q3+ 1.5 x IQR = 42.5 + 41.25 = 83.75
5 10 10 15 15 15 15 20 20 20 25 30 30 40 40 45 60 60 65 85
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
TravelTime TravelTime
Chapter 1, Section 1
Best Measure of
Center
Best Measure of
Spread
Chapter 2, Section 1
Section 2.1
2.1: Describing Location Describing Location in a Distribution
in a Distribution After this section, you should be able to…
Other Percentiles: Deciles and Quartiles Other Percentiles: Deciles and Quartiles
Example: Deciles Quartiles
The following are test scores (out of 100) for a particular math For any set of data (ranked in order from least to greatest):
class.
The second quartile, Q2 (50%) is the median.
44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82
84 86 87 88 90 92 95 96 96 98 100 The first quartile, Q1 (25%) is the median of items below Q2.
The third quartile, Q3 (75%) is the median of items above Q2.
Find the sixth decile.
Sixth decile = 60% The average of the 18th and 19th items
60% = 0.6 represents the 6th decile (D6).
0.6(30) 60% of the scores were at or below 82.
18
Chapter 2, Section 1
Other Percentiles: Deciles and Quartiles Other Percentiles: Deciles and Quartiles
Example: Quartiles Example: Quartiles
The following are test scores (out of 100) for a particular math The following are test scores (out of 100) for a particular math
class. class.
44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82 44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82
84 86 87 88 90 92 95 96 96 98 100 84 86 87 88 90 92 95 96 96 98 100
Age of First 44 Presidents When They Were Inaugurated Age of First 44 Presidents When They Were Inaugurated
Age Frequency Relative Cumulative Cumulative Age Frequency Relative Cumulative Cumulative
frequency frequency relative frequency frequency relative
frequency frequency
20 20
0 0
40 45 50 55 60 65 70 40 45 50 55 60 65 70
Age at inauguration Age at inauguration
Section 2.2
2.2: Normal Distributions
Normal Distributions After this section, you should be able to…
• Many statistical inference procedures are based on Normal In the Normal distribution with mean µ and standard
distributions. deviation σ:
•Approximately 68% of the observations fall
within σ of µ.
•Approximately 95% of the observations fall
within 2σ of µ.
•Approximately 99.7% of the observations fall
within 3σ of µ.
Chapter 2, Section 1
The distribution of Iowa Test of Basic Skills (ITBS) vocabulary The distribution of Iowa Test of Basic Skills (ITBS) vocabulary
scores for 7th grade students in Gary, Indiana, is close to scores for 7th grade students in Gary, Indiana, is close to
Normal. Suppose the distribution is N(6.84, 1.55) and the Normal. Suppose the distribution is N(6.84, 1.55).
range is between 0 and 12. b) Using the Empirical Rule, what percent of ITBS vocabulary
a) Sketch the Normal density curve for this distribution. scores are less than 3.74?
Chapter 2, Section 1
The distribution of Iowa Test of Basic Skills (ITBS) vocabulary The distribution of Iowa Test of Basic Skills (ITBS) vocabulary
scores for 7th grade students in Gary, Indiana, is close to scores for 7th grade students in Gary, Indiana, is close to
Normal. Suppose the distribution is N(6.84, 1.55). Normal. Suppose the distribution is N(6.84, 1.55).?
b) Using the Empirical Rule, what percent of ITBS vocabulary c) Using the Empirical Rule, what percent of the scores are
scores are less than 3.74? between 5.29 and 9.94?
4. Conclude in context.
4. Conclude in context.
3. Determine the p-value by looking up the z-score in We expect that 79.1% of Nadal’s first serves will be less
the Standard Normal table. than 120 mph.
P(z < 0.81) = .7910
We expect that 20.9% of Nadal’s first serves will be greater
Z .00 .01 .02 than 120 mps.
0.7 .7580 .7611 .7642
0.8 .7881 .7910 .7939
0.9 .8159 .8186 .8212
Let’s Practice When Tiger Woods hits his driver, the distance the ball
travels can be described by N(304, 8). What percent of
When Tiger Woods hits his driver, the Tiger’s drives travel between 305 and 325 yards?
distance the ball travels can be Step 1: Draw Distribution
described by N(304, 8). What percent of
Tiger’s drives travel between 305 and
325 yards?
Step 2: Z- Scores
325 - 304
When x = 325, z = = 2.63
8
305 - 304
When x = 305, z = = 0.13
8
Chapter 2, Section 1
Step 3: P-values
Normal Calculations on Calculator
Calculates Example
NormalCDF Probability of What percent of students
obtaining a value scored between 70 and
BETWEEN two values 95 on the test?
NormalPDF Probability of What is the probability
Using Table A, we can find the area to the left of z=2.63 and
obtaining PRECISELY that Suzy scored a 75 on
the area to the left of z=0.13.
or EXACTLY a specific the test?
0.9957 – 0.5517 = 0.4440.
x-value
InvNorm X-value given Tommy scored a 92 on
Step 4: Conclude In Context probability or the test; what proportion
percentile of students did he score
About 44% of Tiger’s drives travel between 305 and 325 yards. better than?
When Tiger Woods hits his driver, the distance Suzy bombed her recent AP Stats exam; she
the ball travels can be described by N(304, 8). scored at the 25th percentile. The class average
What percent of Tiger’s drives travel between was a 170 with a standard deviation of 30.
305 and 325 yards? Assuming the scores are normally distributed,
what score did Suzy earn of the exam?
Chapter 2, Section 1
Suzy bombed her recent AP Stats exam; she When Can I Use Normal
scored at the 25th percentile. The class average Calculations?!
was a 170 with a standard deviation of 30.
Assuming the scores are normally distributed, • Whenever the distribution is Normal.
what score did Suzy earn of the exam? • Ways to Assess Normality:
– Plot the data.
• Make a dotplot, stemplot, or histogram and see if the
graph is approximately symmetric and bell-shaped.
– Check whether the data follow the 68-95-99.7
rule.
– Construct a Normal probability plot.
Finding Areas Under the Standard Normal Finding Areas Under the Standard
Curve Normal Curve
Find the proportion of observations from the standard Find the proportion of observations from the standard
Normal distribution that are between -1.25 and 0.81. Normal distribution that are between -1.25 and 0.81.
Step 1: Look up area to
Step 3: Subtract.
the left of 0.81 using
table A.
Section 3.1
3.1: Scatterplots & Scatterplots and Correlation
Correlation After this section, you should be able to…
Scatterplots
Scatterplots Make a scatterplot of the relationship between body weight and
pack weight. Body weight is our eXplanatory variable.
1. Decide which variable should go on each axis. Body weight (lb) 120 187 109 103 131 165 158 116
Backpack weight 26 30 26 24 29 35 31 28
• Remember, the eXplanatory variable goes on the X- (lb)
axis!
2. Label and scale your axes.
3. Plot individual data values.
Describe and interpret the scatterplot below. The y-axis Describe and interpret the scatterplot below. The y-axis
refers to backpack weight in pounds and the x-axis refers to refers to backpack weight in pounds and the x-axis refers to
body weight in pounds. body weight in pounds.
Sample Answer:
There is a moderately strong,
positive, linear relationship
between body weight and pack
weight. There is one possible
outlier, the hiker with the body
weight of 187 pounds seems to
be carrying relatively less weight
than are the other group
members. It appears that lighter
students are carrying lighter
backpacks
Describe and interpret the scatterplot below. The y-axis refer Describe and interpret the scatterplot below. The y-axis refer
to a school’s mean SAT math score. The x-axis refers to the to a school’s mean SAT math score. The x-axis refers to the
percentage of students at a school taking the SAT. percentage of students at a school taking the SAT.
Sample Answer:
There is a moderately strong,
negative, curved relationship
between the percent of
students in a state who take
the SAT and the mean SAT
math score.
Further, there are two distinct
clusters of states and at least
one possible outliers that falls
outside the overall pattern.
Chapter 3, Section 1
• 0.235
• -0.456
• 0.975
• -0.784
1 æ x - x öæ y - y ö
r= åç i ÷ç i ÷
n -1 è sx øçè sy ÷ø
Section 3.2
3.2: Least Squares Least-Squares Regression
Regressions After this section, you should be able to…
Format 2:
Predicted back pack weight= 16.3 +
0.0908(student’s weight)
= 37(12) + 270
Predicted Score: 714 points
Self Check Quiz! A crazy professor believes that a child with IQ 100 should
have a reading test score of 50, and that reading score should
increase by 1 point for every additional point of IQ. What is
the equation of the professor’s regression line for predicting
reading score from IQ? Be sure to identify all variables used.
Chapter 3, Section 1
Self Check Quiz: Calculate the Regression Self Check Quiz: Interpreting Regression Lines &
Equation Predicted Value
Data on the IQ test scores and reading test scores for a
A crazy professor believes that a child with IQ 100 should group of fifth-grade children resulted in the following
have a reading test score of 50, and that reading score should regression line:
increase by 1 point for every additional point of IQ. What is predicted reading score = −33.4 + 0.882(IQ score)
the equation of the professor’s regression line for predicting
reading score from IQ? Be sure to identify all variables used. (a) What’s the slope of this line? Interpret this value in
context.
Answer: (b) What’s the y-intercept? Explain why the value of the
= 50 + x intercept is not statistically meaningful.
= predicted reading score (c) Find the predicted reading scores for two children
x = number of IQ points above 100 with IQ scores of 90 and 130, respectively.
TI-NSpire: LSRL
1. Enter x data into list 1 and y data into list 2.
2. Press MENU, 4: Statistics, 1: Stat
Calculations
3. Select Option4: Linear Regression.
4. Insert either name of list or a[] for x and
name of list or b[] of y. Press ENTER.
residual
Negative residuals
(below line)
Pattern in residuals
Linear model not
appropriate
Chapter 3, Section 1
2.
s=
å residuals 2
=
å(y i - yÙ) 2
Answer: n -2 n -2
S= 0.740
S represents the typical or average error (residual).
Interpretation: On average, the model under predicts fat
gain by 0.740 kilograms using the least-squares regression Positive = UNDER predicts
line. Negative = OVER predicts
Chapter 3, Section 1
1. ŷ = 2.1495x+ 10.667
Self Check Quiz!
ŷ = predicted fuel consumption in pounds of coal
The data is a random sample of 10 trains comparing number of
cars on the train and fuel consumption in pounds of coal. x = number of rail cars
• What is the regression equation? Be sure to define all variables. 2. 96.7 % of the varation is fuel consumption is explained by the
• What is r2 telling you? linear realtionship with the number of rail cars.
• Define and interpret the slope in context. Does it have a 3. Slope = 2.1495. With each additional car, the fuel consuption
practical interpretation? increased by 2.1495 pounds of coal, on average. This makes
• Define and interpret the y-intercept in context. practical sense.
• What is s telling you? 4. Y-interpect = 10.667. When there are no cars attached to the
train the fuel consuption is 10.667 pounds of coal. This has no
practical intrepretation beacuse there is always at least one car,
the engine.
5. S= 4.361. On average, the model over predicts fuel
consumption by 4.361 pounds of coal using the least-squares
regression line.
The left graph is perfectly linear. In the right graph, the last value was
changed from (5, 5) to (5, 8)…clearly influential, because it changed
the graph significantly. However, the residual is very small.
NO!!!
Chapter 3, Section 1
Section 4.1
4.1: Samples & Surveys
Samples and Surveys
After this section, you should be able to…
Methods of Selecting an
SRS
• Draw names from a hat
• Assign each person in the group and randomly How to Make a Simple Random
generate chosen numbers Sample
– Ways to randomly generate numbers
• Computer
• Random Table of Digits
Use of a Random Digits Table
• Calculator
This is just a portion of an entire random digits table. Ordinarily, the starting Comparing our selection
point is randomly selected. For this example we start at the left end of row 2. {21,09,08,20,18} and our student
list (sampling frame) we determine
our SRS.
We read digits in groups of two since that is the number of digits in our labels.
(If we had one digit labels we would read one digit at a time; for three digit labels,
read three digits at a time, etc.)
(Spaces between digits are there simply to make the chart easier to read.)
Our selection is {21,09,08,20,18}.
Chapter 4, Section 1
Errors?!
• How much do you weigh?
4.2: Experiments
• Will you not vote for President Obama’s
reelection?
• Why should guns be outlawed?
• How often do you exercise?
• How many cigarettes do you smoke each
week?
• How often should Mrs. Daniel give quizzes?
SAT Survey vs. SAT • Survey: Asking students about how many
hours they studied for the SAT and their
Experiment resulting scores.
• Experiment: Selecting a group of same IQ
Describe a survey and an experiment that students and assigning each student a
can be used to determine the relationship different random number of hours to
between SAT scores and hours studied? studying for the SAT. The student is ONLY
allowed to study the mandated amount of
hours. Then, compare their result scores
Group Treatment
2 2
Block Design
• A block is a group of experimental units or
subjects that are known before the
experiment to be similar in some way that is
expected to affect the response to the
treatments.
• In a block design, the random assignment of
units to treatments is carried out separately
within each block.
• Helps control for lurking variables.
Chapter 4, Section 1
Block Design
• Experiments are often blocked by
– Age
– Gender
– Race
– Achievement Level (Regular, Honors, AP, IQ level,
etc.)
Data Ethics
• All planned studies must be reviewed in advance by an
institutional review board charged with protecting the
safety and well-being of the subjects.
• All individuals who are subjects in a study must give their
informed consent before data are collected.
• All individual data must be kept confidential. Only
statistical summaries for groups of subjects may be made
public.
Chapter 5, Section 1
Section 5.1
5.1: Randomness, Randomness, Probability and Simulation
Probability and Simulation After this section, you should be able to…
1
Chapter 5, Section 1
Performing a Simulation
Myths about Randomness
The imitation of chance behavior, based on a model that accurately
The myth of short-run regularity: reflects the situation, is called a simulation. Simulations are usually
done with a table of random digits, calculator random number
The idea of probability is that randomness is predictable in generator (RandInt) or computer software.
the long run (1 million plus occurrences). Probability does
not allow us to make short-run predictions. State: Identify the probability calculation at interest.
Plan: Describe how to use a chance device/tool to implement
The myth of the “law of averages”: one repetition of the process. Explain clearly how to identify
Probability tells us random behavior evens out in the long the outcomes of the chance process.
run. Future outcomes are not affected by past behavior. Do: Perform many (at least 20) repetitions of the simulation.
Conclude: Use the results of your simulation to answer the
Women have a 50% of having a boy with each pregnancy;
question of interest, in context.
the gender of any previous children do not matter!
2
Chapter 5, Section 1
**See 5.1 WS
STATE: PLAN:
• What is the probability that the lottery would Using the table of random digits, we will randomly
assign each student a two digit number from 01 to 95.
result in two winners from the AP Stats class? We’ll label the students in the AP Statistics class from
• P (X=2), where x is the number of winners from 01 to 28, and the remaining students from 29 to 95.
AP Stats (Numbers from 96 to 00 will be skipped.) Starting at
the randomly selected row 139 and moving left to
right across the row, we’ll look at pairs of digits until
we come across two different values from 01 to 95.
These two values will represent the two students with
these labels will win the prime parking spaces. We will
record whether both winners are members of the AP
Statistics class (Yes or no). We will conduct the
simulation 18 times.
3
Chapter 5, Section 1
• What about repeat digits or ignored digits? ✓|✓ Sk|X|X X|✓ ✓|X X|X X|X ✓|✓ X|Sk|X Sk|✓|✓
CONCLUDE: NASCAR
Based on 18 repetitions of our simulation, both In an attempt to increase sales, a breakfast cereal company
decides to offer a NASCAR promotion. Each box of cereal will
winners came from the AP Statistics class 3 contain a collectible card featuring one of these NASCAR
times, so the probability is estimated as drivers: Jeff Gordon, Dale Earnhardt, Jr., Tony Stewart, Danica
16.67%. Therefore is definitely possible for two Patrick, or Jimmie Johnson. The company says that each of the
5 cards is equally likely to appear in any box of cereal. A
AP Stats students to be selected in a “fair”
NASCAR fan decides to keep buying boxes of the cereal until she
drawing. has all 5 drivers’ cards. She is surprised when it takes her 23
boxes to get the full set of cards. Should she be surprised?
Design and carry out a simulation to help answer this question.
4
Chapter 5, Section 1
STATE: PLAN:
What is the probability of needing to buy 23 or Using the calculator's random number generator
more cereal boxes to obtain one card from each (RandInt) we are going to simulate 50 trials. We
driver? will assign each driver a unique number 1 through
5. We will record how many trials it takes to get all
five values (drivers). We will record the total
number of digits required each time.
Driver Label
Jeff Gordon 1
Dale Earnhardt, Jr. 2
Tony Stewart 3
Danica Patrick 4
Jimmie Johnson 5
DO: CONCLUDE:
Dotplot of 50 Trials We never had to buy more than 22 boxes to get
the full set of cards in 50 repetitions of our
simulation. Our estimate of the probability that it
takes 23 or more boxes to get a full set of driver is
roughly 0. Therefore, she should be surprised that
it took 23 cereal box purchases to find all 5 driver
cards.
5
Chapter 5, Section 1
Section 5.2
5.2: Probability Rules Probability Rules
After this section, you should be able to…
6
Chapter 5, Section 1
Event Space:
There are 4 different combination of dice rolls that sum
to 5.
Solution:
Since the dice are fair, each Since each outcome has probability 1/36:
Sample
outcome is equally likely.
Space
Each outcome has P(A) = 4/36 or 1/9.
36 Outcomes
probability 1/36.
7
Chapter 5, Section 1
8
Chapter 5, Section 1
9
Chapter 5, Section 1
2014 AP Statistics Exam Scores Probabilities: 2014 AP Statistics Exam Scores Probabilities:
Score 1 2 3 4 5 Score 1 2 3 4 5
Online-learning courses are rapidly gaining popularity Distance-learning courses are rapidly gaining popularity
among college students. Randomly select an among college students. Randomly select an
undergraduate student who is taking online-learning undergraduate student who is taking distance-learning
courses for credit and record the student’s age. Here is the courses for credit and record the student’s age. Here is the
probability model: probability model:
Age group (yr): 18 to 23 24 to 29 30 to 39 40 or over Age group (yr): 18 to 23 24 to 29 30 to 39 40 or over
Probability: 0.57 0.17 0.14 0.12 Probability: 0.57 0.17 0.14 0.12
(a) Is this a legitimate probability model? Justify. (a)Is this a legitimate probability model? Justify.
Each probability is between 0 and 1 and
0.57 + 0.17 + 0.14 + 0.12 = 1
(b)Find the probability that the chosen student is not in (b)Find the probability that the chosen student is not in
the traditional college age group (18 to 23 years). the traditional college age group (18 to 23 years).
P(not 18 to 23 years) = 1 – P(18 to 23 years)
= 1 – 0.57 = 0.43
10
Chapter 5, Section 1
What is the relationship between educational achievement What is the relationship between educational achievement
and home ownership? A random sample of 500 people and and home ownership? A random sample of 500 people and
each member of the sample was identified as a high school each member of the sample was identified as a high school
graduate (or not) and as a home owner (or not). The two-way graduate (or not) and as a home owner (or not). The two-way
table displays the data. table displays the data.
High Not a High Not a
School High School Total School High School Total
Graduate Graduate Graduate Graduate
Homeowner 221 119 340 Homeowner 221 119 340
Not a Homeowner 89 71 160 Not a Homeowner 89 71 160
Total 310 190 500 Total 310 190 500
What is the probability that a randomly selected person… What is the probability that a randomly selected person…
• (a) is a high school graduate (a) is a high school graduate = 310/500
• (b) is a high school graduate and owns a home (b) is a high school graduate and owns a home = 221/500
• (c) is a high school graduate or owns a home (c) is a high school graduate or owns a home = 310 + 119 =
429/500
5.3: Conditional Probability After this section, you should be able to…
11
Chapter 5, Section 1
A. What is the probability of getting an even A. What is the probability of getting an even
number? number? 4/8 or 1/2
B. What is the probability of getting a prime B. What is the probability of getting a prime
number? number? 5/8
C. What is the probability of getting a multiple C. What is the probability of getting a multiple
of 3? of 3? 2/8 or 1/4
A. What is the probability of getting 2 even A. What is the probability of getting 2 even
spins in a row? spins in a row? 1/4
B. What is the probability of getting a prime B. What is the probability of getting a prime
number or an odd number? number or an odd number? 5/8
C. What is the probability of getting a multiple C. What is the probability of getting a multiple
of 3 or an even spin? of 3 or an even spin? 5/8
12
Chapter 5, Section 1
13
Chapter 5, Section 1
0.05
P(B | A) = = 0.125
0.40
There is a 12.5% chance that a randomly selected resident
who reads USA Today also reads the New York Times.
14
Chapter 5, Section 1
15
Chapter 5, Section 1
Sample Space:
HH HT TH TT
Example: Teens with Online Profiles Example: Teens with Online Profiles
The Pew Internet and American Life Project finds that 93% of The Pew Internet and American Life Project finds that 93% of
teenagers (ages 12 to 17) use the Internet, and that 55% of teenagers (ages 12 to 17) use the Internet, and that 55% of
online teens have a Facebook profile. online teens have a Facebook profile.
What percent of teens are online and have a Facebook profile? What percent of teens are online and have a Facebook profile?
P(online) = 0.93
P(profile | online) = 0.55
= (0.93)(0.55)
= 0.5115
16
Chapter 5, Section 1
Consecutive Probability
Assume a spinner has 8 equal sized sections;
each section is numbered a unique number
from 1 to 8. You spin the spinner three times.
17
Chapter 5, Section 1
B. What proportion of adults are 18 to 29 year old B. What proportion of adults are 18 to 29 year old
Internet users that visit video-sharing sites? Internet users that visit video-sharing sites?
.27 x .7 = .189
C. What proportion of adults are 30 to 49 year old
Internet users that visit video-sharing sites? C. What proportion of adults are 30 to 49 year old
Internet users that visit video-sharing sites?
D. What proportion of adults are 50 and over year old .45 x .51 = .2295
Internet users that visit video-sharing sites?
D. What proportion of adults are 50 and over year old
E. What proportion of all adult Internet users visit video- Internet users that visit video-sharing sites?
sharing sites? Do most Internet users visit YouTube .28 x 26 = .0728
and/or similar sites? Justify your answer.
E. P(video yes ∩ 18
to 29) = 0.27 • P(video yes) = 0.1890 + 0.2295 + 0.0728 = 0.4913
0.7
=0.1890
49.13% of all adult Americans that use the Internet watch
videos online. While 49.13% represents a large proportion of
the population, it is not a majority, so it is not fair to say “most”
P(video yes ∩ 30
adult American Internet users visit video-sharing sites.
to 49) = 0.45 •
0.51
=0.2295
P(video yes ∩ 50
+) = 0.28 • 0.26
=0.0728
18
Chapter 5, Section 1
P(joint1 OK and joint 2 OK and joint 3 OK and joint 4 OK and joint 5 OK and joint 6 OK)
=P(joint 1 OK) • P(joint 2 OK) • … • P(joint 6 OK)
=(0.977)(0.977)(0.977)(0.977)(0.977)(0.977) = 0.87
19
Chapter 6, Section 1
1
Chapter 6, Section 1
Value: 0 1 2 3 4 5 6 7 8 9 10
Probability: 0.001 0.006 0.007 0.008 0.012 0.020 0.038 0.099 0.319 0.437 0.053
2
Chapter 6, Section 1
3
Chapter 6, Section 1
4
Chapter 6, Section 1
5
Chapter 6, Section 1
= 0.9868 – 0.9306
= 0.0562
6
Chapter 6, Section 1
Transformations
on the shape, center, and spread of a distribution of data.
Remember:
1. Adding (or subtracting) a constant, a, to each observation:
on Random
• Adds a to measures of center and location.
• Does not change the shape or measures of spread.
Variables b:
• Multiplies (divides) measures of center and location by
b.
• Multiplies (divides) measures of spread by |b|.
• Does not change the shape of the distribution.
7
Chapter 6, Section 1
The mean of C is
$562.50 and the
standard deviation is
$163.50.
8
Chapter 6, Section 1
Adding the same number a (which could be negative) to Probability pi 0.15 0.25 0.35 0.20 0.05
Compare the shape, center, and spread of the two probability distributions.
Bottom Line:
Whether we are dealing with data or random
Combining
Random
variables, the effects of a linear transformation
are the same!!!
Variables
9
Chapter 6, Section 1
Passengers yi 75 76 77 78
E(T) = µT = µX + µY
Probability pi 0.3 0.4 0.2 0.1 In general, the mean of the sum of several random
variables is the sum of their means.
Mean µA = 76.1 Standard Deviation σA = 0.943
10
Chapter 6, Section 1
11
Chapter 6, Section 1
12
Chapter 6, Section 1
CONCLUDE:
There is an 84.8% percent chance that Mrs. Daniel’s iced coffee will taste right.
13
Chapter 6, Section 1
6. Final calculations:
0.9633 − 0.1867 = 0.7766.
14
Chapter 6, Section 1
CONCLUDE:
There is a 35.94% chance that Mrs. Daniel will score higher than
Mrs. Cooper on any given night.
15
Chapter 6, Section 1
Binomial Distribution: Mean and Find the mean and standard deviation of
Standard Deviation X.
If a count X has the binomial distribution with number of trials X is a binomial random variable with parameters n = 21
n and probability of success p, the mean and standard and p = 1/3.
deviation of X are
µ X = np
s X = np(1- p)
16
Chapter 6, Section 1
µ X = np s X = np(1- p)
pi 0.2373 0.3955 0.2637 0.0879 0.0147 0.00098
= 21(1/3) = 7
= 21(1/3)(2 /3) = 2.16
17
Chapter 6, Section 1
18
Chapter 6, Section 1
19
Chapter 6, Section 1
µ X = np sX = np(1- p)
As a rule of thumb, we will use the Normal approximation
when n is so large that np ≥ 10 and n(1 – p) ≥ 10. That is, the
expected number of successes and failures are both at least
10. We use the normal approximation more in Chapters 8-10.
Consider the normal approximation for this setting. Binomcdf(2500, 0.6, 1520, 2500) = 0.213139
20
Chapter 6, Section 1
OR:
Normal: Since np = 2500(0.60) = 1500 and n(1 – p) = 2500(0.40) Geometric Settings
= 1000 are both at least 10, we may use the Normal A geometric setting arises when we perform independent
approximation. trials of the same chance process and record the number of
trials until a particular outcome occurs. The four conditions
Calculate the mean. µ = np = 2500(0.60) = 1500 for a geometric setting are
Binary? The possible outcomes of each trial can be
Calculate standard deviation. B classified as “success” or “failure.”
s = np(1 - p) = 2500(0.60)(0.40) = 24.49 Independent? Trials must be independent; that is,
Use Calculator I knowing the result of one trial must not have any effect
Normalcdf (1520, 2500, 1500, 24.49) = 0.207061 on the result of any other trial.
Trials? The goal is to count the number of trials until the
T first success occurs.
CONCLUDE:
There is a 20.61% that 1520 or more of the people in the S Success? On each trial, the probability p of success must
sample agree. be the same.
21
Chapter 6, Section 1
22
Chapter 6, Section 1
23
Chapter 6, Section 1
Number of Probability of n-
arrangements k failures
Probability of k
of k successes
successes
24
Chapter 7, Section 1
Section 7.1
What Is a Sampling Distribution?
7.1: What is a Sampling
After this section, you should be able to…
Distribution?!?!
ü DISTINGUISH between a parameter and a statistic
ü DEFINE sampling distribution
ü DISTINGUISH between population distribution,
sampling distribution, and the distribution of sample
data
ü DETERMINE whether a statistic is an unbiased
estimator of a population parameter
ü DESCRIBE the relationship between sample size and the
variability of an estimator
Population
Sample Collect data from a
representative Sample...
Make an Inference
about the Population.
1
Chapter 7, Section 1
2
Chapter 7, Section 1
Sampling Distribution
The sampling distribution of a statistic is the distribution of
values taken by the statistic in all possible samples of the
same size from the same population.
3
Chapter 7, Section 1
4
Chapter 7, Section 1
Section 7.2
Sample Proportions
7.2: Sample Proportions After this section, you should be able to…
5
Chapter 7, Section 1
n =100 n =400
p (1 - p ) conditions:
6
Chapter 7, Section 1
Normal Approximation & Sample We have an SRS of size n = 1500 drawn from a population
Proportions in which the proportion p = 0.35 attend college within 50
miles of home.
A polling organization asks an SRS of 1500 first-year college (0.35)(0.65)
students how far away their home is. Suppose that 35% of
µ pÙ = 0.35 s pÙ = = 0.0123
1500
all first-year students actually attend college within 50
miles of home. Conditions:
Independence: It is reasonable to assume that there are
What is the probability that the random sample of 1500 more than 15,000 college freshmen and therefore the
students will give a result within 2 percentage points of sample represents less than 10% of the population.
this true value? Normality: Additionally, np = 1500(0.35) = 525 and n(1 –
p) = 1500(0.65)=975 are both greater than 10, so it is
reasonable to assume normality.
7
Chapter 7, Section 1
Normalcdf (0.33, 0.37, 0.35, 0.0123) = 0.896054 What is the probability that the proportion in an SRS of
100 students is as small as or smaller than the result of
CONCLUDE: There is an 89.61% chance that the sample will yield the administration’s sample?
results within 2 percentage points of the true value.
(0.67)(0.33)
µ pˆ = 0.67 s pˆ = = 0.04702 Normalcdf (0, 0.62, 0.67, 0.04702) = 0.143805
100
Be sure to include labels!
Conditions: CONCLUDE: There is an 14.38% chance that the sample will yield
Independence: It is reasonable to assume that there are results at or below 62% given that the true population
more than 1000 college freshmen and therefore the proportions is 67%
sample represents less than 10% of the population.
Normality: Additionally, np = 100(0.67) = 67and n(1 – p) =
100(0.33)= 33 are both greater than 10, so it is reasonable
to assume normality.
8
Chapter 7, Section 1
1
µ pÙ = (np) = p pÙis an unbiased estimator or p
n
1 np(1 - p) p(1 - p)
s pÙ = np(1 - p) = =
n n2 n
As sample size increases, the spread decreases.
9
Chapter 7, Section 1
10
Chapter 7, Section 1
11
Chapter 7, Section 1
12
2017. 12. 10.
1
2017. 12. 10.
2
2017. 12. 10.
3
2017. 12. 10.
4
2017. 12. 10.
5
2017. 12. 10.
Section 8.2
Section 8.2 Estimating a Population
Estimating a Population Proportion
After this section, you should be able to…
Proportion ü CONSTRUCT and INTERPRET a confidence interval for a
population proportion
ü DETERMINE the sample size required to obtain a level C
confidence interval for a population proportion with a
specified margin of error
ü DESCRIBE how the margin of error of a confidence
interval changes with the sample size and the level of
confidence C
6
2017. 12. 10.
7
2017. 12. 10.
Interval: Interval:
We are 90% confident that the interval from 0.375 to 0.477 We are 90% confident that the interval from 0.375 to 0.477
will capture the true proportion of red beads. will capture the true proportion of red beads.
pÙ(1 - pÙ)
0.426 +/- 1.645 * pÙ± z *
n
8
2017. 12. 10.
Parameter: p = true proportion of vaccine eligible people Name Interval: 1-proportion z-Interval
receiving flu shot
Interval:
Assess Conditions: We are 99% confident that the interval from 0.38998 to
Random: Random sample, stated. 0.44236 will capture the true proportion of vaccine
eligible adults receiving the flu shot.
Sample Size: Since both n𝑝̂ ≥ 10 (978) and n(1 – 𝑝̂ ) ≥ 10
(1372) are both greater than 10, our sample size is Conclude in Context:
considered large enough. Since 0.45 (45%) is not
Independent: Since the sample of 2350 is less than 10% contained within the
of the population (23,500 adults), it is reasonable to interval, we have
assume independence when sampling without reasonable evidence
replacement. that 45% of eligible
adults did not get
vaccinated.
9
2017. 12. 10.
10
2017. 12. 10.
ü DESCRIBE how the margin of error of a confidence interval changes with One-Sample z Interval for a Population Mean
the sample size and the level of confidence C
Choose an SRS of size n from a population having unknown mean µ and
ü DETERMINE sample statistics from a confidence interval known standard deviation σ. As long as the Normal and Independent
conditions are met, a level C confidence interval for µ is
s
x ± z*
n
The critical value z* is found from the standard Normal distribution.
11
2017. 12. 10.
We can use Table B in the back of the book to determine critical values t* for t
distributions with different degrees of freedom.
12
2017. 12. 10.
13
2017. 12. 10.
14
2017. 12. 10.
15
2017. 12. 10.
Section 9.1
Section 9.1 Significance Tests: The Basics
Significance Tests: The Basics After this section, you should be able to…
1
2017. 12. 10.
2
2017. 12. 10.
3
2017. 12. 10.
4
2017. 12. 10.
5
2017. 12. 10.
6
2017. 12. 10.
• Ho: innocent
• Ha: guilty
7
2017. 12. 10.
8
2017. 12. 10.
Power
The probability of
making a Type 1 • The probability of NOT making a Type II error.
error is ALWSYS
equal to the • The higher the power, the less likely the
significance level
(𝛼).
mistake is.
More:
https://onlinecourses.science.psu.edu/stat414/book/export/html/245
9
2017. 12. 10.
Error Probabilities
We can assess the performance of a significance test by looking at the probabilities of
the two types of error. That’s because statistical inference is based on asking,
“What would happen if I did this many times?”
10
2017. 12. 10.
Error Probabilities
Assuming H 0 : p = 0.08 is true, the sampling distribution of pÙ will have : Type 2 Errors Investigation WS
Shape : Approximately Normal because 500(0.08) = 40 and www.rossmanchance.com/applets
500(0.92) = 460 are both at least 10.
Center : µ pÙ = p = 0.08
Spread : s pÙ =
p(1- p)
=
0.08(0.92)
= 0.0121 The shaded area in the right
Select: Improved Batting Averages (Power)
n 500 tail is 5%. Sample
proportion values to the
right of the green line at Or direct link:
0.0999 will cause us to
reject H0 even though H0 is http://statweb.calpoly.edu/chance/applets/p
true. This will happen in 5% ower/power.html
of all possible samples. That
is, P(making a Type I error)
= 0.05.
( pÙ= 0.0999)
11
2017. 12. 10.
Section 9.2
Section 9.2 Tests About a Population Proportion
Tests About a
After this section, you should be able to…
12
2017. 12. 10.
13
2017. 12. 10.
statistic - parameter
test statistic =
standard deviation of statistic
0.64 - 0.80
z= = -2.83
0.0566
Z- score: -2.83
P- value: 0.0023
14
2017. 12. 10.
Example: One Potato, Two Potato Example: One Potato, Two Potato
A potato-chip producer has just received a truckload of State Parameter & State Hypothesis
potatoes from its main supplier. If the producer determines
that more than 8% of the potatoes in the shipment have α = 0.10 significance level
blemishes, the truck will be sent away to get another load
from the supplier. A supervisor selects a random sample of Parameter: p = actual proportion of potatoes in this shipment
500 potatoes from the truck. An inspection reveals that 47 of with blemishes.
the potatoes have blemishes. Carry out a significance test at
the α= 0.10 significance level. What should the producer Hypothesis:
conclude? H0: p = 0.08
Ha: p > 0.08
15
2017. 12. 10.
Example: One Potato, Two Potato Example: One Potato, Two Potato
Test Statistic (Calculate) and Obtain P-value Make a Decision and State Conclusion
16
2017. 12. 10.
17
2017. 12. 10.
p(1 - p) (0.8)(0.2)
µ pÙ = p = 0.80 and standard deviation s pÙ = = = 0.0566
One Proportion Z-Test by Hand
n 50
18
2017. 12. 10.
Basketball- Carrying Out a Significance Test Example: One Potato, Two Potato
Step 3b: Calculate Test Statistic Step 3: Calculations
p(1 - p) (0.8)(0.2)
The sample proportion of blemished potatoes is
µ pÙ = p = 0.80 and standard deviation s pÙ = = = 0.0566
n 50
pˆ = 47 / 500 = 0.094.
statistic - parameter
test statistic = pÙ- p0 0.094 - 0.08
standard deviation of statistic
Test statistic z = = = 1.15
z=
0.64 - 0.80
= -2.83
p0 (1- p0 ) 0.08(0.92)
0.0566
n 500
Then, Using Table A, we find that the P-value is P(z ≤ – 2.83) =
0.0023.
P-value Using Table A the
desired P-value is
P(z ≥ 1.15) = 1 – 0.8749 = 0.1251
Test statistic z =
pÙ- p0
=
0.60 - 0.50
= 2.45
Section 9.3
Tests About a
p0 (1- p0 ) 0.50(0.50)
n 150
Population Mean
P-value To compute this P-
value, we find the area in one
tail and double it. Using Table A
or normalcdf(2.45, 100) yields
P(z ≥ 2.45) = 0.0071 (the right-
tail area).
So the desired P-value
is2(0.0071) = 0.0142.
19
2017. 12. 10.
20
2017. 12. 10.
Assess Conditions
State Parameter & State Hypothesis Random, Normal, and Independent.
ü Random: The company tests an SRS of 15 new AAA batteries.
Parameter: µ = the true mean lifetime of the new deluxe ü Normal: With such a small sample size (n = 15), we need to
AAA batteries. inspect the data for any departures from Normality.
Name Test, Test statistic (Calculation) and Obtain P- Make a Decision & State Conclusion
value Make a Decision: Since the p-value of 0.072 exceeds our α =
0.05 significance level, we fail to reject the null hypothesis
One sample t- test and
t: 1.5413 df = 14 Make a Conclusion: we can’t conclude that the company’s
p-value: 0.072771 new AAA batteries last longer than 30 hours, on average.
21
2017. 12. 10.
• If the sample size is large (n ≥ 30), we can safely carry out • T-score table only includes probabilities only for t distributions
a significance test (due to the central limit theorem). with degrees of freedom from 1 to 30 and then skips to df = 40,
50, 60, 80, 100, and 1000. (The bottom row gives probabilities for
• If the sample size is small, we should examine (create df = ∞, which corresponds to the standard Normal curve.)
graph on calculator and then DRAW on paper) the
sample data for any obvious departures from Normality, • If the df you need isn’t provided in Table B, use the next
such as skewness and outliers. lower df that is available.
22
2017. 12. 10.
23
2017. 12. 10.
24
2017. 12. 10.
25
2017. 12. 10.
Since no significance level is given, we’ll use α = 0.05. üIndependent We aren’t sampling, so it isn’t necessary to check the
10% condition. We will assume that the changes in depression scores
for individual subjects are independent. This is reasonable if the
experiment is conducted properly.
26
2017. 12. 10.
Name Test, Test Statistic (Calculate) and Make Decision & State Conclusion:
Obtain P-value Make Decision: Since the P-value of 0.0027 is much less
than our chosen α = 0.05, we have convincing evidence to
reject H0: µd = 0.
Using Tests Wisely: Statistical Using Tests Wisely: Don’t Ignore Lack of
Significance and Practical Importance Significance
Statistical significance is valued because it points to an effect There is a tendency to infer that there is no difference
that is unlikely to occur simply by chance. whenever a P-value fails to attain the usual 5% standard. In
some areas of research, small differences that are detectable
When a null hypothesis (“no effect” or “no difference”) can be only with large sample sizes can be of great practical
rejected at the usual levels (α = 0.05 or α = 0.01), there is good significance. When planning a study, verify that the test you
evidence of a difference. But that difference may be very plan to use has a high probability (power) of detecting a
small. When large samples are available, even tiny deviations difference of the size you hope to find.
from the null hypothesis will be significant.
27
2017. 12. 10.
Upper-tail probability p P-value The P-value is the area to the left of Upper-tail probability p P-value The P-value for this two-sided test is
df .25 .20 .15 t = -0.94 under the t distribution curve with df .005 .0025 .001 the area under the t distribution curve with 50
13 .694 .870 1.079
df = 15 – 1 = 14. 30 2.750 3.030 3.385
- 1 = 49 degrees of freedom. Since Table B
14 .692 .868 1.076 40 2.704 2.971 3.307
does not have an entry for df = 49, we use the
15 .691 .866 1.074 50 2.678 2.937 3.261
50% 60% 70% 99% 99.5% 99.8%
more conservative df = 40. The upper tail
Confidence level C Confidence level C
probability is between 0.005 and 0.0025 so the
desired P-value is between 0.01 and 0.005.
28
2017. 12. 10.
P-value According to technology, the area to the right of t Because the population standard deviation σ is usually unknown, we
use the sample standard deviation sx in its place. The resulting test
= 3.53 on the t distribution curve with df = 11 – 1 = 10 is
statistic has the standard error of the sample mean in the
0.0027. denominator x - µ0
t=
sx
n
When the Normal condition is met, this statistic has a t distribution
with n - 1 degrees of freedom.
x - µ 0 33.9 - 30
t= = = 1.54
sx 9.8
n 15
29
2017. 12. 10.
Section 10.1
10.1 Comparing Two Proportions
After this section, you should be able to…
Introduction
• Helps us compare the proportions of individuals
with a certain characteristic in two different
populations.
• We can compare proportions in both surveys and
experiments.
• Sample sizes nor population sizes need to be the
same.
1
2017. 12. 10.
2
2017. 12. 10.
3
2017. 12. 10.
Interval:
pÙ1 (1- pÙ1) pÙ2 (1- pÙ2 ) 0.73(0.27) 0.47(0.53)
( pÙ1 - pÙ2 ) ± z * + = (0.73 - 0.47) ± 1.96 + n2p2 = 2253*.47
n1 n2 800 2253
= 0.26 ± 0.037 = 1058.91
= (0.223, 0.297)
Round to nearest
whole number.
We are 95% confident that the interval from 0.223 to 0.297
captures the true difference in the proportion of all U.S. teens
and adults who use social-networking sites.
4
2017. 12. 10.
Use pÙC in place of both p1 and p2 in the expression for the denominator of the test
statistic :
( pÙ1 - pÙ2 ) - 0
z=
pÙC (1 - pÙC ) pÙC (1 - pÙC )
+
n1 n2
5
2017. 12. 10.
Hungry Children
Parameters & Hypotheses:
H0: p1 = p2
Ha: p1 ≠ p2
6
2017. 12. 10.
7
2017. 12. 10.
Standard Error Because we don't know the values of the parameters p1 and p2, we replace them
in the standard deviation formula with the sample proportions. The result is the
pÙ1(1 - pÙ1) pÙ2 (1 - pÙ2 )
standard error of the statistic pÙ1 - pÙ2 : +
n1 n2
If the Normal condition is met, we find the critical value z* for the given confidence
level from the standard Normal curve. Our confidence interval for p1 – p2 is:
statistic ± (critical value) × (standard deviation of statistic)
pÙ(1- pÙ1 ) pÙ2 (1- pÙ2 )
( pÙ1 - pÙ2 ) ± z * 1 +
n1 n2
Section 10.2
Section 10.2
Comparing Two Means
After this section, you should be able to…
Comparing Two
ü DESCRIBE the characteristics of the sampling distribution of the
difference between two sample means
ü CALCULATE probabilities using the sampling distribution of the
difference between two sample means
Means
ü DETERMINE whether the conditions for performing inference are
met
ü USE two-sample t procedures to compare two means based on
summary statistics or raw data
ü INTERPRET computer output for two-sample t procedures
ü PERFORM a significance test to compare two means
ü INTERPRET the results of inference procedures
8
2017. 12. 10.
Center µx = µ
s
Spread sx = if the sample is no more than 10% of the population
n
2 2
s1 s2
( x1 - x2 ) ± t * +
n1 n2
9
2017. 12. 10.
Conditions: Two Mean T- Interval Big Trees, Small Trees, Short Trees, Tall Trees
The Wade Tract Preserve in Georgia is an old-growth forest of longleaf
pines that has survived in a relatively undisturbed state for hundreds of
years. One question of interest to foresters who study the area is “How
1) Random: Both sets of data should come from a well-designed do the sizes of longleaf pine trees in the northern and southern halves
random samples or randomized experiments. of the forest compare?” To find out, researchers took random samples
of 30 trees from each half and measured the diameter at breast height
(DBH) in centimeters. Comparative boxplots of the data and summary
2) Normal: Both sets of data must meet the Central Limit statistics from Minitab are shown below. Construct and interpret a 90%
confidence interval for the difference in the mean DBH for longleaf
Theorem (CLT) with sample sizes greater than 30 or graph values pines in the northern and southern halves of the Wade Tract Preserve.
that are less than 30 to check normality.
Parameters:
µ1 = the true mean DBH of all trees in the northern half of the Name Test: Two-sample t interval for the difference µ1 – µ2
forest df = 30-1 = 29 OR df = 55.72 (calculator)
µ2 = the true mean DBH of all trees in the southern half of the
forest. Interval:
(-17.7238 to -3.93617)
Assess Conditions:
ü Random: Random samples of 30 trees each from the We are 90% confident that the
northern and southern halves of the forest. interval from -17.7238 to -3.93617
ü Normal: Reasonable to assume normality, since the samples centimeters captures the difference
sizes are each 30. in the actual mean DBH of the
southern trees and the actual mean
üIndependent Researchers took independent samples from the
DBH of the northern trees.
northern and southern halves of the forest.
ü10 % Condition: Since we are sampling without Conclude: This interval suggests that the mean diameter of the
replacement, there have to be at least 10(30) = 300 trees in southern trees is between 3.83 and 17.83 cm larger than the
each half of the forest. This is pretty safe to assume. mean diameter of the northern trees.
10
2017. 12. 10.
11
2017. 12. 10.
12
2017. 12. 10.
13
2017. 12. 10.
Therefore,
µx1 -x2 = µx1 - µx2 = µ1 - µ2 s x2 -x = s x2 + s x2
Formula Derivations
1 2 1 2
æs ö æs ö
2 2
= çç 1 ÷÷ + çç 2 ÷÷
è n1 ø è n 2 ø
s 12 s 12
= +
n1 n2
s 12 s 12
s x -x = +
1 2
n1 n2
Since we don't know the values of the parameters s1 and s2, we replace them
in the standard deviation formula with the sample standard deviations. The result
s12 s2 2
is the standard error of the statistic x1 - x 2 : +
n1 n 2
14
Chapter 11, Section 1
Chapter 11:
Inference for
Distributions of
Categorical Data
1
Chapter 11, Section 1
2
Chapter 11, Section 1
Calculating Expected Values c 2 = 10.180. Because all of the expected counts are at least 5, the c 2
statistic will follow a chi - square distribution with df = 6 (number of categories) - 1 = 5 reasonably
well when H 0 is true.
(Observed - Expected) 2 To find the P - value, use Table C
c =å
2
and look in the df = 5 row.
Expected
P
df .15 .10 .05
(9 -14.40) 2 (8 -12.00) 2 (12 - 9.60) 2 4 6.74 7.78 9.49
c =
2
+ +
14.40 12.00 9.60 5 8.12 9.24 11.07
The value c 2 = 10.180 falls between the critical values 9.24 and 11.07. The
corresponding areas in the right tail of the chi - square distributi on with df = 5
are 0.10 and 0.05.
c 2 = 2.025 + 1.333 + 0.600 + 5.186 + 0.621 + 0.415 Since our P-value is between 0.05 and 0.10, it is greater than α = 0.05.
Therefore, we fail to reject H0. We don’t have sufficient evidence to
= 10.180 conclude that the company’s claimed color distribution is incorrect.
3
Chapter 11, Section 1
4
Chapter 11, Section 1
5
Chapter 11, Section 1
6
Chapter 11, Section 1
Name Test, (Calculate) Test Statistic & Make a Decision & State Conclusion
Obtain P-value Because the P-value, 0.0392, is less than α = 0.05, we will
reject H0. We have convincing evidence that the biologists’
Name: Chi-square goodness-of-fit test. hypothesized distribution for the color of tobacco plant
offspring is incorrect.
(Observed - Expected) 2
c2 = å
Expected
= 6.476
7
Chapter 11, Section 1
8
Chapter 11, Section 1
9
Chapter 11, Section 1
To find the expected counts, we start by assuming that H0 is true. We can see
from the two-way table that 99 of the 243 bottles of wine bought during the
The overall proportion of Italian wine bought during the study was 31/243 =
study were French wines.
0.128. So the expected counts of Italian wine bought under each treatment
Ifare:
the specific type of music that’s playing has no effect on wine purchases,
the proportion31 of French wine sold under 31 each music condition should 31 be
No music : × 84 = 10.72 French music : × 75 = 9.57 Italian music : × 84 = 10.72
99/243 = 0.407.
243 243 243
The overall proportion of Other wine bought during the study was 113/243 =
0.465. So the expected counts of Other wine bought under each treatment
are:
113 113 113
No music : × 84 = 39.06 French music : × 75 = 34.88 Italian music : × 84 = 39.06
243 243 243
10
Chapter 11, Section 1
11
Chapter 11, Section 1
12
Chapter 11, Section 1
13
Chapter 11, Section 1
Name Test & (Calculate) Test Statistic Obtain p-value, Make a Decision & State
Since the conditions are satisfied, we can a perform chi-
Conclusion
test for homogeneity.
Test statistic :
(Observed - Expected) 2
c2 = å
Expected
USE
CALCULATOR!!! Because the P-value, 0.20, is greater than α = 0.05, we fail to
reject H0. There is not enough evidence to conclude that the
= 3.22
distribution of party affiliation differs in the cell-only and
landline user populations.
14
Chapter 11, Section 1
15
Chapter 11, Section 1
More About the Chi-Square Test for The Chi-Square Test for
Association/Independence Association/Independence
Hypothesis:
We often gather data from a random sample and arrange
H0: There is no association between two categorical
them in a two-way table to see if two categorical variables
are associated. The sample data are easy to investigate: variables in the population of interest.
turn them into percents and look for a relationship Ha: There is an association between two categorical
between the variables. variables in the population of interest.
The Chi-Square Test for Association/Independence Background: Angry People and Heart
A study followed a random sample of 8474 people with normal Disease
blood pressure for about four years. All the individuals were
free of heart disease at the beginning of the study. Each person
took the Spielberger Trait Anger Scale test, which measures
how prone a person is to sudden anger. Researchers also
recorded whether each individual developed coronary heart
disease (CHD). This includes people who had heart attacks and
those who needed medical treatment for heart disease. Here is
a two-way table that summarizes the data:
16
Chapter 11, Section 1
Name Test & (Calculate) Test Statistic Obtain P-value, Make a Decision and
Name: Chi-test for association/independence. State Conclusion
P-Value:
0.00032.
17
Chapter 11, Section 1
18
Chapter 12
Section 12.1
12.1: Inference for Linear Inference for Linear Regression
Regression ü CHECK conditions for performing inference about the
slope β of the population regression line
ü CONSTRUCT and INTERPRET a confidence interval for the
slope β of the population regression line
ü PERFORM a significance test about the slope β of a
population regression line
ü INTERPRET computer output from a least-squares
regression analysis
1
Chapter 12
“Standard Error”
s : The typical error when using the regression line Theory: Inference for Linear Regression
to predict calorie consumption is about 23.4
calories. The least-squares regression line for this population of data has
been added to the graph. It has slope 10.36 and y-intercept
33.97. We call this the population regression line (or true
r2: Approximately 42.1% of the variation in calorie regression line) because it uses all the observations that month.
consumption can be explained by the linear
Suppose we take an SRS of 20
relationship with the time spent at the table. eruptions from the population and
calculate the least - squares
regression line yˆ = a + bx for the
Standard Error of Slope: If samples like this were sample data. How does the slope
observed many times, the typical distance that of the sample regression line
(also called the estimated
the estimated slope would differ from the regression line) relate to the slope
population slope by an average of 0.8498. of the population regression line?
2
Chapter 12
3
Chapter 12
4
Chapter 12
5
Chapter 12
å (y - yˆ)
2
Remember “S” is
standard error and s = = å residuals 2
formula is : (n - 2) n -2
6
Chapter 12
Helicopter Experiment
Constructing a Confidence Interval for the
Construct and interpret a 95% confidence interval for the
Slope slope of the population regression line.
Because we use the statistic b as our estimate, the
confidence interval is
b ± t* SEb
We call this a t interval for the slope.
7
Chapter 12
Does Fidgeting Keep you Slim? Does Fidgeting Keep you Slim?
Assess Conditions:
• Linear The scatterplot shows a clear linear pattern. Also, the residual plot
shows a random scatter of points about the “residual = 0” line.
Parameter:
β = slope of true regression
line relating fat gain and NEA.
Does Fidgeting Keep you Slim? Performing a Significance Test for the Slope
Name the Interval: T-Interval for β We can also perform a significance test to determine whether
a specified value of β is plausible. The null hypothesis has the
We use the t distribution with 16 - 2 = 14 degrees of freedom to find general form H0: β = hypothesized value. To do a test,
the critical value. For a 90% confidence level, the critical value is t* = standardize b to get the test statistic:
1.761. So the 90% confidence interval for β is
statistic - parameter
b ± t* SEb = −0.0034415 ± 1.761(0.0007414) test statistic =
standard deviation of statistic
= −0.0034415 ± 0.0013056
= (−0.004747,−0.002136)
b - b0
t=
SEb
Interval: We are 90% confident that the interval from -0.004747 to To find the P-value, use a t distribution with n - 2 degrees of
-0.002136 kg captures the actual slope of the population regression freedom. Here are the details for the t test for the slope.
line relating NEA change to fat gain for healthy young adults.
8
Chapter 12
Crying and IQ
Infants who cry easily may be more easily stimulated than others. This may be
a sign of higher IQ. Child development researchers explored the relationship
between the crying of infants 4 to 10 days old and their later IQ test scores. A
snap of a rubber band on the sole of the foot caused the infants to cry. The
researchers recorded the crying and measured its intensity by the number of
peaks in the most active 20 seconds. They later measured the children’s IQ at
age three years using the Stanford-Binet IQ test. A scatterplot and Minitab
output for the data from a random sample of 38 infants is below.
9
Chapter 12
Name Test, (Calculate) Test Statistic & Obtain Make Decision & State Conclusion
P-value
T-Test for the slope of B The P-value, 0.002, is less than our α = 0.05 significance level, so
b - b0 1.4929 - 0
we have enough evidence to reject H0 and conclude that there is
t= = = 3.07 a positive linear relationship between intensity of crying and IQ
SE b 0.4870
score in the population of infants.
10
Chapter 12
s 6.159
Spread : s b = = = 1.30
sx n -1 1.083 20 -1
The value of σ determines In practice, we don’t know σ for the population regression line. So we
whether the points fall close
to the population regression
estimate it with the standard deviation of the residuals, s. Then we estimate
line (small σ) or are widely the spread of the sampling distribution of b with the standard error of the
scattered (large σ). slope: s
SE b =
sx n -1
11
Mrs. Daniel- AP Stats
1.1 WS
1. What variables are measured? Identify each as categorical or quantitative. In what units were the quantitative
variables measured?
Marital
State Number of Family Members Age Gender Total Income Travel time to work
Status
Kentucky 2 61 Female Married 21000 20
Florida 6 27 Female Married 21300 20
Wisconsin 2 27 Male Married 30000 5
California 4 33 Female Married 26000 10
Michigan 3 49 Female Married 15100 25
Virginia 3 26 Female Married 25000 15
Pennsylvania 4 44 Male Married 43000 10
Virginia 4 22 Male Never married/ single 3000 0
California 1 30 Male Never married/ single 40000 15
New York 4 34 Female Separated 30000 40
2. A sample of 200 children from the United Kingdom ages 9-17 was selected from the CensusAtSchool website
(www.censusatschool.com). The gender of each student was recorded along with which super power they would most
like to have: invisibility, super strength, telepathy (ability to read minds), ability to fly, or ability to freeze time. Here are
the results:
c. What proportion of children that want the power of telepathy are male?
d. What proportion of children that want the power of fly are female?
Female Male Total
Invisibility 17 13 30
Super Strength 3 17 20
Telepathy 39 5 44
Fly 36 18 54
Freeze Time 20 32 52
Total 115 85 200
3. Create a well labeled segmented bar graph of the marginal distributions of power preference and gender.
Be sure to include a key.
Female:
Males:
Key:
4. Based on the graphs above, can we conclude that boys and girls differ in their preference of superpower? Give
appropriate evidence to support your answer.
Mrs. Daniel- AP Stats
1.2 WS
1. Describe the shape, center, and spread of the distribution. Are there any (potential) outliers?
bottom
top
56 70 84 98 112 126 140
EnergyCost
2. Compare the distributions of annual energy costs for these two types of refrigerators.
Who’s Taller?
Which gender is taller? A sample of 14-year-olds from the United Kingdom was randomly selected using the
CensusAtSchool website.
Male: 154, 157, 187, 163, 167, 159, 169, 162, 176, 177, 151, 175, 174, 165, 165, 183, 180
Female: 160, 169, 152, 167, 164, 163, 160, 163, 169, 157, 158, 153, 161, 165, 165, 159, 168, 153, 166, 158, 158,
166
Female Male
332 15 14
98887 15 79 Key: 15|1 represents a
433100 16 23 student who is 151 cm tall.
99876655 16 5579
17 4
17 567
18 03
18 7
Mrs. Daniel- AP Stats
2.1 WS
1. The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2009.
5 9
6 2455 Key: 5|9 represents a
7 00455589 team with 59 wins.
8 0345667778
9 123557
10 3
(a) The Colorado Rockies, who won 92 games. (b) The New York Yankees, who won 103 games.
(c) The Kansas City Royals and Cleveland Indians, who both won 65 games.
2. Here is a table showing the distribution of median household incomes for the 50 states and the District of Columbia.
Calculate the relative frequency and cumulative relative frequency.
Median Cumulative
Relative Cumulative
Income Frequency Relative
Frequency Frequency
($1000s) Frequency
35 to < 40 1 1
40 to < 45 10 11
45 to < 50 14 25
50 to < 55 12 37
55 to < 60 5 42
60 to < 65 6 48
65 to < 70 3 51
3. Use the cumulative relative frequency graph for the state income data to answer each question.
4. Miami-Dade County Public Schools employs teachers at salaries between $40,000 and $71,000. The teachers’ union
and the school board are negotiating the form of next year’s increase in the salary schedule.
(a) If every teacher is given a flat $1000 raise, what will this do to the mean salary? To the median salary? Explain your
answers.
(b) What would a flat $1000 raise do to the extremes and quartiles of the salary distribution? To the standard deviation
of teachers’ salaries? Explain your answers.
Mrs. Daniel- AP Stats
2.2 WS
1. In 2009, the mean number of wins was 81 with a standard deviation of 11.4 wins.
Batting Averages
2. In the previous alternate example about batting averages for Major League Baseball players in 2009, the mean of the
432 batting averages was 0.261 with a standard deviation of 0.034. Suppose that the distribution is exactly Normal with
= 0.261 and = 0.034.
(a) Sketch a Normal density curve for this distribution of batting averages. Label the points that are 1, 2, and 3 standard
deviations from the mean.
(b) What percent of the batting averages are above 0.329? Show your work.
(c) What percent of the batting averages are between 0.227 and .295? Show your work.
3. According to the CDC, the heights of 3 year old females are approximately Normally distributed with a mean of 94.5
cm and a standard deviation of 4 cm. Be sure to draw curves for each calculation!
(c) If a 3 year old female was 91.7 cm tall, what percentile would she be in?
(d) If a mother knew her daughter was at the 91st percentile in height, how tall is her daughter?
(e) If a 3 year old female was 96.4 cm tall, what percentile would she be in?
4. Scores on the Wechsler Adult Intelligence Scale (a standard IQ test) for the 20 to 34 age group are approximately
Normally distributed with μ = 110 and σ = 25. For each part, follow the four-step process.
(b) What percent of people aged 20 to 34 have IQs between 125 and 150?
(c) MENSA is an elite organization that admits as members people who score in the top 2% on IQ tests. What score on
the Wechsler Adult Intelligence Scale would an individual have to earn to qualify for MENSA membership?
Mrs. Daniel- AP Stats
3.1 WS
1. The table below shows data for 13 students in a statistics class. Each member of the class ran a 40-yard sprint and
then did a long jump (with a running start).
Sprint Time (s) 5.41 5.05 9.49 8.09 7.01 7.17 6.83 6.73 8.01 5.68 5.78 6.31 6.04
Long Jump Distance (in) 171 184 48 151 90 65 94 78 71 130 173 143 141
A. How would r change if all the men were 6 inches shorter than the heights given in the table? Does the correlation tell
us if women tend to date men taller than themselves?
B. If heights were measured in centimeters rather than inches, how would the correlation change? (There are 2.54
centimeters in an inch.)
Rank the correlations between these pairs of variables from highest to lowest. Explain your reasoning.
Mrs. Daniel- AP Stats
3.2 WS
1. The following data shows the number of miles driven and advertised price for 11 used Honda CR-Vs from the 2005-
2009 model years (prices found at www.carmax.com). The scatterplot below shows a strong, negative linear association
between number of miles and advertised cost. The line on the plot is the regression line for predicting advertised price
based on number of miles.
Thousand Cost
Miles Driven (dollars)
22 17998
29 16450
35 14998
39 13998
45 14599
49 14988
55 13599
56 14599
69 11998
70 14450
86 10998
A. Calculate the correlation. What does this value mean in plain English? What is the relative strength of the association?
B. What is the least squares regression equation for this association? Define any variables used.
C. Determine the y-intercept of the regression equation and interpret the value in context. Does the value have any real-
world implications?
D. Determine the slope of the regression equation and interpret the value in context. Does the value have any real-
world implications?
2. For a project, two AP Statistics students decided to investigate the effect of sugar on the life of cut flowers. They
went to the local grocery store and randomly selected 12 carnations. All the carnations seemed equally healthy when
they were selected. When they got home, the students prepared 12 identical vases with exactly the same amount of
water in each vase. They put 1 tablespoon of sugar in three vases, 2 tablespoons of sugar in three vases, and 3
tablespoons of sugar in three vases. In the remaining 3 vases, they put no sugar. After the vases were prepared and
placed in the same location, the students randomly assigned one flower to each vase and observed how many hours
each flower continued to look fresh. A scatterplot, residual plot, and computer output from the regression are shown.
Only 10 points appear on the scatterplot and residual plot since there were two observations at (1, 204) and two
observations at (2, 210).
A. What is the equation of the least-squares line? Be sure to define any variables you use.
B. Is a line an appropriate model for these data? Justify your answer. (You need at least two sentences.)
1. In May 2015, the Los Angeles City Council voted to ban most travel and contracts with the state of Arizona to protest
Arizona’s new immigration enforcement law. The Los Angeles Times conducted an online poll that asked if the City
Council was right to pass a boycott of Arizona. The results showed that 96% of the 41,068 people in the sample said
“No.” Does this result represent the opinions of all Los Angeles residents? Explain.
2. The manager of a beach-front hotel wants to survey guests in the hotel to estimate overall customer satisfaction. The
hotel has two towers, an older one to the south and a newer one to the north. Each tower has 10 floors of standard
rooms (40 rooms per floor) and 2 floors of suites (20 suites per floor). Half of the rooms in each tower face the beach,
while the other half of the rooms face the street. This means there are (2 towers)(10 floors)(40 rooms) + (2 towers)(2
floors)(20 suites) = 880 total rooms.
C. Explain why selecting 2 of the 24 different floors would not be a good way to obtain a cluster sample.
3. Which of the following are sources of sampling error and which are sources of nonsampling error? Explain your
answers.
(c) Data are gathered by asking people to mail in a coupon printed in a newspaper.
Mrs. Daniel- AP Stats
4.2 WS
1. A study published in the New England Journal of Medicine compared two medicines to treat head lice: an oral
medication called ivermectin and a topical lotion containing malathion. Researchers studied 812 people in 376
households in seven areas around the world. Of the 185 randomly assigned to ivermectin, 171 were free from head lice
after two weeks compared to only 151 of the 191 households randomly assigned to malathion.
Identify the experimental units, explanatory and response variables, and the treatments in this experiment.
2. Does adding fertilizer affect the productivity of tomato plants? How about the amount of water given to the plants?
To answer these questions, a gardener plants 24 similar tomato plants in identical pots in his greenhouse. He will add
fertilizer to the soil in half of the pots. Also, he will water 8 of the plants with 0.5 gallons of water per day, 8 of the
plants with 1 gallon of water per day and the remaining 8 plants with 1.5 gallons of water per day. At the end of three
months he will record the total weight of tomatoes produced on each plant.
Identify the explanatory and response variables, experimental units, and list all the treatments.
3. Suppose you have a class of 30 students who volunteer to be subjects in an experiment involving caffeine. Explain
how you would randomly assign 15 students to each of the two treatments.
4. A cell phone company is considering two different keyboard designs (A and B) for its new line of cell phones.
Researchers would like to conduct an experiment using subjects who are frequent texters and subjects who are not
frequent texters. The subjects will be asked to text several different messages in 5 minutes. The response variable will
be the number of correctly typed words.
A. Explain why a randomized block design might be preferable to a completely randomized design for this experiment.
B. Outline a randomized block experiment using 100 frequent texters and 200 novice testers.
Mrs. Daniel- AP Stats
5.2 WS
Languages in Canada
Canada has two official languages, English and French. Choose a Canadian at random and ask, “What is your mother
tongue?” Here is the distribution of responses, combining many separate languages from the broad Asia/Pacific region:
(b) What is the probability that a Canadian’s mother tongue is not English?
(c) What is the probability that a Canadian’s mother tongue is a language other than English or French?
U.S. Senators
The two-way table below describes the members of the U.S Senate in a recent year.
(a) Who are the individuals? What variables are being measured?
(b) If we select a U.S. senator at random, what’s the probability that we choose
According to the National Center for Health Statistics, in December 2008, 78% of US households had a traditional
landline telephone, 80% of households had cell phones, and 60% had both. Suppose we randomly selected a household
in December 2008.
(a) Construct a Venn diagram to represent the outcomes of this chance process.
(b) Find the probability that the household has at least one of the two types of phones.
(c) Find the probability the household has a cell phone only.
Mrs. Daniel- AP Stats
5.3 WS #1
1. A poker player holds a flush when all 5 cards in the hand belong to the same suit. Remember that a deck contains 52
cards, 4 suits (hearts, diamonds, spades and clubs) with 13 cards of each suit. When the deck is well shuffled, each card
dealt is equally likely to be any of those that remain in the deck.
(a) We will concentrate on spades. What is the probability that the first card dealt is a spade? What is the conditional
probability that the second card is a spade given that the first is a spade?
(b) Continue to count the remaining cards to find the conditional probabilities of a spade on the third, the fourth, and
the fifth card given in each case that all previous cards are spades.
(c) The probability of being dealt 5 spades is the product of the five probabilities you have found. Why? What is this
probability?
(d) The probability of being dealt 5 hearts or 5 diamonds or 5 clubs is the same as the probability of being dealt 5 spades.
What is the probability of being dealt a flush?
2. Internet sites often vanish or move, so that references to them can’t be followed. In fact, 13% of Internet sites
referenced in major scientific journals are lost within two years after publication. If a paper contains seven Internet
references, what is the probability that at least one of them doesn’t work two years later? Show your work!
3. A recent census at Emory University revealed that 40% of its students mainly used Macintosh computers (Macs). The
rest mainly used PCs. At the time of the census, 67% of the school’s students were undergraduates. The rest were
graduate students. In the census, 23% of the respondents were graduate students who said that they used PCs as their
primary computers. Suppose we select a student at random from among those who were part of the census and learn
that the student mainly uses a Mac. Find the probability that this person is a graduate student. Show your work.
4. Many employers require prospective employees to take a drug test. A positive result on this test indicates that the
prospective employee uses illegal drugs. However, not all people who test positive actually use drugs. Suppose that 4%
of prospective employees use drugs, the false positive rate is 5% and the false negative rate is 10%. What percent of
people who test positive actually use illegal drugs?
Mrs. Daniel- AP Stats
5.3 WS #2
a. Draw a tree diagram or chart that shows the sample space for this chance process.
b. Find the probability that both students selected suffer from allergies.
c. Find the probability that neither student selected suffers from allergies.
d. Find the probability that at least one student selected suffers from allergies.
e. Find the probability that only one student selected suffers from allergies.
Media Usage and Good Grades
In January 2010, the Kaiser Family Foundation released a study about the influence of media in the lives of young people
ages 8-18 (http://www.kff.org/entmedia/mh012010pkg.cfm). In the study, 17% of the youth were classified as light
media users, 62% were classified as moderate media users and 21% were classified as heavy media users. Of the light
users who responded, 74% described their grades as good (A’s and B’s), while only 68% of the moderate users and 52%
of the heavy users described their grades as good.
a. According to this study, what percent of young people ages 8-18 described their grades as good? Use a tree
diagram or chart to calculate the probability.
b. According to the tree diagram you constructed above, what percent of students with good grades are heavy
users of media?
NHL Goals
In 2010, there were 1319 games played in the National Hockey League’s regular season. Imagine selecting one of these
games at random and then randomly selecting one of the two teams that played in the game. Define the random
variable X = number of goals scored by a randomly selected team in a randomly selected game. The table below gives
the probability distribution of X:
Goals: 0 1 2 3 4 5 6 7 8 9
Probability: 0.061 0.154 0.228 0.229 0.173 0.094 0.041 0.015 0.004 0.001
(b) Make a histogram of the probability distribution. Describe what you see.
(c) What is the probability that the number of goals scored by a randomly selected team in a randomly selected game is
at least 6?
NHL Goals, continued.
Previously, we defined the random variable X to be the number of goals scored by a randomly selected team in a
randomly selected game. The table below gives the probability distribution of X:
Goals: 0 1 2 3 4 5 6 7 8 9
Probability: 0.061 0.154 0.228 0.229 0.173 0.094 0.041 0.015 0.004 0.001
(d): Compute the mean of the random variable X and interpret this value in context.
(e): Compute and interpret the standard deviation of the random variable X.
2. The weights of three-year-old females closely follow a Normal distribution with a mean of = 30.7 pounds and a
standard deviation of 3.6 pounds. Randomly choose one three-year-old female and call her weight X.
A. Find the probability that the randomly selected three-year-old female weighs at least 30 pounds.
B. Should the pediatrician be concerned if a 3 year old girl only weighs 25 pounds?
Mrs. Daniel- AP Stats
6.2 WS
Study Habits
The academic motivation and study habits of female students as a group are better than those of males. The Survey of
Study Habits and Attitudes (SSHA) is a psychological test that measures these factors. The distribution of SSHA scores
among the women at a college has mean 120 and standard deviation 28, and the distribution of scores among male
students has mean 105 and standard deviation 35. You select a single male student and a single female student at
random and give them the SSHA test.
(a) Explain why it is reasonable to assume that the scores of the two students are independent.
(b) What are the expected value and standard deviation of the difference (female minus male) between their scores?
(c) From the information given, can you find the probability that the woman chosen scores higher than the man? If so,
find this probability. If not, explain why you cannot.
Swim Team
Alonzo & Tracy Mourning Sr. High School has one of the best women’s swimming team in the region. The 400-meter
freestyle relay team is undefeated this year. In the 400-meter freestyle relay, each swimmer swims 100 meters. The
times, in seconds, for the four swimmers this season are approximately Normally distributed with means and standard
deviations as shown. Assuming that the swimmer’s individual times are independent, find the probability that the total
team time in the 400-meter freestyle relay is less than 220 seconds. Show your work!
Mrs. Daniel- AP Stats
6.3 WS #2: Binomial Distributions Practice
For each problem, be sure that the situation fits the criteria for binomial distributions. If so, answer the questions (show
the formula) and then find the mean and standard deviation of the distribution.
1) 80% of the graduates of Northeast High who apply to Penn State University are admitted. Last year, there were 6
graduates from Northeast who applied to Penn State. What is the probability that
2) Tires from the Apex Tire Corp. are traditionally 5% defective. A truck carries 10 tires, 8 in use and 2 spares. If 10 tires
are chosen from Apex, what is the probability that not more than two defective tires are chosen.
3) Studies indicate that in 70% of the families of Blue Bell, both the husband and wife work. If 7 families are randomly
selected from Blue Bell, what is the probability that
5) According to FBI statistics, only 52% of all rape cases are reported to the police. If 10 rape cases are randomly
selected, what is the probability that at least one is reported to the police?
1
6) In a school, typically only 10 of the student body returns surveys. 20 students are chosen randomly to receive a
survey. What is the probability that
a) they get no surveys back. b) they get more 4 or more surveys back.
7) The probability that a driver making a gas purchase will pay by credit card is 0.60. If 50 cars pull up to the station to
buy gas, what is the probability that at least half of the drivers will pay by credit card?
1. Patients receiving artificial knees often experience pain after surgery. The pain is measured on a subjective scale with
possible values of 1 to 5. Assume that X is a random variable representing the pain score for a randomly elected patient.
The following table gives part of the probability distribution for X.
(b) Find the probability that the pain score is less than 3.
(c) Find the probability that the pain score is greater than 3.
2. Amarillo Slim, a professional dart player, has an 80% chance of hitting the bull’s-eye on a dartboard with any throw.
Suppose that he throws 10 darts, one at a time, at the dartboard.
(a) Find the probability that Slim hits the bull’s-eye exactly six times.
(b) Find the probability that he hits the bull’s-eye at least four times.
(e) Find the probability that it takes Amarillo more than 2 throws to hit the bullseye.
3. Harlan comes to class one day, totally unprepared for a pop quiz consisting of ten multiple-choice questions. Each
question has five answer choices, and Harlan answers each question randomly.
(a) Find the probability that Harlan’s gets more than 5 questions right out of 10.
(b) Find the probability that Harlan’s first correct answer occurs after the fourth question.
(c) Find the expected number of questions required for Harlan to get his first correct answer.
(d) Find the probability that Harlan guesses more answers correctly than would be expected by chance.
Mrs. Daniel- AP Stats
WS 7.1
Tall Girls
According to the National Center for Health Statistics, the distribution of heights for 16-year-old females is modeled well
by a Normal density curve with mean μ = 64 inches and standard deviation σ = 2.5 inches. To see if this distribution
applies at their high school, an AP Statistics class takes an SRS of 20 of the 300 16-year-old females at the school and
measures their heights. What values of the sample mean X would be consistent with the population distribution being
N(64, 2.5)? To find out, we used Fathom software to simulate choosing 250 SRSs of size n = 20 students from a
population that is N(64, 2.5). The figure below is a dotplot of the sample mean height X of the students in the sample.
(c) Suppose that the average height of the 20 girls in the class’s actual sample is X = 64.7. What would you conclude
about the population mean height μ for the 16-year-old females at the school? Explain.
Choosing Cards
We used Fathom (statistical computer software) to simulate choosing 500 SRSs of size 5 from the deck of cards
described in the Alternate Activity on the previous page. The graph below shows the distribution of the sample median
for these 500 samples.
(a) Is this the sampling distribution of the sample median? Justify your answer.
(b) Suppose that another student prepared a different deck of cards and claimed that it was exactly the same as the one
used in the activity. However, when you took an SRS of size 5, the median was 4. Does this provide convincing evidence
that the student’s deck is different?
Mrs. Daniel- AP Stats
7.2 WS
1. Suppose a VERY large candy machine has 15% orange candies. Imagine taking an SRS of 25 candies from the machine
and observing the sample proportion 𝑝̂ ? of orange candies.
(d) Is the sampling distribution of 𝑝̂ approximately Normal? Check to see if the Normal condition is met.
(e) If the sample size were 75 rather than 25, how would this change the sampling distribution of 𝑝̂ ? How would this
impact the Normal condition?
Planning for College
2. The superintendent of Miami-Dade County Public Schools wants to know what proportion of middle school students
in his district are planning to attend a four-year college or university. Suppose that 80% of all middle school students in
his district are planning to attend a four-year college or university. What is the probability that a SRS of size 125 will give
a result within 7 percentage points of the true value?
3. Harley-Davidson motorcycles make up 14% of all the motorcycles registered in the United States. You plan to
interview an SRS of 500 motorcycle owners. How likely is your sample to contain 20% or more who own Harleys?
Mrs. Daniel- AP Stats
7.3 WS
1. At the P. Nutty Peanut Company, dry roasted, shelled peanuts are placed in jars by a machine. The distribution of
weights in the bottles is approximately Normal, with a mean of 16.1 ounces and a standard deviation of 0.15 ounces.
(a) Without doing any calculations, explain which outcome is more likely, randomly selecting a single jar and finding the
contents to weigh less than 16 ounces or randomly selecting 10 jars and finding the average contents to weigh less than
16 ounces.
(b) Find the probability of each event described above. Since the distribution is normal you can use “normalcdf” on your
calculator.
2. Suppose that the number of texts sent during a typical day by a randomly selected high school student follows a right-
skewed distribution with a mean of 15 and a standard deviation of 35. Assuming that students at your school are typical
texters, how likely is it that a random sample of 50 students will have sent more than a total of 1000 texts in the last 24
hours?
Bad carpet
3. The number of flaws per square yard in a type of carpet material varies with mean 1.6 flaws per square yard and
standard deviation 1.2 flaws per square yard. The population distribution cannot be Normal, because a count takes only
whole-number values. An inspector studies 200 square yards of the material, records the number of flaws found in each
square yard, and calculates X, the mean number of flaws per square yard inspected. Find the probability that the mean
number of flaws exceeds 2 per square yard.
Mrs. Daniel- AP Stats
8.1 WS
Losing Weight
A Gallup Poll in November 2014 found that 59% of the people in its sample said “Yes” when asked,
“Would you like to lose weight?” Gallup announced: “For results based on the total sample of national
adults, one can say with 95% confidence that the margin of (sampling) error is ±3 percentage points.”
The admissions director from Florida International University found that (107.8, 116.2) is a 95%
confidence interval for the mean IQ score of all freshmen. Comment on whether or not each of the
following explanations is correct.
(a) There is a 95% probability that the interval from 107.8 to 116.2 contains μ.
(b) There is a 95% chance that the interval (107.8, 116.2) contains X.
(c) This interval was constructed using a method that produces intervals that capture the true mean in
95% of all possible samples.
(d) 95% of all possible samples will contain the interval (107.8, 116.2).
(e) The probability that the interval (107.8, 116.2) captures μ is either 0 or 1, but we don’t know which.
(a) How does the shape of the confidence interval change if the confidence level increases from 90% to
95%?
(b) How would the shape of a confidence change if the sample size was decreased? Assume the new,
smaller sample size still meets all of the normality conditions.
Mrs. Daniel- AP Stats
8.2 WS
According to an article in the San Gabriel Valley Tribune, “Most people are kissing the ‘right way’.” That
is, according to the study, the majority of couples tilt their heads to the right when kissing. In the study,
a researcher observed a random sample 124 couples kissing in various public places and found that
83/124 (66.9%) of the couples tilted to the right. Construct and interpret a 95% confidence interval for
the proportion of all couples who tilt their heads to the right when kissing.
Tattoos
Suppose that you wanted to estimate the p = the true proportion of students at your school that have a
tattoo with 95% confidence and a margin of error of no more than 0.10. What’s the minimum number
of students you would need to survey?
Ms. Garcia wants to estimate how much time students spend on homework, on average, during a typical
week. She wants to estimate at the 90% confidence level with a margin of error of at most 30 minutes.
A pilot study indicated that the standard deviation of time spent on homework per week is about 154
minutes. What’s the minimum number of students you would need to survey?
Mrs. Daniel- AP Stats
8.3 WS
High school students who take the SAT Math exam a second time generally score higher than on their
first try. Past data suggest that the score increase has a standard deviation of about 50 points. How large
a sample of high school students would be needed to estimate the mean change in SAT score to within 2
points with 95% confidence? Show your work.
A study of commuting times reports the travel times to work of a random sample of 20 employed adults
in New York State. The mean (x-bar) is 31.25 minutes, and the standard deviation is 21.88 minutes.
What is the standard error of the mean? Interpret this value in context
Vitamin C Content
Several years ago, the U.S. Agency for International Development provided 238,300 metric tons of corn-
soy blend (CSB) for emergency relief in countries throughout the world. CSB is a highly nutritious, low-
cost fortified food. As part of a study to evaluate appropriate vitamin C levels in this food,
measurements were taken on samples of CSB produced in a factory
The following data are the amounts of vitamin C, measured in milligrams per 100 grams (mg/100 g) of
blend, for a random sample of size 8 from one production run:
Construct and interpret a 95% confidence interval for the mean amount of vitamin C μ in the CSB from
this production run.
Mrs. Daniel- AP Stats
9.1 WS
Mike is an avid golfer who would like to improve his play. A friend suggests getting new clubs and lets Mike try out his 7-
iron. Based on years of experience, Mike has established that the mean distance that balls travel when hit with his old
7-iron is µ = 175 yards with a standard deviation of σ = 15 yards. He is hoping that this new club will make his shots
with a 7-iron more consistent (less variable), so he goes to the driving range and hits 50 shots with the new 7-iron
Based on 50 shots with the new 7-iron, the standard deviation was sx = 10.9 yards. A significance test using the sample
data produced a P-value of 0.002.
(d) Do the data provide convincing evidence against the null hypothesis? Explain.
2. A Gallup Poll report on a national survey of 1028 teenagers revealed that 72% of teens said they seldom or never
argue with their friends. Yvonne wonders whether this national result would be true in her large high school. So she
surveys a random sample of 150 students at her school
Hypothesis:
For Yvonne’s survey, 96 students in the sample said they rarely or never argue with friends. A significance test yields a P-
value of 0.0291.
(b) Do the data provide convincing evidence against the null hypothesis? Explain.
2. Hemoglobin is a protein in red blood cells that carries oxygen from the lungs to body tissues. People with less than 12
grams of hemoglobin per deciliter of blood (g/dl) are anemic. A public health official in Jordan suspects that Jordanian
children are at risk of anemia. He measures a random sample of 50 children.
Hypothesis:
For the study of Jordanian children, the sample mean hemoglobin level was 11.3 g/dl and the sample standard deviation
was 1.6 g/dl. A significance test yields a P-value of 0.0016.
(b) What conclusion would you make if α = 0.05? α = 0.01? Justify your answer
Mrs. Daniel- AP Stats
9.1 WS #2
The manager of a fast-food restaurant want to reduce the proportion of drive-through customers who have to wait
more than 2 minutes to receive their food once their order is placed. Based on store records, the proportion of
customers who had to wait at least 2 minutes was p = 0.63. To reduce this proportion, the manager assigns an
additional employee to assist with drive-through orders. During the next month the manager will collect a random
sample of drive-through times and test the following hypotheses:
H 0 : p = 0.63
H a : p < 0.63
where p = the true proportion of drive-through customers who have to wait more than 2 minutes after their order is
placed to receive their food.
Describe a Type I and a Type II error in this setting and explain the consequences of each.
Suppose that the manager decided to carry out this test using a random sample of 250 orders and a significance level of
= 0.10. What is the probability of a making a Type I error?
Mrs. Daniel- AP Stats
10.1 WS
Presidential approval
Many news organizations conduct polls asking adults in the United States if they approve of the job the president is
doing. How did President Obama’s approval rating change from August 2009 to September 2010? According to a CNN
poll of 1024 randomly selected U.S. adults on September 1-2, 2010, 50% approved of Obama’s job performance. A CNN
poll of 1010 randomly selected U.S. adults on August 28-30, 2009 showed that 53% approved of Obama’s job
performance. Use the results of these polls to construct and interpret a 90% confidence interval for the change in
Obama’s approval rating among all US adults. Based on your interval, is there convincing evidence that Obama’s job
approval rating has changed?
Hearing Loss
In a study of 3000 randomly selected teenagers in 1988-1994, 15% showed some hearing loss. In a similar study of 1800
teenagers in 2012-2013, 19.5% showed some hearing loss. Is there convincing evidence that the proportion of all teens
with hearing loss has increased? Additionally, between the two studies, Apple introduced the iPod/iPhone/Beats
headphones. If the results of the test are statistically significant, can we blame iPods/iPhones/Beats headphones for the
increased hearing loss in teenagers?
Mrs. Daniel- AP Stats
10.1 WS # 2
In an effort to reduce health care costs, General Motors sponsored a study to help employees stop smoking. In the
study, half of the subjects were randomly assigned to receive up to $750 for quitting smoking for a year while the other
half were simply encouraged to use traditional methods to stop smoking. None of the 878 volunteers knew that there
was a financial incentive when they signed up. At the end of one year, 15% of those in the financial rewards group had
quit smoking while only 5% in the traditional group had quit smoking. Do the results of this study give convincing
evidence that a financial incentive helps people quit smoking?
Mrs. Daniel- AP Stats
10.2 WS Solutions
Plastic Bags
Parameter:
T = the mean capacity of plastic bags from Target (in grams)
µp= the mean capacity of plastic bags from Publix (in grams).
Assess Conditions:
Random: The students selected a random sample of bags from each store.
Normal: Since the sample sizes are small, we must graph the data to see if it is reasonable to assume that the
population distributions are approximately Normal.
Publix
Interval:
By hand: For these data, xT = 12825.8, sT = 1912.5, xB = 9234, sB = 1474.2. Using the conservative df of 5 – 1 = 4, the
critical value for 99% confidence is t* = 4.604. Thus, the confidence interval is:
1474.22 1912.52
12826 9234 4.604 = 3592 4972 = (–1380, 8564).
5 5
We are 99% confident that the interval from –101 to 7285 grams captures the true difference in the mean capacity for
plastic grocery bags from Target and from Publix.
Conclude in Context: Since the interval includes 0, it is plausible that there is no difference in the two means. Thus, we
do not have convincing evidence that there is a difference in mean capacity.
The better picker-up?
Solution:
(a) The five-number summary for the Bounty paper towels is (103, 114, 116.5, 124, 128) and the five-number summary
for the generic paper towels is (77, 84, 88, 90, 103). Here are the boxplots:
Both distributions are roughly symmetric, but the generic brand has two high outliers. The center of the Bounty
distribution is much higher than the center of the generic distribution. Although the range of each distribution is
roughly the same, the interquartile range of the Bounty distribution is larger.
Since the centers are so far apart and there is almost no overlap in the two distributions, the Bounty mean is almost
certain to be significantly higher than the generic mean. If the means were really the same, it would be virtually
impossible to get so little overlap.
B.
Parameters:
B = the mean number of quarters a wet Bounty paper towel can hold
G = the mean number of quarters a wet generic paper towel can hold.
Hypothesis:
H 0 : µb = µ g
H a : µb > µg
Assess Conditions:
Random: The students used a random sample of paper towels from each brand.
Normal: Even though there were two outliers in the generic distribution, both distributions were reasonably
symmetric and the sample sizes are both at least 30, so it is safe to use t procedures.
Independent: The samples were selected independently and it is reasonable to assume there are more than
10(30) = 300 paper towels of each brand.
Test statistic: t
117.6 88.1 0 = 17.64
6.642 6.302
30 30
Obtain P-value: Using either the conservative df = 30 – 1 = 29 or from technology (df = 57.8), the P-value is
approximately 0.
Cheap Dice?
Mrs. Daniel purchased a bunch of inexpensive dice from Amazon. She is now wondering if the dice are fair. So, Mrs.
Daniel randomly selects 6 dice to roll 10 times each, for a total of 60 observations to evaluate. Do we have statistically
convincing evidence that the dice are fair?
Outcome Observed
1 13
2 11
3 6
4 12
5 10
6 8
Total 60
Landline surveys
According to the 2010 Census, of all US residents age 20 and older, 19.1% are in their 20’s, 21.5% are in their 30’s, 21.1
% are in their 40’s, 15.5% are in their 50’s, and 22.8% are 60 and older. The table below shows the age distribution for a
sample of US residents age 20 and older. Members of the sample were chosen by randomly dialing landline telephone
numbers.
Category Count
20-29 141
30-39 186
40-49 224
50-59 211
60+ 286
Total 1048
Do these data provide convincing evidence that the age distribution of people who answer landline telephone surveys is
not the same as the age distribution of all US residents?
Mrs. Daniel- AP Stats
11.2 WS
Ibuprofen or Acetaminophen?
In a study reported by the Annals of Emergency Medicine, researchers conducted a randomized, double-blind clinical
trial to compare the effects of ibuprofen and acetaminophen plus codeine as a pain reliever for children recovering from
arm fractures. There were many response variables recorded, including the presence of any adverse effect, such as
nausea, dizziness, and drowsiness. Here are the results:
Acetaminophen
Ibuprofen Total
plus Codeine
Adverse effects 36 57 93
No adverse effects 86 55 141
Total 122 112 234
Tide vs New Tide
Before bringing a new product to market, firms carry out extensive studies to learn how consumers react to the product
and how best to advertise its advantages. Here are data from a study of a new laundry detergent. The participants are a
random sample of people who don’t currently use the established brand that the new product will compete with. Give
subjects free samples of both detergents. After they have tried both for a while, ask which they prefer. The answers may
depend on other facts about how people do laundry.
Determine whether or not the sample provides convincing evidence that laundry practices and product preference are
independent in the population of interest.
Mrs. Daniel- AP Stats
12.1 WS #2
Here are data on the time (in minutes) Professor Moore takes to swim 2000 yards and his pulse rate (beats per minute)
after swimming on a random sample of 23 days:
(a) Calculate and interpret a 95% confidence interval for the slope β of the population regression line. All conditions
have been checked.
(b) Is there statistically significant evidence of a negative linear relationship between Professor Moore’s swim time and
his pulse rate in the population of days on which he swims 2000 yards? Carry out an appropriate significance test at the
α = 0.05 level. All conditions have been checked.