Podmanik PDF

Statistics for
Decision-
Making in
Business
1st Edition
Milos Podmanik
Foreword: What is This Book Good For?
You‟re probably thinking to yourself, “Who does this guy think he is by trying to write his own
book?”
The answer is both satisfying and deceiving to those who expect the traditional math course with
the traditional instructor. I write this course manual to most closely match my personal teaching
philosophy. What might that be? Well, I firmly believe that math education focuses too much on
processes, templates, and repetitive, mundane computational skills. Is this of any importance? To
some extent, yes, they are important. For the most part, however, students fail to make
connections from math to the real-world and vice versa. We tend to teach students how to “do”
and not how to “think.” As a result, I believe it is far more important to promote a deep level of
understanding, engagement, and connections to the planet we live on. After all, do you really
want to become a calculator? If your answer is “yes,” then this will come as a major
disappointment: a computer could calculate faster and more accurately than you decades ago!
Not to mention, computers will only continue to get faster and better than you at computing.
Here‟s the good news: computers don‟t understand why they‟re doing what they‟re doing! They
are simply computing machines. It takes (and most likely will always take) a rational, deep-
thinking human being to provide a contextual and meaningful analysis of the inputs and outputs
of a numerical process. And that, my friends, is what this book is all about.
Statistics for Decision-Making in Business © Milos Podmanik Page 2

A Note to Students
This book is far from perfect. In fact, it will never be perfect. There is, however, a lot of blood,
sweat, and tears put into this book (paper cuts hurt!). I spent much of my 2012 winter break
thinking, writing, and rewriting contents in this book to make it feel “right” for both you and me.
As such, I don‟t believe it‟s that much to ask for you to read the book.
What‟s my point?
… Read this book!

Table of Contents
Chapter Section Concept Page

1: Fundamentals of
Statistics
1.1 Data and Their Uses 5
1.2 Descriptive VS. Inferential Statistics 12
1.3 Statistics in Excel 21
2: Visual Representations
of Data
2.1 Visualizing Categorical Data 29
2.2 Visualizing Quantitative Data 43
2.3 Descriptive Statistics – Center and Position 56
2.4 Descriptive Statistics – Variability 67
3: Probability and
Decision-Making
3.1 The Idea of Probability 82
3.2 Joint Probability 89
3.3 Probability of Unions 99
3.4 Conditional Probability 107
3.5 Combinations and Permutations 119
3.6 Expected Value 135
4: Discrete Probability
Distributions
4.1 The Binomial Distribution 146
5: Continuous Probability
Distributions
5.1 The Ideas Behind the Continuous 158
Distribution
5.2 The Normal Distribution 172
6: Sampling Distributions
and Estimation
6.1 Sampling Distribution for ̅ 181

6.2 Confidence Interval for ̅ 191

6.3 Confidence Interval for ̂ 202
7: Hypothesis Testing
7.1 The Concept Behind Hypothesis Testing 208
Appendices
APPENDIX A: 220
Answers to Select Problems
Chapter 1
Fundamentals of Statistics
1.1 Data and Their Uses
Our lives are filled with information. While at one point we didn‟t have enough data in the
world, now we have so much of it that computers need to be revamped continually in order to
keep up with it. Facebook records rich information about hundreds of millions of users. Studies
are revealing new conclusions that allow us to make decisions about choosing the right type of
treatment for medical conditions. Scientific data is establishing the strong correlation between
humans‟ interaction with the planet and changes in climate. The power of data is limitless.
However, due to our regularly failing media expertise, the results of studies are often
miscommunicated because they are not understood. In order to fully extract the meaningfulness
of data, we must understand how to analyze them. We must be accurate and precise in what we
measure and how we measure it.
1.1.1 Three Good Reasons to Study Statistics
In no particular order, these are:

1. To be informed
2. To be able to make good decisions based on data and to understand current issues
3. To be able to evaluate decisions that affect the operations of a business and our personal
lives
1. To be informed
What does it mean to be informed? To be informed we should be able to understand and interpret
tables, charts, and graphs. We should be able to make sense of conclusions of other's research

based on their numerical results. Moreover, we should be able to have insight into the gathering,
summarization, and analysis of data, and so we should always approach numerical results with a
slight bit of doubt. In other words, we ideally want to adopt the attitude of "doubt until enough
evidence to trust." Let's take a look at some examples of where statistics have helped inform
society.
Examples:
- Does it matter how long children are bottle-fed? An experiment was run to determine
differences in iron deficiency and the length of time that a child is bottle-fed.
- In 2005, Medicare candidates faced a decision of which prescription medication plan to choose.
A program called PlanFinder was made available online to compare available options. But, are
senior citizens online?
- A study in 2005 attempted to answer the question, are students ruder today than in the past? A
survey was conducted.
- Is domestic violence common? A study in 2005 interviewed about 24,000 women to attempt to
answer this question.
- What factors are involved in student achievement in school? Is study-time the most important
factor in answering this question? A study concluded that things such as prioritizing student
achievement and encouraging teacher collaboration may have some impact.
- Do the accounts receivable reported by a business accurately reflect the true accounts
receivable? The IRS randomly audits businesses to try and answer this question.
- A stock’s share value change has fluctuated between -1.2% and 8.9% over the last year. What
predictions should an investor make about the stock over the coming year in order to decide
whether to purchase?
- CVS Pharmacy sells 5 lb. bags of 100% Pure Cane Granulated Sugar. As a quality control
measure, the company would like to know the amount of variability in the true weight of sugar
placed into each of the bags.
2. Making Good Decisions

How can we ever be sure that the results we're seeing or reading are truly the ones we should
believe? Although it is assumed that those who talk about data are supposed to understand
statistics, you'd be surprised how poor some of their conclusions are. We'll definitely see why by
the time this course is over. You'll learn how to summarize data how to analyze it, and, most
importantly, how not to make conclusions about it. The title "Making Good Decisions" should
not be new to you, hopefully.
3. Evaluating Decisions that Affect Our Lives

Are you satisfied that the Food and Drug Administration (FDA) has allowed a new patent for the
drug Zoloft, which is now also useful for Social Anxiety Disorder (in addition to depression), but

which has undergone no additional research to prove the claim? Do you know why you're paying
$720 for car insurance every six months, while you're roommate is paying only $450? If a
mammogram comes back positive for breast cancer, is there any chance that this is a false
positive? Should you be surprised that no ethnic applicants were hired to a company if three
applicants were to be selected, when 15 were Caucasian and 5 were Hispanic? Is there a reason
to suspect inequality? It should not surprise you that these can be answered with probability and
statistics.
1.1.2 Types of Data
In order to be able to reach the goals mentioned above, we need to have some sort of information
about which to make our decisions – we call this information data.
Data comes in two main categories: quantitative and qualitative/categorical.
Quantitative variables, as the title implies, deal with numerical quantities. For example, the
average revenue of a Whole Foods market store is considered a quantitative variable, since the
measurement is a number.
Qualitative variables, on the other hand, deal with qualities. For example, the type of television
that a customer is likely to purchase is considered a qualitative variable, since its value will be,
for instance, plasma, LED, LCD, etc.
1.1.3 Not All Quantitative Variables Are As They Appear!
Just because a variable is stated as a numerical value doesn‟t mean that it can be treated as a
numerical value. A variable must be classified according to its scale of measurement.
For instance, suppose you are to test three marketing tactics on customers. You call these tactics,
Tactics 1, 2, and 3, respectively. These tactics have numerical values, but the numbers do not
have any ordering significance. That is, tactic 1 is not necessarily better than tactic 3. These
numbers serve simply as names for the values of the variables and cannot be numerically
compared. We call this a variable of nominal scale.
Suppose that a business magazine reports the top three new businesses in the city each month.
That is, we have businesses 1, 2, and 3, where 1 is considered the best of the three, 2 the second
best, and 3 the third best. In this case, we can talk about 1 being better than 2 and 3 and 3 being
worse than 1 and 2. This type of variable has the properties of a nominal scaled variable, but also
has the property of order. We call this a variable of ordinal scale.
In another example, consider the variable IQ. Suppose two people have IQ‟s of 100 and 120.
Based on this information, we can say that the person with 120 has a higher IQ. However, we
can also say that the second person has an IQ that is 20 points higher than the first person. We
couldn‟t really say this for the example above. In addition to being nominal (a person can be
identified by their value) and ordinal (can rank the scores), we can also talk about the differences
in scores. This type of variable is of interval scale.

The most powerful type of variable is one that contains all of the above properties, but whose
ratio between two values is meaningful and whose value of zero means a complete absence of
the characteristic. While IQ is of an interval scale, it does not make much sense to say that the
person with the 120 IQ is 20% . / smarter than the person with the 100 IQ.
Certainly we cannot say that a person with 0 IQ has no intelligence at all (this person is probably
not even alive!). Consider, however, the median salary of different types of employees. One
employee makes $100,000 and another makes $120,000. We can definitely say that the second
person makes 20% more than the first person, and we can also say that a values of $0 would
indicate a person makes no money at all (total absence of that variable). This variable is of ratio
scale.
1.1.4 How We Obtain Data

The first question we have after knowing a bit about data is, how do we get it?
Existing Data
In some instances, this data already exists and is available to the researcher. For instance, one
can easily go online and find existing data on the U.S. public. We can view things like the
average credit card debt per person by state, pounds of grains produced in the United States since
1950, etc. This data is usually available through a number of websites, such as:
 U.S. Statistical Abstract (U.S. Census) - http://www.census.gov/compendia/statab/

 Federal Reserve Board – http://www.federalreserve.org
 Office of Management and Budget – http://www.whitehouse.gov/omb
 Department of Commerce – http://www.doc.gov
 Bureau of Labor Statistics – http://www.bls.gov
 FedStats - http://www.fedstats.gov/
There are literally thousands of other repositories for existing data. Sometimes a little bit of
research unveils a plethora of results.
If a company is doing a study of its clients, it may already have a myriad of existing internal
data.
Conducting a Study to Obtain Data

We hear a lot of things coming from our failing media sources. Data is blindly reported, while
the method of data collection is ignored. Why do you think there are so many conflicting
conclusions reached? One week coffee is linked to cancer, while the next it fights cancer. Which
is it?
Many times, observational studies are conducted. There is no experimenter manipulation in this
type of study. For example, a zoologist might study elephant eating patterns in various climates
to determine whether climate has an effect on caloric intake (response variable – what is
measured). He probably cannot manipulate the climate (predictor variable – serves to predict
responses) in which the elephant lives (for many reasons, not the least of which is the difficulty

of transporting such an animal. Not to mention, there are startling ethical concerns with such an
action!). He probably cannot dictate how much food is in the environment, either. Certainly, he
can get an accurate reading of the elephant‟s food intake by following the animal for several
days. At the end of the day, the zoologist is merely observing what happens. His conclusions are
limited.
An experiment, on the other hand, is a type of study in which the experimenter is able to control
and manipulate most, if not all, environmental factors. If the experimenter is studying the effects
of caffeine on math test scores, for instance, he would have a control group of, perhaps, students
who he gives no coffee to and another, experimental group, to which he gives coffee with 60
mg of caffeine. He then measures each group on test score performance (% of total correct):
Suppose the experimental group does poorly compared to the control group. Can we be sure that
it was due to the caffeine? As long as test conditions were the same in each group, yes. If,
however, there was something different between the two groups in addition to the
presence/absence of caffeine, then the results are not so clear. What if, for instance, they played
music with the control group and none with the control group? How do we know better
performance in the control group wasn‟t an effect of soothing music calming the nerves? It could
even have been a combination of no caffeine and music.
Punchline: In an experiment, we manipulate one factor and hold all other conditions constant.
Most of the time it is desirable to run an experiment. The number one reason for this is that we
can usually collect evidence that leads to a cause-and-effect relationship, assuming the
experiment is conducted properly. In an observational study it is impossible to do this as there
are many confounding variables, or variables that might be related to the explanatory and
response variable. Consider this classic example: a researcher counts the number of crimes
committed in a city and then the number of churches in that city. She does this for quite a few
cities. It is found that there is a positive relationship between the number of crimes committed
and the number of churches. That is, as crime increases, so do the number of churches. What
gives? Do these people just repent more often for their guilty consciences?

It may not come as a large shock that we're dealing with potentially many confounding variables.
The simplest one is population. As a city's population increases, more crime is committed and
more churches are needed. This is but one possible explanation.
Example 1: An educational researcher finds that there is a strong relationship between the
number of hours a student studies and his/her grade point average (GPA)? List a few possible
confounding variables.
SOLUTION: There is no guarantee that studying more causes a higher GPA. There are many
factors that might influence a higher GPA:
 More sleep
 Less stress (maybe due to lack of job)
 Less television viewing
 Better study environment
 More support from family/friends
Issues in Planning a Study
There are many. Let's consider the following scenario to help illustrate a few.
Scenario: Suppose we want to test whether or not a newly designed Freud circular saw blade
runs at a lower temperature, and hence causes less burn marks in the wood, than the old blade at
7200 revolutions per minute (RPM).
Can we just run the cuts, take the temperatures, and compare? I think you know the answer to
this.
First off, we face many extraneous factors, or variables that are not of interest in the current
study but that are thought to affect the response variables. Examples? The person doing the
cutting with each blade (same or not?). The type of wood being cut (is one pine and
the other oak?). The type of saw (low-power Craftsman, or professional Jet?).
In order to avoid having these types of factors affect our measurement, we must control them.
We can do this by having the same person do the cutting, having both boards being cut exactly
the same, and use the same saw for both tests.
Secondly, is it sufficient to cut just one board using each blade? Definitely not. We must expect
that there will be some variation or variability in the temperatures we measure. That is, if I run
the cut with the old saw four times, I may read temperatures of 205 , 202 , 209 and 219 . This
difference among the measurements is called variability. Thus, to take into account the
variability, we must take several replications, or repeated measurements. Then, we would likely
use the mean, or average of the replications.

Although far from last, we will consider here one more important concept. You might not think
anything of it at first, but do you suppose that it's a good idea to use just two saw blades for the
experiment - one old, one new? What if we happened to get a faulty blade out of the
batch? If we run 4 replications with each blade, we might consider having 4 of the old blades and
4 of the new blades.
If you have a total of 8 sheets of wood to be cut, is it okay to cut the first 4 with the old blade and
the last 4 with the new blade? Surprisingly, the answer is "no." Why not? Suppose the sheets
were delivered freshly cut, and still moist. Well, moisture is subject to gravity, and so the last
four boards might be more moist than the top four. Thus, we must randomize each board to one
of the two types of saw blades. In other words, we randomly assign each board to a blade. We
will not consider this any further at this point.
Homework Problems - 1.1
1. Classify each of the following variables as nominal, ordinal, interval, or ratio scale.
Justify your answer.
a. Favorite flavor of ice cream
b. Temperature ( F)
c. Accounts Receivable Balance
d. Ranking of Presidential Candidates According to Preference
2. Based on a study of 2121 children between the ages of one and four, researchers at the
Medical College of Wisconsin concluded that there was an association between iron
deficiency and the length of time that a child is bottle-fed (Milwaukee Journal Sentinal,
November 26, 2005).
a. How many elements does this dataset contain?
b. Is the variable categorical or quantitative? Explain.
3. The student senate at a university with 15,000 students is interested in the proportion of
students who favor a change in the grading system to allow for plus and minus grades
(e.g., B+, B, B-, rather than just B). Two hundred students are interviewed to determine
their attitude toward this proposed change.
a. How many elements does this dataset contain?
b. Is the variable categorical or quantitative? Explain.
4. An article titled “Guard Your Kids Against Allergies: Get Them a Pet” (San Luis Obispo
Tribune, August 28, 2002) described a study that led researchers to conclude that “babies
raised with two or more animals were about half as likely to have allergies by the time
they turned six.”
a. Is this study an observational study or an experiment? Explain.

b. Describe a potential confounding variable that illustrates why it is unreasonable to
conclude that being raised with two or more animals is the cause of the observed
lower allergy rate.

5. The article “Television‟s Value to Kids: It‟s All in How They Use It” (Seattle Times, July
6, 2005) described a study in which researchers analyzed standardized test results and
television viewing habits of 1700 children. They found that children who averaged more
than two hours of television viewing per day when they were younger than 3 tended to
score lower on measures of reading ability and short term memory.
a. Is the study described an observational study or an experiment?

b. Is it reasonable to conclude that watching two or more hours of television is the
cause of lower reading scores? Explain.
6. “More than half of California‟s doctors say they are so frustrated with managed care they
will quit, retire early, or leave the state within three years.” This conclusion from an
article titled “Doctors Feeling Pessimistic, Study Finds” (San Luis Obispo Tribune, July
15, 2001) was based on a mail survey conducted by the California Medical Association.
Surveys were mailed to 19,000 California doctors, and 2000 completed surveys were
returned.
a. Is this study an observational study or an experiment? Explain.
b. Describe any concerns you have regarding the conclusion drawn.
1.2Descriptive VS. Inferential Statistics
1.2.1 The Purpose of Statistics and “Statistics”
Statistics is a branch of mathematics that deals with the analysis of data. This is often confusing
to some people, since the lower-case version of this word, statistic, actually means: a piece of
data. So, we have statistics, which are the data themselves, and we have Statistics, which deals
with the analysis of statistics. Confusing, huh? We generally use the word statistics loosely to
mean “data.”
A statistician is a special type of mathematician who deals with the analysis of data. Many
people confuse the profession of the statistician with a person who simply has many statistics
memorized. While some certainly may, most do not.
Needless to say, our purpose in the field of Statistics is to understand data. Depending on one‟s
goal, statistics may be used to simply describe an obtained set of data or to extrapolate the data to
describe something much larger. These two goals are respectively called, descriptive and
inferential statistics.
1.2.2 Descriptive Statistics
Suppose you work in the accounting department and have collected the following data on
revenues earned from new and existing customers over the past day:

Account Type Revenue ($)
New $5,296
Old $2,230
Old $7,643
Old $3,897
Old $9,590
Old $2,689
Old $5,890
Old $9,561
New $3,643
New $8,861
Old $3,946
Your goal is to summarize the data in some meaningful way(s). Descriptive statistics is the
method of describing or summarizing data. How could this be done?
We first consider the types of variables we have present:

 Account type – Categorical
o New, Old
 Revenue – Quantitative
o Range from $2,230 to $9,590
With categorical variables, we cannot mathematically manipulate the observed values, or

observations (here we have “New” and “Old” for observations). We can only provide
descriptions of the values.
We can provide the relative frequency of these values. A relative frequency is a ratio of the
number of observations of a given value to the total number of observations. Here, we could
summarize by saying:
Account Type Relative Frequency

New
Old
This allows us to conclude that 27% of the sales came from new clients while 73% came from
existing clients. This is very valuable information! This information demonstrates that the
company has grown over the course of this one day.
We could present these two descriptive statistics to management by either providing the raw
percentages, or by some visual display, such as a pie chart or a bar graph. A pie chart shows
the ratios (or all parts of one whole) of the categorical variable and thus the entire circle
represents 100% of all account types (100% of the categorical variable values):

Account Type
New
27%
Old
73%
This literally shows the “ingredients” of the pie. A corresponding bar graph might be:
Account Type
9
8
7
6
Frequency
5
4
3
2
1
0
New Old
Type
In a similar way, we could describe Revenue, the quantitative variable. Typically, quantitative
variables are described by:
 Central tendency – measure of the “typical” or center-most observation. Examples are

mean (average), median (the value that is literally the middle number), and mode (most
frequently occurring number – typically not used and data sets usually do not have one).
 Variability – measure of how spread-out the data values are. A number of possible
measures exist including (but not limited to): range, interquartile range, and standard
deviation.

For the present time, we‟ll proceed to describe one of each of the above descriptive statistics.
The rest will be discussed in later sections.
Since we‟re most used to finding a simple average, or the mean, we will do that here. Recall, that
the mean can be found by summing the observations and dividing by the number of
observations:
Recall that when we find an average, we are placing all values into a common “pot.” We then
divide the pot into equal parts. That is to say, if each company had spent the same amount of
money on each purchase, they would each spend $5,750. We like to think of this as a measure of
the center value. Spending less than this amount puts a company below the average and spending
more puts the company above the average.
Mean (Simple Average)
The mean, or simple average, of a quantitative variable is expressed as:
This value represents the amount allocated to each observation, if each observation were to
receive an equal share of the total. We think of this as the “center” value.
In conjunction with measures that summarize the center, it is critical to focus also on how spread
out the data is. One such measure is the range. The range is simply the difference between the
minimum and maximum values in the dataset. In this instance, we have:
Minimum: $2,230
Maximimum: $9,590
The difference is:
Thus, the range of the dataset is $7,360. This tells us that the amount spent varied by as much as
$7,360 from company-to-company.
Range
Range, a measure of the variability (or spread) of a dataset, is measured by taking the difference
between the largest and smallest observed value. That is,

Example 1: For the example considered above, summarize the center and spread of revenue
by account type. Describe any information revealed by splitting up the data in this fashion.
SOLUTION: We are being asked to look at values specific to the account type. Thus, we will
have two means and two ranges.
For “New” accounts:

New $5,296
New $3,643
New $8,861
For “Old” accounts:

Old $2,230
Old $7,643
Old $3,897
Old $9,590
Old $2,689
Old $5,890
Old $9,561
Old $3,946
We summarize this information in a table:

Row Labels Average of Revenue ($) Max of Revenue ($) Min of Revenue ($) Range
New 5933 8861 3643 5218
Old 5681 9590 2230 7360
Grand Total 5750 9590 2230
We see that both company‟s tend to have about the same average purchase amount. However, it
appears that the amount spent by old customers is prone to more fluctuation than that of new
customers. This might be due simply to the fact that there are only three new customers.
Technology Note: All of the information above was generated using Microsoft Excel.
1.2.3 Inferential Statistics
Descriptive statistics is a great way to describe what you have, but how can we describe data that
we do not have?
Let‟s consider an example. You are the manager of the production branch at Healthy Heart
Organic Foods. Due to recent workload increases, you are concerned that your employees‟ team
morale has decreased. You have 864 employees working in your department. You would like to
conduct a survey, but you do not have the means to investigate the data in each of the surveys
provided. Certainly, you could pay your assistant overtime to analyze them for you, but that
would be costly of his time and payroll. Instead, you decide to randomly survey 50 of the
employees in your department in order to get an idea of the overall morale. This process of
collecting data on a smaller portion of the whole in order to generalize to the whole is known as
statistical inference. This branch of statistics is called inferential statistics.
It is of utmost importance to make appropriate conclusions when reporting findings of any study,
a survey or an experiment. For example, if we find that rats die after ingestion of 20mg of
caffeine, does that mean caffeine will kill a human, as well? This brings up the worthwhile
discussion of a population versus the sample. Let‟s consider the figure below:

First off, a researcher must decide who his target population is. That is, is he trying to describe
all people in the United States? All Asian children between the ages of 2 and 5? All elk in
Minnesota? The population is the set of all people, creatures, things, etc., that we wish to
describe.
It is often quite time-consuming and costly to conduct a study based on whole populations. Even
presidential polls rarely involve more than a couple hundred participants. Through one of a
variety of processes, only a select number of elements of the target population will be selected.
This select number is referred to as the sample. The process of selecting a sample from the
population that we will consider is simple random sampling (SRS). This process helps to
ensure that any differences that we notice among sample elements is entirely due to chance and,
importantly, that every element in the target population has an equally likely chance of being in
the sample.
Simple random sampling can be done by many means. You’ve probably heard of the random
process of drawing a name from a box to declare the winner of a raffle. More sophisticated
means of this are done by a random number generator on a computer, wherein every element of
the population is assigned a whole number. Then, a series of random numbers is drawn by a
computer and those elements are selected to be in the sample.
We can see in the illustration above that our goal is to then make inferences about the population
based on our observations of the sample. Just as you might hear from Gallup: “55% of voters
plan on voting for Candidate X,” we try to make generalizations based on the target population.
As another example, consider a lighting company that is hoping to manufacture a light bulb with
a new type of filament. As with any light bulb, a consumer would want to know how long the
light bulb is expected to last. Unfortunately, not every light bulb will last equally long as every
other light bulb. This means that an average will have to be taken. To add to this, it is not
possible to test every single light bulb to determine how long it will last. So, the company
decides to randomly test 200 bulbs that come through the assembly line. They hope to use this
sample, since it is random and is assumed to be representative of all light bulbs, to estimate the
true average lifespan of a light bulb with this new filament. Here is an overview of their
inferential statistics process:

(SOURCE: Essentials of Modern Business Statistics, 4th Edition, Anderson, et. al.)
Though it might seem simple enough to conclude that the average light bulb survives for 76
hours, we have to take into account the variability in the lifetimes. That is to say, we need some
way to produce a reasonable interval for the true average, since it is the entire population we are
looking to describe. A discussion of this inference process is left for future sections.
1. Over its first week in the Box Office (12/14/2012 to 12/20/2012), the movie The Hobbit:
An Unexpected Journey grossed the following amounts, in millions of dollars (no
particular order):
6.9 9.2 1.6 1.9 1.9 1.6 4.9
(SOURCE: www.the-numbers.com)
a. Calculate the mean.

b. Explain the real-world meaning of the mean.
c. Calculate the range.
d. Explain the real-world meaning of the range.
e. Provide a brief written report (summary) to the producers of the film on how the
film is doing and the stability of gross revenues.
2. A marketing firm conducts a focus group with eighteen randomly selected college
students to determine their preference for a variety of clothing lines.
a. Describe the sample.
b. Describe the population.
c. What variables might the marketing firm want to measure?

d. Is the firm‟s goal to conduct descriptive or inferential statistics?
3. In a quality control process, 250 packages of cheese are randomly selected from an
assembly line. Each package of cheese will be described as either “pass” or “fail,”
depending on whether or not it passes the inspection.
a. Describe the sample.
b. Describe the population.
c. Quality control will fail if more than 1% of the packages fail. How many
packages must pass?
4. Two datasets have a range of 30. Describe how it is possible that one dataset is
considered to be more spread out that the other dataset.
5. One hundred randomly selected CGCC students are surveyed and asked, “Do you believe
that racism is an issue in the college setting?” The survey makers would like to generalize
to college students. What is wrong with their study?

1.3Statistics in Excel
When conducting an analysis of realistic amounts of data, it is tiresome, mundane, and even
unfeasible to carry out computations by hand. Microsoft Excel is by far a more powerful and
accessible piece of software that does this all for us. As such, we seek to better understand how it
works in this section. All images below come from the most recent version of Microsoft Excel.
Excel is a spreadsheet-based software. This means that each entry, or cell, represents one piece
of information that is all a part of a larger grid of cells. A cell may contain numerical or textual
information.
1.3.1 Sum(), Average(), Min(), and Max()
Eventually, you will learn to make beautiful spreadsheets, but we are now only concerned with
some basic features. Let‟s begin by entering the following accounting data from Section 1.2:

New $5,296
Old $2,230
Old $7,643
Old $3,897
Old $9,590
Old $2,689
Old $5,890
Old $9,561
New $3,643
New $8,861
Old $3,946
We can choose any cell we want to begin entering data. Let‟s choose cell A1 to type in the
header. This cell reference means that we are looking at row A and column 1. We will enter our
second column‟s label into cell B1. We will list the data vertically, as shown in the table above.
After clicking on a cell and typing in each entry, simply press ENTER or TAB to move to the
next cell. Do not press ESC, or the data you are typing will be cancelled.
In order to see the entire labels in cells A1 and B1, we can expand the column by placing the
cursor between the grey-shaded labels for columns A and B, clicking, holding, and dragging the
window to an appropriate size.

We can make it a bit more presentable by centering and by bolding the labels.
Excel is extremely useful due to the fact that it allows us to create formulas based on the values
of existing cells or cell ranges (a collection of one or more cells).
A formula can either act on a provided value or on a provided set of cells. For example, suppose
we want to add up the total revenue. We want the result to appear in cell D3. To initiate a
formula, we must begin with = in the desired formula cell. Thus, we could click cell D3 and
type:
This, however, would defeat the purpose of having entered all the data in already! So, we will
use the built in sum function. To use this, we type:

= sum(B2:B12)
This tells Excel to sum up the range of values from B2 to B12. The colon indicates that we want
the full range and not just the two cells B2 and B12. If we were only to have wanted to sum cells
B2 and B12 (no in between), then we would have replaced the colon with a comma.
NOTE: Excel is not case-sensitive when it comes to formulas. You can type SUM or Sum or
even sUm and Excel will recognize what you are asking it to do. However, if you are analyzing
categorical data, then “New” is not recognized as being the same as “new.”
We get:
(NOTE: It is highly recommended that you label your spreadsheet values. Before or after
inserting the sum into D3, it is a good idea to label that cell‟s content, perhaps in cell C3 as
shown above. This will be very helpful when your spreadsheet is loaded with information.)
To get the proper formatting, highlight cell D3 and select “Currency” from the Number column
in the Home Tab. This formatting only applies to the selected cell(s).
To find the average revenue, we would simply type the following into the desired cell (we‟ll use
D4):
= average(B2:B12)

For measures such as the range, Excel does not have a built-in range function. Excel does have a
function to locate the maximum and minimum values in a range of cells. Into cell D5, we will
type in:
= max(B2:B12) – min(B2:B12)
This will find the maximum value from B2 to B12 and subtract away the minimum from B2 to
B12, giving us precisely the range. If it is desirable to see the max or the min, you can choose a
cell and simply type in the max portion or the min portion without doing the subtraction, as
shown below:

Suppose that this company assumes the daily revenue of $63,246 is (roughly) expected to be
earned on a daily basis over the next 30-day month. To get the month‟s revenue we would like to
multiply this amount by 30. To do this, we would simply type into our desired output cell:
= 30*D3
NOTE: To indicate multiplication in Excel formulas, you must use the multiplication sign.
Parenthesis to indicate multiplication will produce an error.
There are literally hundreds of functions available through Excel. A very useful tool for learning
how to do new things in Excel is to Google what you are trying to accomplish. For example, if I
wanted to find the standard deviation of revenues, I might search Google for “standard deviation
in Excel.” Thousands of results are bound to pop-up. Why stop there… try YouTube for many
useful videos.
1.3.2 Countif()
It is nice to know that Excel has formulas to operate on quantities, but it could still be
devastating to have to count categorical values by hand.
The countif() function is useful for such an act. This function works as follows: you provide a
range of cells for the function to evaluate. You then provide a condition that it should search for
and it counts the number of such instances. Suppose we want to count the number of new
accounts in cells B2 to B12. We would enter:
= countif(B2:B12, “New”)
NOTE: we separate the cell range with a comma. After the comma, we type in parenthesis the
word it is to search for. Note that case does matter here, since we need to tell Excel exactly what
to search for.
We get:
We can do the same for Old.
A neat little trick is to modify our formula. Let‟s say that we want to minimize the number of
areas in our spreadsheet that we would need to change if, say, we began calling “New” accounts
“NB” for “New Business.” We would need to change all the account type names, as well as the
search criteria in the formula. To make this easier, we can tell our formula to search for
something that is already typed into an existing cell. Since C10 contains the actual word we want
to search for, we will simply put C10 after the comma instead of the word “New.”
= countif(B2:B12, C10)
This tells Excel what cells to count, and it tells it what cell to find the search criteria in. We still
get the same result. Caution to the wind: if you modify the entry in C10, your result in D10 will
change accordingly (or it might produce an error).
1. A new policy prohibiting personal emails being sent is enforced by a telemarketing

company. A climate survey was then conducted to ask whether a randomly selected
number of employees agrees with the policy, and the duration of time they‟ve been with
the company. The results are below:

Agrees w/Policy Years at
Change? Company
Y 4
Y 8
N 3
Y 10
N 3
N 3
N 6
Y 3
Y 5
N 8
N 1
Y 8
Y 10
Y 5
Y 8
Y 3
N 8
N 8
Y 9
a. Determine the mean number of years this sample has been with the company.
b. Determine the minimum and maximum number of years a person from this
sample has been with the company.
c. Determine the combined overall number of years this sample has been with the
company.
d. Determine the frequency with which people within this sample agreed and
disagreed with the policy change.
e. Calculate the mean, the minimum and maximum, and the range for each of the
two groups (agree and disagree).
f. Describe any patterns that emerged when considering the two groups separately.

Chapter 2
Visual Representations of Data
2.1 Visualizing Categorical Data
When summarizing data, it goes without say that there are appropriate and inappropriate ways to
display the data. For example, if you collected a person‟s age and income, you might be
interested in studying income as a function of age. In this case, you probably would not want to
build a pie chart, since you‟re studying quantitative variables (two of them, at that).
In the previous chapter, the main types of categorical data visualizations were mentioned – bar
graphs and pie charts. Our aim here is simply to summarize and to show how to use them in
conjunction with Excel. We‟ll create three types of representations:
 Pie Chart
 Frequency Bar Graph – Vertical axis keeps tracks the number of instances of each
observation
 Relative Frequency Bar Graph – Vertical axis keeps tracks the ratio of instances of each
observation (decimal or percentage, typically)
2.1.1 Creating a Pie Chart Using Excel
Suppose a hotel owner asks 20 randomly selected recent guests to respond to the following
statement regarding their experiences at the new hotel lounge:
“The dining experience in Harlan’s Hotel Lounge is worth revisiting.”
Respondents circle one of the following letter combinations:
- SD - Strongly Disagree
- D -Disagree
- A - Agree
- SA - Strongly Agree
The resulting data is shown below:

Participant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Opinion D A SD A SD SA A A A A A A D A A A A A A A

To represent the data to his shareholders, his marketing team constructs the above visual
representations.
Since the participant number is not important, it is okay to ignore that line of the dataset. Our
focus is on the Opinion row. This is a categorical variable, so we‟ll begin by counting the
number of SD, D, A, and SA responses by using Excel‟s countif() option. Further, we‟ll calculate
the relative frequency of each response by dividing the number of responses for each category by
the total number of observations, which we tally below all the individual frequencies:
One new trick worth mentioning is Excel‟s ability to recognize patterns in our formulas. Let‟s
say that we typed in our countif() formula for SD in G7 as follows.

We now have to enter a formula for the three remaining opinions. This can get time-consuming.
So, we attempt to copy cell G7 and paste it in G8:
This does work! Note that, since we shifted the formula down one level, F7 turned into F8. That
is, the search criteria is now being “pulled” from F8, the cell corresponding to an opinion of „D‟.
However, we have one problem: the counting region also shifted from D6:D25 to D7:D26. We
don‟t want that! To tell Excel that we still want the counting region to be D6:D25 and to not
change when we copy our formula, we “lock” the rows and columns by putting a dollar-sign ($)
before the row letter and before the column number, as shown below:
(HINT: If you place your cursor over each of the cell names in the formula and press command
F4 on your keyboard, you will notice the dollar-sign toggle for you)
Notice that F7 contains no dollar-signs, so as to indicate to Excel that we wish for the criteria cell
to adjust down one row (still in column F) as we move down one row. We can now copy-paste
the formula down the remaining cells:

In G12, we would like the sum of the frequencies, so we type:
= sum(G7:G10)

We know from the data that this value is correct!
To get the relative frequencies, we want to divide each frequency by the constant 20. For
instance, the relative frequency of „A‟ would be 2/20 = 0.1. Instead of telling Excel to divide 2
by 20, we will type the following formula into H7:
= G7/$G$11
Note that we lock cell G11 so that, when we copy this formula to the remaining cells, we
continue to divide by 20, the value in G11.
It is neat to note that we can copy the formula all the way down to H11, since it will simply take
20 and divide it by 20, indicating that the total is 1 or 100% of the data.

We are now prepared to construct visuals.
To build a pie chart, we can simply highlight the four opinions and the corresponding
frequencies (click and drag from cell F7 to G10), selecting the Insert tab, clicking on Pie in the
Charts column, and selecting the desired pie chart. We‟ll select the first one.

Alternatively, it is possible to insert a blank pie chart and to then select the data afterwards. The
above process saves a couple of steps.
Now we would like to label the chart. It would be nice to see a title and the percentages for each
of the slices. To do this, select the chart and click on Design in the Chart Tools tab that appears.
In the Chart Layouts column, we can select the style of chart most appropriate to our needs. For
demonstration purposes, the first option will be shown below:

To add a suitable title, click “Chart Title” and overwrite it with an appropriate name. If the pie
chart become distorted or label are moved undesirably, the chat box can be adjusted by dragging
out its corners.
There are many options when it comes to formatting graphs and charts. This will be left for
exploration. Note also that many online sources, such as YouTube, offer tutorials on professional
formatting within Excel.
2.1.2 Creating a Bar Graph Using Excel
Depending on what one would like to emphasize, a bar graph may be suitable to meet that need.
We can create either a frequency bar graph or a relative frequency, depending on whether we
want to display the number of times an observation appears or the percentage of observations
resulting in each of the possible variable values.
Using our example from above, since the frequencies are in the column adjacent to the opinion
value, we can simply highlight all observations and frequencies and select the Insert tab, the
Charts column, and select the first 2-D Column graph from Column. Be careful not to select the
Total row.

16
14
12
10
8
Series1
6
0
SD D A SA
There is only one variable here, we can click on the “Series1” in the legend and press DELETE.
This will free-up some space.
16
14
12
10
0
SD D A SA
With the graph selected, Choose the Layout tab that appears in the Chart Tools area.

You can label the graph by selecting appropriate options from “Chart Title” and “Axis Titles” on
the left side of the selected tab.
Guest Opinions of Harlan's Lounge

16
14
12
Frequency
10
8
6
4
2
0
SD D A SA
Opinion
In the relative frequency bar graph, we wish only to change the measurement on the vertical axis.
We want to draw the proportions from the third column of our data.
We can update our current bar graph to reflect this. If you do not want to lose the information in
your frequency bar graph, you can copy the graph and paste it beside the existing graph. This
will allow us to modify the data that is being drawn in.

Selected the copied graph. In Chart Tools, select the Design tab. From there, click on Select
Data.
Select the “Edit” option above the “Legend Entries” box.
Beside the “Series values” box, click the icon. This will now allow you to select the values
of the dependent variable. Click and drag to select all the relative frequencies, except the total
frequency. Then press the icon to close the dialogue box. After relabeling the vertical axis,
you should now see:

Guest Opinions of Harlan's Lounge
0.8
0.7
Relative Frequency 0.6
0.5
0.4
0.3
0.2
0.1
0
SD D A SA
Opinion
We notice that both graphs look nearly identical. This is due to the fact that the relative
frequencies are proportional to the frequencies (they are the frequencies multiplied by 1/20!).
2.1.3 Conclusions
The owner of the hotel can reasonably conclude that 80% of his recent guests enjoyed the lounge
(enough to consider revisiting!). He can conclude that 20% of his guests either did not care for it
or absolutely hated it! If he is interested in additional repeat visitors, perhaps he might like to
determine how to make the experience better for those who seem to be highly dissatisfied. Are
these descriptive measures demonstrative of the entire population of visitors? To a greater or
lesser extent – perhaps.
1. The following dataset represents the meat selection made by individuals at a dinner
banquet. Attendees selected from beef (B), chicken (C) veal (V), or pork (P).
B C B C V B C
C C P P B B C
a. Is this data categorical or quantitative?

b. Create a table that shows the frequency and relative frequency for each of the
choices. Use Excel.
c. Create a frequency bar graph. Label all axes.
d. Create a relative frequency bar graph. Label all axes.
e. Create a pie chart. Label all axes.
f. Write a brief report (summary) describing the meal preferences of these attendees.
Describe any general trends. Use specific data and make appropriate conclusions.

2. The following data represents per capita meat consumption (pounds per person) in 2009
for a variety of meats (SOURCE: U.S. Statistical Abstract, Table 217).
Pounds per
Meat Person
Beef 58.1
Veal 0.3
Lamb and mutton 0.7
Pork 46.6
Chicken 56.0
Turkey 13.3
a. Using Excel, find the mean and range of the data.

b. Explain the real-world meaning of the mean you found.
c. Explain the real-world meaning of the range you found.
d. What conclusions can be made about the center and spread of per-capita meat
consumption?
3. On opening day, the owners of Green Heart Restaurant invited 29 food critics to be a part
of the culinary experience. Each critic gave a grade of A (Best), B, C, D, or F (Worst) to
reflect the quality of the overall dining experience. The scores are shown below:
A B B A C B C B B
D C B B A A C C C
C B A D C C B B B
A B
a. Generate a relative frequency bar chart.

b. Generate a pie chart.
c. What should the owners take away from the experiences of the critics?
4. Consider the scenario in problem 1.

a. What is the sample?
b. What is the population of interest?
c. What other variable(s) might be of interest to the data analyst to better study
attendees‟ eating preferences?
5. Consider the scenario in problem 3.

a. What is the sample?
c. What other variable(s) might be of interest to the data analyst to better study the
target demographic?

6. Suppose you are the owner of an accounting firm. You would like to better understand
the employment of the residents within ten miles of your firm.
a. What variables would you collect? Which are quantitative and which are
qualitative?
c. How would you go about collecting data for this study? Be specific.

2.2 Visualizing Quantitative Data
To make an assessment of how efficient the technical support department is in helping customers
solve software issues, management keeps track of the length of each phone call taking place over
the day. They find the following:
Length of Call (min.)

1 2 13 4 12
4 10 6 6 9
4 3 4 0 12
6 4 4 13 15
0 4 10 4 10
7 2 10 8 4
7 0 4 4 4
Since this data is quantitative, the discussed visual displays are not appropriate. However,
management still would like to visualize the 35 observations.
One quick, by-hand technique to visualize how the times appear would be a dot plot, or a simple
number line, with any repeats stacked above others. Given the presence of great technology, we
will use Excel to create a histogram, which is a graph similar to a bar graph (can be either
frequency or relative frequency). The difference is that, instead of having nominal categories on
the horizontal axis, we will create numerical categories. For example, we could simply create
tick marks for each observation value present in the table and to then display the number of time
it appears. Often, with small amounts of data, the graph may appear spread out. In this case, we
might decide to create a bar representing, say, all calls that fall between 0 and 3 minutes. Let‟s
demonstrate both:
Call Times
14
12
10
Frequency
8
6
4
2
0
0 1 2 3 4 6 7 8 9 10 12 13 15
Length (min.)

We clearly see that most calls are between about 4 and 10 minutes (a 4-minute call is most
frequent – the mode). Alternatively, we might choose to create equal-width categories. Let‟s say
we have categories that show the times as 3-minute blocks:
Call Times
14
12
10
Frequency
8
6
4
2
0
0-2 3-5 6-8 9-11 12-15
Length (min.)
Beautiful! Now it is more clear how call times are distributed. This visualization is a bit simpler
than the one above, as it groups times into more manageable categories. Note that the bars are
touching. This is the distinction of a histogram from a bar graph – we want to emphasize that
times are continuous and that every time length between 0 and 15 are accounted for (even
fractions of minute, potentially).
We can make these categories as wide or narrow as we‟d like. We call these categories bins.
Think about this as you would about sorting recycling materials into one of several bins.
2.2.1 Creating a Frequency Histogram Using Excel
The most time-consuming part of building a histogram by hand is organizing the data and
counting the number of observations. Excel does this quite easily via the use of a pivot table. A
pivot table is a “live” table whose values can be formatted in many different ways.
We must first begin with the dataset in Excel as a raw column or row of data:
To insert a pivot table, highlight the entire set of data, including the data label. Click on the
Insert tab and choose the PivotTable option from the Tables column. A data prompt should
appear with the table range already appearing in the box:

You can either choose to have Excel place the table within the same worksheet, or you can have
it create a new one. This choice is up to you. If you choose “Existing Worksheet” you will have
to specify a cell to paste it to. Choose a cell that is out of the way of any existing data so that it
doesn‟t “bump” into it if the pivot table becomes quite large.
You should now see something similar to the table below:

When highlighted, a “PivotTable Field List” window should appear to the right of your screen
with the name(s) of the variable(s) in the “Choose fields to add to report” box.
This generic template will now allow us to construct a table. From the PivotTable Field List
window, we will drag the Times variable into the Row Labels box. This will create a series of
rows with each of the observations appearing, only once. Thus, we will not have to see repeats!

If we had additional variables, the row labels can be any variable desired. For each of these rows,
we would like to see a frequency count. This is where the “Values” box comes in handy. Drag
the Times variable into the “Values” box:
The values of time are, by default, the sums of the times for each of the row labels. This is not
what we want. We want “Count of Times.” To change the type of value, click the arrow on the
“Sum of Times” button. Choose “Value Field Settings.” Change “Summarize value field by”
option to “Count” and close the dialogue box:
We can double-check that these values are correct by noting that the Grand Total is 35, the same
as the number of observations. We would like a histogram to show the “Row Labels” along the
horizontal axis and the “Count of Times” along the vertical axis. To do this, select the pivot table
and choose the Options tab from the PivotTable Tools menu.

Select PivotChart and select the first graphing option:

We make a few adjustments: delete the legend, re-label the chart title, and remove the two grey
boxes. Now that a graph has been inserted, a PivotChart Tools menu appears when the graph is
highlighted. This is very similar to inserting a regular graph. Select Layout to add axis labels. To
remove the grey boxes, right-click either box and select “Hide All Field Buttons on Chart.”
Histogram of Call Times

14
12
10
Frequency
8
6
4
2
0
0 1 2 3 4 6 7 8 9 10 12 13 15
Times (min.)
To make the gaps between bars disappear, select the graph and choose the eighth graph option
from the Design tab in the PivotChart Tools menu shown below (NOTE: this option will
automatically put in axis labels):

To make solid black lines appear as the outlines for each bar, change the bar styles from “Chart
Styles.”

14
12
10
Frequency
8
6
4
2
0
0 1 2 3 4 6 7 8 9 10 12 13 15
Times (min.)
We now would like to adjust the bin widths. Doing this is simple!
Select the pivot table. From the Options tab under the PivotTable Tools menu, choose “Group
Selection” from the Group column. In the dialogue box that appears, the “Starting at” and
“Ending at” boxes should reflect the smallest and largest values of the variable. You can adjust
these to be wider or narrower, if you choose to show less than the full dataset. In the “By:” box,
put the width of the classes. In this case, we chose 3. Press “OK” and the you should then see the
updated pivot table and graph!

To change frequency to relative frequency, we must now change “Count of Times” in the
“Values” box of the “PivotTable Field List.” Click on “Count of Times” and select “Value Field
Settings.” Within the dialogue box, choose the “Show Value As” tab and choose values to show
as “% of Grand Total.” Press “OK.” Adjust the vertical axis label accordingly.

40.00%
35.00%
Relative Frequency
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
0-2 3-5 6-8 9-11 12-15
Times (min.)
1. An instructor grades a math test and produces the following histogram:

Histogran of Test Percentages
10
9
8
7
Frequency
6
5
4
3
2
1
0
60-64 65-69 70-74 75-79 85-90
Percentage Earned
a. What can the instructor conclude about the fairness of the test?
b. What appears to be the mean score, based on the histogram?
c. What is the approximate range of scores, and why is it only possible to be
approximate this from the given information?
2. A cashier at a mall retail clothing outlet asked customers their age for an anonymous
survey. The ages he collected can be found below:
31 34 30 30 31 27 33 36
33 30 29 28 20 32 24 30
32 30 30 22 31 38 28 31
25 24 25 31 25 24 36 32
24 31 31 32 31 31 28 31
33 20 32 32 52 31 27 30

d. Create a relative frequency histogram for age. Leave your bin width as 1 year.
e. Create a relative frequency histogram for age with bin width 5 years.
f. Describe any trends in the age of shoppers at this store.
g. Based on your answer to e), which age group(s) can be omitted from the
company‟s marketing tactics, in an effort to focus only on the regular shoppers?
3. The total number of people (in millions) working in all of the various industries in the
United States in 2010 is given in the table below:
2.206 0.731 9.077 14.081 8.789 5.293 3.805 15.934

7.134 5.88 1.253 3.149 9.35 6.605 2.745 15.253
9.115 6.138 32.062 13.155 18.907 6.249 9.406 3.252
12.53 2.966 9.564 6.769 6.102 0.667 6.983
(SOURCE: U.S. Statistical Abstract, Table 619)

d. Create a relative frequency histogram for age. Leave your bin width as 2 million
people.
e. Create a relative frequency histogram for age with bin width 5 million people.
f. The federal government regularly publishes reports on employment across the
many industries. Using the information you have gathered, generate a brief report
detailing your findings, including any trends in employment.
4. A resort chain that wishes to expand is constantly searching for new sites to add
properties that will be profitable. A good place to start is by considering climates.
Suppose Starwood Hotels and Resorts Worldwide obtains the following data from the
U.S. Census Bureau on highest temperatures ever recorded in various cities in the United
States:
112 100 128 120 134 114 106 110

109 112 100 118 117 116 118 121
114 114 105 109 107 112 115 115
118 117 118 125 106 110 122 108
110 121 113 120 119 111 104 111
120 113 120 117 107 110 118 112
114 115
(SOURCE: U.S. Statistical Abstract, Table 391)

d. Create a relative frequency histogram for age. Leave your bin width as 5 degree.
e. Create a relative frequency histogram for age with bin width 10 degrees.
f. What percentage of states can be eliminated from consideration if the company
will not take any risks with states that have had a record high over 115 F?
g. Summarize the distribution of high temperatures in the U.S.

2.3 Descriptive Statistics – Center and Position
Histograms provide us with a great visualization of the overall distribution of values. A

distribution describes the layout of the values of a quantitative or categorical variable. To further
describe the differences between two similar distributions, it is helpful to use statistics that
describe center, location, and spread.
2.3.1 Mean and Median
To make peace with some regularly occurring notation in statistics, we will use (“sigma”) to
mean “the sum of.” For instance,
Let‟s say that we have a set of variable values. To distinguish each of these “ ‟s” we‟ll use
subscripts, denoting them:
Then, to indicate that we want to sum these values across all subscripts, we would write:
Which means, “sum up all values in the dataset,” or
Using this new notation, we already know how to calculate the mean:
Mean – x-bar notation
The mean value, or average, of a dataset containing values can be written as:
̅ , is used to denote the mean of a sample and can be read as “x-bar.”
A common point of confusion for students is the difference in the subscript and the
denominator . Many people think that the subscript should be to match the number of
elements in the dataset. However, specifically refers to the very last value in the dataset. We
treat the as an index that goes across all subscripts from 1 all the way up to and including . To

account for this discrepancy, mathematicians usually write where the index should start below
sigma and the maximum value above the sigma. For example, if there are 3 values in the dataset,
we would write the mean as:
As you can see, the sigma notation can quickly become convoluted, and so we typically just
write to indicate the sum of all -values.
Median
The median value of a dataset is the value that represents the physical center of the data set. To
locate the median:
Organize the data values from smallest to largest. Then,
If there is an odd number of values in the data set, the center value can be located by counting in
positions from the smallest value, including the smallest value. Alternatively, one can count in an
equal number of values from the left and right endpoints to locate the center value.
If there is an even number of values in the data set, average the two middle-most values together.
The locations of the two middle-most values are:
Positions from the smallest value, including the smallest value. Once again, these values can be
found by counting from the left and the right endpoints of the dataset.
Example 1: Find the mean and median salaries for the company represented by the following
dataset (in thousands). Explain which measure better reflects the overall company
demographic.
SOLUTION: We first find the mean:
This means that, on average, employees earn $148,200 per year.

We begin by listing them in ascending order:
The two middle values are 48 and 50 (these values are four values in from either side). These
represent the 10/2=5th and 10/2+1=6th values in the dataset. To find the median, we average them
together to get
The median salary is $49,000 per employee per year.
The median is clearly a more viable measure. The mean takes into account all values, including
the outlier, or “extreme” salary of $1.1 million per year. The median is not influenced by
extreme outliers.
To find the mean and median salaries in Excel we use the functions average() and median().
The parameter for both functions is the cell range corresponding to the dataset.

2.3.2 Percentile
Another useful tool for describing the location of data points is a percentile.
Percentile
The th percentile is a value such that percent of the values in a dataset (of values) are less
than or equal to this value.
To find the location of this value, that is, the index, , first arrange the data in ascending order.
The index can be calculated by:
. /
That is, find the th percent of the number of observations. Round up if the index is a decimal
and take the average of the values in positions and if the calculated value of is an
integer. One of these two actions will be taken
Example 2: Find the 50th percentile for the salaries in Example 1:. Interpret the real-world
meaning of this value.
The values, in ascending order, are:
We take . Since this is an integer, we average together the values in positions 5 and
6, giving us a value of 49. This means that 50% of employees represented in this dataset make
$49,000 or less.

Not surprisingly, the 50th percentile is actually the median of the dataset! This is always true.
In Excel, we can use the Percentile() function. The set-up of this function‟s parameters is:
=percentile(cell range, p/100)
Thus, for this dataset, we would have:
2.3.3 Quartiles
Often times, data analysts like to think about data in terms of quartiles, or quarters. There are 4
quartiles and can be represented as follows:
 Quartile 1 = 25th Percentile


2.3.4 Rank
What if, on the other hand, an employee wants to know what the rank of his salary is (he knows
his percentile value)? This requires reverse-engineering of the idea of a percentile. Without the
use of any mathematical formulas, we would need to count the number of values that are equal to
or lesser than salary in question. To make this easier, we can use Excel‟s Rank() function. The
parameters we will use are as follows:
= rank(value, cell range, 1)
This will return the number of values that are less than or equal to the value in question. If we
changed the parameter of 1 to a 0, Excel would return the ranking of that value, treating rankings
as being similar to the ranks of, say, runners in a race.
We will then need to divide this output by the number of observations in the dataset. To make
the counting process more automated, we can take this output and divide it by the output of the
count() function. This function will simply count the number of entries in the specified range,
and has the following parameter:
= count(cell range)
Let‟s say the employee making $24,000 would like to know his salary‟s rank. To calculate, we
would type the following:
Giving us:

Thus, his salary is in the 30th percentile. This means that 30% of people represented in this
dataset make $24,000 or less.
Another approach would be to use the “Rank and Percentiles” tool in an Excel add-in called
Analysis ToolPak. This method will show the ranks and percentiles of all values in the dataset
and is only useful for relatively small, manageable datasets. The Analysis ToolPak will be
important later on, so we‟ll describe it‟s installation here.
2.3.5 Analysis ToolPak
To install the Analysis ToolPak, select the File tab within Excel. Then select Options from the
ribbon that appears. Select the Add-Ins option. Click Analysis ToolPak and press Go.

Check the “Analysis ToolPak” and “Analysis ToolPak – VBA” features from the pop-up
window and press OK.
You now have the ToolPak installed.

To use the “Rank and Percentile” tool, select the Data tab. Choose Data Analysis from the
Analysis column. Pick “Rank and Percentile” from the pop-up window and press OK.
Select the input range:
You can either specify an output range, or have Excel create a new worksheet with the results.
This is up to your preferences. Check “Labels in First Row” and be sure that the data label has
been selected.
The results are shown below:

You‟ll immediately notice that a salary of $24,000 is shown as being in the 22.2-percentile,
which does not agree with our calculation. Every software package uses some technique to
conduct this calculation. A common agreement for calculation purposes does not exist.
Fortunately, they both are in the same “ballpark.”
1. Suppose your instructor releases scores on a recent project. The scores are as follows:
83 89 76 41 92 85 76 71
95 92 80 84 77 78 81 75
64 30 80 79 78 70 75 81
99 85 80 82 70 69 71 70
a. Generate a relative frequency histogram and comment on any interesting

observations of the distribution.
b. Compare the mean and median. What causes them to be different in this particular
way?
c. What score would be required in order to be in the 80th percentile?
d. In what percentile is a person who scores 71% on this project?
2. In order to make way for new products, a grocery store chain would like to determine
whether the Lunch Pack or Family Pack of Flaxem Crackers generate more revenue. The
following two datasets show the revenue generated by each over a 10-month period:
450 510 550 330 400

Lunch
500 550 290 310 300
500 400 600 310 350

Family
600 200 200 600 430

a. Compare the mean and median of each dataset. What can be said about the
middle-most revenues?
b. Find all four quartiles for each dataset. Use this information to make an argument
for why this grocer should hang on to the Family Pack.
c. For each of the datasets, determine the top 10% of revenues that can be expected.
d. Find the range of the data. Comment on how this might influence the grocer‟s
decision.
3. Suppose that Budget Car Rentals assesses a variety of new 2012 and 2013 sedans for its
new line of rental cars. It finds the following information on city and highway fuel
efficiencies (mpg) for eight vehicles in consideration:
Year 2012 2013 2013 2012 2012 2012 2012 2012

Make Toyota Ford Ford Honda Toyota Toyota Hyundai VW
Model Prius Hyb. Fusion Hyb. C-Max Hyb. Insight Camry LE Hyb. Camry XLE Hyb. Sonata Hyb. Passat
City 51 47 44 41 43 40 34 31
Highway 49 47 41 44 39 38 39 43
(SOURCE: www.fueleconomy.gov)
a. Find the mean and median fuel efficiency for city and highway mileages of the
vehicles being considered. Comment on any differences between the two values.
b. What is the rank percentage of a vehicle that has 43 city mpg?
c. If the company makes its choice based on the top 15% of city and highway for the
vehicles being considered, what will be the minimum city and highway mileages
they should consider?
d. Make a recommendation for which vehicle(s) should be purchased, if any.

2.4 Descriptive Statistics – Variability
The measure of center is always a good start. But what does a sample mean not tell us? It fails to
describe how far apart the data are from one another. In other words, we need to assess the
variability of variance of the numbers we have collected.
The simplest way we might go about describing the variability is by simply looking at the range
of the data, such that:
Range = largest observation - smallest observation
Albeit, this still does not help us identify how spread out the data are. For example, suppose we
find our range to be 100 units (see dataset below). This might seem rather daunting at first, but
what if all values were clumped between 0 and 10, and there existed an outlier of 110?
Obviously, this range is often determined by outliers alone.
0 1 3 10 8 7 4 110
2.4.1 Standard Deviation
To create a better measure of variability that takes all data points into account, just like the mean
does, statisticians established a standard deviation. As the title implies, this is a standard tool
that measures the average deviations (or by how much each values deviates) from the mean. This
requires us to find all the deviations for points in our dataset,
We would find all of these. Let‟s demonstrate with the above dataset:
Value ̅
0 -17.875
1 -16.875
3 -14.875
10 -7.875
8 -9.875
7 -10.875
4 -13.875
110 92.125
Mean: 17.875
The deviations that we observe to be below the mean produce a negative deviation and the one
above the mean has a positive deviation. To find an average deviation, we would ideally add

them. However, observe that the sum of the deviations is 0! This is true of any dataset, since the
mean represents the “balance” of the dataset. Due to mathematical concerns that we won‟t state
here, mathematicians decided to square these values, since squaring converts all signed numbers
into positive values.
Value ̅ ( ̅)
0 -17.875 319.5156
1 -16.875 284.7656
3 -14.875 221.2656
10 -7.875 62.01563
8 -9.875 97.51563
7 -10.875 118.2656
4 -13.875 192.5156
110 92.125 8487.016
Mean: 17.875 Sum: 9782.88
Great, now they can be summed up to give 9782.88! Thus, we have found the following:
∑( ̅)
One would think that dividing by 8 would now be appropriate to find the average. Due to
mathematical properties that are beyond the scope of this course, the division will be by 7, which
is . Thus:
∑( ̅)
This value that we have found is called the variance.
NOTE: The division by has to do with the fact that we are often dealing with a sample in
inferential statistics and hope to make conclusions above a population.
Sample Variance
The variance of a sample, an uninterpretable measure of variability denoted by , can be found

by the following formula:
∑( ̅)
To make all of these calculations more meaningful (to have a true average), we should probably
“unsquare” the value that we have. When we do this, we get the sample standard deviation:

∑( ̅)
√ √ √
This is what we can think of as the average deviation of each point from the mean. It is clearly
high for this dataset. What is causing it? The outlier of 110!
Conclusion: On average, values in the dataset deviate from the mean by about 37 units.
Sample Standard Deviation
The standard deviation of a sample, denoted , is given by the following formula:
∑( ̅)
√
Note that this is simply the square root of the variance.
In Excel, the standard deviation can be calculated simply by using the function below:
= stdev(cell range)
Example 1: A river with mild current is known to have an average depth of 3 feet with a
standard deviation of 3 feet. The bottom is not visible. Is the river safe to cross by foot? Also,
what is the variance?
SOLUTION: Since there is a standard deviation of 3 feet, we can conclude, that, on average, the
river depth deviates by 3 feet from the mean. It would not be unusual to encounter a part of the
river with a depth of 6 or more feet. Therefore, the river should not be crossed by foot.
Since the standard deviation is the square root of the variance, the variance is the square of the
standard deviation. That is,
Thus, the variance is 9. The variance does not have a valuable interpretation.
2.4.2 How Do We Interpret the Value We Get?
Think about this: n is a fixed value for our sample, specifically 5. The only thing that could make
s2 large or small is the numerator. Thus, if the deviations are large (a bad thing!), then the
squared deviations will be large, and so the sum of squares will be large. This implies a large
standard deviation.

If the deviations are small (good thing!), then the squared deviations will be small, and so the
sum of squares will be small. This implies a small deviation.
So, a large standard deviation means that there is a lot of variability, or that the values are vastly
different from one another. A small standard deviation means the values in the data set are quite
alike. In the near future, you'll see why it is important to have a small standard deviation. In
general, as the variance and standard deviation get larger, our ability to make precise statements
about the population quickly evaporates.
We will be using variance and standard deviation consistently for the rest of the semester. It is
important to get comfortable with it.
2.4.3 Do Population Variances and Standard Deviations Fall into Play?
Indeed they do. Do you think that we can find them? Definitely not! The population variance
requires the use of the population mean, . How do we get ? We take the average of all the
values in the entire population. Since we typically don't know this value, we also typically don't
know the population variance, so certainly we don't know the population standard deviation
(since it's the square root of the population variance).
The table below summarizes the notations we need to recognize:
Variance Standard
Deviation
Sample
Population
The population parameter, , is the lowercase Greek letter “Sigma.” (This is as opposed to the
sample statistic, .)
2.4.4 Interquartile Range
The standard deviation, much like the mean, is easily skewed by excessively small or large
values. We noticed this in the first example in this section. Using the idea of medians and
percentiles is a safe bet for outlier-proofing our spread estimates. An interquartile range is the
difference between the 3rd quartile and the 1st quartile. Remember, these are simply the 75th and
25th percentiles, respectively. The difference is the middle 50% of the dataset.

This gives us a nice measure of how spread out the data is about the median.
Example 2: Consider the following home prices and find both the standard deviation and the
interquartile range. Describe what conclusions can be drawn from these values.
Values (thous. $) 95 875 96 89 87 88 93 91
SOLUTION: Using Excel, we find the following:
The standard deviation indicates that home prices, on average, vary by $277,100 from the mean
value. However, we see from the interquartile range that the middle 50% of homes only vary by
$6,500. The standard deviation is being skewed by the home that is priced at $875,000. The
interquartile range tells us that the majority of home values stay pretty close to the median value.
Additionally, we see that most home values are between $88,000 and $96,000.
2.4.5 Descriptive Statistics: Analysis ToolPak in Excel
To generate most of the features we have discussed up until now, we turn to Excel‟s Analysis
ToolPak for a more automated approach.
Let‟s consider the house data above:

Values (thous. $)
95
875
96
89
87
88
93
91
Access the Data Analysis tool from the Data tab in Excel. Select “Descriptive Statistics” from
the menu and select the data from the spreadsheet containing the data.
Be sure that you check “Summary Statistics.”

We can immediately see the mean and the median of the dataset. Additionally, we see the
standard deviation, variance, range, min/max, sum of the values, and the number of values in the
dataset, among other tools to ignore for now. We see, as expected, that the dataset does not have
a mode, or most frequently occurring value.

2.4.6 Shapes of Distributions
Now that we have a basis for measuring data in terms of its center and spread, we turn back to
making connections with the visual shape of the distribution.
There are many different shapes that we encounter for distributions. Let's discuss a few. First,
note that the following do not look like the rectangular histograms from earlier on. These are
smoothed out forms of what we experienced earlier. They are often used to describe the general
shape of a distribution. And, of course, they are much easier to sketch.
A histogram is said to be (a) unimodal if it has a single peak, (b) bimodal if it has two peaks,
and (c) multimodal if it has more than two peaks.
If we follow the curves from left to right, we begin at the lower tail, move over the peak(s), and
arrive back down to what is called the upper tail.
A unimodal histogram is said to be symmetric, if we are able to draw a line down the center
such that the left side of the line is a mirror image of the right side. Consider the following
unimodal symmetric histograms:
A unimodal histogram that is not symmetric is said to be skewed. If the upper tail of the
histogram stretches out much farther than the lower tail, then the distribution of values is
positively (right) skewed. On the other hand, if the lower tail is much longer than the upper tail,
the histogram is negatively (left) skewed. Can you identify the following unimodal histograms
as positively or negatively skewed?

Lastly, a normal curve is the most desired type, due to its (in general) nice properties. A normal
curve occurs quite frequently. It has a bell shape and is sometimes called the Gaussian curve.
Here are examples of normal curves:
2.4.7 Skewness
Excel also produces a nice measure that allows us to make conclusions about the general shape
of the distribution. This measure is called skewness.
If the skewness measure is:
 Postive, then the distribution is skewed right

 Negative, then the distribution is skewed left
 Zero, then the distribution is symmetric
The farther from 0 that the skewness measure is, the more skewed in the respective direction the
distribution will be.
Consider the following data showing the number of televisions owned by randomly sampled
individuals in a big city:

3 4 3 2 3 2 1 1 0
4 0 4 4 4 3 1 0 1
4 3 3 0 4 2 1 2 4
2 4 2 4 0 3 4 3 3
2 2 0 2 1 1 3 2 2
0 0 3 1 0 3 4 3 3
0 1 4 4 2 1 2 0 2
4 3 2 4 2 4 3 3 3
1 2 0 3 0 2 3 2 0
0 2 0 4 4 3 4 1 0
Using Excel, we produce descriptive statistics using the Analysis ToolPak:
We notice that the Skewness measure is positive: 0.51. This means the dataset is slightly skewed
to the right:

Histogram of TV's Owned
25
20
Frequency
15
10
0
0 1 2 3 4 5 6
Number of TV's
2.4.8 Outlier Detection
After analyzing a dataset, how do we assess likely values for data and deem other values as
outliers?
One approach is to determine how many standard deviations above (positive value) or below the
mean (negative value) a given data value is.
For instance, suppose we have a dataset with mean 20 and standard deviation 3. We have an
observation of 14. In terms of units, this value is 6 units below the mean. Thus, it has a deviation
of -6. This deviation tells us that the data value in question is 2 standard deviations below the
mean, since:
This measure is often called a z-score. Let‟s recap:
-Score
A -score tells us the number of standard deviations a data point, , is from its mean, ̅ .
Mathematically,

The idea of a -score is quite helpful, in that it tells us how far it is from the mean, relative to the
size of the standard deviation (the average spread). If is very close to 0, then the score is not far
from the mean. If it is very large, it is very far from the mean.
A very useful theorem established by Russian mathematician, Lvovich Chebyshev, allows us to

determine how large is very large. Chebyshev established the following theorem:
Chebyshev’s Theorem
For any , at least . / of the data values must be within (to the left and the right)
standard deviations of the mean, for any.
This works for any and all distributions.
Example 3: A data value is 3 standard deviations above the mean. Is this an extreme value?
SOLUTION: Chebyshev‟s Theorem states that
89% of all data points in this distribution will lie between -3 and +3 standard deviations from the
mean. Thus, there is, at most, an 11% chance of observing something higher than +3 standard
deviations. This data value is fairly unlikely an might be considered a mild outlier.
1. The Connecticut Agricultural Experiment Station conducted a study of the calorie content
of different types of beer. The calorie content (calories per 100 mL) for 26 brands of
light beer are:
29 28 33 31 30 33 30 28 27 41 39 31 29
23 32 31 32 19 40 22 34 31 42 35 29 43
a. Find the standard deviation. Explain the real-world meaning of this value.
b. Find the interquartile range. Explain the real-world meaning of this value.
c. Find the skewness. What type of shape does this distribution have?
2. The UNICEF report “Progress for Children” (April, 2005) included the accompanying
data on the percentage of primary-school-age children who were enrolled in school for 23
countries in Central Africa.

58.3 34.6 35.5 45.4 38.6 63.8 53.9 61.9 69.9 43 85 63.4
58.4 61.9 40.9 73.9 34.8 74.4 97.4 61 66.7 79.6 98.9
a. Find the range, standard deviation, and interquartile range. Explain what these
three values tell us about the shape of the distribution.
b. Explain the real-world meaning of the standard deviation and the interquartile
range.
c. Produce descriptive statistics for this dataset with the Analysis ToolPak in Excel.
d. Is the distribution skewed? If so, in which direction?
e. Create a relative frequency histogram. Describe any trends in the data.
f. Is an observation of 79.6 an outlier? Use Chebyshev‟s Theorem to justify your
answer.
3. The article “Determination of Most Representative Subdivision” (Journal of Energy

Engineering [1993]: 43-55) gave data on various characteristics of subdivisions that
could be used in deciding whether to provide electrical power using overhead lines or
underground lines. Data on the variable x = total length of streets within a subdivision (in
feet) are as follows:
1280 5320 4390 2100 1240 3060 4770 1050

360 3330 3380 340 1000 960 1320 530
3350 540 3870 1250 2400 960 1120 2120
450 2250 2320 2400 3150 5700 5220 500
1850 2460 5850 2700 2730 1670 100 5770
3150 1890 510 240 396 1419 2109
a. Find the range, standard deviation, and interquartile range. Explain what these
three values tell us about the shape of the distribution.
b. Explain the real-world meaning of the standard deviation and the interquartile
range.
c. Produce descriptive statistics for this dataset with the Analysis ToolPak in Excel.
d. Is the distribution skewed? If so, in which direction?
e. Find the -score for the observation 79.6. Explain what your answer means in
real-world terms.
f. Create a relative frequency histogram. Is an observation of 79.6 an outlier? Use
Chebyshev‟s Theorem to justify your answer.
4. Using the five class intervals 100 to 120, 120 to 140, . . ., 180 to 200, devise a frequency
distribution based on 70 observations whose histogram could be described as follows:
a. symmetric b. bimodal c. positively (right) skewed d. negatively (left) skewed

5. The Highway Loss Data Institute publishes data on repair costs resulting from a 5-mph
crash test of a car moving forward into a flat barrier. The following table gives data for
10 midsize luxury cars tested in October 2002:
Model Repair Cost

Audi A6 0
BMW 328i 0
Cadillac Catera 900
Jaguar X 1254
Lexus ES300 234
Lexus IS300 979
Mercedes C320 707
Saab 9-5 670
Volvo S60 769
Volvo S80 4194
a. Using Analysis ToolPak in Excel, generate all descriptive statistics. Discuss the
best measure of center and the best measure of spread based on what you see.
Justify why these measure were selected.
b. Find the -score for the observation 4194. Explain what your answer means in
real-world terms.
c. Is $4,194 considered an extreme outlier? Also use Chebyshev‟s Theorem to
numerically reinforce your answer.
6. Cost-to-charge ratios were reported for the 10 hospitals in California with the lowest
ratios (San Luis Obispo Tribune, December 15, 2002). The 10 cost-to-charge values
were
8.81 10.26 10.2 12.66 12.86 12.96 13.04 13.14 14.7 14.84
Discuss relevant descriptive statistics and a relative frequency distribution . Use your
information to make a conclusion about the state of hospitals in California.
7. The technical report “Ozone Season Emissions by State” (U.S. Environmental Protection
Agency, 2002) gave the following nitrous oxide emissions (in thousands of tons) for 16
states in the continental United States:
76 22 40 7 30 5 6 136 72 33
0 89 136 39 92 40 13 27 1 63
Generate a brief report about the distribution of nitrous oxide emissions in the sampled
states. Use descriptive measures and visuals to justify your answer.

Chapter 3
Probability and Decision Theory
When you stop, I mean really stop, and think about

how often you think in terms of probabilities, I am
confident you‟ll find you use it more often than not.
Do you ever decide to get to work by taking one
route as opposed to another? Would you find
yourself making health decisions based on your
doctor‟s advice instead of the advice you might
receive from a ten-year-old child? Have you ever
purchased a birthday gift for someone after deep
contemplation of what that person might like? Do
you trust one news network over another? What are
your decisions based on in these situations?
Whether or not you‟re willing to give in to your

inner nerd, you should admit that you think in terms of chances and likelihood. I imagine that
you do have a preferred route. I think that you do trust an expert‟s medical opinion. I believe that
you do make a gift purchase after considering what you think the recipient enjoys. I should think
there are some networks that you trust more than others.
In this chapter, we‟ll explore the nature of probabilistic thinking. You‟ll also notice the phrase
“Decision Theory” in the title. Instead of focusing on the trite probability questions involving
situations that we don‟t ever encounter, we‟ll concern ourselves with real-world situations where
probabilistic reasoning will help us make a decision.

3.1 The Idea of Probability
In this section, we‟ll address what probability is (and isn‟t).
Example 1: A weather report by the National Weather Service (NWS) stated on July 31, 2011
that, overnight, there was a 50% chance of precipitation in the 85225 zip code in which
Chandler-Gilbert Community College is located. What does this mean?
(SOURCE: www.crh.noaa.gov/)
SOLUTION: This is actually quite a loaded statement. One might want to say that, out of 100
times, it will rain 50 times. This is a very misleading approach for a couple of different reasons.
First off, what is meant by “times”? We are only concerned with one time: overnight on July 31,
2011.
A probability is actually a measure of how likely something is to occur in the long-run. That is,
if something were to be repeated in trials over and over again then, theoretically, the specified
outcome would occur a certain percentage of time. Importantly, it must be noted that the
conditions under which we are measuring a probability must be in place in order for the
probability to be a valid measure.
In our case, NWS states that, under the exact same environmental conditions taking place
throughout the night of July 31, 2011, it would be expected to rain 50% of the time.
The graph below shows a hypothetical scenario in which there is a 50% chance of precipitation
under the set of conditions that occurred on the above night. Notice that it rained on the initial
day and so immediately the proportion (or probability) of rainy days is 100%. As the same
conditions occur on different days, sometimes it rains and sometimes it does not. Having noted
that, any given day has a 50% chance of precipitation. We notice that the proportion is quite
unstable at first, jumping from 100%, down to nearly 40%; However, as many days with this
same set of conditions pass (in the long-run), we notice that the proportion becomes more stable
and approaches the theoretical probability of 50%.

Proportion of Rainy Days Under July 31,2011 Overnight Conditions
1.2
Proportion of Rainy Days
0.8
0.6
0.4
0.2
0
0 20 40 60 80 100 120 140
Day with Specific Conditions
Graph: Based on a random simulation involving the true probability of a 50% chance of precipitation
and what occurs in the long-run.
As an interesting note, NWS has sophisticated helium “balloons” that they send up into the air to
measure properties such as wind speed and direction, humidity, and barometric pressure. Then
physics is used based on theories of fluid mechanics to make the prediction.
Among many others that we could begin to state, there is one other major misconception about
probability: that if the probability that it rains is said to be very small and yet it rains, then the
probability must be wrong. This is incorrect. Probability is a measure of uncertainty. As in the
case of meteorology, the predictions are scientific and are based upon prior data. Just because it
has only rained, say, 10% of the time on days like today, this is not to say that it won‟t rain. In
fact, it very well might! The moral of the story is that probability talks about likelihood. Only in
the instance of 0% and 100% probabilities is anything guaranteed. If there are situations in which
something either never happens or always happens, then we‟re probably not concerned about
understanding probabilities.
Probability
Probability is a measure of uncertainty, typically expressed as a number between 0 (0%) and 1

(100%), that describes how likely it is that an event will or will not occur under a specified set of

conditions in the long-run.
Measuring Probabilities
While probability is considerably more complicated than we‟ll let on, the basic idea is that a
probability can be calculated by considering the number of times some event occurs relative to
the total number of “trials,” or observable situations. In simpler terms, it is the number of
“successes” out of the total number of trials.
Calculating Probability
The probability that event occurs, denoted ( ), is the ratio (or fraction) of successes divided
by the number of trials. Mathematically, we write the number of times occurs by ( ) and the
total number of trials as ( ). That is,
( )
( )
( )
This formula works when all elements in the sample space are equiprobable, that is, each
individual outcome in the sample space has the same probability of occurring as any other
outcome.
As a note the () notation stands for “the number of ways” the event in parenthesis can occur.
The in the denominator stands for sample space or the total number of things/situations/trials
being considered in the experiment.
Example 2: In a 2009 study of high-fructose corn syrup (HFCS), a corn-based sweetener used in
a wide variety of foods, beverages, and condiments, 20 samples of HFCS were analyzed. Of
those, nine of them were found to contain mercury by researchers. Based on the results of this
study, find the probability that a random sample of HFCS contains mercury and explain what this
result means.
SOURCE: http://www.washingtonpost.com/wp-
dyn/content/article/2009/01/26/AR2009012601831.html
SOLUTION: The event in this scenario is that mercury is found. Out of the total 20 trials, nine of
them contained mercury. Therefore,
( )

This means that if samples of HFCS were to be sampled randomly and repeatedly, it would be
found that 45% of all samples would contain traces of mercury. This does not guarantee that
exactly 45 samples out of 100 will contain mercury.
Example 3: In July 2011, temperatures in Gilbert, Arizona were above 100 every day
(SOURCE: www.weather.com). Based on this data, a researcher concludes that the probability of
above 100 temperatures in Arizona is 100%. Comment on his findings.
SOLUTION: Since temperatures in July 2011 were above 100 31 days of the 31 days in the
month, it is fair to make the experimental observation that approximately 100% of all days in
July 2011 have temperatures exceeding 100 , in the long-run (there have been days in the past
when temperatures were below 100 ); However, because we know that temperatures are
periodic, or that they go from low to high and back to low over the course of a year, 100% is not
a good estimate for temperatures in Arizona, in general (temperatures are reasonably never above
100 in January!).
This example truly stresses the importance of critical thinking when using probabilities. It is
often that probabilities are used and abused in the media, education, and in politics, just to name
a few. We want to make sure that we are as specific as possible.
It will often be considerably helpful to display probabilities in a tabular form, that is, through the
use of tables. This type of table is called a contingency table. This not only helps to organize
data, but to simultaneously see the big picture. Let‟s consider an example.
Example 4: In a 1950 study that considered 1,418 hospital patients in London (half of each) with
and without lung cancer and whether or not they smoked over the course of their lives, the
following was found:
Smoker?/Lung Cancer? Yes No

Yes 688 650
No 21 59
Assuming this data can be used as a representation of the entire population of London residents,
analyze the data by discussing the following:
a. What is the probability that a randomly selected participant within this study develops
lung cancer?
b. Provided that a person was a smoker, what is the probability that he has lung cancer?
c. Provided that a person was not a smoker, what is the probability that he has lung cancer?
d. Given that a person has lung cancer, what is the probability that he smokes?
SOLUTION:
When answering these questions, it is fairly useful to fully organize the data by providing all
totals:
Smoker?/Lung Cancer? Yes No Smoker TOTALS

Yes 688 650 1,338
No 21 59 80
Lung Cancer TOTALS: 709 709 1,418
1. Since there is a total of 1,418 individuals being considered and, of those, 709 developed
lung cancer,
( )
We must be careful in using this probability as it doesn‟t really reveal anything about the
link between lung cancer and smoking, since 709 patients with lung cancer and 709
without lung cancer were chosen to participate in the study to begin with. This is a
probability that was fixed by the researchers.
2. There is a total of 1,338 individuals in the study that smoke (we are limited to the
smokers only, per the way the question is stated). Of those individuals, 688 have lung
cancer.
( )
Slightly over half of the patients who are smokers developed lung cancer. This number is
frighteningly large. Before we jump the gun in assuming that smoking is the culprit here,
we should probably consider what happens with nonsmokers.
3. There is a total of 80 nonsmokers in the group. Of them, 21 developed lung cancer.
( )
Slightly more than one-fourth of non-smokers developed lung cancer. This number
appears to be significantly less severe than for the smokers. We speculate (but did not
prove) that smoking increases the likelihood that one will develop lung cancer.
4. There are 709 patients with lung cancer. Of these, 688 smoke.
( )

Are we confident in accusing a lung cancer patient of being a smoker? According to this
data, perhaps.
The moral of the story is: analyze the situation from a variety of lenses. What appears to be true
might be an illusion of what we see immediately! Sometimes, however, it is about what the
naked eye does not detect. This is what makes good analysts.
1. A classmate of yours was absent when this section was discussed. Explain to her what a
probability is in your own words.
2. In a study performed by Cambridge University in the United Kingdom, it was found that,
“One out of three people is overwhelmed by the latest breakthroughs in technology.”
(SOURCE: http://www.gev.com/2011/07/study-one-out-of-three-people-overwhelmed-
by-technology/). Primarily, individuals are overwhelmed by how much information is
available through the use of social networks and smartphones, to name just two. Explain
what is meant by this and explain in terms of probabilistic reasoning.
3. In a 2007 survey conducted by DDB Worldwide, an internationally known advertising

company, the following question was asked of a random group of 217 participants: “Is
consistency in branding becoming any more or less important?” The following table
displays the results:
Response Number of respondents

More important 143
Less important 74
Find the probability that a respondent believes that consistency in branding is:
a. More important, then explain what this means.

b. Less important, then explain what this means.
4. The probability that a visit to a primary care physician‟s (PCP) office results in neither
lab work nor referral to a specialist is 35%. Of those coming to a PCP‟s office, 30% are
referred to specialists and 40% require lab work.
Determine the probability that a visit to a PCP‟s office results in both lab work and
referral to a specialist. (Video Solution)
5. A public health researcher examines the medical records of a group of 937 men who died
in 1999 and discovers that 210 of the men died from causes related to heart disease.
Moreover, 312 of the 937 men had at least one parent who suffered from heart disease,
and, of these 312 men, 102 died from causes related to heart disease.

Determine the probability that a man randomly selected from this group died of causes
related to heart disease, provided that neither of his parents suffered from heart disease.
(PROBLEM SOURCE: SOA/CAS Exam P Sample Questions, Page 5) (Video Solution)

3.2 Joint Probability
In the previous section, we began computing probability using some fairly basic ideas. In
calculating probabilities, we made a huge assumption: that the found number represents what
will occur in the long-run. For instance, if we conduct a study and find that out of 100 people, 94
respond positively to a new energy drink, can we conclude the drink is effective in providing
added energy?
The answer to this question is

humbling: it depends upon how the
data was collected. Suppose the
participants are all college students
who tend to consume a large amount of
caffeine as it is. Would it be fair for the
advertisement to say, “There is a 94%
chance that this energy drink will
energize you?” Not necessarily, since
the result only appeared to be valid in a
sample of college students. This means that the population must be specified form which the
sample was taken. In this case, the population is the set of all college students and the sample is
the 100 students who were selected. Thus, perhaps the advertisement should say, “Are you a
college student? If so, there is a 94% chance that this energy drink will energize you?” That is,
provided that this sample was a random sample and not a group of college students hand-picked
from the respective population.
Okay, so you have a data sample collected from a specific population and your goal is to now
talk about probabilities.
Example 1: Imagine that you work for a marketing agency and your goal is to determine the
effectiveness of two different branding approaches to a new line of clothing. The first approach
involves establishing a group of Facebook followers by giving incentives for discounts on
clothing by becoming a friend of the company. The company hypothesizes that seeing the
company logo under on their Facebook account each week,
they will gain a strong familiarity and comfort level with
the company‟s product. The second approach involves
hiring Hollywood actors to endorse the product at film
festivals and celebrity appearances. The company then
tracks the degree of success of the branding tactic by
measuring the number of retail outlets that agree to stock
the product based on the branding used. They find that, of
the 6 companies exposed to Tactic 1 (T1), 5 agreed to stock
the product. Of the 7 companies exposed to Tactic 2 (T2), 5
agreed to stock the product.

Because of the amount of resources involved in selling the product to retail stores, a single
marketing analyst can only reach out to about 15 business per month; However, if successfully
sold, the result is a high level of profit for the clothing company, which, in turn, means you
might get that raise after all.
SOLUTION: Let‟s start with a simpler question, and first consider T1. We find that the
probability of a successful sale is:
( )
This means that we should expect 80% of all companies to sell the clothing line, in the long-run.
Suppose that a marketing analyst is to offer T1 to two different companies. He would like to
know, what is the probability that both companies agree to sell the product? Is the answer 80%?
Unfortunately, no. There is an 80% chance that each company agrees to sell the clothing line.
We should expect that the probability that both sign-on is less.
We know that about 8 out of 10 times, Company 1 (C1) will sign-on and that 8 out of 10 times
Company 2 (C2) will sign on. Let‟s compare the possibilities by using a tabular approach:
Company 2
Choices
Y Y Y Y Y Y Y Y N N
Y
Y
Y
Y
Y
Company 1 Y
Choices Y
Y
N
N
Each cell in the table represents a particular combination of the C1‟s choice and C2‟s choice. So,
the 1-1 entry (remember, this means first row, first column) of the table is the situation in which
it does indeed turn out that C1 and C2 agree to sell the clothing line. The question was, what is
the probability that both sign-on? Since the definition of probability is the ratio of the number of
ways the event can occur divided by the total number of possible outcomes, let‟s do a bit of
counting by highlighting important features of the table:
Company 2
Choices
Y Y Y Y Y Y Y Y N N

Y
Y
Y
Y
Y
Company 1 Y
Choices Y
Y
N
N
The shaded region represents the number of ways in which we can get both companies to sign
on. This region is 8 x 8, which creates 64 possibilities. The total number of possibilities is simply
the total number of cells in the table. Since the table is 10 x 10, we have100 possibilities.
So,
( )
This is, as speculated, less than the probability that only one company signs on. Let‟s consider
what we really did here:
( )
( )
Notice that
( ) ( )
( ) ( )
Or, in short,
( ) ( ) ( )

Example 3: The idea of red-light cameras has been disputed quite often in Arizona and all across
the United States. While unable to find any specific details, the author will assume that red-light
runners have about a 70% chance of being caught by a red-light camera on any given instance.
Suppose that on a given day, two cars run through an intersection during separate red lights,
setting off the camera. What is the probability that both drivers are
caught?
SOLUTION: We can fairly assume that the first driver being caught
and the second driver being caught (calling these events and ,
respectively) constitute events that do not affect one another. Thus,
( ) ( ) ( )
There is a 49% chance that both drivers are caught. This is about the likelihood of getting heads
on the toss of a coin.
Example 5: In a crop of corn, the Food & Drug Administration

(FDA) finds that two of the 20 bushels of corn are potentially
contaminated with E. coli. Supposing that two bushels have
already gone out for shipment to county marketplaces, how likely
is it that both of the contaminated bushels have gone out?
SOLUTION: The question asks about the probability that both

have been shipped, that is, the first contaminated bushel and the
second contaminated bushel. We will refer to these events as simply and . We will first
write the “and” probability in the form of dependent events and will then determine whether or
not a dependency exists (see Independence Property box above).
( ) ( ) ( )
We know that ( ) . Now, since the first probability “removes” one of the two
contaminated bushels and one bushel out of the 20 available, the probability of shipping a second
bushel is slightly changed to:
( )
Thus, the events are indeed dependent, and so the probability becomes:
( )

There is less than a 1% chance that both contaminated bushels went out.
Does this outcome satisfy the farm producing these bushels of corn? Thinking in more detail, the
main concern is actually in regards to one or more (at least one) contaminated bushel going out!
In order to address how to find this, it is useful to think about the following, perhaps obvious,
characteristic.
Basic Properties of Probability (Kolmogorov Axioms)
1) A particular event is: guaranteed to not occur, is guaranteed to occur, or lies somewhere
between these extremes.
2) In a given situation, or sample space, the likelihood of something occurring (however
small or insignificant), is guaranteed.
3) The summed probabilities of all the possible events in a situation constitute the entire, or
the whole of all possibilities.
Mathematically, suppose that a sample space consists of n events . Then, the

above verbal rules translate into:
1) For any arbitrary event between events 1 and n, let‟s call this event , then:
( )
2) Using to denote the sample space,
( )
3) Summing the probabilities gives 100% of all possible outcomes:
( ) ( ) ( ) ( )
These basic properties are often referred to as the Kolmogorov axioms, named after the
mathematician Andrey Kolmogorov. An axiom can be thought of as a necessary assumption. For
instance, when physicists develop new concepts in physics, they assume that gravity follows
certain properties. Thus, they have gravity axioms.
The Kolmogorov axioms are extremely important in probability and the development of new
ideas.

In fact, recall Example 5 dealing with the contaminated corn crop. What are all the possibilities
for shipping out two bushels from the total of 20? Let‟s list them out:
 0 contaminated bushels and 2 uncontaminated bushels ship (call it

))
 1 contaminated and 1 uncontaminated bushels ship (call it )
 2 contaminated bushels ship (call it )
Are there any others? Not unless there is a possibility we have not considered. Since two bushels
are guaranteed to go out, the outcome must fall into one of the three categories listed.
Let‟s calculate the probability for each of these by hand:
 ( ) ( )
( ) ( )
 ( ): there are two possibilities; either the first is contaminated and the second is not, or
vice versa. We must consider both outcomes below:
o ( )
( ) ( )
o ( )
( ) ( )
These two possibilities give 9.5% + 9.5% = 19% of the sample space.
 ( ) (from previous calculation)
(NOTE: Importantly, summing these three probabilities gives 1, as stated in the axioms!)

We can now see that the situation in which there is at least one contaminated bushel will occur
of the time. This is much higher than when we concerned ourselves with
both going out! This is quite a frightening situation!
Needless to say, this was a lot of work; however, we can use the axioms to simplify the amount
of work we commit to ourselves.
According to axiom 2:
( ) ( ) ( )
Our earlier statement involved wanting to know the likelihood that at least one contaminated
bushel went out. That only involves and ! Solving for the sum of these two probabilities:
( ) ( ) ( )
That is,
( ) ( )
( )
This is the same number we achieved taking the long route! We only had to find the probability
of shipping 0 bushels, which is a little bit of work as compared to a lot of work!
Probability of At Least One…
Given any number of events involving quantities, the probability of at least one in quantity is 1
minus the probability of 0 in quantity. That is:
( ) ( )
Mathematically, let subscripts represent quantity, where corresponding events are

denoted . Then,
( ) ( ) ( ) ( )
1. In 2009 the H1N1 virus, commonly referred to as the “Swine Flu,” reportedly infected an
estimated 10% of New Yorkers (SOURCE:
http://www.reuters.com/article/2009/08/30/us-flu-newyork-idUSTRE57T26Y20090830).

Suppose that an emergency room in New York City has two individuals with flu-like
symptoms. (Video Solution)
a. What condition(s) do you believe would make it appropriate to assume
independence in this situation?
b. By using the tabular approach and assuming independence, find the probability
that both people have the H1N1 virus.
c. By using the “and” rule, verify that you get the same answer that you found in
Part b.
d. Find the probability that neither of these individuals has the H1N1 virus.
e. Find the probability that at least one of them has the H1N1 virus.
f. Exposure to flu germs for even a short period of time can significantly increase
one‟s chances of catching the flu. Suppose that if one is exposed to an individual
with the flu virus, their chance of becoming infected is 15 percentage points
higher than normal. Find the probability that both individuals have the flu virus.
2. Many fire stations handle emergency calls for medical assistance as well as calls
requesting firefighting equipment. A particular station says that the probability that an
incoming call is for medical assistance is .85. This can be expressed as P(call is for
medical assistance) = .85.
a. Give a relative frequency interpretation of the given probability. That is, interpret
what the number .85 means based on the definition of probability.
b. What is the probability that a call is not for medical assistance?
c. Assuming that successive calls are independent of one another (i.e., knowing that
one call is for medical assistance doesn't influence our assessment of the
probability that the next call will be for medical assistance), calculate the
probability that both of the two successive calls will be for medical assistance.
d. Still assuming independence, calculate the probability that for two successive
calls, the first is for medical assistance and the second is not for medical
assistance.
e. Still assuming independence, calculate the probability that exactly one of the next
two calls will be for medical assistance. (Hint: There are two different
possibilities that you should consider. The one call for medical assistance might
be the first call, or it might be the second call.)
f. Do you think it is reasonable to assume that the requests made in successive calls
are independent? Explain.
3. "N.Y. Lottery Numbers Come Up 9-1-1 on 9/11" was the headline of an article that
appeared in the San Francisco Chronicle (September 13, 2002). More than 5600 people
had selected the sequence 9-1-1 on that date, many more than is typical for that sequence.
A professor at the University of Buffalo is quoted as saying, "I'm a bit surprised, but I
wouldn't characterize it as bizarre. It's randomness. Every number has the same chance of
coming up. People tend to read into these things. I'm sure that whatever numbers come up
tonight, they will have some special meaning to someone, somewhere." The New York
state lottery uses balls numbered 0-9 circulating in 3 separate bins. To select the winning

sequence, one ball is chose at random from each bin. What is the probability that the
sequence 9-1-1 would be the one selected on any particular day?
4. On August 8, 2011, the Dow Jones Industrial fell 635 points (5.5%) to 10,810 points,
representing the 6th worst point loss ever experienced. On that day, President Obama‟s
approval ratings also suffered tremendously; only 22% of the nation‟s voters “Strongly
Approve” of how he is performing in the presidential role (SOURCE:
http://www.rasmussenreports.com/public_content/politics/obama_administration/daily_pr
esidential_tracking_poll).
Suppose presidential hopeful Randall Terry (Democrat) speaks at a rally shortly

thereafter and assumes that his approval rating as a candidate will likely closely mirror
President Obama‟s. Suppose there are 40 swing voters (voters that are “on the fence”
about who to vote for). (Video Solution)
a. What is the probability that all 40 voters will strongly approve of Terry‟s plan?
b. What is the probability that none of the 40 voters will strongly approve of Terry‟s
plan?
c. What is the probability that at least one voter will approve of Terry‟s plan?
5. The following case study is reported in the article "Parking Tickets and Missing
Women," which appears in an early edition of the book Statistics: A Guide to the
Unknown. In a Swedish trial on a charge of overtime parking, a police officer testified
that he had noted the position of the two air valves on the tires of a parked car: To the
closest hour, one valve was at the 1 o' clock position and the other was at the 6 o' clock
position. After the allowable time for parking in that zone had passed, the policeman
returned, noted the valves were in the same position, and ticketed the car. The owner of
the car claimed that he had left the parking place in time and had returned later. The
values just happened by chance to be in the same positions. An "expert" witness
computed the probability of this occurring as (1/12)(1/12) = 1/144.
a. What reasoning did the expert use to arrive at the probability of 1/144?
b. Can you spot the error(s) in the reasoning that leads to the stated probability of
1/144? What effect does this error(s) have on the probability of occurrence? Do
you think that 1/144 is larger or smaller that the correct probability of occurrence?
6. Jeanie is a bit forgetful, and if she doesn't make a "to do" list, the probability that she
forgets something she is supposed to do is .1. Tomorrow she intends to run three errands,
and she fails to write them on her list.
a. What is the probability that Jeanie forgets all three errands? What assumptions did
you make to calculate this probability?
b. What is the probability that Jeanie remembers at least one of the three errands?
c. What is the probability that Jeanie remembers the first errand but not the second
or third?
7. One of the myths most commonly believed by students on multiple choice exams is that,
as long as they always use letter „C‟ as their guess, they increase their chances of

guessing correctly. This, of course, is absurd, since there is not usually a set pattern used
by instructors in pairing correct answers with certain letters (certainly not for me,
anyhow).
Suppose that a multiple-choice quiz has two problems on it and that the student has no
idea how to answer them, so he guesses. Each problem has letters A-E corresponding to
the answers to choose from. Using counting techniques discussed in class, find and
explain how you found the following: (Video Solution)
a. What is the probability that both guesses are correct?

b. What is the probability that both guesses are incorrect?
c. What is the probability that he receives a 50% on the test?
d. How likely is it that he gets at least one problem correct?
e. What is the probability that he receives a 90% on the exam (assume no partial
credit is possible)?
f. How did the idea of “counting tables” allow you to answer these questions
without having to do additional work for each subsequent table?

3.3 Probability of Unions
Imagine that you toss a fair, two-sided quarter. You let it land and take a look at the side facing
up. What is the probability that you see heads or tails (assume the toss will be ignored if it
happens to land on its side)?
You can probably see fairly quickly that the outcome desired is guaranteed; when a coin is
tossed, it will result in one of two outcomes: heads or tails. If someone in a bet were to tell you
that he will win if the toss of a coin results in heads or tails, then you could probably tell him,
“Congratulations!”
Adding to our intuition (no pun intended), we will write the situation in the form of a
mathematical probability. The sample space will have two outcomes:
Then,
( )
Since we know that
( ) ( )
So, we can gladly write:
( ) ( ) ( )
Simple enough! We feel pretty satisfied and so we hope to

tackle another problem:
Example 1: A large company offers a self-insured health

insurance policy to its employees to help them reduce premium and copay costs. Using its
historical data from the last two years, the company analyst considers the risk status of the
employees (low or high) based on preexisting conditions, and the type of claim filed (physical
health or mental health). He finds that 70% of employees have filed a mental health claim and
that 40% of employees have been categorized as high risk. Further, he finds that 20% of
employees are low risk and have filed a physical health claim. The company only insures the
first claim. All claims thereafter are paid for by a third-party insurer.

For reporting purposes, he would like to find the probability that a randomly selected employee
(or an employee that is to be hired in the future) is high risk or will file a mental health claim. As
he is writing his report, he reaches a speed bump:
Letting ,
( ) ( ) ( )
He quickly realizes that this probability is invalid because a probability cannot be greater than 1,
or 100%. What happened?
SOLUTION:
We first organize his data into a table to help us better see what is happening:
Claim\Risk Low High

Physical .20
Mental .70
.40
The probabilities outside of the boxes represent totals for mental health claims and for high risk
claims. The probability in the 1-1 entry of the table represents the probability of being low risk
and filing a physical health claim. Since we know that this data represents all of those who have
filed claims, we know that 100% have filed one type or the other. Additionally, each employee
considered falls into one of the two risk categories. So we fill in more details:
Claim\Risk Low High

Physical .20 .30
Mental .70
.60 .40
We can also proceed to fill in the boxes in the table, since each person falls into exactly one of
the four positions (low physical, low mental, high physical, high mental):
Claim\Risk Low High

Physical .20 .10 .30
Mental .40 .30 .70
.60 .40
Now, the analyst added to second row total with the second column total, as highlighted in the
table below:
Claim\Risk Low High

Mental .40 .30 .70
.60 .40
The problem seems to be that the .40 and the .70 both include the probability of High Risk and
Mental Claim! In other words, it is being counted twice, hence the end probability that is great
than 1.
Instead, let‟s add up the individuals box probabilities as illustrated in the table below:
Claim\Risk Low High

Mental .40 .30 .70
.60 .40
We find that ( ) , which is a number that rests between

0% and 100%. We conclude that, in fact, there is an 80% chance that a claim-filing employee is
high risk or files a mental claim (or both!!).
While this does not seem like a huge amount of work, suppose that we instead had three types of
claims and 3 different statuses. It would probably be convenient to have some sort of
mathematical approach to the solution.
Let‟s go back to the table in which the double-count occurred:
Claim\Risk Low High

Mental .40 .30 .70
.60 .40
We are free to add the two probabilities, ( ) and ( ), but we must be sure to take out the .30
one time, so that it is single-counted and not double-counted:
( )
This is the same answer as before! Notice what we really did:
( )
( ) ( ) ( )
Regardless of the context/application of the probability, this issues can be resolved as shown.
Probability of One Event “Or” the Other

Given two events, and , the probability that one or the other occurs is the sum of the
individual probabilities with the double-count removed once. Mathematically,
( ) ( ) ( ) ( )
Typically, is used (called a union) to replace the word “or”, making the above equation
( ) ( ) ( ) ( )
At the beginning of this section, we addressed a coin-tossing problem that involve the
summation of the probability of heads and the probability of tails. Let‟s see why we could get
away with not subtracting away the double-count. We use the “Or” probability set-up:
( ) ( ) ( ) ( )
We already know that the first two probabilities on the right-hand side, but what is the third
probability value? Let‟s analyze its meaning:
( )
Of course, it is impossible to get both heads and tails in one toss of a coin! Any impossible
outcome has a probability of 0%. That is:
( )
So,
( ) ( ) ( ) ( )
We simply “lucked-out” when this problem worked-out according to our intuition. In general,
you need only to remember the “Or” probability formula for the reasons given to solve any
problem involving the occurrence of one outcome or another.
Example 2: It is often interesting to note how political preference (Democrat or Republican)

varies within a married couple. Suppose that in a survey of
160 couples it is found that 60 of the couples agree on a
preference to vote Democrat and 40 are such that the
husband votes Democrat and the wife votes Republican. The
total number of wives that vote Democrat is 90. What is the
probability that the couple has a husband or a wife that is
Republican?

SOLUTION: We first arrange this information in a table:
Husband\Wife Democrat Republican

Democrat 60 40
Republican
90 160
Note that the bottom-right corner represents the table total.
We know that the number of husbands voting democrat is . This means that the
number of husbands voting Republican is . Additionally, we conclude that the
number of couples where the husband votes Republican and the wife votes Democrat is
. We fill this information in:

Democrat 60 40 100
Republican 30 60
90 160
This allows us to fill in the remaining details in the table:

Democrat 60 40 100
Republican 30 30 60
90 70 160
We convert the totals into percentages by dividing each cell entry by the total number of couples,
160:

Democrat .375 .25 .625
Republican .1875 .1875 .375
.5625 .4375
Let
So,
( ) ( ) ( ) ( )

We find that there is a 62.5% chance that in a couple either the husband votes Republican, the
wife votes Republican, or both vote Republican.
At this point you might be wondering why we don‟t simply draw out the table and ignore the
mathematical formulas. When possible, tables are extremely useful, but they might not always be
available. Consider the following example.
Example 3: Testing has determined that a particular ballistic missile has an

80% chance of hitting its intended target. Suppose that an enemy jet
approaches a military base and so two missiles are fired at the incoming jet.
What is the probability that this threat is eliminated?
SOLUTION: This is the probability that one or both missiles hit the target.
We only have one probability, so filling out a table would not be possible.
Let
We want to know
( ) ( ) ( ) ( )
We already know the first two probabilities on the right hand-side (.80), but we are not given
information on ( ). We can fairly assume that the outcome of one missile has no (or
very minimal) impact on the outcome of another missile, and so we assume the events are
independent. This allows us to write:
( ) ( ) ( )
And so,
( )
We conclude that there is a 96% chance that the enemy jet is eliminated.

1. A gaming investor is considering becoming a financial partner in a new casino. In

deciding to go in on the deal, he reviews gaming revenues for previous years. From
experience and industry research, he decides that the gaming industry tends to be
successful when total gross revenues for card rooms are above $1 million or when gross
revenues for lotteries are above $20 billion. Between 2000 and 2009, he found that 50%
of the time, both sectors have been successful and that 0% of the time only card tables
were successful (and lotteries were not). Lotteries were unsuccessful 30% of the time
(SOURCE: 2011 U.S. Statistical Abstract, Table 1258). What is the probability that the
investor‟s conditions will be met? In your professional opinion, is it likely that he will
decide to become a partner in the proposal? (Video Solution)
2. A researcher conducts a study on a total of 600 cats to determine whether or not they tend
to be adaptive to danger and whether or not their time to respond to those dangers is fast
enough to avoid harm. The animals were exposed to non-harmful stimuli to assist in
answering the researcher‟s questions. In his report he details that, “207 non-adaptive cats
were studied and, of them, 180 were found to have response times that were simply not
fast enough. By comparison, a total of 300 cats were both adaptive and had response
times that were fast enough.” How likely is it that a cat is adaptive to environmental
physical dangers or has a response time that is fast enough? (Video Solution)
3. In the March 3, 2011 episode of the Dr. Oz Show entitled “Dangerous Doctors: Is Your
MD Hazardous to Your Health?” Dr. Oz mentioned that 20% of the time doctors order
scans to protect themselves from a lawsuit. Dr. Oz also said, “Up to 1/3 of all tests and
treatments are entirely unnecessary.” (Video Solution)
a. Two patients are given orders for scans from a particular doctor. What is the
probability that one patient or the other were given scans to protect the doctor
against a lawsuit?
b. One patient is given orders for two different tests/treatments. What is the
probability that one or both of them was/were unnecessary?
c. A patient is prescribed a scan and a blood test. What is the probability that an
unnecessary prescription was made, through the patient‟s eyes?
4. In all of his Fall 2010 classes, Milos discovered that 44% of his students earned a „B‟ or
better on their homework average. He also discovered that 50% of his students had a „B‟
or better homework average or a „B‟ or better overall grade in the class (SOURCE:
Milos‟ Fall 2010 Grade Spreadsheet). If 30% of all his students received a „B‟ or better
homework average and a „B‟ or better class grade, what percentage of his students earned
a „B‟ or better in the class? (Video Solution)
5. In all of his Fall 2010 classes, Milos discovered that the percentage of all students that
earned a „C‟ or better homework average, 87% of these students earned a „C‟ or better
final class grade. 70% of all students in his classes earned a „C‟ or better homework
average or earned a „C‟ or better final class grade (SOURCE: Milos‟ Fall 2010 Grade
Spreadsheet), while only 49% earned a „C‟ or better on homework and as a final class

grade (some still did well in the class, but maybe failed to turn in homework). What is the
probability that a randomly selected student in his class earned a „C‟ or better final class
grade? (Video Solution)

3.4 Conditional Probability
In many cases, a probability depends on what we already know. For instance, would we believe
that the likelihood of a car accident changes, provided that the roads are slick from snow? We
would probably agree that the likelihood increases if we already know the road conditions.
Suppose a fair, two-sided coin is tossed. You are told that the outcome is not a head. What is the
likelihood that the outcome is tails?
The answer is probably obvious… if you know the outcome was not heads, and the only two
possibilities are heads and tails, then there is a 100% chance the outcome is tails.
This is a conditional probability. That is, if
Further, to indicate that the outcome is not one of the above, we often put a bar on top of the
event name:
̅
̅
Then,
( )
However, given that we know the outcome was not tails, the probability of heads jumped to 1.
We might write:
( ̅)
Instead of using the word “given” we often use a vertical line (called a “pipe”), |. That is,

( ̅)
Conditional Probability
The conditional probability of event provided that already occurred is written as
( )
And implies that the likelihood of may be different, knowing that already took place.
Example 1: Due to wars at sea, shipwrecks, and other such disasters, there are (roughly)
around 3,000,000 sunken vessels in the all of the seas in the world! Suppose an area of the ocean
is mapped out due to the historic ships that have wrecked in that area. There is speculation that,
of the estimated 20 ships in that region, 11 are original pirate ships. Given that a pirate ship is the
first of the 20 recovered, what is the probability that the next one found will also be a pirate ship?
SOLUTION:
We would like to find the probability that a pirate ship is found, given that one pirate ship has
already been removed. If one ship is removed, there are 19 ships left. Since the ship removed
was a pirate ship, there are only 10 remaining. That is,
( )
Note that this is different than,
( )
Why?
This probability has no condition placed on it. It assumes the very basic information: 20 ships, 11
pirate ships. So,
( )
The conditional probability, in this case, is different than the unconditional probability.
Example 2: Determine whether or not the following situations represent and as

independent or dependent events.
a) : It rains in Chandler today

There is a car accident in Chandler

b) : The Arizona Cardinals make it to the playoffs
Subway runs out of whole wheat bread
c) : Dow Jones Industrial reports an enormous loss
Microsoft stocks plummet
d) ( ) ( ) ( )
e) ( ) ( ) ( ) )
f) ( ) ( ) ( ) )
SOLUTION:
a) Dependent; rain likely greatens the likelihood for accidents
b) Independent; these events probably don‟t have any impact on one another
c) Dependent; Microsoft is part of the Dow Jones Industrial and so there is a strong
relationship between the two
d) Independent; we see that the likelihood of does not change given that has occurred
– it is still .75
e) Dependent; the likelihood of does change given that has occurred – it drops to .3
f) If the product of the two given events does equal the probability of and , then the
events are independent, as this would mean that ( ) is .75, which is the same as
( ). We see that , and so we conclude that the events are independent.
Example 3: An aircraft radar system detects 30 aircraft in a 100-mile radius. Of these, 18 are
ally planes, 6 are cargo planes, and 6 are enemy planes. Given that a plane approaching the radar
is ruled out as being an enemy plane, what is the probability that it is a cargo plane?
SOLUTION: First off, define:
We want to know,
( ̅)
Since it is not an enemy plane, it must be one of the remaining 24 aircraft. Of those, 6 are cargo
planes, so
( ̅)

Example 4: Suppose that Company 1 (C1) and Company 2 (C2) are
competitors in the clothing business. In fact, they both have locations
within Chandler Fashion Center Mall. Given previous business
experience, the marketing analyst knows that each company has an
80% chance of agreeing to sell a particular line of clothing; However, if
C1 agrees to sell the clothing line, C2 wants to stay competitive and so
definitely purchases the clothing line. How is the probability that both
will agree affected by this new knowledge?
SOLUTION: In this situation, the decision of C2 is dependent (conditional) upon the decision
of C1. Consider a table in which C2‟s choices will reflect the decision of C1.
Company 2
Choices When
C1 Agrees
Y Y Y Y Y Y Y Y Y Y
Y
Y
Y
Y
Y
Company 1 Y
Choices Y
Y
N
N
( )
The difference is that C2‟s decisions are all to agree, provided that C1 has agreed. If C1 does not
agree, then we‟re not really sure how C2 will act, but we don‟t really care, since the probability
we are in search of is when both companies agree!
Here we have:
( ) ( ) ( )

We could just as well have written,
So as to be using the decimal form instead of the tabular fractions.
If you look back at the reasoning here, you‟ll notice that we have bolded the word “dependent.”
In previous sections, we didn‟t have to worry about dependency, since we assumed that the
choices of C1 and C2 were independent, that is, one outcome did not affect the other, and vice
versa.
How do we know whether events are dependent or independent? Often times this is based upon
some knowledge of the situation or, perhaps, our intuition. Let‟s set up the important ideas here
and then we‟ll look at a few examples of dependence versus independence.
Probability of Two Events Occurring Simultaneously
Given two events, and , then
If and are independent events, then
( ) ( ) ( )
( ) ( ) ( )
where is a symbol to represent the word “and”. We use this in mathematics often.
And if and are dependent events, then
( ) ( ) ( )
( ) ( ) ( )
Or, as it is often written
( ) ( ) ( )
In either instance, the end result involves multiplication.
NOTE: and are generic names and thus can be attached to an event in an arbitrary order.
As an interesting note, we can make the following conclusion:

Independence Property
Given two events, and , if ( ) ( ), then does not depend on , and so the
dependence formula reduces to:
( ) ( ) ( )
( ) ( ) ( )
This result is important, because it allows you to only have to remember the “and” rule for
dependent events. If the next event does not depend on the prior event, then the end probability is
just a product of the two individual probabilities.
Though the ideas presented above might at first seem confusing, you‟ll notice that the idea of
joint probabilities has not changed. The only new caution is to take care to acknowledge whether
the events are independent or not. We‟ll consider a few more examples below.
Example 5: The probability that a resistor and capacitor both fail in a portable electronic
device in the fifth year of use is 0.95%. The probability that the resistor fails is 1.22% and the
probability that the capacitor fails is 1%. Are the events independent? If they are not
independent, what is the probability that the capacitor fails given that the resistor fails?
SOLUTION:
Let
If the two events are independent, then the product of unconditional probabilities should give us
the provided joint probability.
We have that,
( )
( )
If they are independent events, then
( )
However, the joint probability under independence is 0.0122%, not 0.95%.
Thus,
( ) ( ) ( )

That is, the probability that the capacitor fails is dependent upon the resistor failing. Filling in
what we know:
( )
Dividing gives,
( )
Thus, there is a 77.9% chance the capacitor fails if the resistor fails. The resistor is an integral
part in this device. The likelihood of the capacitor failing increases, if the resistor fails first.
The above examples brings up a useful result.
Calculating the Conditional Probability of A given B
Since ( ) ( ) ( )
We have that,
( )
( )
( )
Example 6: In a demographic study of a small, it is found that 5% of the adult residents are
unemployed and living at or below poverty level. A total of 8% are unemployed. What is the
probability that a person in this town is living at or below the poverty level, given that they are
unemployed? Interpret the meaning of your answer.
SOLUTION:
Letting = a person lives at or below the poverty level and = a person is unemployed, we
would like to know, ( )
We have that ( ) ( ) . Thus:
( )
This says that, if a person is unemployed, there is a 62.5% chance they are living at or below the
poverty level. We would probably expect this figure to be quite high.

Conditional probability is quite useful when used in the correct way. The counterintuitive
problem below will allow us to shed light on how important it really is to think about
dependencies.
Example 7: As part of a narcotics checkpoint, officers randomly search freight trucks for
shipments of illegal drugs. The officers search a small number of crates in the trucks that are
chosen for random inspection. Suppose that, unbeknownst to the officers, there are two trucks
ahead, one of which contains one crate with illegal drugs. This truck has a total of 8 crates, while
the truck without drugs has a total of 5 crates. One of the two trucks will be randomly chosen.
What is the probability that the officers find the drugs?
SOLUTION: At first, it is tempting to say that the probability is , however this is not accurate.
The probability that the officers find the crate with drugs is dependent on them choosing the
correct truck first!
Let
Two things must happen: they must choose the correct truck and they must choose the correct
crate. Randomly choosing one of the two trucks is equiprobable, ( ) . If the correct truck is
chosen, then the probability of choosing the correct crate is , that is, ( )
( ) ( ) ( )
Why is it not valid to say 1/13? It might appear that probability is simply pulling a “fast one” on
our intuition.
A simple way to think about it is as follows: there is not just one random process here. If all the
crates were in the same truck, there would indeed be a 1/13 chance that we‟d get the right crate.
However, there are two random processes here. If you don‟t choose the correct truck, then
choosing the correct crate is impossible. The likelihood of the second random process leading to
the correct crate is indeed deeply affected by the outcome of the first random process!
Example 8: Reconsider Example 7:: Let‟s say that the second truck had two crates with
shipments of drugs. As before, one of the two trucks will be randomly chosen. What is the
probability that the officers find the drugs?
SOLUTION:

This can happen in one of two ways:
 the truck with 8 crates ( ) is selected and the one correct crate is chosen OR
 the truck with 5 crates ( ) is selected and one of the two correct crates is chosen
We will first create a small tree diagram showing the possible outcomes.
The beauty of this diagram is that it displays the conditional probabilities on the right “stems” of
the tree for each initial choosing of the truck.
The probability that drugs are found would thus be:
 Truck 1:
 Truck 2:
Since these are distinct outcomes and cannot both occur (there is no overlap in the events), it is
okay to add them
Thus, there is a 37% chance that drugs are found between the two trucks. Again, note that the
probability is not simply , as our intuition might falsely lead us to believe.
To formalize the tree above,
Let

( )
( ) ( ) ( )
Since only one truck will be chosen, the probability of findings drugs in T1 and T2 is 0.
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
Summing these together yields , as with the tree diagram.
1. A deck of standard playing cards has 52 cards. There are four suits: clubs, diamonds,
hearts, and spades. There are two colors of cards – red and black. Diamonds and hearts
are red, and clubs and spades are black. The cards are labeled A (Ace), 1-10, J (Jack), Q
(Queen), and K (king). To better visualize, consider the illustration below:
Suppose you are given various conditions and that you must determine the probability of
the specified draw on the next card. Use the card descriptions above to find that
probability that: (Video Solution)
a. Given that one Jack is removed, a Jack is drawn

b. Given that all red cards are removed, a black card is drawn
c. Given that a red Queen is removed, a red Queen is drawn
d. Given that all red Queens are removed, a black Queen is drawn
e. Given that all Kings are removed, a red card is drawn

f. All numerical red cards are removed, a king is drawn
g. A red king is removed, a black king is drawn
2. An auto insurance company finds that there is an 18% chance that a teenager gets into a
car accident between ages 16 and 19. There is a 34% chance that a teenager gets a traffic
ticket during this same age range. They find that the chance of getting into a car accident
and getting a traffic ticket (not necessarily because of the accident) is 10%. (Video
Solution)
a. Based on the probabilities provided, are the two events independent? Perform a
calculation to justify your answer.
b. Given that a teenager gets into an accident, what is the probability that he gets a
traffic ticket?
c. Why did the probability change in this way, as compared to the unconditional
probability of getting a traffic ticket?
d. Given that a teenager gets a traffic ticket, what is the probability that he gets into
an accident?
e. Explain, in practical terms, what your answer in d) means.
3. Let , , and be events in a sample space. Do the following: a) explain whether or

not the events are independent or dependent, and b) answer the questions below regarding
these events with the information provided. Assume the first event listed in each
probability statement occurs first (e.g. ( ) means occurs first). (Video
Solution)
( )
( )
( )
( )
( )
( )
a. ( )
b. ( )
c. ( )
4. Gregor Mendel was a monk who, in 1865, suggested a theory of inheritance based on the
science of genetics. He identified heterozygous individuals for flower color that had two
alleles (one r = recessive white color allele and one R = dominant red color allele). When
these individuals were mated, ¾ of the offspring were observed to have red flowers and
¼ had white flowers. The table summarizes this mating; each parent gives one of its
alleles to form the gene of the offspring.
Parent 2
Parent 1 r R
r rr rR
R Rr RR

We assume that each parent is equally likely to give either of the two alleles and that, if
either one or two of the alleles in a pair is dominant (R), the offspring will have red
flowers. (Problem source: Mathematical Statistics with Applications, 6th Ed., Wackerly,
et al.) (Video Solution)
a. What is the probability that an offspring has one recessive allele, given that the
offspring has red flowers?
b. What is the probability that an offspring has one dominant allele, given that the
offspring has white flowers?
c. What is the probability that an offspring has white flowers, given that it has one
recessive allele?
d. What is the probability that an offspring has white flowers, given that it has one
dominant allele?
e. What is the probability that an offspring has red flowers, given that it has one
dominant allele?
5. There are 5 candidates for 2 town council positions. Three of them are for the removal of
a landfill just outside of the city limits. The same candidate cannot fill both seats. (Video
Solution)
a. What is the probability that one randomly chosen candidate in the group is for the
removal of the landfill?
b. Given that one of the positions is filled with a candidate in favor of the landfill
removal, what is the probability that the second candidate chosen is also in favor?
c. What is the probability that two candidates in favor of the landfill removal are
chosen?
d. What is the probability that only one seat is filled by a candidate in favor of the
landfill removal?
e. What is the probability that at least one seat is filled by a candidate in favor of the
landfill removal?

3.5 Combinations and Permutations
Recall from Section 3.2 the problem faced by a corn growing business: the FDA determines that
two of the 20 bushels are potentially contaminated with E. coli. Two bushels had been shipped
out and the question was: what is the probability that both bushels that were shipped to the local
grocer were uncontaminated?
We wrote the simultaneous probability as
( )
( ) ( )
Due to the fact that one of the uncontaminated bushels was removed from the “pool”, there was
now only a 17/19 chance that the second uncontaminated bushel would be pulled. In short, we
wrote:
( )
We notice that the numerator and denominator both have a product of two sequential numbers.
Had they shipped, say, four bushels, the probability that all four were uncontaminated would be:
As you might imagine, this pattern continues.
How painful, though, would it be to have to multiply eight or nine probabilities of this nature
together? You could certainly do it, but you might think, “It sure would be nice to take advantage
of this pattern!” Well, we‟re in luck!
Let‟s define an important term:
A factorial is a descending product of whole numbers down to 1, beginning at a specified whole

number. To start with a generic whole number, , we denote this product by , and write:
( ) ( )
Example 1: Find .
SOLUTION: By definition of factorial, we write

This definition is great, but it still does not resolve our crisis: how do we multiply on a specific
number of sequential whole numbers?
Here‟s a little trick: write the factorial out, then divide out the factors that are not needed. For us,
this means:
But this is the same thing as:
In a similar way, we can write the denominator of our probability by:
Before we push this too far and get ourselves into a trap, let‟s consider a different example with a
smaller sample space.
Suppose that there are only 3 bushels of corn and that only one is contaminated with E. coli.
Again, let‟s say that two are shipped out. Then,
( )
If you recall the tabular approach to thinking about this, we might show the possibilities for
uncontaminated bushels, U1 and U2, and the way in which they can appear:
2nd Bushel
U1 U2
st
1 Bushel U1
U2
We know that the pairs (U1, U1) and (U2, U2) for the 1st and 2nd bushels cannot be possible,
since that particular bushel is removed from the population. So, we denote that in the table by
blacking-out those cells:

2nd Bushel
U1 U2
1st Bushel U1
U2
Perfect! So we see the remaining two possibilities, right? Well, actually, is there a difference
between (U2, U1) and (U1, U2)? Not unless those two bushels are actually different than one
another! So, blacking out either one of these pairs leaves:
2nd Bushel
U1 U2
st
1 Bushel U1
U2
One possibility!
You might be wondering why we‟re bothering with this if we‟ve already found the probability.
This is a good thing to wonder.
Recall that a probability is the number of ways an event can happen divided by the total number
of outcomes. To be consistent with this definition, we really should be putting 1 in the
numerator. Does that mean we miscomputed the probability? Not in this particular example, but
it can happen.
To make our denominator consistent, let‟s look at the total number of possibilities for selecting
bushels, adding in the contaminated bushel, C:
2nd Bushel
U1 U2 C
st
1 Bushel U1
U2
C
Again, it is not possible to select the same pair twice, so we black-out the diagonals:
2nd Bushel
U1 U2 C
1st Bushel U1
U2
C
Are we done? Not unless we feel that (U2, U1) is different than (U1, U2). We notice that the
three cells to the right of our blacked out diagonal are duplicates of those to the left. Thus we can
cross them out, as well:

2nd Bushel
U1 U2 C
1st Bushel U1
U2
C
This leaves us with three possibilities. So, our probability should be:
( )
Wait! This is the same as our earlier calculation of
( )
Since we get the same answer, one might think that it must not matter which approach we take.
Many times, it doesn‟t; however, “many” is not satisfying enough, since this leaves us prone to
mistakes under different circumstances.
Let‟s analyze the full situation two different ways. We found that if we don‟t eliminate order
differences, then we can write the probability as:
If we did (correctly) eliminate order differences, notice that we cut the number of possibilities in
half, that is, divided by 2. You‟ll notice that 2 is the same thing as . So, let‟s divide out
the number of duplicates from top and bottom:
And

Which, in its final state gives:
This does look rather complicated, but remember that it follows from some fairly simple things
that we have built up on. Also notice that both the top fraction and the bottom fraction have .
Ah, yes! So that‟s why the order-not-eliminated and order-eliminated answers are the same:
⏟ ⏟
While this works out beautifully in this example, it is not always true, and so we must take care
to observe whether order difference is important. We will see examples later where this
difference will come into play, but those situation are a bit more advanced.
Let‟s simplify this horrid notation a bit. Suppose that there are a total of items and of those
are to be drawn.
Permutation – Order Does Matter
If order is not to be eliminated (in cases where order is important), then the number of ways to
select things from the given is called a permutation and is denoted:
( )
NOTE: ( ) , that is, factorial is not distributable!! Subtract first, then use
factorial.
For our numerator, we had selected 2 uncontaminated bushels from a total of 18 uncontaminated
bushels. According to our new notation, this can be written as:
( )
And this is precisely what we have written for the numerator!
For our denominator, we had selected 2 (general) bushels from a total of 20 (general) bushels,
since we want to know the total number of ways 2 objects can come out of 20.

( )
And this is precisely what we have written for the denominator!
In simplified notation,
( )
Calculator Clinic – Using Permutations
To evaluate a permutation,
1. first enter in your home screen
2. Go to and move to the left to the PRB tab.
3. Select 2: nPr. This will return you to your home screen.
4. Enter and press ENTER
TIP: Sometimes the value of the numerator or denominator is so large that the computer
throws an overflow error. It is advisable to enter the entire probability in, numerator and
denominator to avoid this potential problem.
Let‟s now consider the case where it is important to

Combination – Order Does NOT Matter (Eliminated)
If order is to be eliminated (in cases where order is not important), then the number of ways to
select things from the given is called a combination and is denoted:
( )
NOTE: ( ) , that is, factorial is not distributable!! Subtract first, then use
factorial. Additionally, the factorial of a product is not the product of factorials, that is,
.
For our numerator, we had selected 2 uncontaminated bushels from a total of 18 uncontaminated
bushels, eliminating the number of repeats, which was 2, or . According to our new notation,
this can be written as:
( )
And this is precisely what we have written for the numerator!
For our denominator, we had selected 2 (general) bushels from a total of 20 (general) bushels,
since we want to know the total number of ways 2 objects can come out of 20, order aside.
( )
And this is precisely what we have written for the denominator!
In simplified notation,
( )
Calculator Clinic – Using Combinations
Follow the steps for finding permutations, but in Step 3, use 3: nCr instead.
Example 2: Every week, Cori stops at Chipotle Mexican Grill for

lunch with his colleagues. Each time, he drops a business card into
the fishbowl for a chance to win lunch for his entire office. After the
seventh visit, Cori begins to wonder his chances of winning. He
estimates that there are approximately 40 cards in the bowl. If two
were to be drawn, what is the probability Cori wins both draws?

SOLUTION: We first think about what it is that we need to know. Per the question asked, the
event is that the first and second cards drawn are both Cori‟s.
This event occurs when the 2 cards drawn both come out of the 7 he has put in thus far. Since the
order in which his two cards are drawn don‟t matter (as the prize is the same), we would like to
know the value of
The sample space is simply the total number of outcomes. Two cards will be drawn from the
stack of 40, and since order doesn‟t matter
Thus, the probability of this event is
( )
There is about a 3% chance that both of the cards drawn are Cori‟s.
Example 3: Probability is often used in police investigations to help

determine probable cause. Suppose that in a gang-related report it was
stated that three gang members were spotted. In an interrogation room,
20 gang members are suspects, three of whom are certain to have
committed the crime. A detective has a suspicion that the three came
from a gang of which 5 of its members are present. Just by chance,
how likely is it that the three members came from the gang he believes to be behind the
crime? Does this give him what you might consider “probable cause” to pursue the group?
SOLUTION: The event is that the three criminals come from a group of five particular gang
members. There are
The total number of way three-criminal groups that can be formed out of the suspects is
This means,

( )
There is only a .9% chance that the three gang members all come from the presumed gang. The
detective should consider more evidence to narrow down the search results before making
assumptions.
Example 4: A business creates a new system to keep track of client relations, such that
information about the client and a particular orders placed can be accessed by a nonrepeating,
four character or digit number. For instance, KA23 and
AK23 are possible codes. Any code containing only letters
will be reserved for large clients. How many such codes of
non-repeating letters can they make available, and
assuming all such codes will eventually be used up what
percentage of the company‟s clients will be considered
large clients?
SOLUTION: There are 26 letters in the alphabet and, of

those, four will comprise a single, large-client code. There
are different codes without the same
letters being repeated, but where order does matter.
In order to know what percentage (or probability) of the total number of possible codes this
represents, we need to compute the total number of codes that can be formed, where no letter or
number is repeated, but where order does matter. This is precisely what permutations are for.
Since there are 26 letters and 10 numbers, a total of 36 different “symbols” can be selected from.
The number of permutations is
total different codes1 without the same letters or numbers being repeated, but
where order does matter.
So, the percentage/probability, then, is:
( )
We conclude that 25% of all clients (the large clients) will have completely alphabetical codes.
1
Notice that the increase in the number of possibilities after increasing the size of the sample space is not
proportional to the increase amount. The growth is actually exponential, not linear.

Example 5: In Example 4:, it was necessary that letters and numbers were not to be repeated.
Recalculate the number of large client codes and the percentage of them by assuming that
numbers and letters actually can be repeated.
SOLUTION: Recall that a permutation or a combination is intended to handle situation in which

repeats are not allowed. Recall from the beginning of this section that to find the number of ways
in which two bushels of corn could be selected from a crop of 20 (and after one is selected, the
sample space reduces in size), we wrote:
In this situation, we are allowing repeats. For the number of ways to form a 4-letter code, we
have 26 possibilities for each digit. That is 26 for the first, the second, the third, and the fourth.
Crossing all of these possibilities gives:
Which we expect to be larger than in the previous example since we are allowing repeats.
Similarly, the number of letter/number codes that are possible can be calculated by noting
that, in general, each piece of the code has 36 possibilities. So,
The percentage/probability is
( )
The percentage changes to 27% of all codes will contain only letters.
Moral of the Story with Counting
’
determining some key pieces of information:

1. Are repeats/replacements allowed? If yes, permutations/combinations are likely
the incorrect approach.
2. Does order matter? If yes, permutations should be used. If no, combinations should
be used.
You Might Be Wondering:
You might be wondering why we must divide by to remove all repeats. This was
probably somewhat obvious when working with two objects. Say there are 5 objects to
select from. One is now gone, so for the second selection there are only 4. We proceed to
cross out everything along the and to the right of the diagonal since they are either not
possible or are s ’
Object 1 Object 2 Object 3 Object 4

Object 1
Object 2
Object 3
Object 4
Object 5
We have essentially multiplied the first five possibilities by the next number of possibilities,
which is only 4 (this is accounted for by crossing out the diagonals, since this subtracts out
five possibilities to give ), and then divided that result by 2, since half of the table is a
repeat. That is,
What happens when we select a third object? We extend the above table as a multiple of 3,
since there are three objects left. Each table represents a pairing with one of the three
remaining objects, as shown in the upper-left corner:

OBJECT 1 Object 1 Object 2 Object 3 Object 4
Object 1
Object 2
Object 3
Object 4
Object 5

Object 1
Object 2
Object 3
Object 4
Object 5

Object 1
Object 2
Object 3
Object 4
Object 5
In the first table, we can cross out the first column (and first row, if it were there), since it is
not possible to select object 1 for a third time. In the second table, we can cross out the
second column/row and in the third table we can cross out the third column/row for the
same reason as table 1.

Object 1
Object 2
Object 3
Object 4
Object 5

Object 1
Object 2
Object 3
Object 4
Object 5

Object 1
Object 2
Object 3

Object 4
Object 5
Also notice that the second column of table 1 and the last three rows of table are the same
(1, 2, 3), (1, 2, 4), and (1, 2, 5). For a similar reason, the third column of table 1 can be
crossed out, since it is a repeat of what we have in column 1 of table 3.

Object 1
Object 2
Object 3
Object 4
Object 5

Object 1
Object 2
Object 3
Object 4
Object 5

Object 1
Object 2
Object 3
Object 4
Object 5
Nothing else in table 1 can be eliminated, since (1, 4, 5) cannot be found in either of the two
remaining tables (this is a unique characteristic of the bottom, right-most entry).
In table 2, we will try to eliminate any entries that can be found in table 3. These
eliminations will involve any entries that contain Object 3. We can do so with the (2, 1, 3)
entry and the third column:

Object 1
Object 2
Object 3
Object 4
Object 5

Object 1

Object 2
Object 3
Object 4
Object 5

Object 1
Object 2
Object 3
Object 4
Object 5
Now, notice that we have 10 white spots left. This happens to be exactly one-third of what
we had after we tripled the table. That is,
⏟
⏟
Which can be simplified to,
( )
Selecting items allows this process to repeat, ad nauseam, any number of times.
Mathematicians discovered that this tabular process could be reduced into the formula we
“ ” general case (where we
allow to be any value between 0 and the number of items we have to choose from), which
tends to be discussed in more theoretical mathematics courses such as Discrete
Mathematical Structures (our MAT227).
1. If possible, give an imaginary (but realistic) scenario for each of the following. If not
possible, state why.
a.
b.
c.
d.

e.
2. Your classmate was absent when permutations and combinations. Explain when he
should and when he should not use permutations and combinations. (Video
Solution)
3. A police officer has been brought before the court on accusations of racial profiling.
This occurs when a person of a particular race has been pulled over or detained by
the police due to his race. The officer stopped 2 vehicles out of 10 that passed by
through a freeway tollbooth. Both of the suspects were Asian and there were a total
of 3 Asian drivers in the 10. (Video Solution)
a. In how many ways could 2 drivers have been selected from the 10?
b. In how many ways could 2 Asian drivers have been selected from the 3?
c. How likely is it that the 2 selected drivers would both have been Asian if the
stops were truly random?
4. In the United States, 20 out of the 50 states spend more than 50% of their state park
and recreation areas revenue on keeping the state park operable (SOURCE: 2012
U.S. Statistical Abstract). Suppose a survey of 10 states is to be conducted next year
to see if anything has changed. (Video Solution)
a. In how many ways can 10 states be selected for the survey?
b. In how many ways can 10 states be drawn so that all 10 are operating on
more than 50% of their state park and recreation areas revenue?
c. What is the probability that all 10 of the states drawn are operating on more
than 50% of their state park and recreation areas revenue?
5. Ten pieces of furniture are to be arranged in a long row in a furniture store. In how
many ways can all 10 be arranged? (Video Solution)
6. At Chandler-Gilbert Community College high-school math competitions, students

enter into a raffle to win various prizes, including a graphing calculator. There are
typically around 200 students. Suppose there are 5 different types of calculators to
be given out and that the best is saved for last. (Video Solution)
a. In how many ways can the prizes be distributed among the 200 students?
b. Suppose a school has 5 attendees. In how many ways can all 5 students from
this school win a calculator?
c. What is the probability that all 5 students from this school wins a calculator?
7. A frequent concern of cautious consumers is the idea of the last four digits of a credit
card number being displayed on receipts. Suppose a consumer has a Visa, which has a
total of 16-digits, each of which can be between 0 and 9. For the sake of simplicity,
suppose any combination is possible. A customer left the following receipt lying around
and is now concerned about his identity: (Video Solution)

a. First, how many different credit-card numbers are possible with 16 digits?
b. How many different credit-cards numbers can be arranged with 6781 as the last
four digits?
c. On any one guess by a potential thief, what is the probability that he correctly
guesses this person‟s credit card number?

3.6 Expected Value
Imagine that you are an

insurance salesperson with
many years of experience. A
new client has requested that
your business provide him
with auto insurance. He is 20
years old and has never been
in an accident before.
Considering age alone, you
look at industry data and find
that, as recently as 2008,
there was about a 15% chance that someone his age would get into an accident (SOURCE: U.S.
Statistical Abstract, Table 1113). Using your own expertise you find that, of your 20 year-old
clients, the typical accident payment for his particular make and model of vehicle is about
$3,200. He brings forward a quote from another insurance agency for a $100/month premium
with no deductible (nothing to pay when an accident does occur except the running premium).
The question is, do you insure him?
Let‟s look at the possibilities in a tabular form. Since there‟s a 15% chance the driver will get
into an accident, there is an 85% chance he won‟t (since it either does happen or it doesn‟t). If
there is no accident, then the insurance company receives $1200 for the entire year. If an
accident does occur, the insurer pays out $3200 (hence a negative effect), but still receives the
year‟s premiums. Thus, the net difference is $2000, which the insurer is responsible for.
Action Likelihood Monetary Value to Insurer

Accident 15%
No Accident 85%
If we now consider 100 years, it is expected that 15 of those years there would be an accident
and 85 of them there would be no accident, assuming the constant probability. That means the
insurer would pay $2000 a total of 15 times and receive $1200 a total of 85 times. Let‟s consider
the net difference:

This amount looks very good! In fact, on average, the company received
. This customer is definitely profitable to the company, in the long-run. Of course,
we know that an accident could occur the first year, in which an $800 loss would be incurred
right away.
Notice what we really did here. We took the sum of the amounts and divided by 100:
( )
By properties of a common denominator we can write:
( ) ( )
( ) ( )
( ) ( )
In reality, we multiplied each monetary value by its respective probability. This idea is
known as expected value, since it is what we expect to happen in the long-run.
Expected Value and Random Variable
Expected value is the expected, or average, quantity that should occur in the long-run,
provided that each quantity occurs with a certain probability.
Suppose there are quantities, , each of which occurs with a certain

probability, , respectively, then the expected value, denoted , - is
, -
A capital , , is used to denote what is called a discrete random variable, a variable that
takes on one of (a natural number of) values with a certain probability. This value is
defined by what it measures in the given situation.
Importantly, , that is, we must account for 100% of all possible

outcomes in order for the expected value to be meaningful.

An expected value is actually not something terribly new. To see this more explicitly,
suppose a student earns three test scores: 95%, 80%, and 85%. Then the average
percentage is:
Observe that we can use properties of fractions to separate the sum as follows:
( ) ( ) ( )
While one-third in this situation is not a probability (since the scores have already been
) “ ” -third of the overall
class grade.
Example 1: A company sells consumer electronics, such as televisions, stereos, and

computers. For each product, the company offers the consumer
a warranty that protects any problems that might occur within
the first two years, with the exception of accidental damage
and theft. For a particular television that runs $1200, it offers a
2-year warranty for $ ’
determines that 3% of these televisions malfunction each year.
Is the company offering the warranty at a profitable price?
Explain your answer and define the random variable.
SOLUTION: We should determine what will happen, on average. We first see that the
warranty is a 2-year warranty and the defect rate is for one year. If 3% malfunction each
year, then 6% of all televisions are expected to malfunction within the first two years.
This means that the company will make $175 with a 94% probability and will lose $1200-
$175=$1025 with a 6% probability, since it will still receive the payment, but will have to
either replace the product or offer a credit to the consumer.
Letting , then the expected amount to be gained, or

, -, is
, - ( )

This means that, after selling this product for a while, it should earn an average of $103
from each consumer that purchases the warranty. This is a profitable outcome.
Example 2: The Arizona Lottery has a number of different lottery games that a person
can play. One in particular is Fantasy 5. The rules of the game are simple: pay $1 per
ticket and select five numbers between 1 and 41. Five numbers are then selected at
random. If you correctly selected two or more of these numbers, then you are
considered a winner. The following table describes the likelihood of winning:
(SOURCE: www.arizonalottery.com)
The estimated jackpot for the Wednesday, August 17, 2011 lottery was $54,000. Is the
game in your favor? Why or why not?
SOLUTION:
We must first consider the fact that these prizes do not take into account that $1 was lost to
purchase the ticket; we should subtract $1 from each of the prizes. Additionally, we note
that the probabilities do not add to 1:
The remainder of the time, it is simply the case that $1 is lost:
We rebuild the table to show all of the values and probabilities:
53,999 499 4 0 -1
( ) 1/749,398 1/4163 1/119 1/11 9,004/10,000

Where
The expected value is:
, - ( ) ( ) ( ) ( ) ( )
This means that if one were to play time-after-time, taking into consideration the small
likelihood of winning occasionally, one would be expected to lose, on average, $0.67 per
ticket.
Notice that we represented the outcomes by using a table, in which we listed the outcomes,
or the individual along with the probability that this occurs, ( ). This is one way in
which to display a probability distribution, or how all probabilities are distributed among
the various outcomes.
Example 3: A fair, six-sided die is tossed repeatedly. The number of dots that are facing
up after each throw is recorded. Define the random variable, find its probability
distribution, and find and interpret the expected value of the random variable.
SOLUTION: We define the random variable,
The different values that can take on are , since we know there are six
sides. Since this is a fair die, each of these six outcomes has an equally likely chance of
appearing, so ( ) , for all values, of . Our probability distribution is thus,
1 2 3 4 5 6
( ) 1/6 1/6 1/6 1/6 1/6 1/6
The expected value is the sum of the products of each outcome value and its associated
probability.

, - ( ) ( ) ( ) ( ) ( ) ( )
The average value of a die that is repeatedly tossed will be 3.5. If we were to conduct a
simulation we would probably see something similar as in the introductory section of this
chapter:
Average Die Roll Outcome

6
Average Die Roll Outcome
0
0 20 40 60 80 100 120 140
Number of Times Die Has Been Tossed
As time passes, we see that the average roll becomes more stable and seems to e approaching
3.5, as we have shown mathematically.
Example 4: In hopes of understanding the directions in which married couples are naturally
inclined to walk at an outdoor mall in Arizona, a marketing group conducts a study. It is the
experience of the mall that men and women tend to walk in different directions once they
park (and catching up later). The first question is how many individuals within a couple can
they expect to start their walk through a street that has one or more clothing stores?

SOLUTION: We first note that there are three paths out of five with one or more clothing stores.
We assume there are two people per couple and that each takes a different initial route. The
random variable we are interested in is:
The random variable can take on values, , since it is possible that neither of them
take a clothing store route, only one does, or both do.
We need to find the probability for each of the three events.
individuals taking a route with a clothing store would occur when, from the three clothing
store routes, none are selected, and both routes without clothing stores are selected. We then
must compare this to the number of ways two routes can be chosen from five. That is,
( )( )
( )
( )
Similarly, for , we want to know how many ways one clothing-store route and one non-
clothing-store route can be selected. That is,
( )( )
( )
( )
For

( )( )
( )
( )
Our probability distribution is:
0 1 2
( ) 1/10 6/10 3/10
We can see that the probabilities sum to 1, which helps to imply that we have accounted for all
possibilities.
The number of individuals expected to take a clothing store route is an expected value of this
distribution,
, - ( ) ( ) ( )
Thus, it can be expected that, on average, at least one person from the couple will walk along a
route that contains a clothing store.
One additional way to represent a probability distribution is by using a probability histogram.

A histogram looks similar to a bar graph, except that it has a numerical horizontal axis and
measures the probability along the vertical axis. Additionally, the bars touch in order to show
continuity, where applicable. For the above situation, we would expect to see:

Clothing Store Route Probabilities
0.7
0.6
Probability 0.5
0.4
0.3
0.2
0.1
0
0 1 2
Number of Individuals
This is a convenient visual way to view the distribution of probabilities. It is clear to us that it is
quite unlikely that neither of the individuals in the couple will walk a route without a clothing
store.
1. While working in downtown Phoenix, the author tracked minutes that the Blue Line
bus going through downtown Phoenix, AZ was late in arriving at a specific bus stop. He
discovered the following: (Video Solution)
On time 1 2 3 4
( ) 0.53 0.25 0.18 0.03 0.01
a. Construct a probability histogram.

b. What does the probability histogram reveal?
c. Find and interpret the expected value of the random variable.
(SOURCE: Author‟s data)
2. A Geico auto insurance policy for a 21-year-old Chandler male driver of a 2012 BMW
M5 with no previous tickets has a semi-annual premium of $312.41. In the instance of an
accident, there is a $1,000 deductible that the policyholder must pay before insurance will
cover the damages (SOURCE: www.geico.com). The vehicle costs about $115,000 to
replace. From past experience, suppose Geico knows there is a 2.5% chance (annually)

that this situation will result in an accident. Find the expected payout for Geico and
comment on its profitability in a situation like this. (Video Solution)
3. An insurance policy pays $100 per day for up to 3 days of hospitalization and $50 per
day for each day of hospitalization thereafter. (Video Solution)
The number of days of hospitalization, , is a random variable with probability given by

the function
( ) {
a. Define the random variable.

b. Give the probability distribution for by using a probability histogram.
c. What does the probability histogram tell you about hospitalization?
d. Determine the expected payment for hospitalization under this policy.
(SOURCE: Society of Actuaries (SOA), Spring 2003 Exam P, #36)
4. You work on a dairy farm and are in charge of quality control for eggs. Your primary
concern is that broken eggs do not go out. You know from past experience that about
25% of the outgoing boxes contain one or more broken eggs (based on complaints). If a
local restaurant purchases 4 boxes of eggs from you, what is the expected number of
boxes with broken eggs that this vendor should receive? (Video Solution)
5. At a major seafood restaurant, shrimp fettuccini is a popular dish. The company is

considering adding a family-sized fettuccini dish, but would first like to make sure that it
will be a profitable endeavor. The company randomly surveys customers that who
purchase the original $14.99 dish and finds that 15% would purchase the larger family
dish. What should they charge for the family-sized dish so that average revenue from
shrimp fettuccini will be $17.00? (Video Solution)

Chapter 4
Discrete Probability Distributions
It might seem paradoxical to say that uncertainty occurs in certain ways, but the truth is that it
does – assuming certain assumptions are satisfied. As we build a probability distribution,
whether in the form of a table or histogram, we can often times save ourselves a lot of labor by
focusing on the type of experiment that lay before us. The purpose of this chapter is to
(hopefully) simplify some of our efforts.
4.1 The Binomial Distribution
1.1.1 Why Probability Distributions Are Useful
Suppose a friend of yours, let‟s call him Kyle, tells you that his brother is 6-feet, 9-inches tall.
You are most likely wide-eyed and surprised by what he just told you.
Why is this?
You likely have some idea of how tall people generally are. You would probably consider a
height of 6-feet, 9-inches to be uncommon in the environment you‟re used to. In fact, you might
even go as far as to call this height an outlier, or a value that falls outside the usual data range.
How can you be absolutely sure that this height is uncommon? What if you live in a region that
tends to have shorter people?
The statistician would say that it would be nice to see a probability distribution associated with
heights of all people living in the region, state, country, or continent on which you live. She
would argue that, if you are trying to describe the people in the U.S. based on people living in
Arizona, you are drawing from a biased sample.
While we will not discuss continuous random variables here (variables that can take on any
number in a specified range), we will show a theoretical distribution for heights in the U.S.
below:

For men, we see that the most frequently occurring height is near 70 inches (5-feet, 10-inches). It
is very uncommon to have someone who is 80 inches tall (6-feet, 9-inches). This type of
information allows us to conclude that your brother‟s friend is indeed very tall.
You might be wondering how we know that the shapes of the distributions should look like bells.
This is based on the data collection process. It is not unlikely in nature for distributions to have a
heavily loaded center with lower frequencies out towards the left and right tails. While the
histogram of all heights might not have a perfect bell shape as we indicate, having this shape
allows us to use mathematics to model the curve.
Although many variables do take on a continuous set of values, we will begin with discrete
random variables, as these are slightly simpler to describe.
1.1.2 The Binomial Distribution
When we talk about any variable that can take on a finite (as opposed to infinite) number of
possibilities, we are dealing with a discrete random variable.
Specifically, a binomial random variable is one that takes on one of two possible values, as
indicated by the prefix “bi.” We will simply refer to the outcome as either a “success” or a
“failure.”
Consider this example: let‟s say that you and a friend are tossing a coin (since this is one of the
most exciting things to do). Your friend tosses 9 heads out of 10 tosses. Curious about this, you
begin to analyze the results – how likely is that this type of event could take place?
By letting and represent the events that a head/tail is facing up on a coin toss, respectively,
we know that one possible way in which this can happen is:
The probability of this particular sequence of 9 heads and 1 tail is:
( ) ( )

This is definitely a small probability, but it is not the only way in which this can happen. The tail
can occur first, second, third, fourth, etc., with heads all around it. Another one would be:
The probability of this sequence is the same: 9 heads, 1 tail. This is okay, since the probability of
tossing a certain sequence does not affect the probability of getting a head or tail on the next toss.
So,
( ) ( ) ( ) ( ) ( ) ( )
Not surprisingly, there are 8 more places for the tail to have appeared. We‟ll summarize in the
table below:
Arrangement of 9 , 1 Probability
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
Since these are 10 distinct ways of getting this outcome, each with probability 0.000977 (that is,
each takes up 0.0977% of the entire sample space), the probability of getting 9 heads and 1 tail
is:
( )
As suspected, this particular event is not very likely.
What if we complicated the problem a little more and asked, what would be the probability of
having two tails mixed up in 10 total tosses?
This gets more complicated, since the two tosses could occur one after another, two tosses apart,
three tosses apart, etc. To simplify our lives, it can be shown that the total number of ways in
which a binary “success” can occur is by finding the following combination:
. /

So, we had 10 trials and wanted to know the number if ways in which 9 heads (successes) can be
included in the mix. We have:
. /
Then, we simply need to find the probability of just one of those arrangements and multiply it by
the number of different arrangements.
Since we defined a head resulting as a success, then, what we just calculated was:
. / ( ) ( )
At first glance, it might seem a little confusing that the second exponent is the number of trials
less the number of successes.
Why is this?
Suppose there are 10 trials and you want 6 successes. This necessarily means that the other 4
trials would result in failures. This is precisely , or the number of trials less the
number of successes.
Let‟s make this formula easier to consider. First off, let‟s define some variables:
Let
Now, in any event, success and failure make up the whole sample space. That is:
Since they make up the sample space,
( ) ( )
So,

( ) ( )
We rewrite our formula with the above defined components:
. / ( )
This is known as the binomial probability density function, or binomial pdf.
To make this more clear, we first define a random variable, . In the case of a binomial
experiment (one in which there are two possible outcomes for each trial), the set listing all
possible values that can be achieved (between 0 and the number of trials).
For example, if
in coin tosses, then * +. That is, between 0 and 10 heads can possibly
be achieved in 10 tosses of the coin (though not all have the same probability). To indicate a
binomial pdf calculation, we often write:
The probability that takes on successes is . / ( ) , or,
( ) . / ( )
We summarize a binomial pdf below, along with the necessary assumptions to use this.
Binomial Probability Density Function (pdf)
If the following assumptions are met:
1) An experiment is carried out with trials,

2) Each trial can result in only one of two possible values: a success or a failure,
3) The probability of a success in each trial is (it is always the same), and
4) Each trial is independent of all other trials (the outcome of one trial in no way affects the
outcome of any other trial),
then the experiment is a binomial experiment and the probability of successes can be
calculated by
( ) . / ( )

Example 1: A fair-two sided coin is tossed 10 times. The goal is to get 8 heads.
a) In how many different ways can this event occur?

b) Verify that all assumptions are met to conduct a binomial experiment.
c) What is the probability of this event?
SOLUTION:
a) Since there are 10 events and 8 successes desired, there are:
. /
b)
1) There are trials
2) Each outcome is either a head (success) or a tail (failure)
3) The probability of success on any trial is
4) One toss does not influence the outcome of any other toss
Thus, all assumptions have been met.
c)
( ) . /( ) ( )
Thus, there is about a 4.3% of tossing 8 heads in 10 tosses.
The fact that the probability of getting 8 heads in 10 tosses is higher than getting 9 heads in 10
tosses should not surprise us. Getting 9 heads is a rather extreme request. Getting 8 heads, while
still extreme, is a bit more likely.
Let‟s now build the probability distribution histogram for . We first display the probabilities in
a table below by applying the binomial pdf:

Successes Probability
0 0.001
1 0.010
2 0.044
3 0.117
4 0.205
5 0.246
6 0.205
7 0.117
8 0.044
9 0.010
10 0.001
Does this match our expectations? The table indicates that getting 5 heads has the highest
likelihood of all 11 possible events. Even more importantly, the probability of getting between 4
and 6 heads in 10 tosses is . The probability of getting very few
or many successes gets to be very unlikely. This data is displayed in the histogram below:
Tossing X Heads in 10 Tosses

0.300
0.250
0.200
Probability
0.150
0.100
0.050
0.000
1 2 3 4 5 6 7 8 9 10 11
Successes
This further validates our argument above.
Additionally, note that the sum of all event probabilities sums to 1. This is necessary and
important in describing the distribution.
Sum of Success Probabilities in a Binomial Experiment
With trials in a binomial experiment, the sum of the probabilities of 0 up to successes must
constitute the sample space and hence equal 1.
That is,

( ) ( ) ( ) ( )
Example 2: A fair, 6-sided die is rolled 8 times. The goal is to roll a 1 or a 2 four times during
the experiment.
a) Is this a binomial experiment?

b) In how many different ways can this event occur?
c) What is the probability of this event?
SOLUTION:
a) A success is classified as rolling a 1 or a 2. A failure is classified as rolling a 3, 4, 5, or 6.

Thus, . There are trials and the probability of a success is always , since
the 8 outcomes are independent. Thus, this is indeed a binomial experiment.
b) It is possible to have a success occur in . / different ways.
c) Let be the number of successes possible. Then * +.
( ) . /( ) ( )
. /( ) ( )
There is about a 17% chance of getting a 1 or 2 on four out of 8 die rolls.
A question that follows from Example 2: is, what does the distribution look like? Let‟s develop
the distribution in tabular form first. To do this, we calculate binomial probabilities for each of
the 9 possible outcomes (anywhere between 0 and 8 successes possible).
Successes Probability
0 0.039
1 0.156
2 0.273
3 0.273
4 0.171
5 0.068
6 0.017
7 0.002
8 0.000

We see clearly that the number of successes with the highest probability is 2 or 3. The histogram
follows:
Rolling a 1 or 2 in 8 Die Rolls

0.300
0.250
0.200
Probability
0.150
0.100
0.050
0.000
1 2 3 4 5 6 7 8 9
Successes
Notice that this distribution is not symmetric. It is said to have to be skewed to the right, since
the distribution has its probabilities heavily concentrated towards the left and so has a tail to the
right (hence the name)
Distribution Types
There are three single-peaked (called unimodal) distributions, as illustrated below:
1.1.3 Expected Value

Expected Value of a Binomial Random Variable
It can be shown that the expected value of , or the average number of successes we expect to
see, given that is a binomial random variable, is:
( )
Example 3: Pristine Air Conditioning uses a digital phonebook to call homeowners in a large
city regarding a $55.99 A/C maintenance special. In an hour, a telemarketer can make about
10 calls. If the probability that a randomly called homeowner signs up for the maintenance
special is 0.40,
a. what is the probability that telemarketer gets at least 80% of his hourly customers
to sign up?
b. Represent this probability in a histogram.
c. Find and explain the expected value of the random variable.
SOLUTION:
a) We first need to determine whether or not this is a binomial probability. Since the
probability of success is 0.40 on every one of 10 trials and we assume that the size of the
population does not significantly impact the percentage of success (as removing one
potential customer from the pool reduces the size of the callable population), we conclude
that this is a binomial experiment. Thus, the number of called homeowners that
accept the offer.
We want to know the probability of getting business from 8, 9, or all 10 of the called
individuals. We want:
( ) ( ) ( )
because each of these accounts for disjoint pieces of the sample space.
With and , we have:
. /( ) ( ) . /( ) ( ) . /( ) ( )
Thus, there is only about a 1.23% chance that the A/C company gets the business of 80%
or more of the homeowners called.
b) The histogram is below. The probability we are looking at is the sum of probabilities after
7 successes:

c) The expected value is, , - ( ) . Thus, we expect that each hour 4 out of 10
homeowners accept the maintenance offer.
Homework Problems –4.1
1. Determine whether or not each of the following experiments represents a binomial

experiment. (Video Solution)
a. A die is rolled 20 times and the number of 6‟s is counted.
b. A die is rolled until ten 6‟s show up.
c. In a stream with 1,500 fish, 700 are Rainbow Trout. A total of 20 fish are caught
and the number of Rainbow Trout is counted.
d. About 10% of the U.S. population is suspected to have a form of bacteria. A
sample of 100 people is drawn from the population and the number of people with
the strain of bacteria is counted.
e. A brand of LED light bulb has a 0.5% chance of going out prior to the advertised
life of 30,000 hours. In the testing phase, 850 bulbs are sampled for quality
assurance. The number of bulbs that don‟t die prior to the 30,000 hour life is
counted.
2. Suppose the outcome of random variable is conducted with trials each with
independent probability of success, . (Video Solution)
a. Is this a binomial experiment?
b. What is the probability that
c. What is the probability that
d. What is the probability that
e. What is the probability that

f. What is the probability that
g. What is , -? Does it coincide with the resulting that has the highest
probability?
3. In preparing for a New Year‟s Eve celebration, police look at past records for arrests due
driving under the influence (DUI). In the U.S., 10.5% of arrests made are for DUI
(SOURCE: U.S. Statistical Abstract, Table 324). If it is expected that each police officer
makes 10 arrests, what is the probability that all arrests result in DUI‟s? (Video Solution)
4. Pancreatic cancer is a vicious killer. The 5-year survival rate between 2001 and 2007 was
only 5.9%, meaning that the majority of people with pancreatic cancer die within 5-years
of contracting the cancer. In a group of 25 patients, 5 survive beyond. How likely is such
an event? Assume that the survival of one person is independent of another person.
(SOURCE: U.S. Statistical Abstract, Table 182). (Video Solution)
5. A new herbal drink blend is being compared to an older blend via a blind taste-test
comparison. Four judges will taste each of the two drinks and will state their preference.
It is anticipated that both blends are equally impressive. (Video Solution)
a. Find the probability distribution for the number of judges that vote in favor of the
new blend.
b. Construct a probability histogram.
c. What is the probability that at least two of the judges prefer the new blend?
d. What is the expected value of this distribution and what is its real-world meaning?
6. Goranson and Hall (1980) explain that the probability of detecting a crack in an airplane
wing is the product of , the probability of inspecting a plane with a wing crack; , the
probability of inspecting the detail in which the crack is located; and , the probability
of detecting the damage. (Problem Source: Mathematical Statistics with Applications, 6th
Ed., Wackerly, et. al.) (Video Solution)
a. What assumptions justify the multiplication of these probabilities?
b. Suppose and for a certain fleet of planes. If three planes
are inspected from this fleet, find the probability that a wing crack will be
detected on at least one of them.
c. Find the probability distribution for the number of planes in this fleet with
detected wing cracks.
d. Construct a probability histogram.
e. What is the expected value of this distribution and what is its real-world meaning?

Chapter 5
Continuous Probability Distributions
Up until this point, we have only considered distribution that have discrete values – non-negative
integers. There are many variables, however, that are continuous in nature. In fact, almost every
variable you studied in algebra and calculus was continuous!
Take, for example, heights of NBA basketball players, hourly wage, response time of a database
server, temperature, depth of a lake, the value of a share of Intel stock, and the lifespan of a car
engine, to name just a very few. These are all variables that can take on infinitely many values,
even within a limited range. For example, the response time of a database could be 0 seconds and
1 second. It could be 0.01 seconds, 0.00001 seconds, or 0.98727495 seconds.
5.1 The Ideas Behind the Continuous Distribution
5.1.1 Conceptual Approach to Continuous Distributions
Think back to a discrete distribution. The probability of a particular value was found by
observing the height of the relative frequency bar. While relative frequency represents the
percentage of observations found to have the value specified, it can also be thought of as a
probability, if we feel that it accurately models predictions that we might use it for. Consider the
example below showing the number of children in a classroom of 30 that are likely to likely to
have the flu.
Number of Children with Flu in a Class

0.45 0.4
Probability (Relative Frequency)
0.4
0.35
0.3
0.25 0.2
0.2 0.16
0.14
0.15 0.1 0.1
0.1
0.05
0
0 1 2 3 4 5
Number of Children w/Flu

For instance, we see that the probability that any 2 children in a classroom have the flu is 0.2.
Let‟s call this random variable # of children in a classroom of 30 that have the flu.
Then, we will write the probability that any 2 children have the flu as:
( )
This reads, “the probability that the number of children that have the flue is 2”
The output of this statement is:
( )
What would it mean to say ask: What is ( )?
This is asking us to find the probability that 2 or fewer children have the flu. In other words,
what is the probability that 0, 1, or 2 children have the flu. To answer this, we simply add the bar
heights corresponding to .
( )
Thus, there is a 74% chance that 2 or fewer children in a class of 30 children have the flu.
With continuous distributions, we cannot simply read the “height of the bar!” For instance
consider the following continuous probability distribution that shows the likelihood of various
wait times in line at a fast-food restaurant:
Time Speng Waiting in Line

0.25
0.2
Probability
0.15
0.1
0.05
0
0 1 2 3 4 5
Minutes

In this case: minutes spent waiting in line is a continuous random variable. The reason is
that a person doesn‟t wait a whole-number of minutes! It is perfectly okay for a person to wait
1.42 minutes, for example.
In this example, suppose we wish to find ( ), that is, the probability that the wait time is
2-and-a-half minutes. At first glance, we might simply decide to locate 2.5 minutes and assess
the probability output. We would find:
( )
If this were the case, wouldn‟t it be the case that all wait times have a probability of 0.2? Based
on the graph, of course. This, however, would be a logical pitfall: if there are infinitely many
different wait times between 0 and 5 minutes, then the sum of all probabilities would be a sum of
infinitely many 0.2‟s. In other words, it is only possible for the wait times to have individual
probabilities of 0.2 if the times were discrete. When we deal with continuous random variables,
we should actually consider the vertical axis to be density instead of probability. In and of itself,
density is not a meaningful value, however, in conjunction what we will mention next, it will
prove to be useful.
Without going into too much detail, an interval of densities is designed in such a way that the
area under the function is 1, or 100%. Let‟s reconsider the above graph:

Time Speng Waiting in Line
0.25
0.2
0.15
Density
0.1
0.05
0
0 1 2 3 4 5
Minutes
We notice . The region underneath the blue line is rectangular. Visually:
To find the area of a rectangle, we must simply take
And, so we are able to confirm that represents all possible wait times this particular
store has experienced.
As you might guess, if we wish to find the probability of a range of values, we would simply find
the probability between those two values of time.

One question does remain, however: what is the probability that the wait time is exactly
2.5 minutes?
The answer might not come as too much of a surprise: the probability is 0!
The probability of a single value in a continuous distribution is 0, since there are infinitely many
possible values. Thus, 2.5 represents 1 of infinitely many values. Take and you get 0!
We can only find the probability of a non-zero range of values for a continuous random variable!
Continuous Random Variables
A continuous random variable is a random variable that has infinitely many possible values
within a range of real numbers.
As a result, the probability that a continuous random variable takes on any one specific value is
0.
Probability Density Function (PDF)
The PDF of a continuous random variable is a continuous function such that the total area
between the function and the horizontal axis is 1. The function‟s input values are the values of
the random variable, while the output values are densities. Densities are individually meaningless
values designed so that the total area equals 1.
Reconsider the above wait-times example:
Time Spent Waiting in Line

0.25
0.2
0.15
Density
0.1
0.05
0
0 1 2 3 4 5
Minutes

Suppose we wish to find ( ), that is, the probability that the waiting time is
between 2.5 and 3.5 minutes. To find this, we simply find the area under the PDF between 2.5
and 3.5 minutes:
The area of the rectangular region is:
Thus,
( )
We can expect to wait between 2.5 and 3.5 minutes with a 20% chance. Thus, approximately one
in five visits, our wait-time will be somewhere within this interval.
Similarly, suppose we wish to know:
( )
This is the probability that the wait-time is between 0.3 and 4.4 minutes. We identify this region
below:

The area of this region is:
Thus, there is an 82% chance that the wait-time is between 0.3 and 4.4 minutes.
5.1.2 Uniform Distribution
Continuous Uniform Distribution
When the PDF of a random variable is a constant, we call this a uniform distribution. That is,
values of the random variable are uniformly distributed.
The PDF of a random variable, , whose values are in the interval is:
( ) {
The expected value of this random variable is:
( )
The variance of this random variable is:
( )
( )

Resulting in a standard deviation of:
( )
√
Example 1: The amount of revenue that a farmers market generates on a given Saturday is
uniformly distributed between $5,000 and $22,000.
a. Find the PDF for this random variable.

b. Find the probability that the between $6,000 and $8,000 is generated.
c. Find the expected value of this random variable and explain its real-world
meaning.
d. Find the standard deviation of this random variable and explain its real-world
meaning.
SOLUTION:
a. The lower limit is and the upper limit is . Thus,
( )
This is constant function is only valid for values between 5000 and 22000. It is valued as
0 everywhere else.
Revenue PDF
0.00007
0.00006
0.00005
Density
0.00004
0.00003
0.00002
0.00001
0
5000 22000
Revenue ($)
b. We want ( ). The probability will be the length times the width.

We get:
( )
There is about a 12% chance that revenue earned will fall between $6,000 and $8,000.
c. The expected value will be:
This is a simple average. Thus, on average, the farmers market will make $13,500 on a
given Saturday.
d. The standard deviation will be:
On average, revenue will vary by $4,908 less or more than the mean.
5.1.3 Other Distributions

Without going into detail here, continuous random variables have PDF‟s with area between the
function and the horizontal axis equal to 1. Clearly, densities will have to be positive, as it is not
possible to have negative probabilities.
As an example, a distribution might look like this:
1.2
0.8
Density
0.6
0.4
0.2
0
0 1 2
Random Variable Values
Practically speaking, it appears to be most probable that the random variable will take on a value
around 1. It is less likely that the random variable will take on values close to 0 or close to 2.
This might be handy in situations where such criteria is desired.
Notice that the area is also 1. If you divide the triangle into 2 and use the area of a triangle
formula . /:

Then the sum of the two triangular areas is:
In this next section, we will focus our attention on the most commonly used continuous random
variable: the normally distributed random variable.
The first two questions below involve discrete random variables. The aim of these questions is to
get you thinking in terms of the probabilities of ranges of values.
1. A pizza shop sells pizzas in four different sizes. The 1000 most recent orders for a single
pizza gave the following proportions for the various sizes:
With denoting the size of a pizza in a single-pizza order, the given table is an
approximation to the population distribution of .
a. Construct a probability (relative frequency) histogram to represent the

approximate distribution of this variable.
b. Approximate ( ).
c. Approximate ( ).
d. Find the expected value of .What does this value mean?
e. What is the approximate probability that is within 2 in. of this expected (mean)
value?
2. Airlines sometimes overbook flights. Suppose that for a plane with 100 seats, an airline
takes 110 reservations. Define the variable as the number of people who actually show
up for a sold-out flight. From past experience, the population distribution of is given in
the following table:

a. What is the probability that the airline can accommodate everyone who shows up
for the flight?
b. What is the probability that not all passengers can be accommodated?
3. A particular professor never dismisses class early. Let denote the amount of time past
the hour (in minutes) that elapses before the professor dismisses class. Suppose that the
density curve shown in the following figure is an appropriate model for the probability
distribution of :
0.20
0.15
0.10
0.05
2 4 6 8 10
a. Find the probability density function (PDF) for this random variable.
b. What is the probability that at most 5 minutes elapse before dismissal?
c. Find ( ). Explain what your answer means.
d. Find the expected value of this distribution and explain its real-world meaning.
e. Find the standard deviation of this distribution and explain its real-world meaning.
f. What is the probability that instructor let‟s out class within one standard deviation
of the average overtime?
4. A delivery service charges a special rate for any package that weighs less than 1 lb. Let
denote the weight of a randomly selected parcel that qualifies for this special rate. The
probability distribution of is specified by the following density curve:
Density 0.5 x
1.5
1.0
0.5
0.0 0.2 0.4 0.6 0.8 1.0 1.2

Use the fact that the figure can be broken up into the area of a rectangle and the area of a
triangle, where area of a triangle = ( )( ) and the area of a rectangle =
( )( ).
a. What is the probability that a randomly selected package of this type weighs at
most 0.5 lb.?
b. What is the probability that a randomly selected package of this type weighs
between 0.25 lb. and 0.5 lb.?
c. What is the probability that a randomly selected package of this type weighs at
least 0.75 lb.?
d. The probability is defined on the interval . Verify that the area under
the curve in this region is 1.
5. A plumbing service is able to respond to off-site emergency calls uniformly between 15

and 45 minutes.
a. Find the PDF for this random variable, .
b. Find ( )
c. Find ( )
d. Why are both of the above probabilities the same?
e. Find ( ).
f. Find and interpret the real-world meaning of the expected value.
g. Find and interpret the real-world meaning of the standard deviation.
h. What is the probability that the service responds within 1.5 standard deviations of
the expected time?

5.2 The Normal Distribution
5.2.1 The Normal Distribution As a Natural Phenomena
The normal distribution (pictured above), much like the uniform distribution, is a continuous
distribution. In fact, this distribution is defined for all real numbers. The curve runs from to
. However, as you might observe, the most likely values occur close to where the density
function peaks. Values that occur in either one of the “tails” are highly unlikely and, as it
appears, the density function is very close to the horizontal axis as it extends farther to the left
and to the right.
Why do we use this distribution? Much like the infamous appears in many natural places,
many random variables tend to be normally distributed. That is to say, the bulk of values tend to
occur near the mean and median (both of which are located directly in the center of the
distribution, since it is perfectly symmetric). For instance, heights of individuals in the United
States (roughly) follow a normal distribution – there are many people whose heights are near
average. There are fewer extremely short and extremely tall people in the United States. Thus,
we would say that the bulk of people are “normal” with respect to their heights.
While certainly not all random variables are normally distributed, many are. Weights, IQ, new-
vehicle gas mileages (to name just a very few) are variables that have been known to follow a
normal distribution. As we will later see, any distribution can “become” a normal distribution.
This is a beautiful phenomenon that allows us to make some important conclusions (more on this
idea in a later section).
As before, the overall area under the normal curve is 1 (50% on either side of the mean/median,
as in the image). To find the area, we would need to use some rather unusual shapes in order to
apply the same methodology as before. The idea of an integral in calculus would actually allow
us to find the area exactly, however, the normal curve is modeled by the following pdf:
( )
( )
√

As you can see, this is a difficult function to work with. Historically, tables have been developed
with calculated areas, as the calculus was once quite difficult to do. In order to do this, it was
often necessary to first convert the desired range of values to -scores. Since every normal
distribution has a different mean and standard deviation, it would be impossible to create a table
for every possible combination. Instead, since each normal distribution is of the same shape, it
made sense to create just one table that represented a mean of and a standard deviation of
. That is, we can think about every distribution as the number of standard deviations each
score is from the mean. The mean is 0 standard deviations away from the mean (it is the mean!)
and each unit represents 1 standard deviation. We can think about any distribution this way!
Normal Distribution Expected Value and Variance
A normal probability distribution can be modeled by the function
( )
( )
√
where the
expected value is , defined as a standard mean,
And variance is , defined as a standard variance,
∑( )
IMPORTANT NOTE: and represent the population mean and variance. represents the
population size. Recall that the sample variance has a divisor of , so that it is an unbiased
estimator of the population variance.
Below is an example of what a typical table would look like. We call this a standard normal
table, since it requires that values between which we would like to know areas are
“standardized.” This means they are converted to scores prior to using the table:

As we notice, this table only shows positive scores. A similar table exists for negative scores,
that is, for values that are less than the mean. The image tells us that each of the entries in the
center of the table correspond to areas that are to the left of the score we would look up.
1. In an Arizona town, suppose the heights of adult males is such that inches and
(so the standard deviation is the square root of this value, ). What is the
probability that a male is shorter than 72 inches (6 feet tall)?
SOLUTION: We wish to find ( ), where ( ). The normal

distribution would look like the following:

We wish to know the area of the shaded region below:
We first convert the value of 72 to a score:
We round to two decimal places, since the standard normal table can handle up to two decimal
places. Any additional decimal places would not make a substantial difference.
We locate by first locating 1.1 along the rows and 0.04 along the columns (since 1.1 +
0.04 = 1.14).

The value we find is 0.8729. This means that ( ) . There is an 87.29% chance
that a randomly selected individual will be less than 72 inches in height.
What if we wanted to know an area to the right, such as ( )? The table does not provide
these values. However, if we know that ( ) then the probability of a height
greater than 72 must be the remaining area, .
Similarly, if we wish to find the area between two points, we must get creative.
Suppose we wish to know ( ). We first need to convert both endpoints to scores:
and

We can easily find that the probability of a score less than 0.57 is: 0.7157
The probability of a score less than 1.00 is: 0.8643
The area between them is the difference in their areas:

As technology progresses, there is a much lesser need for by-hand computations of the sort
above. Instead, let us use the web applet from which the above pdf‟s came:
http://www.rossmanchance.com/applets/NormalCalcs/NormalCalculations.html
As you can see, we enter the mean and standard deviation in the first section. If we would like to
plot two functions over one another, we could check the box and enter a second mean and
standard deviation.

In the second section, we can check up to two boxes, in the event that we would like to find an
area between two points. We can either enter values as z-scores or as raw data values ( ). To find
the probability of a value greater than, we click the grey box to select:
The probability of such an event is displayed in the “prob” box. If we have two values entered
and both boxes checked, then the “probability between” these two values is displayed. Isn‟t this
much more intuitive and convenient than using tables?
NOTE: One limitation of the above applet is that values rounded to two decimal places require
a bit of finagling.
Homework Problems – 5.2
Use the applet mentioned in this section to complete these exercises. You are not required to use
the standard normal table.
1. In the United States, IQ‟s are normally distributed with and .

a. What is the probability that a person has an IQ lower than 130?
b. What is the probability that a person has an IQ between 80 and 110?
c. What is the probability that a person has an IQ between 50 and 70?
d. What is the probability that a person has an IQ above 120?
2. In the UK, birth weights are approximately normally distributed with lbs. and
lbs. (SOURCE: http://www.healthknowledge.org.uk).
a. Find and explain the real-world meaning of ( ).
b. Find and explain the real-world meaning of ( ).
c. Find and explain the real-world meaning of ( ).
d. Find and explain the real-world meaning of ( ).
e. What weight is such that 20% of infants weight less than this amount? (HINT:
You can still use the calculator applet.)
3. In a recent years, Scholastic Aptitude Test (SAT) scores for all college-bound seniors in
the United States was such that points and points (SOURCE:
http://www.collegeboard.com) .
a. 50% of students scored less than how many points?
b. 50% of students scored more than how many points?
c. In order to be in the top 10% of SAT-takers, what score would one have to
achieve?
d. What score do the lowest 10% score between?
e. The middle 50% of students scored between what two values?
4. Sketch a normal distribution and . Label the mean, standard deviations,

standard deviations, and standard deviations.

a. Determine the probability that an observation falls within each of these standard
deviation ranges.
b. The Empirical Rule describes the probability of scores within 1, 2, and 3 standard
deviations of the mean. Do a web search on this topic and compare it to your
answer in the above part. Are the results the same?
5. Suppose a distribution is such that and .

a. What would happen to the distribution if was changed to 60?
b. What would happen to the distribution if was changed to 10? There are two
effects to describe. Discuss why it makes practical sense that these two things
should happen to the curve.
c. What would happen to the distribution if was changed to 2? There are two
effects to describe. Discuss why it makes practical sense that these two things
should happen to the curve.
d. Describe the effects, in general, of and on the shape and location of a normal
distribution.

Chapter 6
Sampling Distributions and Estimation
When it is only our dataset that is of interest, we use descriptive statistics. This is precisely the
trouble we have been up to so far! Often times, however, we cannot collect all elements in the
population. Take, for example, a poll to gauge Americans‟ opinion of a candidate in office.
Certainly, you cannot sample all voting-age adults. This is easily resolved with a manageable
random sample, but is further complicated by the following idea: sampling variability!
We will work to answer the following question:
How do we estimate true population parameters using a random sample, all the while taking into
account the fact that our sample statistic is variable from sample-to-sample?
This is the purpose of inferential statistics and is a very important aspect of understanding the
structure of an underlying population. With many advances in statistics, it is possible to make
precise claims about our population.
6.1 Sampling Distribution for ̅
6.1.1 What is a Sampling Distribution?
The hard-cold truth is that, when working with statistical inference, we likely have no idea what
the underlying probability distribution for the population looks like. If we did, then we wouldn‟t
have to draw a random sample and would be nearly done with this course. Since we don‟t, we
can‟t in good conscience assume that the distribution is normal. So, why spend time studying
such a distribution? We will soon experience why.
Let‟s start with an example that is concrete.
Suppose we roll a die. Without too much effort, we can produce the probability distribution for
the population of all possible outcomes. Here it is:

Probability Distribution for Single Die Roll
0.18
0.16
0.14
0.12
Probability
0.10
0.08
0.06
0.04
0.02
0.00
1 2 3 4 5 6
Die Value
In words, the probability of getting any one face value on a die roll is about 0.17 or 1/6. The
distribution is uniform.
If we found the expected value (the average), we would get:
, - ( ) ( ) ( ) ( ) ( ) ( )
(NOTE: This is the same as since each event is equally likely)
The variance of this population requires us to use the population standard deviation formula
(remember, division by occurs if we are dealing with a sample, so that we have an
unbiased estimate for the population standard deviation). That is:
∑( )
, -
Using Excel we find that:
1 1
2 2
3 3
4 4
5 5
6 6
=VAR.P(A2:A7) which give: 2.916666667

Thus, the standard deviation would be √ , meaning that, on average, we would
expect the die value to deviate by 1.708, or nearly 2 units from the average (1.5 to 5.5, which is
pretty much 1 to 6).
Thus, we have that:
In reality, keep in mind that we would often not know much about our population. We get the
luxury of studying something we can fully explain. This is all in an effort to better understand
sampling distributions.
Suppose we conducted an experiment of rolling the die 10 times. For one random sequence, we
might obtain the following result:
4 6
3 4
4 1
3 4
1 2
Not surprisingly, we get a fairly even spread of values 1 – 6. If we are to compute the average,
we would obtain 3.2. That is if all rolls came up as the same number, each roll would be 3.2.
Suppose we asked 19 other people to roll a die 10 times and to then report back to us the mean.
Here is what we might find (based on a computer simulation of rolls):

20 Means
3.1
3.3
2.4
3.5
2.7
2.9
2.9
3.6
3
4.7
3.6
3.2
3.9
2.8
3.2
3.3
3.9
3.3
3.5
3.1
First off, we notice there is sampling variability. Not every person obtained the same average
outcome from 10 tosses each. This is expected, since the process is a random one.
The distribution of these means is called a sampling distribution.
Sampling Distribution
The distribution of sample statistics (such as ̅ ) computed from repeated sampling is called a
sampling distribution.
6.1.2 The Central Limit Theorem
We do notice that the means tend to gravitate towards 3.5. Some, as expected, deviate from this
value.
Let us now consider a histogram for this sampling distribution of sample means:

Sampling Distribution of x-bar
6
5.15>
2.4 to 2.65
2.9 to 3.15
3.4 to 3.65
3.9 to 4.15
4.4 to 4.65
4.9 to 5.15
2.65 to 2.9
3.15 to 3.4
3.65 to 3.9
4.15 to 4.4
4.65 to 4.9
This is quite interesting… we have obtained a distribution (of means) that appears somewhat
bell-shaped.
Suppose now that we had a total of 1000 people roll a die 10 times each, and to then compute the
sample mean. Here is what a simulation of this process would look like:

100
90
80
70
60
50
40
30
20
10
0
5.2>
1.9 to 2
2 to 2.1
2.3 to 2.4
2.6 to 2.7
2.9 to 3
3 to 3.1
3.9 to 4
4 to 4.1
4.5 to 4.6
4.9 to 5
5 to 5.1
1.7 to 1.8
1.8 to 1.9
2.1 to 2.2
2.2 to 2.3
2.4 to 2.5
2.5 to 2.6
2.7 to 2.8
2.8 to 2.9
3.1 to 3.2
3.2 to 3.3
3.3 to 3.4
3.4 to 3.5
3.5 to 3.6
3.6 to 3.7
3.7 to 3.8
3.8 to 3.9
4.1 to 4.2
4.2 to 4.3
4.3 to 4.4
4.4 to 4.5
4.6 to 4.7
4.7 to 4.8
4.8 to 4.9
5.1 to 5.2
Wow! Our distribution of means for 1000 individuals for experiments of 10 rolls each produces
something remarkably like a normal distribution. Additionally, it appears that the mean of this
distribution is around 3.5!
Let‟s try this again, but now, let‟s say that 1000 individuals each roll a die 20 times, and each
individual computes a sample mean. This simulated event would produce the following
distribution of die-roll average:

120
100
80
60
40
20
4.7>
2.7 to 2.8
2.9 to 3
3 to 3.1
3.9 to 4
4 to 4.1
4.4 to 4.5
2.2 to 2.3
2.3 to 2.4
2.4 to 2.5
2.5 to 2.6
2.6 to 2.7
2.8 to 2.9
3.1 to 3.2
3.2 to 3.3
3.3 to 3.4
3.4 to 3.5
3.5 to 3.6
3.6 to 3.7
3.7 to 3.8
3.8 to 3.9
4.1 to 4.2
4.2 to 4.3
4.3 to 4.4
4.5 to 4.6
4.6 to 4.7
The distribution looks a bit more normal. Upon closer inspection, we also see that the variability
of these averages is smaller. That is:
Approximate Range for Means of 10 Tosses: 2.1 to 5.2

Approximate Range for Means of 20 Tosses: 2.5 to 4.6
We notice that increasing the sample size ( ) has decreased the sampling distribution‟s
variability.
In fact, the standard deviation for the distribution of means computed from 10 and 20 tosses is
about 0.52 and 0.38, respectively.
Let‟s do one more experiment. Let‟s say that 1000 individuals each roll a die 30 times, and each
individual computes the mean of his/her rolls. The sampling distribution of means would look
like this (based on simulation):

140
120
100
80
60
40
20
4.9>
2.9 to 3
3 to 3.1
3.9 to 4
4 to 4.1
4.6 to 4.7
2.4 to 2.5
2.5 to 2.6
2.6 to 2.7
2.7 to 2.8
2.8 to 2.9
3.1 to 3.2
3.2 to 3.3
3.3 to 3.4
3.4 to 3.5
3.5 to 3.6
3.6 to 3.7
3.7 to 3.8
3.8 to 3.9
4.1 to 4.2
4.2 to 4.3
4.3 to 4.4
4.4 to 4.5
4.5 to 4.6
4.7 to 4.8
4.8 to 4.9
Again, we notice the bell-curved shape and the decreased range of means (about 2.6 to 4.4)!
Let‟s summarize:
Distribution Type Distribution Mean Distribution Standard Deviation

Original Die Values 3.5 1.7
UNIFORM
Sampling Distribution 3.5 0.52
Of 10-Roll Means
NORMAL
Of 20-Roll Means
NORMAL
Of 30-Roll Means
NORMAL
We can very easily see that the expected value of the sampling distribution is the same as , the
expected value of the population distribution. That is:
, ̅-
But, what is the relationship of the standard deviations of the means in relation to the standard
deviation of the population of die roll value?!
This is not so clear. Statisticians, after much research, found that the standard deviation of each
of the sampling distribution is related to the sample size in the following way:

, ̅-
√
For example,
That is very close to the 0.52 we obtained!
Similarly, for our sample of size 20,
This one happens to be fairly spot-on!
An finally, for our sample of size 30,
This is again very close to our obtained 0.32!
The reason for this difference is simply due to randomness, and estimates can be improved more
(if desired) by increasing the number of “individuals rolling the die.”
What we have observed here is formally known as the Central Limit Theorem.
Central Limit Theorem
Regardless of the distribution of a random variable, , if we take repeated random samples from
this distribution of and compute the mean, ̅ , for each sample, then the following will
hold:
1.) The distribution of ̅ will be approximately normal

2.) , ̅-
3.) , ̅-
√
(NOTE: A sample size of at least 30 is a rule-of-thumb and can vary slightly depending on the
severity of skews and abnormalities in the distribution. For even severely skewed distributions,
the approximate shape is typically normal.)
6.1.3 Why the Central Limit Theorem?

The Central Limit Theorem (CLT) has some very powerful, but subtle results.
First of all, we do not need to understand the shape of the underlying distribution from which we
are sampling. This is an amazing result in-and-of itself, since we usually have little to know
information about the population itself (again, if we did, we wouldn‟t be wasting our time with
any of this!).
Secondly, since the resulting sampling distribution is approximately normally distributed, we can
proceed to calculate probabilities using the normal distribution. This is also great, since we
already have the background in that process!
Example 1: After experimentation, researchers believe that the mean lifespan of a strain of
bacteria is days with days. Due to the complexity of the bacteria, the shape
of the distribution of bacteria lifespans is unknown. A sample of 60 bacteria strains is
collected.
a. Does the CLT apply here?
b. Calculate the probability that the sample mean lifespan, ̅ , is less than 3 days.
SOLUTION:
a. Since the sample size is 60, we should be safe in assuming that the sampling distribution
of all means is normally distributed with mean and standard deviation
√
.
b. We want ( ). Using our probability calculator
Given the very small level of variability in the sampling distribution of lifespan means,
we would consider observing an average smaller than 3 feasibly 0.
6.1.4 Limitations of the CLT

One major oversight of our excitement with this idea is the notion that we would actually know
the true population mean, , and the true population standard deviation, . If we have limited
information about our population, then we certainly would not know these values. In the next
parts of this chapter, we will learn how to use our sample to make these predictions about the
population. Though similar in conceptual nature, it is not as straightforward as replacing with ̅
and with .
1. In your own words, what does the Central Limit Theorem tell us?
2. In your own words, why is the Central Limit Theorem a very powerful practical result?
3. A sample of size 36 is taken from a population distribution of unknown shape, though the
mean is believed to be 100 with a standard deviation of 18. What is the probability that
the sample mean is:
a. Greater than 102?
b. Less than 98?
c. Between 95 and 105?
d. Between what two values will the middle 90% of means be?
4. A stained glass company produces panes of glass with a mean thickness of 0.42 inches
and a standard deviation of 0.04 inches, if produced properly. Suppose a random sample
of windows reveals a sample mean of 0.43.
a. What is the probability of this average, or a larger average?
b. Given the probability you have computed, what can be said about recent
production standards?
5. Promote Marketing has a research team to research new marketing tactics to propose to
potential clients. A group of 40 clients have been invited for a conference to be put on by
the marketing firm. The research team usually generates in revenues for
each member of the team with .
a. What will be the shape of the distribution of ̅ ? How do you know?
b. What is the probability that average sales will exceed $420,000 for this particular
event?
c. How would your answer change if 100 clients were to show up?
d. If the team (300 people) have an average revenue that is in the 90th percentile of
revenues, they will earn 4-days of paid vacation. What average sales would be
required for this?
6. A computer simulation reveals that a distribution of average incomes in a sample of 500

has a standard deviation of $130. What is the standard deviation for the population of all
incomes? Interpret the result you get in real-world terms.
7. Use the Excel Sampling Distribution Applet to address this problem. In a population, it is
found that 30% of homes have 5 rooms, 40% have 4 rooms, and 30% have 3 rooms. You
can set this up in our applet by having a “die” with 10 values: three 5‟s, four 4‟s, and
three 3‟s.
a. What is the average number of rooms a home has in this population? What is the
standard deviation in the number of rooms in this population?
b. Now, suppose you take a sample of size 30 from this population. What shape will
the distribution have and how do you know?
c. Take 1,000 random samples each of size and compute the 1,000 sample
means. According to the applet, what is the average of the average rooms in the
sample? What is the standard deviation in the average number of rooms in a
house? Compare these two results to what the Central Limit Theorem says we
should come up with. That is, find , ̅ - and , ̅ -.
d. Take 1,000 random samples each of size and compute the 1,000 sample
e. Take 1,000 random samples each of size and compute the 1,000 sample
f. Why do the values in the population have the highest standard deviation when
compared with the distribution of means in the last there parts?
g. What is the probability that, in a sample of 100 homes, the average number of
rooms is greater than 5?
h. Explain in practical terms why the standard deviation of any ̅ distribution
decreases as the sample size increases.
6.2 Confidence Interval for ̅
6.2.1 Confidence Interval for ̅ Using Sampling Distributions
As discussed previously, our ultimate goal is to make inferences about the population parameter
. Again, keep in mind that this is the only reason why we are spending time on this! Otherwise,
we would have completed our semester early!
When we generate our sampling distribution for ̅ we see very vividly that our sample means are
subject to sampling variability, depending on which “die values” are “rolled” for each individual
sample of size . Thus, we should be very skeptical of concluding that ̅ is representative
of the true population mean. However if we have many, many “individuals roll the die,” we
should get a fairly reasonable understanding of a range of values for the true value of . Let‟s
consider an example.
Suppose we want to better understand a population of ages of people in a town.

1 1 18 22 25 27 30 18 21 2
3 19 20 32 20 25 29 32 33 40
29 25 29 24 23 29 29 26 27 1
31 32 31 31 35 33 30 32 31 33
19 20 22 21 20 20 19 22 22 9
23.46
9.250319
But, wait! Let‟s pretend that we actually don‟t have access to the entire population of values
(yes, we clearly see them in the table above, but we normally do not have that luxury). Due to
limited time and money, you are only able to sample 30 of these values. After taking a random
sample, here is what you have chosen:
32 31 31 35 19 20 22 21 20 20
20 25 29 32 33 19 19 19 18 22
25 27 30 18 21 33 30 32 31 33
̅ 25.56667
5.870342
Again, at this point, we would have no way of telling how close we are to the actual mean of
23.46.
To get a good estimate of , we will come up with a confidence interval. A confidence

interval is a range of values such that there is an probability that the true population mean, ,
is between those values.
How do we calculate this? Here is our motivation for what is to come:
There are two ways to think about inferential statistics:

1) Use theoretical results and make conclusions using them
2) Build a sampling distribution for the statistic of choice ( ̅ or ̂ ) using the Bootstrap
Method and make conclusions using this empirical data.
We will draw parallels between the two regularly.
Here is the basic idea of Bootstrap Sampling:
1) From the population, take a random sample, preferably of size 30 or greater. The larger
the random sample, the more power we have in making inferences about the population.
2) If this is a truly representative sample, then we can think of it as a “mini” population that
acts and behaves according to the population as a whole. This is a key ingredient!

3) We cannot use this sample to calculate the corresponding parameter because of sampling
variability. However, if this sample behaves like the population, then we can resample
from it and get an idea of the overall variability. That is, draw a sample of the same
sample size from this “mini” population, but do so with replacement. This is the same
idea as rolling a die a fixed number of times – we are sampling with replacement from
the population 1,2,3,4,5, 6. What will this do? It will account for sampling variability, if
repeated.
4) Calculate the statistic from this sample and record it.
5) Repeat steps 3) and 4) 1,000 to 10,000 times. We now have a sampling distribution and
can make estimates about the true population parameter. And, guess what this distribution
will look like? You guessed it – it will be approximately normal, by the Central Limit
Theorem.
Below is a diagrammatic representation of steps 1) – 5):
Sample 1
Sample 2
Sample 3
Random
Population
Sample,
Sample 4
.
.
.
Sample 10,000
Some of the assumptions we make are indeed dangerous. For example, do we really have a mini
population? If the answer is “no,” then theoretical results are equally worthless since they, too,
assume that the sample is representative.

Now, back to our example…
If we have truly collected a random sample, then we should be able to think about the sample as
a small population. If this is a small population, then we should be able to sample from it. We
will draw random samples of size from the small “population” which is also of size
. Sounds strange, but we will sample with replacement, so it is possible to resample the
same value multiple times.
We will draw 1,000 samples of size from this “population” and, as you might have
figured, we will calculate the mean of each and build the sampling distribution for ̅ .

200
180
160
140
120
100
80
60
40
20
0
29.7666666666667>
24.7666666666667 to
28.2666666666667 to
22.2666666666667 to
22.7666666666667 to
23.2666666666667 to
23.7666666666667 to
24.2666666666667 to
25.2666666666667 to
25.7666666666667 to
26.2666666666667 to
26.7666666666667 to
27.2666666666667 to
27.7666666666667 to
28.7666666666667 to
29.2666666666667 to
25.2666666666667
28.7666666666667
22.7666666666667
23.2666666666667
23.7666666666667
24.2666666666667
24.7666666666667
25.7666666666667
26.2666666666667
26.7666666666667
27.2666666666667
27.7666666666667
28.2666666666667
29.2666666666667
29.7666666666667
As we should expect based on CLT, the distribution of these 1,000 means is approximately
normal.
Let‟s suppose that we want to have an interval within which there is a 95% probability that the
true population mean, , lies. This is the same as looking for the middle 95% of means!

Thus, we need to find the lower and upper limits for this interval by finding the 2.5 percentile
and the 97.5 percentile. In Excel, we can do this by using the percentile() function. We get:
Upper (97.5 percentile): 27.50

Lower (2.5 percentile): 23.60
Thus, we can say that we are 95% confident that the true population mean is between 23.6 years
and 27.5 years. In other words, there is a 95% probability that we have “trapped” the population
mean between our lower and upper limit. Said one other way, 95% of all sample means, when
the variability from sample to sample is taken into account, are between these lower and upper
limits. If this is representative of the population, then we should believe that 95% of the time, we
will have means between these two values.
What if we wanted to be 99% certain? We would need to find lower and upper limits so that
there is only 1% in the tails:
Thus, we would like 0.01/2 = 0.005 (or .5%) in each of the two tails. To find the lower and upper
limits, we would need to find the 0.005 percentile and the 1-0.005 = 0.995 percentile. We get:
Upper (97.5 percentile): 28.17

Lower (2.5 percentile): 22.83
Thus, we are 99% confident that the true population mean age, , is between 22.83 years and
28.17 years. In other words, there is a 99% probability that the true mean age is between 22.83
and 28.17 years.
If we want to be more confident, we need to expand our interval of values!
Note that in only one of our confidence intervals (99%), we have captured the true mean within
our range. This is very likely, since our confidence percentage is very high. BUT, keep in mind
that we never know what the true mean is! Thus, we cannot say that it would have been better to
stick with the wider 99% interval. After all, there is a 1% chance we might have made an error.
The level of confidence that we desire depends on the situation and the allowable mean width we
are willing to tolerate. More confidence means wider possibilities. In general, we never know

whether or not we have captured the true mean in our interval. On the upside, there is a
probability associated with it!
As a final note, it is interesting that we actually missed the true mean in our 95% confidence
interval, since there is only a 5% chance of error. Keep in mind, however, that this interval was
based on simulation. It is based on 1,000 samples and may have been better to increase the
number of samples.
6.2.2 Confidence Interval for ̅ Using Theoretical Results – When and are Unkown
In the previous section, we found that the sampling distribution of ̅ with is

approximately normal with , ̅ - and , ̅ - . As a bit of notation, if a random variable
√
has a normal distribution with mean and standard deviation, we would write:
̅ ( )
√
This reads, “ -bar is normally distributed with mean and standard deviation .”
√
This, however, assumes that we know something that we probably don‟t – the population mean
and standard deviation!
As you might guess, we will use ̅ and to approximate these. This proposes a problem: we are
√
introducing more error. In order to account for this, the normal distribution is not appropriate.
When using these approximations, we must use the theoretical Student’s Distribution. This
distribution looks much like the normal distribution, but is constructed by sample size, not the
mean and standard deviation. Below is a comparison of the -distribution in comparison to the
standard normal distribution for size .

We see that the standard deviation (in red) is just slightly larger than that of the standard normal
(in blue) – it is about 1.0339. So, as sample size gets greater, the -distribution begins to look
more like a standard normal. BUT, look at the one below where sample size is 10:
The variability is nearly 14% greater.
As we mentioned, this distribution‟s shape relies on the sample size. The relationship is called
the degrees of freedom and can be calculated as , that is degrees of freedom is equal
to one less than the sample size.
So, in our previous example, we had a sample size of 30, so
In a probability calculator, we would enter 29 for the degrees of freedom:

This will work much like the standard normal distribution. It, too, functions in displaying
standard deviations. That is, the mean is 0 standard deviations away from the mean. We can to
know the number of standard deviations to the left and to the right of the mean we need to travel,
in order to “trap” 95% of the distribution.
We use the calculator:
Thus, we would expect 95% of sample means to be within 2.045 standard deviations of the
mean. In other words:
̅
√
Or:

√
The lower limit is:
And the upper limit is:
Thus, we are 95% confident that the true average age in this town is between 23.4 and 27.8.
Notice that this is not very much different than our simulated confidence interval of 23.6 to 27.5.
So, which is more precise? This is arguable, but it is difficult to argue with empirical data.
Personally, I prefer the bootstrap confidence interval we ran earlier. My reasoning is that a
distribution of means is asymptotically normal, meaning that, under infinitely many sampled
units, the distribution would be exactly normal. This is very theoretical and not always valid.
For now, we will compare both.
For the 99% confidence interval, theory produces the following:
We would now simply adjust the number of standard deviations to 2.756:
Lower limit:

√
Upper limit:
Similarly, there is a 95% chance that the population mean age is between 22.6 and 28.5.
Compare this to our empirical result above of 22.8 to 28.2. We are, again, very close.
1. Describe, in your own words, what a bootstrap distribution is and why we would want to
use one. Be sure to mention the logical process behind building one, as well as the
assumptions we are making when we do so.
2. What is a confidence interval? Explain in your own words.
3. The following is a random sample of 10 labor costs associated with farming for civilian
consumers (in billions of dollars) since 1970.
Labor Costs (bill. $)

229.9 303.7
137.9 58.3
81.5 196.6
36.6 168.4
122.9 347.4
(SOURCE: Data randomly sampled from U.S. Statistical Abstract, Table 847)
a. Does the Central Limit Theorem apply for this data? Why or why not?
b. Using a bootstrap distribution, calculate a 95% confidence interval for , the true
population average labor cost.
c. In a complete sentence, interpret the real-world meaning of this value.
d. Using the bootstrap distribution and percentiles, how likely is it that a sample of
labor costs has a mean greater than $190,000,000,000?
4. In Arizona, primarily the Phoenix Metropolitan area, the issue of red-light cameras used
to catch red-light runners and speeders was a prominent one for much of the early 2000‟s.
Many studies were carried out over this period of debate to determine whether or not they
were effective, and whether or not they used taxpayer money appropriately. Suppose the

following data was collected on the revenue generated by randomly sampled red-lights
across the valley. The goal is to have, on average, each camera generate $750 and no less
than $640 per day.
883 522 590 779 887 615 690 771 843 509
872 840 536 892 880 588 547 770 687 842
832 840 676 555 884 617 517 586 505 552
a. Can the state be 95% confident that the desired average is possible?
b. Generate a 99% confidence interval for , the population average daily revenue
per camera. Explain in a complete sentence what this means.
c. Is the CLT valid in this problem? Explain.
d. Using the assumption that the distribution of ̅ is normally distributed, calculate a
theoretical 95% confidence interval for (you will need to estimate the
√
standard deviation of ̅ ‟s and ̅ to estimate .
e. In reality, anytime we estimate parameters, like you did above in part d), we
actually shouldn‟t assume a normal distribution. Instead, we should assume what
is known as a -distribution, which is symmetrical, though has more variability to
account for the uncertainty in our estimates.
Watch this brief informative video:
http://www.youtube.com/watch?v=yV-0ReCXW64
Pull up the following applet: http://www.stat.tamu.edu/~west/applets/tdemo.html.

You can type in the percentile corresponding to means you want to consider.
stands for “degrees of freedom” and can be calculated by taking the sample size
minus 1 ( ). (From the video, we know that, if the sample size is really, really
big, then the difference between the normal distribution and t-distribution
becomes indistinguishable.) The output of this applet will give you the number of
standard deviations your endpoints will be on either side of the mean.
For example, you will find that a 99% confidence interval for a sample of size 100
has endpoints that are 2.626 standard deviation from the mean (left and right).
Let‟s say your sample mean is ̅ and standard deviation . Then, the
confidence interval will be an interval around the sample mean. That is, one
standard deviation is (remember, the standard deviation of means
√ √
requires that we divide the standard deviation among individual ‟s and divide by
the square root of the sample size). So, 2.626 standard deviations would be
2.626(0.5) = 1.313 units away from the mean. The endpoints would be 40 – 1.313
and 40 + 1.313, or 38.687 to 41.313.
Formulaically, we found:

̅
√
Where is the number of standard deviations endpoints for a confidence

interval with total area in the tails. i.e.
Using this “crash course” in theoretical confidence interval-finding, compute the

95% confidence using these ideas. Do you get a similar result? How close?
6.3 Confidence Interval for ̂
6.3.1 Confidence Interval for ̂ Using Sampling Distributions
Suppose that it is of interest to estimate the proportion of recent customers that say they would
come back and shop at your store. You take a sample and determine that, of 30 people, 20 said
they would and 10 said they wouldn‟t. You would like to make an inference about the population
of all of your customers. In your sample, you know that:
Is the proportion of your customers that will come back and purchase from you again. You are
looking to find a confidence interval for ̂ . How do we do that with the simulator if we have no
data?
In reality, we do. We just have to make it numerical. In reality, 20/30 is an average. It is the
average of 30 responses. If we let:

{
So, we have a set of twenty 1‟s and ten 0‟s. We enter these in to our simulator.
We run the bootstrap sample on these 1‟s and 0‟s 1,000 times. We will get a variety of sample
proportions:

We see that this distribution is approximately normal. No surprise there!
Sampling Distribution of p-hat

350
300
250
200
150
100
50
0
0.933333333333334
0.983333333333334
0.433333333333333
0.483333333333333
0.483333333333333
0.533333333333333
0.533333333333333
0.583333333333333
0.583333333333333
0.633333333333333
0.633333333333333
0.683333333333333
0.683333333333333
0.733333333333334
0.733333333333334
0.783333333333334
0.783333333333334
0.833333333333334
0.833333333333334
0.883333333333334
0.883333333333334
0.933333333333334
0.983333333333334>
to
to
to
to
to
to
to
to
to
to
to
We calculate the 2.5- and 97.5-percentiles to get the middle 95% of sample proportions
generated in the bootstrap sample:
(As %) Results
Percentile 1: 97.5 0.833
Percentile 2: 2.5 0.500
Thus, we are 95% confident that the proportion of the population of customers that will shop at
your store will between 0.50 and 0.83. This is quite a wide interval! At least you know what to
expect with 95% confidence!

DULY CAUTIONED: The assumptions here are the same as for bootstrapping with ̅ : a
random sample is drawn from the population and is representative of the population. If not, the
sample is worthless, in any case.
6.3.2 Confidence Interval for ̂ Using Theoretical Results
Without providing the intuition for this method, we will simply state the results for the CLT
pertaining to the sampling distribution of ̂ :
Central Limit Theorem for ̂
The sampling distribution of ̂ (which is really just an average of 0‟s and 1‟s) is approximately
normal just as long as (similar idea as for the standard CLT).
̂
( ̂)
With
, ̂-
̂( ̂)
, ̂- √
NOTE: the standard deviation is often referred to as the margin of error in polls.
The results above state that,
1. the average proportion of the sampling distribution is the true population proportion.
2. The standard deviation of proportions of the sampling distribution is the above, complex,
calculation.
AS LONG AS ̂ and ( ̂) , both of which are

true statements. We can now proceed:
Here, we get to use the standard normal distribution to calculate the number of standard
deviations corresponding to the desired interval. So, we know that:

( )
, ̂- √
The number of standard deviations corresponding to the middle 95% of a standard normal
distribution is calculated below:
Thus, these endpoints are approximately 1.96 standard deviations away from the mean. So, our
confidence interval would be:
̂( ̂)
̂ √
In our case:
Lower limit:
Upper limit:
These limits are nearly identical to the simulation values!

1. In a sample of 55 students from Arizona State University taking a political science class,
30 say they would be interested in taking another political science class. The university is
interested in determine the proportion of all its students that are interested in taking
another political science class.
a. What is the population of interest in this study?
b. Construct a 90% bootstrap confidence interval for, , the true proportion.
c. Interpret the real-world meaning of your confidence interval.
2. A software company takes a random sample of recent orders and finds that, of the 250
sampled, 42 resulted in the return of a piece of purchased software.
b. Construct a 99% bootstrap confidence interval for, , the true proportion.
3. A batch of apples was inspected prior to shipment for any defects. Each apple was
marked as either pass (P), re-inspect (R) or fail (F). The following results were reported.
F P P P P P P P R R
P P R P R R P R P P
P R P R P F R R P P
P P P P P P P R P P
P P P F P R P P P R

b. Construct a 95% bootstrap confidence interval for, , the true proportion of
passing apples.
d. Using the CLT for ̂ ‟s, construct a 95% confidence interval (see blue box in this
section). How does it compare to the bootstrap confidence interval?
Chapter 7
Hypothesis Testing
We are often faced with uncertainty. Specifically, we often want to know whether one product is
better than the other, whether one group outperforms another in some type of task, or how one
manufacturing process compares to another, among many other things. How can we ever know?
The first step would be to conduct a study and collect data. The data must then be compared.

But, how do we do so if there exists variability from one sample to the next? This chapter will
address this question?
7.1 The Concept Behind Hypothesis Testing
So, you have a research question… what now? The question might at first seem obvious: let‟s
run a study. This question, however, needs some special treatment before anything else happens,
especially if the study comes at a significant cost.
For instance, suppose we‟re interested in determining whether pesticides damage the soil in
which we grow the majority of our food. This is a loaded curiosity. We first need to fully define
how it is that we would conduct such a study. For instance, will be comparing two regions, one
that has been sprayed with pesticides and one that hasn‟t been sprayed? What is it, exactly, that
we will measure in order determine the level of soil damage?
First and foremost, we need to formulate a hypothesis, or a belief about what it is that we expect
to see. For example,
Our hypothesis is that pesticides inflict serious damage on sprayed soils
Great, so we know what we believe. Did we just state what we wanted to happen? Probably not.
We‟ll usually formulate a hypothesis based on some existing observations. Perhaps we‟re seeing
that plants aren‟t producing as many edibles as previously thought. Or, maybe we‟re finding
rising levels of cancers. (By the way, all of the above are becoming eminent public concerns in
the U.S. and beyond.) So, based on these observations, we‟re forming an educated belief on the
effect of pesticides.
The next critical question:
How will we measure “soil damage?”
This can be a controversial question and may lack a consensus of an answer. Will it be measured
by the quantities of beneficial microbes present in the soil? By the soil‟s pH level? By the
amount of nitrogen it contains?
However we choose to measure “soil damage,” we want to be sure that we are being accurate.
That is, we need to be sure that we are actually measuring what we say we‟re measuring. This
sounds infantile, but it happens all the time that researchers say they‟re measuring something that
they‟re not actually measuring.
So, suppose we do some research and conclude that we test for soil damage by determining the
weight of vegetables harvested from these plants and comparing the average weight per plant for
the experimental group (some determined quantity of pesticides sprayed). We find that healthy
plants produce about 30 lbs. of some vegetable across their seasonal life span. Will the average
plant yield for plants sprayed with pesticides be lower?

Since this is a mathematical question, we would want to formulate our hypothesis into
mathematical statements.
Since we are dealing with an average in this scenario, the statistical symbol often used to
represent the average plant yield for the entire population of this particular vegetable is the
Greek letter Mu, .
Now, our experimental hypothesis is that pesticides damage the soil, measured by the pounds of
vegetables yielded from these plants. If that is the case, we would expect to see a yield of less
than 30 lbs. of fruit per plant. That is, our hypothesis is that
Since this is the experimental hypothesis, we have no evidence to conclude that this is true. Thus,
we should probably assume that there is no difference between the yields of pesticide-sprayed
and non-sprayed plants. Thus, begin by assuming that:
This second hypothesis is called the null hypothesis, that is, the hypothesis that is assumed until
there is sufficient evidence otherwise. Symbolically, this hypothesis is written and is typically
read as “null hypothesis,” or “h-naught.”
The hypothesis that we believe is called the alternative hypothesis, and is written , or “h-ay.”
To write these two hypotheses, we would write:
When evidence is insufficient, we say
“Based on sample data, we fail to reject in favor of ”
When evidence is sufficient to conclude that the average is really below 30, we say
“Based on sample evidence, we reject in favor of ”
We are cautious to make these conclusions based on sample data. Certainly, we may have
obtained an oddball sample that doesn‟t represent the population.
Let‟s practice writing some hypotheses. First, off, let‟s make note of the variety of population
characteristics, called population parameters, that we can seek to describe in a study.

Population Parameters
In a study, we seek to gain information about the target population. There is a number of things
we can test about the population parameters, actual values. Two common ones are:
1) Population average, denoted by Greek Mu (“mew”),

2) Population percentage, denoted by Greek Pi (“pie”),
Unfortunately, we do not know the true values for and and realistically cannot, unless we
sample the entire population. We can only estimate them based on the sample we collect. The
values we collect from the sample are sample statistics and are estimators for the respective
population parameters. These estimators for the values above, respectively, are notated:
1) ̂ (“mew-hat”)
2) ̂ (“pie-hat”)
Example 1: Because of variation in the manufacturing process, tennis balls produced by a

particular machine do not have identical diameters. Let denote the true average diameter
for tennis balls currently being produced. Suppose that the machine was initially calibrated to
achieve the design specification in. However, the manufacturer is now concerned that
the diameters no longer conform to this specification. If sample evidence suggests that the
true average diameter for tennis balls is not 3 inches, the production process will have to be
halted while the machine is recalibrated. Because stopping the production is costly, the
manufacturer wants to be quite sure that the true average diameter is not 3 inches before
undertaking recalibration. What are the competing hypotheses?
SOLUTION:
Under the original assumption, . The researcher wants to test whether . So:
Example 2: A long-used chemical in a particular carpet-cleaning product has been known to

successfully remove dark stains 70% of the time. After extensive research, the product's
formula is modified. The head of production must decide whether or not to sell the new
product. Write null and alternative hypotheses for conducting an experiment that might help
him decide.
SOLUTION:
Under original specifications, the proportion of time the product works is . He is

concerned that . If it is truly less effective, then he will not sell the new product. That is,

Example 3: Many older homes have electrical systems that use fuses rather than circuit
breakers. A manufacturer of 40-amp fuses wants to make sure that the mean amperage at
which its fuses burn out is in fact 40. If the mean amperage is lower than 40, customers will
complain because the fuses require replacement too often. If the mean amperage is higher
than 40, the manufacturer might be liable for damage to an electrical system as a result of
fuse malfunction. To verify the mean amperage of the fuses, a random sample of fuses is
selected and tested. If a hypothesis test is performed using the resulting data, what null and
alternative hypotheses would be of interest to the manufacturer?
SOLUTION:
The fuse is designed and assumed to be 40 amps. That is, on average, . He wants to make
sure it is not the case that . So,
So Your Average IS Different!
In our pesticide experiment, our target population is all plants of this particular variety. Thus, we
will take a random sample of plants from the pesticide group. Once we have that, we will find
the sample mean, which is called a sample statistic. That is, we can‟t possibly keep track of all
the plants in the population, so we will use the mean of the sample to help us describe the entire
population. Usually, this sample statistic is written as ̂ (“mew-hat”). Suppose that you find,
from the pesticide group, that
̂
The claim has been proven, right? Maybe, maybe not.
We must remember that this is just one random sample from all plants. Certainly, this sample
average is lower, but can it not just be due to random variation that we‟re seeing a difference?
After all, not all no-pesticide plants will produce exactly 30 lbs. of the vegetable.
What if we collect a sample and

Without some sort of analysis, we might be tempted to say this is sufficiently lower. However,
we need to have some sort of formal way to determine:
When is “low,” low enough? Or, more generally
The Big Question
When making conclusions about the population based on sample data, we must first ask the
question,
When do we conclude that an “extreme” is extreme enough to reject ?
As you might guess, there is probability involved.
That is, if the probability of observing what we have just seen, or what is more extreme, is small
“enough,” then we will reject and conclude that might be a more valid conclusion.
Punchline: We shouldn‟t reject the null hypothesis unless the probability of seeing something as
or more extreme is very unlikely.
What Happens If I Reject When the Data Provides Insufficient Evidence?
Imagine a medical test to determine whether or not you have some disease. Let‟s call this
disease, Disease X.
As for having the condition, you have one of two possibilities: you have it or you don‟t.
As for the test, it will either say that you have it or you don‟t.
Now, realistically, we know that there is no way to be omniscient and really know whether or not
you have the condition. However, let‟s imagine that we are all-knowing and can judge the
validity of the test. There are four possibilities:
1) The test is positive, and you do have X (accurate)

2) The test is positive, and you don’t have X (inaccurate)
3) The test is negative, and you do have X (inaccurate)
4) The test is positive, and you don’t have X (accurate)
It is evident that possibilities 2) and 3) represent scenarios where there is an inaccurate result.
That is, it would be invalid for the test to tell you that you have the condition when, in fact, you
don‟t. It would also be invalid for the test to tell you that you don‟t have the condition when, in
fact, you do.
Contrarily, we do want the test to tell us positive when we do have the condition and negative
when we don‟t.

Medical researchers usually give these four instances name, as summarized in the following
table:
Truth
Have Don‟t Have
Test Says
Positive True Positive False Positive
(Type II Error)
Negative False Negative True Negative
(Type I Error)
As can be seen, the green cells represent accurate results (true results) and the red cells represent
inaccurate results (false results).
As a patient, you would probably be quite upset (devastated, even) if you received false results
for a terrible condition, such as X!
In a hypothesis test, we are up against the same dilemma: our test result can be either positive or
negative. The truth may or may not be accurately represented. Let‟s modify our table slightly to
represent the hypothesis test scenario:
Truth
True False
Hypothesis Test
Don‟t True Positive False Positive

Conclusion
Reject (Type II Error)
Reject False Negative True Negative

(Type I Error)
In reality, we shouldn‟t reject (make it appear false), when it is true. If we do, we have a false
negative on our hands. Similarly, we shouldn‟t not reject (make it appear true), when it is
false. These are labeled Type I and Type II errors, respectively.
How Do We Avoid Erroneous Conclusions?
Unfortunately, we are not omniscient. Thus, we can never be sure that our conclusions are
accurate. If we knew, there would be no testing necessary!
On the flipside, we can determine how large of an error rate we require. Earlier, we mentioned
that we will reject when the probability of observing something as or more extreme as what
we have observed is “small.” This value of small fully determines our probability of a Type I
error. As researchers, it is our duty to set this value. This probability of a Type I error is called
the criterion, or alpha-level, and is denoted with the Greek letter alpha, .
Criterion/Alpha-Level

Our chosen risk of a Type I error is called the criterion or alpha-level, and is denoted by the .
Typical values for are:
That is, rarely will we choose a very small or considerably large alpha-level.
Suppose that we reject when the probability of observing something as or more extreme as
what we have observed is 5% (or smaller). We have that .
This means that there is still a 5% (or smaller) chance that we observe a value (sample mean,
sample proportion, etc.) more extreme than what we have observed. That is, there is a 5% chance
that we have falsely rejected the null hypothesis. Probabilistically,
( ) ( )
( )
To visualize this, consider the diagram below. Recall that a conditional probability statement
limits us to the event after the “pipe,” |, and then asks the question, “what percentage of the time
can we expect the event to occur, out of the times the specified condition occurs. The modified
table below shows that.
Truth
True
Hypothesis Test
Don‟t True Positive

Conclusion
Reject 95%
Reject False Negative

(Type I Error)
5%
100%
At this point we might wonder: why shouldn‟t we set extremely small so that we minimize the
Type 1 error risk?
Good question. Imagine that your alpha is 0.0001. This means you will only reject 0.01% (or
1 out of 10,000 times) of the time, when it is true. Certainly, your risk of a Type I error is
extremely small.

Since your decision criteria, or the numerical figure that we later calculate to decide whether or
not to reject, will be extremely stringent and difficult to achieve. If this is the case, then you
almost never reject the null hypothesis!
Okay, so if you very rarely reject the null hypothesis, then you are also potentially committing
another act of error: not rejecting the null hypothesis, even though it may be false. That is, you
increase the likelihood of a Type II error. Recall that,
( ) ( )
We can see here that failing to reject results in potentially failing to reject it even when it
should be rejected! Unfortunately, there is no free lunch in hypothesis testing.
Truth
True
Hypothesis Test
Don‟t False Negative

Conclusion
Reject (Type II Error)
Reject True Positive
Though we cannot yet easily provide numerical support for this claim (which certainly makes
sense), we will make the following preliminary conclusion:
Type II Error -
The probability of a Type II error, denoted , is inversely proportional to , the probability of a

Type I error. That is, decreasing will increase .
Important Caution
Students are often confused that the probability of rejecting when is true and the
probability of failing to reject when is true sum to 1. After all, these two possibilities are
only two of the four possible results in a test decision.
However, keep in mind that these are the percentages of time we reject and fail to reject out of all
the times that is true! This out of only one column total, not the entire sample space.

The important caution brings up the following idea:
If
( ) ( )
( )
, then,
( )
Similarly,
If
( ) ( )
( )
, then,
( )
The probability that we reject the null hypothesis when it is false is referred to as the power of
the test. We summarize these in the table below:
Truth
True False
Hypothesis Test
Don‟t
Conclusion
Reject
Reject
Example 4: The college dropout rate for a particular county is known to be 30%. The
educational board of a city within the county believe its dropout rate is significantly lower.
The board follows 60 students and, of them, 15 dropout. The board wants to run a statistical
hypothesis test with to determine whether their belief is true. Describe the
hypothesis test by:
a. Writing competing hypotheses
b. A decision rule for rejecting
c. A decision criterion rule

d. A generic conclusion statement
SOLUTION:
a.) Under the null hypothesis, . We want to test to see if . Thus:
b.) We will reject if the probability of observing something as or more extreme as 15 out of
60 dropouts ( ) under the assumption of the null hypothesis is less than or equal to
0.05. That is:
( )
c.) We will reject if the observed value of is smaller than some cutoff value of . That
is, it might be the case that would have to be smaller than, say, 13 in order for us to
reject the null hypothesis.
d.) Based on sample evidence, we (choose from below)

a. Reject in favor of
b. Fail to reject . We do not accept as true, but we don‟t have evidence to
conclude otherwise.
As we see from the above example, our hypothesis test needs to have a structured layout. We
need to know ahead of time what we‟ll do.
It is tempting, but we cannot determine our rejection criterion based on what the sample data
tells us! In practice, you can carry this type of philosophy, but you increase the error rate.
Consider, for example, the scenario wherein you take an exam for a biology class. You get the
results back and look at what you missed. You say, “oh, of course I should have put that! I knew
that!” If you told that to the instructor, she may say, “sorry, you didn‟t demonstrate that on the
exam.” Without surprise, we expect this response. Why? Because, it is the test that helps to
determine our level of understanding! It is not the other way around. If the instructor allowed
you to change your answer, then the test wouldn‟t really be demonstrating what you knew at that
time of the test. A hypothesis test is quite analogous. We carry one out because we have a hunch.
Always think back to this statement:
If you dig long enough in your data, you will find something!
This, however, looks upon the digging process as a negative thing since it does not justify the
decision questions. In fact, it creates a high likelihood that we are observing a coincidence and
not a solid finding at all! Thus, we increase the probability of error exponentially!

Structure of a Hypothesis Test
The following should be included in all hypothesis tests:
1. A statement of competing hypotheses ( vs. )

2. A decision rule for rejecting (based on )
3. A decision criterion rule (the physical value of the random variable that represents the
required “extremeness” of our observed sample value.
4. A conclusion statement (what the sample data tells you to conclude)
As an important note: we never say, “accept as true.” Instead, we remain accurate and say
that there is simply not enough evidence to reject it. Think about this as “innocent,” vs. “not
guilty.” Just because a court cannot prove that someone is guilty, they don‟t say that he is
innocent. Instead, they give the verdict of “not guilty.”
1. In your own words, explain the difference between the null and alternative hypotheses.
Also, explain how to identify each in a research study.
2. Explain why we assume that the null hypothesis is true before testing a hypothesis.
3. It is believed that 7% ( ) of an organic corn crop is lost to insect infestations. An

organic farmer has devised a system that may result in less insect destruction. He would
like to test this idea with a hypothesis test. Write the competing hypotheses.
4. A high school statistics class typically gets an average of scores out of 5 on an

Advanced Placement (AP) exam. Over the recent several years, he has found that his
students‟ scores were higher. He would like to test this hypothesis. Write the competing
hypotheses.
5. A snack dispenser has a failure rate of over a 5-year span. After changes to the
machine, the manufacturer would like to know whether or not this has changed. Write
competing hypotheses.
6. What does it mean to say that when describing a Type I error?
7. Based on the “Structure of a Hypothesis Test” blue box, fully describe the hypothesis test
for the scenario in question 3, assuming and that he finds that only 52 out of
1000 bushels of his crop are lost to insect infestations.
for the scenario in question 4, assuming and that he finds his students have
been averaging ̅ on the test.

for the scenario in question 5, assuming and that she finds the failure rate is 16
out of 1000 machines.
10. In real-world terms, describe what Type I and II errors would mean for each of questions
3, 4, and 5.
11. Why does the risk of a Type II error increase as we decrease ?

APPENDIX A
Answers to Select Problems
1.1 Data and Their Uses
1.
a. Nominal; ice cream names cannot be ordered, in general.
b. Interval; temperatures have order and the differences in temperature can be
reasonably discussed. For example, to talk about a difference is meaningful.
c. Ratio: Absolute 0 exists since there can be no balance at all. Additionally, it
makes sense to talk about ratios. For instance, accounts receivable balances can
be, say, 20% higher this month as compared to last.
d. Ordinal; there is an ordering, though we can‟t talk about the number 1 candidate
as being 2 better than the number 3 candidate. This is because the difference of 1
might not necessarily be the same from 1 to 2 as it would be from 2 to 3. Maybe
candidate 3 is a far third.
2.
a. 2,121 elements in the sample
b. Length of time is a quantitative variable, since it is a numerical measure.
3.
a. 15,000 elements in the sample
b. A proportion is a quantitative variable, since it is a ratio.
4.
a. Observational; the number of animals a family have is not being assigned.
Instead, families are simply being asked about how many animals they have.
b. The study might have considered families with horses. People with horses likely
live on the outskirts of a big city, perhaps being exposed to less pollen. Also,
maybe more families have pets because their children do not seem to have
allergies to them.
5.
a. Observational; the researchers are looking at preexisting habits. They are not
attempting to alter the habits to determine what effect doing so might have on
measures of reading ability and short-term memory.
b. No; perhaps those who watch more television also have other habits that lead
them to scoring poorly on such assessments.
6.
a. Observational; the opinions of the doctors are not being altered in any way.
b. There is a nonresponse bias since not all participants responded. Thus, it might be
the case that those with the strongest opinions decided to come forward, whereas
the other 17,000 who didn‟t respond might have influenced the poll in a different
way.

1.2 Descriptive VS. Inferential Statistics
1.
a. $4 million/day
b. If all days had the same gross revenue, $4 million would be earned.
c. $7.6
d. The amount of gross revenue earned on a given day varies by as much as $7.6
million as another day.
e. The film has generated an average of $4 million/day. There is much instability in
this average in that the actual gross revenue has varied from $1.6 million to $9.2
million, a range of $7.6 million. It is dangerous to place too many bets on what
might happen next, due to the extreme variability in revenues.
2.
a. 18 randomly selected college students
b. All college students
c. Answers vary; spending on clothing, style preference, etc.
d. Inferential; they wish to make conclusions about the population of all college
students
3.
a. 250 packages of cheese selected
b. All packages of cheese produced by the company
c. 248 or more must pass
4. Consider the following two datasets with a range of 30:
0, 1, 2, 2, 3, 2, 28, 29, 30
0, 1, 2, 3, 4, 3, 4, 2, 1 30
While both have a range of 30, the first dataset has most of its data towards the outer ends
of the dataset. In the second dataset, there appears to tightly spaced data, followed by one
outlier of 30. The second dataset is, overall, less spread out.
5. The researchers are trying to use CGCC students as a representative population of all
college students. This presents a bias, in that CGCC probably does not accurately
represent all college students.
2.4 Descriptive Statistics – Variability
1.
a. Standard deviation = 5.9; on average, beers in this sample are within 5.9 calories
of the average calorie content.

b. Q3 – Q1 = 4.75. The middle 50% of beer calories in this sample have a range of
4.75 calories. Specifically, they range from 29 calories (first quartile) to 33.75
calories (third quartile).
c. The skewness value is 0.14. This means the distribution is slightly skewed to the
right.
2.
a. Range = 64.3; Interquartile Range = 27.7 (71.9 – 44.2); Standard Deviation =
18.7. The difference between the highest and lowest percentage is 64.3%, telling
us that the percentage of school enrollees varies greatly across Central Africa.
However, this does not ensure that there is not a single outlier creating this wide
spread. The interquartile range is 27.7%, telling us that the middle 50% of
percentages span from 44.2% to 71.9%, still a considerable spread. The standard
deviation verifies that percentages are quite variable, since, on average, the
percentage of school enrollees varies by 18.7% points about the mean.
b. The interquartile range is 27.7%, telling us that the middle 50% of percentages
span from 44.2% to 71.9%, still a considerable spread. The standard deviation
verifies that percentages are quite variable, since, on average, the percentage of
school enrollees varies by 18.7% points about the mean.
c.
Enrollment
Mean 60.9
Standard Error 3.9
Median 61.9
Mode 61.9
Standard Deviation 18.7
Sample Variance 351.2
Kurtosis -0.4
Skewness 0.4
Range 64.3
Minimum 34.6
Maximum 98.9
Sum 1401.2
Count 23.0
d. Yes, it is skewed to the right, since the skewness value is 0.4, a positive value.
e.

Percent Enrolled
35%
30%
Relative Frequency 25%
20%
15%
10%
5%
0%
Percentage
The majority of people in Central Africa are not enrolled in school, since it is
predominantly the case that fewer than 50% of people in each nation attend school.
f. We know that ̅ and . A percentage of 79.6% is
standard deviation from the mean. We would expect that at least
( )
of all enrollment percentages would be within one standard deviation of the mean.
This is considered to be a very normal percentage (it is still within the “average”
spread).
3.
a. The range is 5750, which tells us that there is a difference of 5,750 feet from the
shortest street to the longest street. The interquartile range is 2170, telling us that
the middle 50% of all street lengths range from 980 feet to 3,150 feet. The
standard deviation is 1634, telling us that, on average, a street varies by 1,634 feet
from the mean street length.
b. The interquartile range is 2170, telling us that the middle 50% of all street lengths
range from 980 feet to 3,150 feet. The standard deviation is 1634, telling us that,
on average, a street varies by 1,634 feet from the mean street length.
c.

Street Lengths
Mean 2231.4
Standard Error 238.4
Median 2100.0
Mode 960.0
Kurtosis -0.2
Skewness 0.8
Range 5750.0
Minimum 100.0
Maximum 5850.0
Sum 104874.0
Count 47.0
d. The distribution is strongly skewed to the right.

e.
This means that a street length of 79.6 feet would be about 1.3 standard deviations
below the mean.
f.
Street Length
35.00%
30.00%
Relative Frequency
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
100-1099 1100-2099 2100-3099 3100-4099 4100-5099 5100-6099
Feet
By C.T. . / of all street lengths in the sample are guaranteed to

fall within 1.3 standard deviations of the mean. This is not unusual.
4. Answers vary;

Symmetric:
35
30
25
20
15
10
0
100 to 120 120 to 140 140 to 160 160 to 180 180 to 200
Bimodal (two peaks):
30
25
20
15
10
0
100 to 120 120 to 140 140 to 160 160 to 180 180 to 200
Right Skewed:

35
30
25
20
15
10
0
100 to 120 120 to 140 140 to 160 160 to 180 180 to 200
Left Skewed:
35
30
25
20
15
10
0
100 to 120 120 to 140 140 to 160 160 to 180 180 to 200
5.
a.

Repair Cost
Mean 971
Standard Error 382
Median 738
Mode -
Standard Deviation 1,207
Sample Variance 1,455,875
Kurtosis 7
Skewness 2
Range 4,194
Minimum -
Maximum 4,194
Sum 9,707
Count 10
Due to the great variability in repair costs, it would be most appropriate to use the
median as measure of center. It also reflects the fact that most repair costs, if there
are any, tend to be between $600 and $1000. Since the standard deviation
describes movement about the mean, it is not appropriate to be used in
combination with a median. Thus, we should probably use the interquartile range
to describe the middle 50% of repair costs.
b.
The repair costs of $4,194 is nearly 3 standard deviations above the mean. This
means that it is an outlier cost.
c. According to C.T., at least . / of the data in this data set should be

within 2.7 standard deviations of the mean. Thus, there is only a 14% chance that
we have a score outside of 2.7 standard deviations of the mean. This tells us that a
repair cost of $4,194 is fairly unusual.
6.

CC Ratios
Mean 12.35
Standard Error 0.62
Median 12.91
Mode #N/A
Standard Deviation
1.97
Sample Variance3.90
Kurtosis -0.50
Skewness -0.60
Range 6.03
Minimum 8.81
Maximum 14.84
Sum 123.47
Count 10.00
There do not appear to be extreme outliers, since the mean and median are close. However,
based on the mean being smaller than the median, and the skewness value being negative, there
is a slight left-skew to the distribution. The standard deviation tells us that average CC ratios are
within 0.62, or 62% points, of the mean. We verify these notions by consider the histogram
CC Ratio Distribution
45.00%
40.00%
35.00%
30.00%
rel freq
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
CC Ratio
We should also be careful to note that there is not very much data available, which is why we
don‟t distinctly see a skew.
7.

Nitrous Oxide (thous. Tons)
Mean 46.35
Standard Error 9.395205
Median 36
Mode 40
Kurtosis 0.09474
Skewness 0.949789
Range 136
Minimum 0
Maximum 136
Sum 927
Count 20
Nitrous Oxide Distribution

30%
25%
20%
rel freq
15%
10%
5%
0%
Nitrous Oxide (thous. Tons)
The distribution of nitrous oxide emissions is skewed to the right indicating that most states have
relatively low emissions, whereas fewer states have relatively high emissions. We note that the
median is a good measure, indicating that 36 thousand tons is the 50th percentile. There are two
outliers of 136 thousand tons. For this value, , indicating that at least around
75% of all values in the data set are within 2.1 standard deviations of the mean. Thus, 136 can be
considered a mild outlier.
3.2 Joint Probability

1. See Video Solution
2.
a. About 85% of all the past calls were for medical assistance.
b. P(call is not for medical assistance) = 1 – 0.85 = 0.15.
c. P(two successive calls are both for medical assistance) = (0.85)(0.85) = 0.7225.
d. P(first call is for medical assistance and second call is not for medical assistance)
= (0.85)(0.15) = 0.1275
e. P(exactly one of two calls is for medical assistance) = P(first call is for medical
assistance and the second is not) + P(first call is not for medical assistance but the
second is) = (0.85)(0.15) + (0.15)(0.85) = 0.255.
f. Probably not. There are likely to be several calls related to the same event -
several reports of the same accident or fire that would be received close together
in time.
3. (“ ” “ ” “ ”) . / . / . /
4. See Video Solution
5.
a. The "expert" assumed that the positions of the two valves were independent.
b. The position of the two valves is not independent but rather dependent. The
effect of the error makes the probability much smaller. The actual probability is
compared to .
6.
a. Assuming that whether Jeanie forgets to do one of her “to do” list items is
independent of whether or not she forgets any other of her “to do” list items, the
probability that she forgets all three errands = (0.1)(0.1)(0.1) = 0.001.
b. ( )
( )
c. P(remembers the first errand, but not the second or the third) = (0.9)(0.1)(0.1) =
0.009.
5.1 The Ideas Behind the Continuous Distribution
1.
a.

Pizza Size Distribution
0.6
0.5
0.4
Probability
0.3
0.2
0.1
0
12 14 16 18
Size (inches)
b. ( )
c. ( )
d. , - ( ) ( ) ( ) ( ) inches per pizza, on
average.
e. ( ) (doesn‟t include the 12-inch pizza!)
2.
a. ( )
b. ( )
3.
a. , so ( ) for
b. ( )
c. ( )
d. ; on average, the professor dismisses class 5 minutes after the hour.
e. ; on average, the amount of time that the professor dismisses the class
after the hour by varies by 2.9 minutes about the mean.
f. ( ) ( )
4.
a. ( )

b. ( ) ( )
c. ( ) ( ) ( )
d. ( )
5.
a. , so ( ) for
b. ( )
c. ( )
d. Both ( ) ( ) because, in a continuous distribution, the
probability that is 0.

e. ( )
f. ; the average response time is 26 minutes
g. ; on average, wait times deviate from the mean wait time by 4.6 minutes.
h. . Thus, we want ( ) (
) .
5.2 The Normal Distribution
1.
a.
b.

c.
d.

2.
a. The long-run proportion of all children born in the U.K. expected to weight more
than 10 lbs. is 0.0186.
b. The long-run proportion of all children born in the U.K. expected to weigh at
most 10 lbs. is 09814.

c. The long-run proportion of all children born in the U.K. expected to weigh
between 5 and 6.5 lbs. is 01837.
d. The long-run proportion of all children born in the U.K. expected to weigh
between 1 and 2 lbs. is 0.0000.

e. 20% of all children are expected to be born weighing less than 6.5 lbs.
6. In a recent years, Scholastic Aptitude Test (SAT) scores for all college-bound seniors in
the United States was such that points and points (SOURCE:
http://www.collegeboard.com) .
a. 50% of students scored less than how many points?
b. 50% of students scored more than how many points?
c. In order to be in the top 10% of SAT-takers, what score would one have to
achieve?
d. What score do the lowest 10% score between?
e. The middle 50% of students scored between what two values?

3.
a. 50% of students score less than 1518 on the test.
b. By complementary probability, 50% of students should score more than 1518.

c. You would have to score about 1913 points.
d. About 1123.

e. The middle 50% score between about 1310 and 1726.
4.
a.

b. The Empirical Rule is a summary of what we have done above. It is a nice rule-
of-thumb.
5.
a. The distribution would maintain its exact shape, though would be shifted 10 units
to the right.
b. The distribution would become wider and have a lower peak. This must happen to
make sure the area is still 1 when the distribution becomes wider.
c. The distribution would become narrower and have a higher peak. If a distribution
becomes narrower, its height must increase to maintain an area of 1.
d. The mean, , determines where the distribution is centered without altering its
shape. The standard deviation, , will make a distribution wide and low-peaked if
it large, and will make a distribution narrow and high-peaked if small.
6.1 The Sampling Distribution for ̅
1. Answers vary
2. Answers vary – emphasis on the ability to have a population distribution with any
unknown shape.
3.
a. 0.2525
b. 0.2514
c. 0.9044
d. 95.1 and 104.9
4.
a. 0.0272

b. This might indicate that the production process is outside of the norm. This type
of average is unlikely in a sample of size where The company
should investigate why the average thickness of its glass samples is so thick.
5.
a. It should be approximately normal, regardless of the distribution of revenues.
b. 0.3869
c. The standard deviation of means would change from $6,957 to $4,400. This
would change ( ) . This makes sense, since the
distribution of means is less spread, and so there will be fewer mean sales
amounts beyond $420,000.
d. $421,255.50; If the team averages more than this amount for each team member,
then they will receive the paid vacation days.
6. We know that , ̅- , so , so √ A person‟s
√ √
income varies, on average, by about $2,906.89 from the population average of incomes.
7.
a. rooms and rooms (NOTE: be sure to use sdev.p() since this is a
population standard deviation we want)
b. It should be approximately normal based on the Central Limit Theorem; the
sample size of 30 satisfies the minimum required sample size to meet normality
assumptions.
c. Answers will vary slightly due to sampling variability of the simulation process;
, ̅- and , ̅- . We see that , ̅ - as expected. We also see that
, which is what we obtained via simulation.
√ √
d. Answers will vary slightly due to sampling variability of the simulation process;
√ √
e. Answers will vary slightly due to sampling variability of the simulation process;
√ √
f. The population standard deviation can be thought of as the distribution of means
from a sample of size . That is, , ̅- . Since it is the smallest
√
possible sample size, it will have the highest degree of variability.
g. 0.000 or about 0% chance
h. As with tossing a coin repeatedly, when something is repeated over-and-over
again, the amount of variation in the outcomes becomes relatively small. That is,
any mild outliers get averaged in to a large sample of typical values, and its effect
is dispersed. In small samples, the opposite holds – deviate values are highly
corrosive to the sample mean.
6.2 Confidence Interval for ̅
1. Answers vary

2. Answers vary
3.
a. No, the sample size is 10, which is less than the minimum required (30).
b. ( )
c. We are 95% confident that the population average labor cost is between $109.6
billion and $227.6 billion.
d. About 0.213
4.
a. Yes, since they can be 95% confident that the average revenue per camera will be
between $654.51 and $752.44.
b. No, since they can be 99% confident that the average revenue per camera will be
between $637.42 and $768.01, which includes the possibility of the average being
lower than $640.
c. Yes, the sample size is 30, which is the minimum required sample size for the
CLT results to be applied.
d. We know that , ̅ - , which we are estimating by ̅ . That is, we are assuming
the sample mean is the population mean for the basis of our interval. Here,
̅ . Similarly , ̅ - √ . We are using to estimate . Thus, our
estimate of , ̅- . Using our probability calculator, we find:
√
Our 95% confidence interval would be 652.1 to 755.1, which is close to our
bootstrap confidence interval. It is a bit wider than we would like.
e. Here we have that . We have 5% to split between the tails.

Thus, in each tail. We find that (same number of standard
deviations from the mean to each tail, since the distribution is symmetric):

We have that ̅ and √ . So our interval will be
Where
( )

Thus, our interval is:
( )
Or
( )
This is a bit wider, accounting for the extra variability in estimating and .
6.3 Confidence Interval for ̂
7.1 The Concept Behind Hypothesis Testing
1. The null hypothesis is assumed to be true and is usually based on what has been observed
before. The alternative hypothesis is what we would like to test, which is something that
would challenge past observations or assumptions about a population.
2. We assume it is true because it is based on past observations or research. For example, if

the Census Bureau finds that 35% of Americans enjoy hypothesis testing, then this is
typically based on some fairly extensive research. If a researcher believes this rate is
greater in his community, then he can test his alternative hypothesis.
3.
4.
5.

6. This is the probability that we reject the null hypothesis when it is, in fact, true. That is
( ) . This allows us to be 95% confident that we fail to reject
when it is true, a correct decision.
7.
1) Hypotheses:
2) Decision Rule: We will reject the null hypothesis when the likelihood of
observing something as small or smaller than 52 out of 1000 bushels is no
larger than a 1% probability, under the assumption of the null hypothesis. That is,
( )
3) We will reject if the observed value of is smaller than some cutoff
value of .
4) Based on the sample evidence, we will either:
a. Reject in favor of of insect-related crop destruction for the
farmer‟s new method.
b. Fail to reject . We do not have sufficient evidence to conclude that the
farmer‟s new method is better than his old method.
8.
1) Hypotheses:
observing something as large or larger than ̅ is no larger than a 5%
probability, under the assumption of the null hypothesis. That is,
( ̅ )
3) We will reject if the observed average of ̅ is larger than some cutoff

value of ̅ .
a. Reject in favor of out of 5 questions are answered correctly by
his students (as of recent observations).
b. Fail to reject . We do not have sufficient evidence to conclude that the
instructor‟s more recent students do better on the AP exam than his former
students.
9.
1) Hypotheses:

observing something as small/large or smaller/larger than 16 out of 1000
bushels is no larger than a 1% probability, under the assumption of the null
hypothesis. That is,
( )
3) We will reject if the observed value of is smaller or larger than some

cutoff values of . That is, if it is smaller than some value, say , or larger than
some value, say , then we will reject . Remember, we set-up a hypothesis
first, then do the test. Even though 16 is larger than 15 out of 1000, we did not
know this to begin with. We are still testing whether or not this value is
significantly different and do not care about the direction of the difference.

a) Reject in favor of of machines fail. That is, either a
significantly fewer number of them fail, or a significantly greater number
of them fail.
b) Fail to reject . We do not have sufficient evidence to conclude that new
machines fail more or less when compared to the old machine.
10.
1) Type I: We conclude the farmer‟s method reduces crop destruction, when there is
no difference; Type II: We conclude the farmer‟s method is no different than the
old method, when in fact there is less than 7% crop destruction with his new
method.
2) Type I: We conclude the instructors students perform better than his former
students, when in fact there is no difference; Type II: We conclude that his new
students perform just as well as his former students, when in fact they do better.
3) Type I: We conclude that the new machines fail more or less than the former
machines, when in fact there is no difference; Type II: We conclude that there is
no difference between the failure rates of the new and old machines, when in fact
there is a significant difference.
11. Increasing means we will reject less often, as we set more stringent conditions upon
the rejection process. If we reject less often, then there is an elevated likelihood that we
may fail to reject, when in fact we should. This is precisely what a Type II error is.

Podmanik PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Podmanik PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Statistics for

Statistics for Decision-Making in Business © Milos Podmanik Page 2

… Read this book!

Statistics for Decision-Making in Business © Milos Podmanik Page 3

Chapter Section Concept Page

6.1 Sampling Distribution for ̅ 181

Statistics for Decision-Making in Business © Milos Podmanik Page 4

1.1 Data and Their Uses

1.1.1 Three Good Reasons to Study Statistics

In no particular order, these are:

Statistics for Decision-Making in Business © Milos Podmanik Page 5

2. Making Good Decisions

3. Evaluating Decisions that Affect Our Lives

Statistics for Decision-Making in Business © Milos Podmanik Page 6

1.1.2 Types of Data

Data comes in two main categories: quantitative and qualitative/categorical.

1.1.3 Not All Quantitative Variables Are As They Appear!

Statistics for Decision-Making in Business © Milos Podmanik Page 7

1.1.4 How We Obtain Data

 U.S. Statistical Abstract (U.S. Census) - http://www.census.gov/compendia/statab/

Conducting a Study to Obtain Data

Statistics for Decision-Making in Business © Milos Podmanik Page 8

Statistics for Decision-Making in Business © Milos Podmanik Page 9

Issues in Planning a Study

Statistics for Decision-Making in Business © Milos Podmanik Page 10

Homework Problems - 1.1

a. Is this study an observational study or an experiment? Explain.

Statistics for Decision-Making in Business © Milos Podmanik Page 11

a. Is the study described an observational study or an experiment?

1.2Descriptive VS. Inferential Statistics

1.2.1 The Purpose of Statistics and “Statistics”

1.2.2 Descriptive Statistics

Statistics for Decision-Making in Business © Milos Podmanik Page 12

We first consider the types of variables we have present:

With categorical variables, we cannot mathematically manipulate the observed values, or

Account Type Relative Frequency

Statistics for Decision-Making in Business © Milos Podmanik Page 13

 Central tendency – measure of the “typical” or center-most observation. Examples are

Statistics for Decision-Making in Business © Milos Podmanik Page 14

Mean (Simple Average)

The mean, or simple average, of a quantitative variable is expressed as:

The difference is:

Statistics for Decision-Making in Business © Milos Podmanik Page 15

For “New” accounts:

Account Type Revenue ($)

For “Old” accounts:

Account Type Revenue ($)

We summarize this information in a table:

Statistics for Decision-Making in Business © Milos Podmanik Page 16

1.2.3 Inferential Statistics

Statistics for Decision-Making in Business © Milos Podmanik Page 17

Statistics for Decision-Making in Business © Milos Podmanik Page 18

Homework Problems - 1.2

6.9 9.2 1.6 1.9 1.9 1.6 4.9

a. Calculate the mean.

Statistics for Decision-Making in Business © Milos Podmanik Page 19

Statistics for Decision-Making in Business © Milos Podmanik Page 20

1.3.1 Sum(), Average(), Min(), and Max()

Statistics for Decision-Making in Business © Milos Podmanik Page 21

Statistics for Decision-Making in Business © Milos Podmanik Page 22

Statistics for Decision-Making in Business © Milos Podmanik Page 23

Statistics for Decision-Making in Business © Milos Podmanik Page 24

Statistics for Decision-Making in Business © Milos Podmanik Page 25