Documente Academic
Documente Profesional
Documente Cultură
Basic Concepts
If you roll a six-sided die, there are six possible outcomes, and each of these
outcomes is equally likely. A six is as likely to come up as a three, and likewise for the
other four sides of the die. What, then, is the probability that a one will come up? Since
there are six possible outcomes, the probability is 1/6. What is the probability that either
a one or a six will come up? The two outcomes about which we are concerned (a one or
a six coming up) are called favorable outcomes. Given that all outcomes are equally
likely, we can compute the probability of a one or a six using the formula:
In this case there are two favorable outcomes and six possible outcomes. So the
probability of throwing either a one or six is 1/3. Don't be misled by our use of the term
"favorable," by the way. You should understand it in the sense of "favorable to the event
in question happening." That event might not be favorable to your well-being. You might
be betting on a three, for example.
The above formula applies to many games of chance. For example, what is the
probability that a card drawn at random from a deck of playing cards will be an ace?
Since the deck has four aces, there are four favorable outcomes; since the deck has 52
cards, there are 52 possible outcomes. The probability is therefore 4/52 = 1/13. What
about the probability that the card will be a club? Since there are 13 clubs, the
probability is 13/52 = 1/4.
Let's say you have a bag with 20 cherries: 14 sweet and 6 sour. If you pick a
cherry at random, what is the probability that it will be sweet? There are 20 possible
cherries that could be picked, so the number of possible outcomes is 20. Of these 20
possible outcomes, 14 are favorable (sweet), so the probability that the cherry will be
sweet is 14/20 = 7/10. There is one potential complication to this example, however. It
must be assumed that the probability of picking any of the cherries is the same as the
probability of picking any other. This wouldn't be true if (let us imagine) the sweet
cherries are smaller than the sour ones. (The sour cherries would come to hand more
readily when you sampled from the bag.) Let us keep in mind, therefore, that when we
assess probabilities in terms of the ratio of favorable to all potential cases, we rely
heavily on the assumption of equal probability for all outcomes.
Probability of A and B
When two events are independent, the probability of both occurring is the product
of the probabilities of the individual events. More formally, if events A and B are
independent, then the probability of both A and B occurring is:
where P(A and B) is the probability of events A and B both occurring, P(A) is the
probability of event A occurring, and P(B) is the probability of event B occurring.
If you flip a coin twice, what is the probability that it will come up heads both
times? Event A is that the coin comes up heads on the first flip and Event B is that the
coin comes up heads on the second flip. Since both P(A) and P(B) equal 1/2, the
probability that both events occur is
Conditional Probabilities
Once the first card chosen is an ace, the probability that the second card chosen
is also an ace is called the conditional probability of drawing an ace. In this case, the
"condition" is that the first card is an ace. Symbolically, we write this as:
The vertical bar "|" is read as "given," so the above expression is short for: "The
probability that an ace is drawn on the second draw given that an ace was drawn on the
first draw." What is this probability? Since after an ace is drawn on the first draw, there
are 3 aces out of 51 total cards left. This means that the probability that one of these
aces will be drawn is 3/51 = 1/17.
If Events A and B are not independent, then P(A and B) = P(A) x P(B|A).
Applying this to the problem of two aces, the probability of drawing two aces from a
deck is 4/52 x 3/51 = 1/221.
Birthday Problem
If there are 25 people in a room, what is the probability that at least two of them
share the same birthday. If your first thought is that it is 25/365 = 0.068, you will be
surprised to learn it is much higher than that. This problem requires the application of
the sections on P(A and B) and conditional probability.
This problem is best approached by asking what is the probability that no two
people have the same birthday. Once we know this probability, we can simply subtract it
from 1 to find the probability that two people share a birthday.
If we choose two people at random, what is the probability that they do not share
a birthday? Of the 365 days on which the second person could have a birthday, 364 of
them are different from the first person's birthday. Therefore the probability is 364/365.
Let's define P2 as the probability that the second person drawn does not share a
birthday with the person drawn previously. P2 is therefore 364/365. Now define P3 as
the probability that the third person drawn does not share a birthday with anyone drawn
previously given that there are no previous birthday matches. P3 is therefore a
conditional probability. If there are no previous birthday matches, then two of the 365
days have been "used up," leaving 363 non-matching days. Therefore P3 = 363/365. In
like manner, P4 = 362/365, P5 = 361/365, and so on up to P25 = 341/365.
In order for there to be no matches, the second person must not match any
previous person and the third person must not match any previous person, and the
fourth person must not match any previous person, etc. Since P(A and B) = P(A)P(B),
all we have to do is multiply P2, P3, P4 ...P25 together. The result is 0.431. Therefore
the probability of at least one match is 0.569.
Gambler's Fallacy
A fair coin is flipped five times and comes up heads each time. What is the
probability that it will come up heads on the sixth flip? The correct answer is, of course,
1/2. But many people believe that a tail is more likely to occur after throwing five heads.
Their faulty reasoning may go something like this: "In the long run, the number of heads
and tails will be the same, so the tails have some catching up to do."
In this section, we shall develop a few counting techniques. Such techniques will
enable us to count the following, without having to list all of the items:
Before we learn some of the basic principles of counting, let's see some of the
notation we'll need.
So in the example,
Addition Rule
where
n(E) = Number of outcomes of event E
Multiplication Rule
Now consider the case when two events E1 and E2 are to be performed and the
events E1 and E2 are independent events i.e. one does not affect the other's outcome.
Example
Say the only clean clothes you've got are 2 t-shirts and 4 pairs of jeans. How
many different combinations can you choose?
Answer
2×4=8 possible combinations
We could write
Suppose that event E1 can result in any one of n(E1) possible outcomes; and for
each outcome of the event E1, there are n(E2) possible outcomes of event E2.
Together there will be n(E1) × n(E2) possible outcomes of the two events.
Probability Rules
There are three main rules associated with basic probability: the addition rule, the
multiplication rule, and the complement rule. You can think of the complement rule as
the 'subtraction rule' if it helps you to remember it.
Types of Distributions
Bernoulli Distribution
Here, the occurrence of a head denotes success, and the occurrence of a tail
denotes failure.
Probability of getting a head = 0.5 = Probability of getting a tail since there are
only two possible outcomes.
Here, the probability of success(p) is not same as the probability of failure. So,
the chart below shows the Bernoulli Distribution of our fight.
Here, the probability of success = 0.15 and probability of failure = 0.85. The
expected value is exactly what it sounds. If I punch you, I may expect you to punch me
back. Basically expected value of any distribution is the mean of the distribution. The
expected value of a random variable X from a Bernoulli distribution is found as follows:
There are many examples of Bernoulli distribution such as whether it’s going to
rain tomorrow or not where rain denotes success and no rain denotes failure and
Winning (success) or losing (failure) the game.
Uniform Distribution
When you roll a fair die, the outcomes are 1 to 6. The probabilities of getting
these outcomes are equally likely and that is the basis of a uniform distribution. Unlike
Bernoulli Distribution, all the n number of possible outcomes of a uniform distribution are
equally likely.
You can see that the shape of the Uniform distribution curve is rectangular, the
reason why Uniform distribution is called rectangular distribution.
The number of bouquets sold daily at a flower shop is uniformly distributed with a
maximum of 40 and a minimum of 10.
Let’s try calculating the probability that the daily sales will fall between 15 and 30.
The probability that daily sales will fall between 15 and 30 is (30-15)*(1/(40-10)) =
0.5. Similarly, the probability that daily sales are greater than 20 is = 0.667
The standard uniform density has parameters a = 0 and b = 1, so the PDF for
standard uniform density is given by:
Binomial Distribution
Suppose that you won the toss today and this indicates a successful event. You
toss again but you lost this time. If you win a toss today, this does not necessitate that
you will win the toss tomorrow. Let’s assign a random variable, say X, to the number of
times you won the toss. What can be the possible value of X? It can be any number
depending on the number of times you tossed a coin.
There are only two possible outcomes. Head denoting success and tail denoting
failure. Therefore, probability of getting a head = 0.5 and the probability of failure can be
easily computed as: q = 1- p = 0.5.
A distribution where only two outcomes are possible, such as success or failure,
gain or loss, win or lose and where the probability of success and failure is same for all
the trials is called a Binomial Distribution.
The outcomes need not be equally likely. Remember the example of a fight
between me and Undertaker? So, if the probability of success in an experiment is 0.2
then the probability of failure can be easily computed as q = 1 – 0.2 = 0.8.
Each trial is independent since the outcome of the previous toss doesn’t
determine or affect the outcome of the current toss. An experiment with only two
possible outcomes repeated n number of times is called binomial. The parameters of a
binomial distribution are n and p where n is the total number of trials and p is the
probability of success in each trial.
On the basis of the above explanation, the properties of a Binomial Distribution are
A binomial distribution graph where the probability of success does not equal the
probability of failure looks like
Now, when probability of success = probability of failure, in such a situation the
graph of binomial distribution looks like
Normal Distribution
Poisson Distribution
Suppose you work at a call center, approximately how many calls do you get in a
day? It can be any number. Now, the entire number of calls at a call center in a day is
modeled by Poisson distribution. Some more examples are
You can now think of many examples following the same course. Poisson
Distribution is applicable in situations where events occur at random points of time and
space wherein our interest lies only in the number of occurrences of the event.
1. Any successful event should not influence the outcome of another successful
event.
2. The probability of success over a short interval must equal the probability of
success over a longer interval.
3. The probability of success in an interval approaches zero as the interval
becomes smaller.
Let µ denote the mean number of events in an interval of length t. Then, µ = λ*t.
The mean µ is the parameter of this distribution. µ is also defined as the λ times
length of that interval. The graph of a Poisson distribution is shown below:
The graph shown below illustrates the shift in the curve due to increase in mean.
It is perceptible that as the mean increases, the curve shifts to the right.
Exponential Distribution
Let’s consider the call center example one more time. What about the interval of
time between the calls ? Here, exponential distribution comes to our rescue.
Exponential distribution models the interval of time between the calls.
Exponential distribution is widely used for survival analysis. From the expected life of
a machine to the expected life of a human, exponential distribution successfully delivers
the result.
For survival analysis, λ is called the failure rate of a device at any time t, given
that it has survived up to t.
Also, the greater the rate, the faster the curve drops and the lower the rate, flatter
the curve. This is explained better with the graph shown below.
P{X≤x} = 1 – e-λx, corresponds to the area under the density curve to the left of
x.
P{X>x} = e-λx, corresponds to the area under the density curve to the right of x.
P{x1<X≤ x2} = e-λx1 – e-λx2, corresponds to the area under the density curve
between x1 and x2.
Population vs sample
The population is the entire group that you want to draw conclusions about.
The sample is the specific group of individuals that you will collect data from.
It can be very broad or quite narrow: maybe you want to make inferences about
the whole adult population of your country; maybe your research focuses on customers
of a certain company, patients with a specific health condition, or students in a single
school.
Sampling frame
The sampling frame is the actual list of individuals that the sample will be drawn
from. Ideally, it should include the entire target population (and nobody who is not part
of that population).
Example:
Sample size
The number of individuals in your sample depends on the size of the population,
and on how precisely you want the results to represent the population as a whole.
You can use a sample size calculator to determine how big your sample should
be. In general, the larger the sample size, the more accurately and confidently you can
make inferences about the whole population.
Probability sampling means that every member of the population has a chance of
being selected. It is mainly used in quantitative research. If you want to produce results
that are representative of the whole population, you need to use a probability sampling
technique.
In a simple random sample, every member of the population has an equal chance of
being selected. Your sampling frame should include the whole population.
To conduct this type of sampling, you can use tools like random number generators or
other techniques that are based entirely on chance.
Example
2. Systematic Sampling
Example
All employees of the company are listed in alphabetical order. From the first 10
numbers, you randomly select a starting point: number 6. From number 6 onwards,
every 10th person on the list is selected (6, 16, 26, 36, and so on), and you end up with
a sample of 100 people.
If you use this technique, it is important to make sure that there is no hidden
pattern in the list that might skew the sample. For example, if the HR database groups
employees by team, and team members are listed in order of seniority, there is a risk
that your interval might skip over people in junior roles, resulting in a sample that is
skewed towards senior employees.
3. Stratified Sampling
This sampling method is appropriate when the population has mixed characteristics,
and you want to ensure that every characteristic is proportionally represented in the
sample.
You divide the population into subgroups (called strata) based on the relevant
characteristic (e.g. gender, age range, income bracket, job role).
From the overall proportions of the population, you calculate how many people should
be sampled from each subgroup. Then you use random or systematic sampling to
select a sample from each subgroup.
Example
The company has 800 female employees and 200 male employees. You want to
ensure that the sample reflects the gender balance of the company, so you sort the
population into two strata based on gender. Then you use random sampling on each
group, selecting 80 women and 20 men, which gives you a representative sample of
100 people.
4. Cluster Sampling
Cluster sampling also involves dividing the population into subgroups, but each
subgroup should have similar characteristics to the whole sample. Instead of sampling
individuals from each subgroup, you randomly select entire subgroups.
If it is practically possible, you might include every individual from each sampled
cluster. If the clusters themselves are large, you can also sample individuals from within
each cluster using one of the techniques above.
This method is good for dealing with large and dispersed populations, but there is
more risk of error in the sample, as there could be substantial differences between
clusters. It’s difficult to guarantee that the sampled clusters are really representative of
the whole population.
Example
The company has offices in 10 cities across the country (all with roughly the
same number of employees in similar roles). You don’t have the capacity to travel to
every office to collect your data, so you use random sampling to select 3 offices – these
are your clusters.
1. Convenience Sampling
This is an easy and inexpensive way to gather initial data, but there is no way to tell if
the sample is representative of the population, so it can’t produce generalizable results.
Example
You are researching opinions about student support services in your university,
so after each of your classes, you ask your fellow students to complete a survey on the
topic. This is a convenient way to gather data, but as you only surveyed students taking
the same classes as you at the same level, the sample is not representative of all the
students at your university.
Voluntary response samples are always at least somewhat biased, as some people
will inherently be more likely to volunteer than others.
Example
You send out the survey to all students at your university and a lot of students
decide to complete it. This can certainly give you some insight into the topic, but the
people who responded are more likely to be those who have strong opinions about the
student support services, so you can’t be sure that their opinions are representative of
all students.
3. Purposive Sampling
This type of sampling involves the researcher using their judgement to select a
sample that is most useful to the purposes of the research.
It is often used in qualitative research, where the researcher wants to gain detailed
knowledge about a specific phenomenon rather than make statistical inferences. An
effective purposive sample must have clear criteria and rationale for inclusion.
Example
You want to know more about the opinions and experiences of disabled students
at your university, so you purposefully select a number of students with different support
needs in order to gather a varied range of data on their experiences with student
services.
4. Snowball Sampling
If the population is hard to access, snowball sampling can be used to recruit
participants via other participants. The number of people you have access to
“snowballs” as you get in contact with more people.
Example
Sources:
https://www.investopedia.com/terms/p/probabilitydistribution.asp#:~:text=There
%20are%20many%20different%20classifications,binomial%20distribution%2C
%20and%20Poisson%20distribution.
https://www.analyticsvidhya.com/blog/2017/09/6-probability-distributions-data-
science/
https://study.com/academy/lesson/basic-probability-theory-rules-formulas.html
http://onlinestatbook.com/2/probability/basic.html
https://www.scribbr.com/methodology/sampling-methods/
https://loremipsum.io/generator/?n=5&t=p
https://www.intmath.com/counting-probability/2-basic-principles-counting.php