Documente Academic
Documente Profesional
Documente Cultură
Chapter 3
Exploring A Distribution
1. Always plot your data: make a graph
(histogram, stemplot, normal probability plot,
boxplot, dotplot)
2. Look for overall patterns (shape, center,
spread) and for striking deviations such as
outliers.
3. Calculate a numerical summary to briefly
describe center and spread.
4. (What we will be studying in this chapter)
Sometimes the overall pattern of a large
number of observations is so regular that we
can describe it by a smooth curve.
BPS - 5th Ed.
Chapter 3
Density Curves
Example: here is a
histogram of vocabulary
scores of 947 seventh
graders.
The smooth curve
drawn over the
histogram is a
mathematical model for
the distribution.
Chapter 3
Density Curves
The density curves are defined by probability
density functions. It is a formula used to
specify and compute areas under the curve.
This will give us probabilities/proportions for
the random variable.
The function must have two properties:
1. The total area under the graph of the
function is equal to 1 (i.e. the total
probability is 1)
2. The function is always greater than or
equal to zero.
BPS - 5th Ed.
Chapter 3
Density Curves
Lets look at those two properties and what they
give us.
1. The total area under the graph of the function
is equal to 1 (i.e. the total probability is 1)
That will let us determine the probability
(proportion) a continuous random variable
takes on a value between two numbers.
The probability the variable takes on values
between the two numbers of interest will be
the area under the curve.
2. The function is always greater than or equal to
zero, ensures that we have a something called
a probability distribution (to be studied later).
BPS - 5th Ed.
Chapter 3
Density Curves
Example: the areas of
the shaded bars in this
histogram represent the
proportion/probability of
scores in the observed
data that are less than
or equal to 6.0. This
probability is equal to
0.303.
Chapter 3
Density Curves
Example: now the area
under the smooth curve
to the left of 6.0 is
shaded. If the scale is
adjusted so the total
area under the curve is
exactly 1, then this
curve is called a density
curve. The
probability/proportion of
scores to the left of 6.0
is now equal to 0.293.
BPS - 5th Ed.
Chapter 3
Chapter 3
Likelihood Interpretation
Probability Density Function
The probability
of being
between 4 and 8.
More likely
values
BPS - 5th Ed.
Less likely
values
Density Curves
The median of a density curve is the
equal-areas point, the point that divides
the area under the curve in half
The mean of a density curve is the
balance point, at which the curve would
balance if made of solid material
The mean and the median are the same
for a symmetric density curve. They both
lie at the center. The mean of a skewed
curve is pulled away from the median in
the dirction of the long tail.
BPS - 5th Ed.
Chapter 3
10
BPS
- 5th
Ed.
BPS
- 5th
Ed.
Chapter
Chapter
3
3
11
respectively.
BPS - 5th Ed.
Chapter 3
12
Chapter 3
13
Chapter 3
14
Chapter 3
15
Chapter 3
16
Chapter 3
17
0.15%
2.5%
16%
16 2.5 = 13.5
68/2 = 34
Chapter 3
18
Question
Data sets consisting of physical measurements
(heights, weights, lengths of bones, and so on) for
adults of the same species and sex tend to follow
a similar pattern. The pattern is that most
individuals are clumped around the average, with
numbers decreasing the farther values are from
the average in either direction. Describe what
shape a histogram (or density curve) of such
measurements would have.
Chapter 3
19
Chapter 3
20
Chapter 3
21
Chapter 3
22
Chapter 3
23
Empirical Rule
The Empirical Rule is true for the Normal Distribution:
Approximately 68%
(exactly 68.26%)
of the values lie between
( ) and ( + ).
Approximately 95%
(exactly 95.44%)
of the values lie between
( 2) and ( + 2).
Approximately 99.7%
(exactly 99.74%)
of the values lie between
( 3) and ( + 3).
Chapter 3
24
Approximation to Histogram
When we collect data on a continuous
variable, we can draw a histogram to
summarize its distribution.
However, using histograms has several
drawbacks:
Histograms are based on classes, i.e.,
grouped values of the variable, so there are
always grouping "errors".
It is difficult to make detailed calculations.
Instead of using a histogram, we can use a
probability density function that is an
approximation of the histogram.
BPS - 5th Ed.
Chapter 3
25
Normal
Approximation
Chapter 3
26
Approximation to Histogram
(cont.)
When we model a relative frequency
distribution with a normal probability
distribution, we use the area under the
normal curve to:
Approximate the areas of the bars in the
histogram being modeled.
Approximate proportions that are too detailed to
be computed from just the histogram.
Chapter 3
27
Chapter 3
standard deviation
28
[ - = 70.0 - 2.8]
+ 2.8]
95%
[ + = 70.0
Chapter 3
29
16%
?
-1
+1
? = 84%
70
Chapter 3
72.8
(height values)
30
?
68 70
(height values)
Chapter 3
31
Chapter 3
32
Chapter 3
33
specified by
their means andChapter
standard
deviations.
3
34
Chapter 3
35
Chapter 3
36
Chapter 3
37
Chapter 3
38
Enter
Part of Table A
Enter
Chapter 3
Read
39
Enter
Enter
Chapter 3
40
Chapter 3
41
Included
too much
42
Chapter 3
43
Caution!
When the Z-score is off the standard normal table:
State the area under the standard normal curve to the left of Z =
3.50 (or the right of Z = 3.50) as < 0.0002 (not 0).
P(Z z) < 0.0002 for any z 3.50
P(Z z) < 0.0002 for any z 3.50
State the area under the standard normal curve to the left of Z =
3.50 (or the right of Z = 3.50) as > 0.9998 (not 1).
P(Z z) > 0.9998 for any z 3.50
P(Z z) > 0.9998 for any z 3.50
Chapter 3
45
x
z
Chapter 3
46
Chapter 3
47
Standardized Scores
Lets look back at our Health and Nutrition
Examination Study of 1976-1980 where the
mean was 70 and the standard deviation of
this normally distributed variable was 2.8.
How many standard deviations is 68 from
70?
standardized score =
(observed value minus mean) / (std dev)
Chapter 3
48
?
68 70 (height values)
-0.71 0 (standardized values)
Chapter 3
49
Table A:
Standard Normal Probabilities
See pages 690-691 in text for Table A.
(the Standard Normal Table)
Look up the closest standardized score (z)
in the table.
Find the probability (area) to the left of
the standardized score.
Chapter 3
50
Table A:
Standard Normal Probabilities
z
.00
.01
.02
0.8
.2119
.2090
.2061
0.7
.2420
.2389
.2358
0.6
.2743
.2709
.2676
Chapter 3
51
.2389
68
-0.71
70 (height values)
0
Chapter 3
(standardized values)
52
.2389
68
-0.71
1.2389
= .7611
70 (height values)
0
Chapter 3
(standardized values)
53
? 70
BPS - 5th Ed.
Chapter 3
(height values)
54
Table A:
Standard Normal Probabilities
z
.07
.08
.09
1.3
.0853
.0838
.0823
1.2
.1020
.1003
.0985
1.1
.1210
.1190
.1170
Chapter 3
55
.10
? 70
-1.28
BPS - 5th Ed.
0
Chapter 3
(height values)
(standardized values)
56
x
z
x z
observed value =
mean plus [(standardized score) (std dev)]
Chapter 3
57
Chapter 3
58