Picturing Distributions With Graphs: 1 BPS - 5th Ed

Chapter 1
Picturing Distributions with

Graphs
BPS - 5th Ed.
Chapter 1
Statistics
Statistics is a science that involves the extraction of
information from numerical data obtained during an
experiment or from a sample. It involves the design
of the experiment or sampling procedure, the
collection and analysis of the data, and making
inferences (statements) about the population based
upon information in a sample.
BPS - 5th Ed.
Chapter 1
STATISTICS
So put another way
Statistics is the Science of Reasoning
with Data. It is the science of
collecting, organizing, summarizing,
and analyzing information to draw
conclusions or answer questions.
BPS - 5th Ed.
Chapter 1
Individuals and Variables

Individuals
the objects described by a set of data
may be people, animals, or things
Variable
any characteristic of an individual
can take different values for different
individuals
BPS - 5th Ed.
Chapter 1
S A S I n s t it u t e I n c . is g a t h e r in g d a ta r e la t e d t o t h e a t te n d e e s o f t h is c o u r s e
p r o v id e t h e f o llo w in g in f o r m a tio n . T h is d a ta is b e in g g a th e r e d f o r p u r p o s e
c e r ta in c o n c e p t s w ith in t h is c o u r s e a n d w ill n o t b e u s e d f o r a n y o th e r p u r p
Data
Statistics is the
Science of
Reasoning with
Data
_ _ _ _ _ _ _ _ _ _ _ F ir s t a n d L a s t I n itia ls
___________ G ender
F
F e m a le
M a le
_ _ _ _ _ _ _ _ _ _ _ S ta t e o f B ir t h ( 2 - c h a r a c t e r a b b r e v ia tio n )
_ _ _ _ _ _ _ _ _ _ _ N u m b e r o f Y e a r s w it h P r e s e n t E m p lo y e r
_ _ _ _ _ _ _ _ _ _ _ P r o f e s s io n
So what is Data?
Data is the Collection of
values of a set of
characteristics for a set
of individuals, collected
for a particular study.
BPS - 5th Ed.
Chapter 1
E x e c u tiv e
F in a n c e /A c c o u n tin g
B u s in e s s A n a ly s t
E n g in e e r in g
S a le s / M a r k e t in g
P r o d u c tio n
R &D
S ta t is t ic ia n
In f o r m a tio n S y s t e m s
C o n s u lta n t
T r a in e r /E d u c a to r
F u ll- t im e S t u d e n t
O th e r
S A S a n d a ll o t h e r S A S I n s t it u t e I n c . p r o d u c t o r s e r v ic e n a m e s a r e r e g is t e r e d t r a d e m a r k s o f S A S I n s t it u t e I n c .
Components of a Data Set

Data Set (or Table): Collection of
all observations (i.e., values of all
variables) for a set of individuals,
collected for a particular study.
Individual (Object or Subject

or Experimental Unit): Entity
on which information is
collected.
Variable: Measurement or
characteristic of interest for an
individual. (For example:
name, address, customer
number, and so on.)
Observation: Set of all
measurements (i.e., variable
values) collected from a single
individual.
Row: Single
observation
JM
TM
GS
DW
BW
.
.
M
M
F
F
F
.
.
GA
CT
FL
AL
NC
.
.
17
5
2
10
12
.
.
F
E
A
D
B
.
.
Column: Single variable

BPS - 5th Ed.
Chapter 1
What does Statistics

Involve?
Statistics involves: Collecting, Organizing, Summarizing, and
Analyzing information to draw conclusions or answer questions.
A population:
Is the group to be studied.
Includes all of the individuals in the group.
A sample:
Is a subset of the population.
Is often used in analyses because getting access to the
entire population is impractical.
Descriptive Statistics: Statements about the Sample.
Organizing and summarizing the information collected.
Inferential Statistics: Generalization of sample results to
statements about the Population.
BPS - 5th Ed.
Chapter 1
Process of Statistics
Process of statistics consists of 4 steps:

Step 1: Identify the research objective.
What questions are to be answered?
What group should be studied?
Step 2: Collect the information needed.

Can you access the entire population?
How can you collect a good sample?
What other methods are available and appropriate?
Step 3: Organize and summarize the information.

Descriptive statistics
Visual methods such as charts and graphs.
Numeric methods such as calculations.
Step 4: Draw conclusions from the information.

Inferential statistics
Various methods that are appropriate for different
questions and different types of data sets.
BPS - 5th Ed.
Chapter 1
Qualitative vs. Quantitative

Variables
Categorical or Qualitative
Variable whose values are attributes or characteristics.
Allows researchers to categorize the individual.
Quantitative
Variable whose values are numerical measures that
arithmetic operations can meaningfully be performed
on. Allows researchers to summarize via averages.
Discrete variables - Variables that have a finite or a
countable number of possibilities. Often these
variables are counts.
Continuous variables - Variables that have an
infinite, not countable, number of possibilities.
Often these variables are measurements.
BPS - 5th Ed.
Chapter 1
Descriptive
Statistics
Descriptive Statistics
After we collect the raw data (from a sample survey or
a designed experiment), we can:
Describe the data using visual methods (Chapter 1)
Describe the data using numeric methods (Chapter
2)
Different methods are appropriate for different types
of data.
Descriptive Statistics: Statements about the Sample
Focus is on interpreting and presenting the data that has been
collected via the sample.
No attempt is made to generalize to other (larger) groups, such
as the population.
BPS - 5th Ed.
Chapter 1
10
Case Study
The Effect of Hypnosis
on the
Immune System
reported in Science News, Sept. 4, 1993, p. 153
BPS - 5th Ed.
Chapter 1
11
Case Study
The Effect of Hypnosis
on the
Immune System
Objective:
To determine if hypnosis strengthens the
disease-fighting capacity of immune cells.
BPS - 5th Ed.
Chapter 1
12
Case Study
65 college students.
33 easily hypnotized
32 not easily hypnotized
white blood cell counts measured

all students viewed a brief video
about the immune system.
BPS - 5th Ed.
Chapter 1
13
Case Study
Students randomly assigned to one
of three conditions
subjects hypnotized, given mental
exercise
subjects relaxed in sensory deprivation
tank
control group (no treatment)
BPS - 5th Ed.
Chapter 1
14
Case Study
white blood cell counts re-measured after
one week
the two white blood cell counts are
compared for each group
results
hypnotized group showed larger jump in white
blood cells
easily hypnotized group showed largest
immune enhancement
BPS - 5th Ed.
Chapter 1
15
Case Study
Variables measured
categorical
quantitative
BPS - 5th Ed.
Easy or difficult to achieve

hypnotic trance
Group assignment
Pre-study white blood cell
count
Post-study white blood cell
count
Chapter 1
16
Case Study
Weight Gain Spells
Heart Risk for Women
Weight, weight change, and coronary heart disease
in women. W.C. Willett, et. al., vol. 273(6), Journal
of the American Medical Association, Feb. 8, 1995.
(Reported in Science News, Feb. 4, 1995, p. 108)
BPS - 5th Ed.
Chapter 1
17
Case Study
Weight Gain Spells
Heart Risk for Women
Objective:
To recommend a range of body mass index
(a function of weight and height) in terms of
coronary heart disease (CHD) risk in women.
BPS - 5th Ed.
Chapter 1
18
Case Study
Study started in 1976 with 115,818
women aged 30 to 55 years and
without a history of previous CHD.
Each womans weight (body mass)
was determined.
Each woman was asked her weight at
age 18.
BPS - 5th Ed.
Chapter 1
19
Case Study
The cohort of women were followed
for 14 years.
The number of CHD (fatal and
nonfatal) cases were counted (1292
cases).
BPS - 5th Ed.
Chapter 1
20
Case Study
Variables measured
quantitative
categorical
BPS - 5th Ed.
Age (in 1976)

Weight in 1976
Weight at age 18
Incidence of coronary heart
disease
Smoker or nonsmoker
Family history of heart disease
Chapter 1
21
Distribution
Tells what values a variable takes
and how often it takes these values
Can be a table, graph, or function
BPS - 5th Ed.
Chapter 1
22
Some Ways of Displaying

Distributions
Categorical variables
Pie charts
Bar graphs
Quantitative variables
Histograms
Stemplots (stem-and-leaf plots)
BPS - 5th Ed.
Chapter 1
23
Examples of Displaying
Data
The next 6 slides will show different
ways to display the same
information.
The first 3 slides show different ways
to display information on the Class
Make-up on the First Day of Class
The next 3 slides display information
on U.S. solid waste.
BPS - 5th Ed.
Chapter 1
24
Class Make-up on First Day

Data Table
Year
Count
Percent
Freshman
18
41.9%
Sophomore
10
23.3%
Junior
14.0%
Senior
20.9%
Total
43
100.1%
Frequency and Relative Frequency Distribution
BPS - 5th Ed.
Chapter 1
25

Pie Chart
BPS - 5th Ed.
Chapter 1
26

Bar Graph
BPS - 5th Ed.
Chapter 1
27
Example: U.S. Solid Waste

(2000)
Data Table
Material
Weight (million tons)
Percent of total
Food scraps
25.9
11.2 %
Glass
12.8
5.5 %
Metals
18.0
7.8 %
Paper, paperboard
86.7
37.4 %
Plastics
24.7
10.7 %
Rubber, leather, textiles
15.8
6.8 %
Wood
12.7
5.5 %
Yard trimmings
27.7
11.9 %
Other
7.5
3.2 %
Total
231.9
100.0 %
BPS - 5th Ed.
Chapter 1
28

(2000)
Pie Chart
BPS - 5th Ed.
Chapter 1
29

(2000)
Bar Graph
BPS - 5th Ed.
Chapter 1
30
Organizing Qualitative Data

Organize qualitative data in tables.
Construct bar graphs.
Construct pie charts.
Reminder: Raw qualitative data comes as a list of

values each value is one out of a set of categories.
Allows researcher to categorize or classify individuals
into groups.
BPS - 5th Ed.
Chapter 1
31
Frequency/Relative
Frequency
A frequency distribution
lists:
Each of the categories.
The frequency, or the count, of the observations that
belong to each category.
Frequency
Counts
A relative frequency distribution lists:

Each of the categories.
The relative frequency, or the proportions (or
percents), of the observations out of the total that
belong toRelative
each category.
Proportions
Frequency
BPS - 5th Ed.
(or Percents)
Chapter 1
32
Example
Consider the following data set:
blue, blue, green, red, red, blue, red, blue
The frequency distribution for this qualitative
data is:
Color
Frequency
Blue
Green
Red
Frequency
Table
The most commonly occurring color is ______

BPS - 5th Ed.
Chapter 1
33
Example
Consider the following data set:
blue, blue, green, red, red, blue, red, blue
The frequency distribution for this
qualitative data
is:
Color
Frequency
Blue
Green
Red
Frequency
Table
The most commonly occurring color is Blue

BPS - 5th Ed.
Chapter 1
34
Example (cont.)
The relative frequency distribution is computed as
follows:
Sum of all frequencies = _____
Blue has a relative frequency of __________________
Green has a relative frequency of _________________
Percent
Red Proportion
has a relative frequency of ___________________
Color
Relative
Frequency
Color
Relative
Frequency
Blue
Blue
Green
Green
Red
Red
BPS - 5th Ed.
Relative Frequency
Table
Chapter 1
35
Example (cont.)
The relative frequency distribution is computed as
follows:
Sum of all frequencies = 8
Blue has a relative frequency of 4/8 or or 0.5 or
50%
Green
has a relative frequency of Percent
1/8 or 0.125 or
Proportion
12.5%
Color
Relative
Color
Relative
Red has aFrequency
relative frequency of 3/8 or Frequency
0.375 or
Blue
4/8 or 1/2
Blue
50%
37.5%
Green
1/8
Green
12.5%
Red
3/8
Red
37.5%
BPS - 5th Ed.
Relative Frequency
Table
Chapter 1
36
A bar graph:
Bar Graphs
Relative Frequency
Lists the categories on the horizontal axis.

Draws rectangles above each category where the
heights are equal to the categorys frequency or
relative frequency.
A bar graph is a "picture" of a frequency/relative

BPS - 5th Ed.
frequency tableChapter
for 1qualitative data.
37
Bar
Graphs
(cont.)
Good practices in constructing bar graphs:
The horizontal scale:
The categories should be equally spaced.
The rectangles should have the same widths and have some space
between them.
The qualitative variable associated with the categories should be
identified via a meaningful label.
The vertical scale should:

Begin with 0.
Be incremented in reasonable steps.
Go somewhat, but not significantly, beyond the largest frequency or
relative frequency.
Be labeled in such a way that it is clear whether it is a frequency or
relative frequency graph.
Overall Graph: Meaningful title identifying qualitative variable

whose distribution is being graphed.
BPS - 5th Ed.
Chapter 1
38
Blood Type Example (MINITAB 1 in Blackboard in the

MINITAB Examples folder)
A phlebotomist draws the blood of a

random sample of 50 patience and
determines their blood types (data
found in MINITAB 1 Data)
We will use MINITAB to create some
graphs and charts.
BPS - 5th Ed.
Chapter 1
39
Blood Type Example (MINITAB 1 in

Blackboard in the MINITAB Examples folder)
Meaningful Titles and Labels: Substitute the correct words (i.e.,

appropriate for the problem being solved) for the items in
italics.
Title: Distribution of Name of Variable for Describe the Items
for our blood example: Distribution of Blood Types for the 50 Patients
Horizontal Axis Label: Description of Variable"

for our blood example: Blood Type
Vertical Axis Label

Frequency: Number of Describe the Items
for our blood example: Number of Patients
Relative Frequency: Proportion of Describe the Items

or
Percent of Describe the Items
for our blood example: Proportion of Patients
or
BPS - 5th Ed.
Percent
Chapter 1 of Patients
40
Blood Type Example (cont.)

To open a worksheet:
File > Open Worksheet > Select file (.mtw or .mpj)
Session Window
MTB > tally c1;
SUBC> counts;
SUBC> percents.
Tally for Discrete
Variables: Blood Type
Filled in by opening worksheet.

But in Blackboard you will just
click on the file and it will open in
Blackboard.
BPS - 5th Ed.
Chapter 1
Blood
Type
A
AB
B
O
N=
Count
18
4
6
22
50
Percent
36.00
8.00
12.00
44.00
41

Session Window
MTB >
SUBC>
SUBC>
SUBC>
Data must be loaded into column c1 before issuing the commands.
chart c1;
title "Distribution of Blood Types for the 50 Patients";
percent;
bar.
Distribution of Blood Types for the 50 Patients
In MINITAB,
double-click
any title or label
to change.
40
Percent
Default label in
MINITAB is Percent
instead of Relative
Frequency
50
30
20
10
AB
Blood Type
Percent within all data.
BPS - 5th Ed.
Chapter 1
42

Session Window
chart c1;
decreasing;
bar.
Default label in
MINITAB is Count
instead of Frequency
25
20
Count
MTB >
SUBC>
SUBC>
SUBC>
15
10
Categories arranged
so bars are decreasing
in height.
AB
Blood Type
Pareto Chart: A Bar Graph whose bars are drawn in decreasing order of
frequency or relative frequency.
BPS - 5th Ed.
Chapter 1
43
MINITAB Frequency Table
1. Enable Commands from the Editor menu item.

2. Open the MINITAB file (i.e., worksheet) containing the data or enter the
data into a column in the worksheet.
3. Type the following command and subcommands:
MTB > tally cy;

SUBC> counts;
SUBC> percents.
Note: Substitute the correct

numbers from specific problem
for the items in italics.
where:
cy is the column containing the data
a) Subcommands in MINITAB sometimes end with a semicolon

and sometimes with a period.
b) To enter subcommands, end command with a semicolon.
BPS - 5th Ed.
Chapter 1
44
MINITAB Bar Graph

2. Open the MINITAB file (i.e., worksheet) containing the data
or enter the data into a column in the worksheet.
MTB > chart cy;
SUBC> title Meaningful Title;
SUBC> percent;
words/numbers from specific
problem for the items in italics.
SUBC> bar.
where:
a) Remember meaningful title and labels; can double-click any title or label
to change it.
b) If you leave out the percent subcommand, MINITAB produces a
frequency bar graph; vertical axis labeled count instead of frequency.
c) If add the decreasing subcommand, MINITAB produces a Pareto chart.
BPS - 5th Ed.
Chapter 1
45
Side-By-Side Bar Graph (p. 59)

A side-by-side bar graph draws multiple rectangles for each
category, one for each set of data. The frequencies (or relative
frequencies) for each category can then easily be compared.
Side-by-side bar graph

comparing educational
attainment in 1990 versus 2003.
Not covered , but good to

know.
BPS - 5th Ed.
Chapter 1
46
Pie Charts
A pie chart is a circle divided into sections, one for each
category.
The area (angle) of each sector is proportional to the
frequency/relative frequency of that category.
Pie charts are useful for showing the relative proportions of each
category, compared to the whole.
A pie chart is a "picture" of a relative

frequency table for qualitative data.
BPS - 5th Ed.
Chapter 1
47
Pie Charts (cont.)
Good practices in constructing pie charts:

Category Slices:
Different colors (or different patterns) should be

used to distinguish the categories.
Each category should be labeled with the
category name and relative frequency (typically
as a percent).
Overall Graph: Meaningful title identifying qualitative variable
whose distribution is being graphed. (Same title as for a bar
graph.)
Not effective if there are too many categories or if some relative
frequencies are too small. May need to combine categories.
BPS - 5th Ed.
Chapter 1
48
Pie Chart: Manual Solution
Construct a relative frequency distribution first:

List all categories of data.
Obtain frequencies.
Calculate relative frequencies.
Calculate the degree measure of each sector by multiplying
each relative frequency by 360 degrees.
Construct a pie chart by hand using a protractor.
Blood Type Example

Blood Type
A
Frequency
18
AB
22
Total:
50
BPS - 5th Ed.
From slide with tally of blood

type
Relative
Frequency
Degree Measure
1.000
360
Chapter 1
49
Pie Chart: Manual Solution
Construct a relative frequency distribution first:

List all categories of data.
Obtain frequencies.
Calculate relative frequencies.
Calculate the degree measure of each sector by multiplying
each relative frequency by 360 degrees.
Construct a pie chart by hand using a protractor.
Blood Type Example
From slide with tally of blood

type
Blood Type
A
Frequency
18
Relative
Frequency
0.36
AB
0.08
360*.08=28.8, 29
0.12
360*0.12=43.2, 43
22
0.44
360*0.44=158.4, 158
Total:
50
1.000
360
BPS - 5th Ed.
Chapter 1
Degree Measure
360*0.36=129.6,130
50
Blood Type Example

Session Window
MTB >
SUBC>
SUBC>
SUBC>
piechart c1;
slabel;
pcategory;
SUBC> percent

Category
A
AB
B
O
A
36.0%
O
44.0%
In MINITAB,
double-click
title to change.
AB
8.0%
B
12.0%
BPS - 5th Ed.
Chapter 1
51
MINITAB Pie Chart

2. Open the MINITAB file (i.e., worksheet) containing the data
or enter the data into a column in the worksheet.
MTB > piechart cy;
words/numbers from specific
SUBC> title Meaningful Title;
problem for the items in italics.
SUBC> slabel;
SUBC> pcategory;
SUBC> percent.
where:
Remember meaningful title; can double-click title to change it.
BPS - 5th Ed.
Chapter 1
52
Categorical Data (Summary)

Qualitative data can be organized in
several ways:
Tables are useful for listing the data, its
frequencies, and its relative frequencies.
Charts such as bar graphs, Pareto charts,
and pie charts are useful visual methods
for organizing data.
Side-by-side bar graphs are useful for
comparing two or more sets of qualitative
data.
BPS - 5th Ed.
Chapter 1
53
Organizing Quantitative Data

Organizing Quantitative Data: The Popular Displays
Organize discrete data in tables and histograms.

Organize continuous data in tables and histograms.
Draw stem-and-leaf plots.
Identify the shape of a distribution.
Draw time-series plots.
Reminder: Raw quantitative data comes as a list of numeric

values each value is a count or measurement, either
discrete or continuous.
Comparisons (one value being more than or less than another)
can be performed on the data values.
Mathematical operations (addition, subtraction, ) can be
performed on the data values.
BPS - 5th Ed.
Chapter 1
54
Discrete
Quantitative
Data
Discrete quantitative data can be presented in tables
and bar graphs in several of the same ways as
qualitative data.
Frequency/Relative Frequency Distribution
Values listed in a table - use the discrete values instead
of the category names.
List frequencies or relative frequencies.
Histogram (Bar Graph for Discrete Data)

Use the discrete values instead of the category names
and arrange the values in ascending order.
Unlike a bar graph for qualitative data, no space is left
between the bars and the width of the bars have
meaning.
BPS - 5th Ed.
Chapter 1
55
Wendy's Example
Frequency and relative frequency distributions:
MTB > tally c1;
SUBC> counts;
SUBC> percents.
Data must be loaded

into column c1 before
issuing the commands.
Tally for Discrete Variables: Num

Num
1
2
3
4
5
6
7
8
9
11
N=
a)
b)
c)
d)
Count
1
6
1
4
7
11
5
2
2
1
40
Percent
2.50
15.00
2.50
10.00
17.50
27.50
12.50
5.00
5.00
2.50
What is the most frequently occurring number of customers?

What percentage of the intervals have fewer than 3 customers?
What proportion had 9 customers?
How many had between 4 and 8 customers?
BPS - 5th Ed.
Chapter 1
56
Wendy's Example
Histograms for discrete data:
Proportion of Intervals
Number of Intervals
Can MINITAB be used to make histograms for discrete data?

BPS - 5th Ed.
Chapter 1
57
Continuous Quantitative
Continuous data cannot be
put directly into frequency
Data
tables or displayed in histograms because continuous
data do not have any obvious categories.
Categories are created using classes, or intervals of
numbers. (No predefined categories.)
The continuous data is then grouped into the classes.
Just as for discrete data:
The classes and the number (or proportion) of values in
each can be put into a table to form a frequency (or
relative frequency) distribution.
A histogram can be created from the frequency/relative
frequency distribution.
Replace categories (qualitative data) or

individual values (discrete quantitative
data) with intervals of numbers, called
classes.
BPS - 5th Ed.
Chapter 1
For discrete data with

many different values,
sometimes also use
classes.
58
Classes
For ages of adults, a possible set of classes is:

20 29
30 39
40 49
50 59
60 and older
Definitions:
Lower and Upper Class Limits

For the class 30 39:
30 is the lower class limit
39 is the upper class limit
Smallest value in class.

(cutpoints)
Largest value in class.
Open-Ended Class: Class with either no lower limit or

no upper limit.
The class 60 and older has no upper limit.
BPS - 5th Ed.
Chapter 1
59
Classes (cont.)
Class Width: Difference between consecutive lower class
limits (with the exception of open-ended classes).
For the class 30 39, the class width = 40 30 = 10.
(All the classes (20 29, 30 39, 40 49, 50 59) have the same widths.)
IMPORTANT: The class width is NOT the difference between upper

and lower class limits for a single class.
The class 30 39 years old actually is 30 years to 39 years 364 days old
or 30 years to just less than 40 years old.
The class width is 10 years, all adults in their 30s.
Class Midpoint: Sum of consecutive lower class limits divided

by 2.
For the class 30 39, the class midpoint = (30 + 40)/2 = 35.
BPS - 5th Ed.
Chapter 1
60
Frequency Tables
The classes and the number of values in each can be put
into a table (called a frequency table):
1.
2.
3.
4.
5.
Select number of classes desired.

Choose class width.
Choose class limits.
Count number of measurements per class.
List the classes and frequencies (or proportions or percents)
in a table.
In this data set, there are
Age
Number
1147 subjects between 30
20 29
533
and 39 years old.
30 39
1147
40 49
1090
50 59
493
60 and older
110
BPS - 5th Ed.
Chapter 1
61
Frequency Tables (cont.)

Good practices for constructing tables for continuous
variables:
There is no unique "best" way of selecting classes! But, the
classes should:
Not overlap.
Not have any gaps between them.
Have the same width (except for possible open-ended classes at
the extreme low or extreme high ends).
Cover the range of the data.
The lower class limits should be reasonable numbers.

The class width should be a reasonable number.
Typically 5-20 classes, with larger number of classes used with
larger data sets.
Select classes to provide a meaningful overall summary of the data:
Too few classes causes data to bunch.
Too many spread the data out so far that hard to detect patterns.
BPS - 5th Ed.
Chapter 1
62
Histograms
A histogram is a "picture" of a frequency/relative frequency table for
quantitative data.
Classes
0-1.99
2-3.99
...
12-13.99
14-15.99
To construct a histogram:
1.
2.
3.
4.
5.
Construct the frequency or relative frequency table desired.

Place the variable of interest on the horizontal axis.
Place the lower class limits for each interval on the axis.
Draw a rectangle above each interval.
The height of each rectangle is proportional to the frequency or
relative frequency for that class.
BPS - 5th Ed.
Chapter 1
63
Histograms (cont.)
Important points of histogram construction:
Plot and label only the lower class limits, in between the
bars.
Provide a descriptive title.
Generic title: Distribution of Name of the Variable for
Describe the Items.
(You must substitute the correct words for the problem at
hand for the items in italics.)
Label the horizontal axis.

Generic label: Name of the Variable (in Units)
(You must substitute the correct words for the problem at
hand for the items in italics. YOU MUST INCLUDE UNITS!)
Label the vertical axis as for a bar graph.

BPS - 5th Ed.
Chapter 1
64
Stock Example (MINITAB 2 in MINITAB

Examples folder on Blackboard)
Session Window
MTB > hist c1;

SUBC> title "Distribution of Daily Stock Volume for 35 Trading Days";
SUBC> axlabel 1 "Volume (in millions)".
Minimum = 3.01
Maximum = 10.96
MTB > desc c1

Distribution of Daily Stock Volume for 35 Trading Days
In MINITAB, doubleclick any title or

label to change.
7
6
Default label in
MINITAB is
Frequency
Frequency
5
4
3
2
Problems with labeling of

class limits? (And for that
matter, what are the classes
that MINITAB used?)
BPS - 5th Ed.
1
0
Chapter 1
6
8
Volume (in millions)
10
65
Stock Example (cont.)

Double-click label; opens this
dialog box.

7
6
Frequency
5
4
3
2
1
0
6
8
10

7
Number of Days
6
5
4
3
2
1
0
BPS - 5th Ed.
Chapter 1
6
8
10
66
Stock
Example
(cont.)
Double-click any tick mark label;
opens this dialog box.
SUBC> cutpoint 3:11/1

BPS - 5th Ed.
Chapter 1
67
Stock Example (cont.)

9
8
Number of Days
7
6
5
4
3
2
1
0
6
8
10

9
Double-click x-axis
or tick mark label;
opens this dialog
box.
BPS - 5th Ed.
Number of Days
7
6
5
4
3
2
1
0
Chapter 1
6
7
8
10
11
68
MINITAB Histogram

2. Open the MINITAB file (i.e., worksheet) containing the data or
enter the data into a column in the worksheet.
MTB > hist cy;
SUBC> title Distribution of Name of the Variable
Note: Substitute
for Describe the Items ";
the correct
words/numbers
SUBC> axlabel 1 Name of the Variable (in units);
from specific
SUBC> cutpoint limit_1:limit_2/class_width.
problem for the
items in italics.
where:
cy is the column containing the data.
limit_1 is the lower limit of the first class.
limit_2 is the next lower class limit if one more class were to
exist (or the last lower class limit plus class width).
class_width is the class width.
Use percent subcommand for relative frequency histogram.
BPS - 5th Ed.
Chapter 1
69
MINITAB Histogram
To make a histogram with
tick marks you define, can use
(cont.)
the subcommand cutpoint.
However, to use the cutpoint subcommand, you must figure
out values for limit_1, limit_2, and class_width; MINITAB
does not compute these for you. (Use DESCRIBE command
to find sample size, minimum, and maximum.)
BPS - 5th Ed.
Chapter 1
70
Additional Example
a) Determine the class width.
b) Identify the classes.
c) Which class has the highest
frequency?
d) What MINITAB "cutpoint"
subcommand would you use
to generate this histogram?
e) Approximately how many
states had between 200 and
399 traffic fatalities?
f) Approximately what percent
of states had less than 200
traffic fatalities?
BPS - 5th Ed.
Chapter 1
71
Additional Example
a) Determine the class width. 200
b) Identify the classes. 0-199,
200-399,400-599,600-799,
800-999,1000-1199,1200-1399,
1400-1599,1600-1799
c) Which class has the highest
frequency? 0-200
d) What MINITAB "cutpoint"
subcommand would you use
to generate this histogram?
SUBC> cutpoint 0:1800/200
e) Approximately how many
states had between 200 and
399 traffic fatalities? 15
f) Approximately what percent of
states had less than 200 traffic
fatalities? 20/50= 40%
BPS - 5th Ed.
Chapter 1
72
Case Study
Weight Data
Introductory Statistics class
Spring, 1997
Virginia Commonwealth University
BPS - 5th Ed.
Chapter 1
73
Weight Data
BPS - 5th Ed.
Chapter 1
74
Weight Data: Frequency

Table
sqrt(53) = 7.2, or 8 intervals; range (260100=160) / 8 = 20 = class width
BPS - 5th Ed.
Chapter 1
75
Number of students
Weight Data: Histogram
100
120 140
160
180 200
Weight
220 240
260
280
* Left endpoint is included in the group, right endpoint is not.
BPS - 5th Ed.
Chapter 1
76
Stem-and-Leaf Plot
A stem-and-leaf plot is a different way to represent
quantitative data that is similar to a histogram.
To draw a stem-and-leaf plot, each data value must be
broken up into two components. In the simplest
scenario:
The stem consists of all the digits except for the
right-most one.
The leaf consists of the right-most digit.
Example: For the number 173, the stem would be 17

|3
and the leaf Stem
would be17
3
Leaf
BPS - 5th Ed.
Chapter 1
77
Stem-and-Leaf Plot (cont.)

Decimal Point?
The smallest value is 5.6.
Raw Data
5.6 6.5 7.3 7.8 7.8
. . . . . . . . . .
16.8 17.0 17.6 17.8 18.0
To draw a stem-and-leaf plot:

1. Write all the values in ascending order.
(optional)
2. Find the stems and write them vertically in
ascending order. (First stem is stem for
minimum; last stem corresponds to
maximum. Must include all "stems" in
between.)
3. Draw a vertical line to the right of the stems.
4. For each data value, write its leaf in the row
next to its stem.
5. The resulting leaves will also be in ascending
order (or arrange them in ascending order).
The largest value is 18.0

The second largest value is 17.8
BPS - 5th Ed.
The list of stems with their corresponding

leaves is the stem-and-leaf plot.
Chapter 1
78
Stem-and-Leaf Plot (cont.)

To read a stem-and-leaf plot:
Read the stem first.
Attach the leaf as the last digit of the stem.
The result is the original data value (after placement of the
decimal point in some cases).
Stem-and-leaf plots display the same visual patterns as

histograms. (Essentially a histogram turned on its side).
Stem
Classes
Leaves
Bars
Advantages:
Contain more information than histograms usually can recover the
"raw" data.
"Quick" way to sort data.
Disadvantages:
Best used only with small data sets.
Histogram more flexible in choice of "classes".
BPS - 5th Ed.
Chapter 1
79
Stem-and-Leaf: Manual
Solution
Construct a stem-and-leaf plot for the following data set:

8.6 11.7 9.4 9.1 10.2 8.1 8.3 11.0 8.8 7.8
Stem
1. Sort numbers into ascending
order.
2. Write the stems vertically in
increasing order.
3. Write the leaves
corresponding to their stems.
Sorted data:
7.8 8.1 8.3 8.6 8.8 9.1 9.4 10.2 11.0 11.7
Chapter 1
Leaf
Include zeros.
Modifying Data
Leaf Unit = 0.10

2
0
09
2
1
5
2
238
7
3
07
10 4
139
13 5
399
19 6
013568
(2) 7
79
19 8
23358
14 9
1388
10 10 0139
6
11 69
4
12 05
2
13 8
1
14 4
BPS - 5th Ed.
MTB > desc c1
Leaf Unit = 0.10
Minimum = 0.05
Maximum = 14.48
1
2
5
7
10
11
19
(2)
19
14
10
6
4
2
1
Truncated
MTB > stem c1
Rounded
Chapter 1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
0
339
17
139
4
00114578
89
23369
1489
1139
79
15
9
5
81
Split Stems
Include
stem even
when no
leaves.
a)
b)
c)
d)
How many values exceed 30?

What percentage of values are 20 or below?
What value occurs most frequently?
What are the minimum and maximum
values?
e) What value roughly divides the data set in
half?
BPS - 5th Ed.
Chapter 1
82
Modifications to Stem-and-Leaf
If we wanted to comparePlots
two sets of data, we could draw
two stem-and-leaf plots using the same stem, with leaves
going left (for one set of data) and right (for the other set).
Compare to a sideby-side bar graph.
There are cases where constructing a descending stemand-leaf plot could also be appropriate (for test scores, for
example).
BPS - 5th Ed.
Chapter 1
83
Case Study
Weight Data
Introductory Statistics class
Spring, 1997
Virginia Commonwealth University
BPS - 5th Ed.
Chapter 1
84
Weight Data
BPS - 5th Ed.
Chapter 1
85
Weight Data:
Stemplot
(Stem & Leaf Plot)
Key
20|3 means
203 pounds
Stems = 10s
Leaves = 1s
BPS - 5th Ed.
Chapter 1
10
11
12 5
13
14 2
15
16
17
18 2
19
20
21
22
23
24
25
26
192
152
135
86
Dot Plot
Number of arrivals at Wendy's Data
A dot plot is a graph where a
dot is placed over the value
each time it is observed.
(Used with discrete data and
small sets of continuous data.)
BPS - 5th Ed.
Chapter 1
87
Examining the Distribution

of Quantitative Data
BPS - 5th Ed.
Overall pattern of graph

Deviations from overall pattern
Shape of the data
Center of the data
Spread of the data (Variation)
Outliers
Chapter 1
88
Identifying Shapes of the Distributions

A useful way to describe a quantitative variable
is by the shape of its distribution.
Some common shapes of distributions are:
Uniform
Symmetric
bell shaped
other symmetric shapes
Asymmetric
right skewed
left skewed
Unimodal, bimodal
BPS - 5th Ed.
Chapter 1
89
Uniform
A variable has a uniform distribution
when:
Each of the values tends to occur with
the same frequency.
The histogram looks flat.
BPS - 5th Ed.
Chapter 1
90
Bell-Shaped
A variable has a bell-shaped (or
mound-shaped) distribution when:
Most of the values fall in the middle.
The frequencies tail off to the left and to
the right.
It is symmetric (i.e., left half mirror
image of right half).
BPS - 5th Ed.
Chapter 1
91
Symmetric
Bell-Shaped
BPS - 5th Ed.
Chapter 1
92
Symmetric
Mound-Shaped
BPS - 5th Ed.
Chapter 1
93
Right-Skewed
A variable has a skewed right
distribution when:
The distribution is not symmetric.
The tail to the right is longer than the
tail to the left.
The arrow from the middle to the long
tail points right.
Right
BPS - 5th Ed.
Chapter 1
94
Asymmetric
Skewed to the Right
BPS - 5th Ed.
Chapter 1
95
Left-Skewed
A variable has a skewed left
distribution when:
The distribution is not symmetric.
The tail to the left is longer than the tail
to the right.
The arrow from the middle to the long
tail points left.
Left
BPS - 5th Ed.
Chapter 1
96
Asymmetric
Skewed to the Left
BPS - 5th Ed.
Chapter 1
97
Summary: Organizing Quantitative Data

Quantitative data can be organized in several ways:
Histograms based on data values are good for discrete data.
Histograms based on classes (intervals) are good for
continuous data.
The shape of a distribution describes a variable
histograms are useful for identifying the shapes.
Labels in "Middle" of Bars

(midpoints)
BPS - 5th Ed.
Labels "In Between" Bars

(cutpoints)
Chapter 1
98
Time Plots
A time plot shows behavior over time.
Time is always on the horizontal axis, and the
variable being measured is on the vertical
axis.
Look for an overall pattern (trend), and
deviations from this trend. Connecting the
data points by lines may emphasize this trend.
Look for patterns that repeat at known regular
intervals (seasonal variations).
BPS - 5th Ed.
Chapter 1
99
Time Plots
Time-Series Data: Variable is measured at different points in
time.
Time-Series Plot: Time-series data (vertical axis) plotted
against time (horizontal axis). Lines are then drawn
connecting the points.
Identify long term trends.
Identify regularly occurring patterns with time (seasonality).
BPS - 5th Ed.
Chapter 1
100

(Fall Semesters: 1985-1993)
Class Make-up On First Day
70%
60%
Percent of Class
That Are Freshman
50%
40%
30%
20%
10%
0%
1985
1986
1987
1988
1989
1990
1991
1992
1993
Year of Fall Semester
BPS - 5th Ed.
Chapter 1
101
Average Tuition (Public vs. Private)
BPS - 5th Ed.
Chapter 1
102
Outliers
Extreme values that fall outside the
overall pattern
May occur naturally
May occur due to error in recording
May occur due to error in measuring
Observational unit may be
fundamentally different
BPS - 5th Ed.
Chapter 1
103

Picturing Distributions With Graphs: 1 BPS - 5th Ed

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Picturing Distributions With Graphs: 1 BPS - 5th Ed

Încărcat de

Drepturi de autor:

Formate disponibile

Chapter 1

Picturing Distributions with

BPS - 5th Ed.

BPS - 5th Ed.

BPS - 5th Ed.

Individuals and Variables

BPS - 5th Ed.

Components of a Data Set

Individual (Object or Subject

Column: Single variable

What does Statistics

Process of statistics consists of 4 steps:

Step 2: Collect the information needed.

Step 3: Organize and summarize the information.

Step 4: Draw conclusions from the information.

Qualitative vs. Quantitative

BPS - 5th Ed.

BPS - 5th Ed.

white blood cell counts measured

BPS - 5th Ed.

BPS - 5th Ed.

BPS - 5th Ed.

BPS - 5th Ed.

Easy or difficult to achieve

BPS - 5th Ed.

BPS - 5th Ed.

BPS - 5th Ed.

BPS - 5th Ed.

BPS - 5th Ed.

Age (in 1976)

BPS - 5th Ed.

Some Ways of Displaying

BPS - 5th Ed.

Class Make-up on First Day

Frequency and Relative Frequency Distribution

BPS - 5th Ed.

Class Make-up on First Day

BPS - 5th Ed.

Class Make-up on First Day

BPS - 5th Ed.

Example: U.S. Solid Waste

Weight (million tons)

Rubber, leather, textiles

BPS - 5th Ed.

Example: U.S. Solid Waste

BPS - 5th Ed.

Example: U.S. Solid Waste

BPS - 5th Ed.

Organizing Qualitative Data

Reminder: Raw qualitative data comes as a list of

A relative frequency distribution lists:

The most commonly occurring color is ______

The most commonly occurring color is Blue

BPS - 5th Ed.

BPS - 5th Ed.

Lists the categories on the horizontal axis.

A bar graph is a "picture" of a frequency/relative

The vertical scale should:

Overall Graph: Meaningful title identifying qualitative variable

BPS - 5th Ed.

Blood Type Example (MINITAB 1 in Blackboard in the

A phlebotomist draws the blood of a