Sunteți pe pagina 1din 103

Chapter 1

Picturing Distributions with


Graphs

BPS - 5th Ed.

Chapter 1

Statistics
Statistics is a science that involves the extraction of
information from numerical data obtained during an
experiment or from a sample. It involves the design
of the experiment or sampling procedure, the
collection and analysis of the data, and making
inferences (statements) about the population based
upon information in a sample.

BPS - 5th Ed.

Chapter 1

STATISTICS
So put another way
Statistics is the Science of Reasoning
with Data. It is the science of
collecting, organizing, summarizing,
and analyzing information to draw
conclusions or answer questions.

BPS - 5th Ed.

Chapter 1

Individuals and Variables


Individuals
the objects described by a set of data
may be people, animals, or things

Variable
any characteristic of an individual
can take different values for different
individuals

BPS - 5th Ed.

Chapter 1

S A S I n s t it u t e I n c . is g a t h e r in g d a ta r e la t e d t o t h e a t te n d e e s o f t h is c o u r s e
p r o v id e t h e f o llo w in g in f o r m a tio n . T h is d a ta is b e in g g a th e r e d f o r p u r p o s e
c e r ta in c o n c e p t s w ith in t h is c o u r s e a n d w ill n o t b e u s e d f o r a n y o th e r p u r p

Data
Statistics is the
Science of
Reasoning with
Data

_ _ _ _ _ _ _ _ _ _ _ F ir s t a n d L a s t I n itia ls
___________ G ender
F

F e m a le

M a le

_ _ _ _ _ _ _ _ _ _ _ S ta t e o f B ir t h ( 2 - c h a r a c t e r a b b r e v ia tio n )
_ _ _ _ _ _ _ _ _ _ _ N u m b e r o f Y e a r s w it h P r e s e n t E m p lo y e r
_ _ _ _ _ _ _ _ _ _ _ P r o f e s s io n

So what is Data?
Data is the Collection of
values of a set of
characteristics for a set
of individuals, collected
for a particular study.
BPS - 5th Ed.

Chapter 1

E x e c u tiv e

F in a n c e /A c c o u n tin g

B u s in e s s A n a ly s t

E n g in e e r in g

S a le s / M a r k e t in g

P r o d u c tio n

R &D

S ta t is t ic ia n

In f o r m a tio n S y s t e m s

C o n s u lta n t

T r a in e r /E d u c a to r

F u ll- t im e S t u d e n t

O th e r

S A S a n d a ll o t h e r S A S I n s t it u t e I n c . p r o d u c t o r s e r v ic e n a m e s a r e r e g is t e r e d t r a d e m a r k s o f S A S I n s t it u t e I n c .

Components of a Data Set


Data Set (or Table): Collection of
all observations (i.e., values of all
variables) for a set of individuals,
collected for a particular study.

Individual (Object or Subject


or Experimental Unit): Entity
on which information is
collected.
Variable: Measurement or
characteristic of interest for an
individual. (For example:
name, address, customer
number, and so on.)
Observation: Set of all
measurements (i.e., variable
values) collected from a single
individual.

Row: Single
observation

JM
TM
GS
DW
BW
.
.

M
M
F
F
F
.
.

GA
CT
FL
AL
NC
.
.

17
5
2
10
12
.
.

F
E
A
D
B
.
.

Column: Single variable


BPS - 5th Ed.

Chapter 1

What does Statistics


Involve?
Statistics involves: Collecting, Organizing, Summarizing, and
Analyzing information to draw conclusions or answer questions.

A population:
Is the group to be studied.
Includes all of the individuals in the group.

A sample:
Is a subset of the population.
Is often used in analyses because getting access to the
entire population is impractical.
Descriptive Statistics: Statements about the Sample.
Organizing and summarizing the information collected.
Inferential Statistics: Generalization of sample results to
statements about the Population.
BPS - 5th Ed.

Chapter 1

Process of Statistics

Process of statistics consists of 4 steps:


Step 1: Identify the research objective.
What questions are to be answered?
What group should be studied?

Step 2: Collect the information needed.


Can you access the entire population?
How can you collect a good sample?
What other methods are available and appropriate?

Step 3: Organize and summarize the information.


Descriptive statistics
Visual methods such as charts and graphs.
Numeric methods such as calculations.

Step 4: Draw conclusions from the information.


Inferential statistics
Various methods that are appropriate for different
questions and different types of data sets.
BPS - 5th Ed.

Chapter 1

Qualitative vs. Quantitative


Variables
Categorical or Qualitative
Variable whose values are attributes or characteristics.
Allows researchers to categorize the individual.
Quantitative
Variable whose values are numerical measures that
arithmetic operations can meaningfully be performed
on. Allows researchers to summarize via averages.
Discrete variables - Variables that have a finite or a
countable number of possibilities. Often these
variables are counts.
Continuous variables - Variables that have an
infinite, not countable, number of possibilities.
Often these variables are measurements.
BPS - 5th Ed.

Chapter 1

Descriptive
Statistics
Descriptive Statistics
After we collect the raw data (from a sample survey or
a designed experiment), we can:
Describe the data using visual methods (Chapter 1)
Describe the data using numeric methods (Chapter
2)
Different methods are appropriate for different types
of data.
Descriptive Statistics: Statements about the Sample
Focus is on interpreting and presenting the data that has been
collected via the sample.
No attempt is made to generalize to other (larger) groups, such
as the population.
BPS - 5th Ed.

Chapter 1

10

Case Study
The Effect of Hypnosis
on the
Immune System
reported in Science News, Sept. 4, 1993, p. 153

BPS - 5th Ed.

Chapter 1

11

Case Study
The Effect of Hypnosis
on the
Immune System
Objective:
To determine if hypnosis strengthens the
disease-fighting capacity of immune cells.

BPS - 5th Ed.

Chapter 1

12

Case Study
65 college students.
33 easily hypnotized
32 not easily hypnotized

white blood cell counts measured


all students viewed a brief video
about the immune system.

BPS - 5th Ed.

Chapter 1

13

Case Study
Students randomly assigned to one
of three conditions
subjects hypnotized, given mental
exercise
subjects relaxed in sensory deprivation
tank
control group (no treatment)

BPS - 5th Ed.

Chapter 1

14

Case Study
white blood cell counts re-measured after
one week
the two white blood cell counts are
compared for each group
results
hypnotized group showed larger jump in white
blood cells
easily hypnotized group showed largest
immune enhancement

BPS - 5th Ed.

Chapter 1

15

Case Study
Variables measured
categorical
quantitative

BPS - 5th Ed.

Easy or difficult to achieve


hypnotic trance
Group assignment
Pre-study white blood cell
count
Post-study white blood cell
count
Chapter 1

16

Case Study
Weight Gain Spells
Heart Risk for Women
Weight, weight change, and coronary heart disease
in women. W.C. Willett, et. al., vol. 273(6), Journal
of the American Medical Association, Feb. 8, 1995.
(Reported in Science News, Feb. 4, 1995, p. 108)

BPS - 5th Ed.

Chapter 1

17

Case Study
Weight Gain Spells
Heart Risk for Women
Objective:
To recommend a range of body mass index
(a function of weight and height) in terms of
coronary heart disease (CHD) risk in women.

BPS - 5th Ed.

Chapter 1

18

Case Study
Study started in 1976 with 115,818
women aged 30 to 55 years and
without a history of previous CHD.
Each womans weight (body mass)
was determined.
Each woman was asked her weight at
age 18.

BPS - 5th Ed.

Chapter 1

19

Case Study
The cohort of women were followed
for 14 years.
The number of CHD (fatal and
nonfatal) cases were counted (1292
cases).

BPS - 5th Ed.

Chapter 1

20

Case Study
Variables measured
quantitative

categorical

BPS - 5th Ed.

Age (in 1976)


Weight in 1976
Weight at age 18
Incidence of coronary heart
disease
Smoker or nonsmoker
Family history of heart disease

Chapter 1

21

Distribution
Tells what values a variable takes
and how often it takes these values
Can be a table, graph, or function

BPS - 5th Ed.

Chapter 1

22

Some Ways of Displaying


Distributions
Categorical variables
Pie charts
Bar graphs

Quantitative variables
Histograms
Stemplots (stem-and-leaf plots)

BPS - 5th Ed.

Chapter 1

23

Examples of Displaying
Data
The next 6 slides will show different
ways to display the same
information.
The first 3 slides show different ways
to display information on the Class
Make-up on the First Day of Class
The next 3 slides display information
on U.S. solid waste.
BPS - 5th Ed.

Chapter 1

24

Class Make-up on First Day


Data Table
Year

Count

Percent

Freshman

18

41.9%

Sophomore

10

23.3%

Junior

14.0%

Senior

20.9%

Total

43

100.1%

Frequency and Relative Frequency Distribution

BPS - 5th Ed.

Chapter 1

25

Class Make-up on First Day


Pie Chart

BPS - 5th Ed.

Chapter 1

26

Class Make-up on First Day


Bar Graph

BPS - 5th Ed.

Chapter 1

27

Example: U.S. Solid Waste


(2000)
Data Table
Material

Weight (million tons)

Percent of total

Food scraps

25.9

11.2 %

Glass

12.8

5.5 %

Metals

18.0

7.8 %

Paper, paperboard

86.7

37.4 %

Plastics

24.7

10.7 %

Rubber, leather, textiles

15.8

6.8 %

Wood

12.7

5.5 %

Yard trimmings

27.7

11.9 %

Other

7.5

3.2 %

Total

231.9

100.0 %

BPS - 5th Ed.

Chapter 1

28

Example: U.S. Solid Waste


(2000)
Pie Chart

BPS - 5th Ed.

Chapter 1

29

Example: U.S. Solid Waste


(2000)
Bar Graph

BPS - 5th Ed.

Chapter 1

30

Organizing Qualitative Data


Organize qualitative data in tables.
Construct bar graphs.
Construct pie charts.

Reminder: Raw qualitative data comes as a list of


values each value is one out of a set of categories.
Allows researcher to categorize or classify individuals
into groups.
BPS - 5th Ed.

Chapter 1

31

Frequency/Relative
Frequency
A frequency distribution
lists:
Each of the categories.
The frequency, or the count, of the observations that
belong to each category.

Frequency

Counts

A relative frequency distribution lists:


Each of the categories.
The relative frequency, or the proportions (or
percents), of the observations out of the total that
belong toRelative
each category.
Proportions

Frequency
BPS - 5th Ed.

(or Percents)
Chapter 1

32

Example
Consider the following data set:
blue, blue, green, red, red, blue, red, blue
The frequency distribution for this qualitative
data is:
Color
Frequency
Blue

Green

Red

Frequency
Table

The most commonly occurring color is ______


BPS - 5th Ed.

Chapter 1

33

Example
Consider the following data set:
blue, blue, green, red, red, blue, red, blue
The frequency distribution for this
qualitative data
is:
Color
Frequency
Blue

Green

Red

Frequency
Table

The most commonly occurring color is Blue


BPS - 5th Ed.

Chapter 1

34

Example (cont.)
The relative frequency distribution is computed as
follows:
Sum of all frequencies = _____
Blue has a relative frequency of __________________
Green has a relative frequency of _________________
Percent
Red Proportion
has a relative frequency of ___________________
Color

Relative
Frequency

Color

Relative
Frequency

Blue

Blue

Green

Green

Red

Red

BPS - 5th Ed.

Relative Frequency
Table
Chapter 1

35

Example (cont.)
The relative frequency distribution is computed as
follows:
Sum of all frequencies = 8
Blue has a relative frequency of 4/8 or or 0.5 or
50%
Green
has a relative frequency of Percent
1/8 or 0.125 or
Proportion
12.5%
Color
Relative
Color
Relative
Red has aFrequency
relative frequency of 3/8 or Frequency
0.375 or
Blue
4/8 or 1/2
Blue
50%
37.5%
Green

1/8

Green

12.5%

Red

3/8

Red

37.5%

BPS - 5th Ed.

Relative Frequency
Table
Chapter 1

36

A bar graph:

Bar Graphs

Relative Frequency

Lists the categories on the horizontal axis.


Draws rectangles above each category where the
heights are equal to the categorys frequency or
relative frequency.

A bar graph is a "picture" of a frequency/relative


BPS - 5th Ed.

frequency tableChapter
for 1qualitative data.

37

Bar
Graphs
(cont.)
Good practices in constructing bar graphs:
The horizontal scale:
The categories should be equally spaced.
The rectangles should have the same widths and have some space
between them.
The qualitative variable associated with the categories should be
identified via a meaningful label.

The vertical scale should:


Begin with 0.
Be incremented in reasonable steps.
Go somewhat, but not significantly, beyond the largest frequency or
relative frequency.
Be labeled in such a way that it is clear whether it is a frequency or
relative frequency graph.

Overall Graph: Meaningful title identifying qualitative variable


whose distribution is being graphed.

BPS - 5th Ed.

Chapter 1

38

Blood Type Example (MINITAB 1 in Blackboard in the


MINITAB Examples folder)

A phlebotomist draws the blood of a


random sample of 50 patience and
determines their blood types (data
found in MINITAB 1 Data)
We will use MINITAB to create some
graphs and charts.

BPS - 5th Ed.

Chapter 1

39

Blood Type Example (MINITAB 1 in


Blackboard in the MINITAB Examples folder)

Meaningful Titles and Labels: Substitute the correct words (i.e.,


appropriate for the problem being solved) for the items in
italics.
Title: Distribution of Name of Variable for Describe the Items
for our blood example: Distribution of Blood Types for the 50 Patients

Horizontal Axis Label: Description of Variable"


for our blood example: Blood Type

Vertical Axis Label


Frequency: Number of Describe the Items
for our blood example: Number of Patients

Relative Frequency: Proportion of Describe the Items


or
Percent of Describe the Items
for our blood example: Proportion of Patients
or
BPS - 5th Ed.

Percent
Chapter 1 of Patients

40

Blood Type Example (cont.)


To open a worksheet:
File > Open Worksheet > Select file (.mtw or .mpj)
Session Window
MTB > tally c1;
SUBC> counts;
SUBC> percents.
Tally for Discrete
Variables: Blood Type

Filled in by opening worksheet.


But in Blackboard you will just
click on the file and it will open in
Blackboard.
BPS - 5th Ed.

Chapter 1

Blood
Type
A
AB
B
O
N=

Count
18
4
6
22
50

Percent
36.00
8.00
12.00
44.00
41

Blood Type Example (cont.)


Session Window
MTB >
SUBC>
SUBC>
SUBC>

Data must be loaded into column c1 before issuing the commands.

chart c1;
title "Distribution of Blood Types for the 50 Patients";
percent;
bar.
Distribution of Blood Types for the 50 Patients

In MINITAB,
double-click
any title or label
to change.

40

Percent

Default label in
MINITAB is Percent
instead of Relative
Frequency

50

30

20

10

AB

Blood Type
Percent within all data.

BPS - 5th Ed.

Chapter 1

42

Blood Type Example (cont.)


Session Window
chart c1;
title "Distribution of Blood Types for the 50 Patients";
decreasing;
Distribution of Blood Types for the 50 Patients
bar.

Default label in
MINITAB is Count
instead of Frequency

25

20

Count

MTB >
SUBC>
SUBC>
SUBC>

15

10

Categories arranged
so bars are decreasing
in height.

AB

Blood Type

Pareto Chart: A Bar Graph whose bars are drawn in decreasing order of
frequency or relative frequency.
BPS - 5th Ed.

Chapter 1

43

MINITAB Frequency Table

1. Enable Commands from the Editor menu item.


2. Open the MINITAB file (i.e., worksheet) containing the data or enter the
data into a column in the worksheet.
3. Type the following command and subcommands:

MTB > tally cy;


SUBC> counts;
SUBC> percents.

Note: Substitute the correct


numbers from specific problem
for the items in italics.

where:
cy is the column containing the data

a) Subcommands in MINITAB sometimes end with a semicolon


and sometimes with a period.
b) To enter subcommands, end command with a semicolon.

BPS - 5th Ed.

Chapter 1

44

MINITAB Bar Graph


1. Enable Commands from the Editor menu item.
2. Open the MINITAB file (i.e., worksheet) containing the data
or enter the data into a column in the worksheet.
3. Type the following command and subcommands:
MTB > chart cy;
SUBC> title Meaningful Title;
Note: Substitute the correct
SUBC> percent;
words/numbers from specific
problem for the items in italics.
SUBC> bar.
where:
cy is the column containing the data
a) Remember meaningful title and labels; can double-click any title or label
to change it.
b) If you leave out the percent subcommand, MINITAB produces a
frequency bar graph; vertical axis labeled count instead of frequency.
c) If add the decreasing subcommand, MINITAB produces a Pareto chart.
BPS - 5th Ed.

Chapter 1

45

Side-By-Side Bar Graph (p. 59)


A side-by-side bar graph draws multiple rectangles for each
category, one for each set of data. The frequencies (or relative
frequencies) for each category can then easily be compared.

Side-by-side bar graph


comparing educational
attainment in 1990 versus 2003.

Not covered , but good to


know.
BPS - 5th Ed.

Chapter 1

46

Pie Charts
A pie chart is a circle divided into sections, one for each
category.
The area (angle) of each sector is proportional to the
frequency/relative frequency of that category.
Pie charts are useful for showing the relative proportions of each
category, compared to the whole.

A pie chart is a "picture" of a relative


frequency table for qualitative data.
BPS - 5th Ed.

Chapter 1

47

Pie Charts (cont.)

Good practices in constructing pie charts:


Category Slices:

Different colors (or different patterns) should be


used to distinguish the categories.
Each category should be labeled with the
category name and relative frequency (typically
as a percent).
Overall Graph: Meaningful title identifying qualitative variable
whose distribution is being graphed. (Same title as for a bar
graph.)
Not effective if there are too many categories or if some relative
frequencies are too small. May need to combine categories.

BPS - 5th Ed.

Chapter 1

48

Pie Chart: Manual Solution

Construct a relative frequency distribution first:


List all categories of data.
Obtain frequencies.
Calculate relative frequencies.
Calculate the degree measure of each sector by multiplying
each relative frequency by 360 degrees.
Construct a pie chart by hand using a protractor.

Blood Type Example


Blood Type
A

Frequency
18

AB

22

Total:

50

BPS - 5th Ed.

From slide with tally of blood


type

Relative
Frequency

Degree Measure

1.000

360

Chapter 1

49

Pie Chart: Manual Solution

Construct a relative frequency distribution first:


List all categories of data.
Obtain frequencies.
Calculate relative frequencies.
Calculate the degree measure of each sector by multiplying
each relative frequency by 360 degrees.
Construct a pie chart by hand using a protractor.

Blood Type Example

From slide with tally of blood


type

Blood Type
A

Frequency
18

Relative
Frequency
0.36

AB

0.08

360*.08=28.8, 29

0.12

360*0.12=43.2, 43

22

0.44

360*0.44=158.4, 158

Total:

50

1.000

360

BPS - 5th Ed.

Chapter 1

Degree Measure
360*0.36=129.6,130

50

Blood Type Example


Session Window
MTB >
SUBC>
SUBC>
SUBC>

Data must be loaded into column c1 before issuing the commands.

piechart c1;
title "Distribution of Blood Types for the 50 Patients";
slabel;
pcategory;

SUBC> percent

Distribution of Blood Types for the 50 Patients


Category
A
AB
B
O
A
36.0%
O
44.0%

In MINITAB,
double-click
title to change.
AB
8.0%
B
12.0%

BPS - 5th Ed.

Chapter 1

51

MINITAB Pie Chart


1. Enable Commands from the Editor menu item.
2. Open the MINITAB file (i.e., worksheet) containing the data
or enter the data into a column in the worksheet.
3. Type the following command and subcommands:
Note: Substitute the correct
MTB > piechart cy;
words/numbers from specific
SUBC> title Meaningful Title;
problem for the items in italics.
SUBC> slabel;
SUBC> pcategory;
SUBC> percent.
where:
cy is the column containing the data
Remember meaningful title; can double-click title to change it.

BPS - 5th Ed.

Chapter 1

52

Categorical Data (Summary)


Qualitative data can be organized in
several ways:
Tables are useful for listing the data, its
frequencies, and its relative frequencies.
Charts such as bar graphs, Pareto charts,
and pie charts are useful visual methods
for organizing data.
Side-by-side bar graphs are useful for
comparing two or more sets of qualitative
data.
BPS - 5th Ed.

Chapter 1

53

Organizing Quantitative Data


Organizing Quantitative Data: The Popular Displays

Organize discrete data in tables and histograms.


Organize continuous data in tables and histograms.
Draw stem-and-leaf plots.
Identify the shape of a distribution.
Draw time-series plots.

Reminder: Raw quantitative data comes as a list of numeric


values each value is a count or measurement, either
discrete or continuous.
Comparisons (one value being more than or less than another)
can be performed on the data values.
Mathematical operations (addition, subtraction, ) can be
performed on the data values.
BPS - 5th Ed.

Chapter 1

54

Discrete
Quantitative
Data
Discrete quantitative data can be presented in tables
and bar graphs in several of the same ways as
qualitative data.
Frequency/Relative Frequency Distribution
Values listed in a table - use the discrete values instead
of the category names.
List frequencies or relative frequencies.

Histogram (Bar Graph for Discrete Data)


Use the discrete values instead of the category names
and arrange the values in ascending order.
Unlike a bar graph for qualitative data, no space is left
between the bars and the width of the bars have
meaning.
BPS - 5th Ed.

Chapter 1

55

Wendy's Example
Frequency and relative frequency distributions:
MTB > tally c1;
SUBC> counts;
SUBC> percents.

Data must be loaded


into column c1 before
issuing the commands.

Tally for Discrete Variables: Num


Num
1
2
3
4
5
6
7
8
9
11
N=

a)
b)
c)
d)

Count
1
6
1
4
7
11
5
2
2
1
40

Percent
2.50
15.00
2.50
10.00
17.50
27.50
12.50
5.00
5.00
2.50

What is the most frequently occurring number of customers?


What percentage of the intervals have fewer than 3 customers?
What proportion had 9 customers?
How many had between 4 and 8 customers?

BPS - 5th Ed.

Chapter 1

56

Wendy's Example

Histograms for discrete data:

Proportion of Intervals

Number of Intervals

Can MINITAB be used to make histograms for discrete data?


BPS - 5th Ed.

Chapter 1

57

Continuous Quantitative
Continuous data cannot be
put directly into frequency
Data
tables or displayed in histograms because continuous
data do not have any obvious categories.
Categories are created using classes, or intervals of
numbers. (No predefined categories.)
The continuous data is then grouped into the classes.
Just as for discrete data:
The classes and the number (or proportion) of values in
each can be put into a table to form a frequency (or
relative frequency) distribution.
A histogram can be created from the frequency/relative
frequency distribution.

Replace categories (qualitative data) or


individual values (discrete quantitative
data) with intervals of numbers, called
classes.
BPS - 5th Ed.

Chapter 1

For discrete data with


many different values,
sometimes also use
classes.
58

Classes

For ages of adults, a possible set of classes is:


20 29
30 39
40 49
50 59
60 and older

Definitions:

Lower and Upper Class Limits


For the class 30 39:
30 is the lower class limit
39 is the upper class limit

Smallest value in class.


(cutpoints)
Largest value in class.

Open-Ended Class: Class with either no lower limit or


no upper limit.
The class 60 and older has no upper limit.

BPS - 5th Ed.

Chapter 1

59

Classes (cont.)
Class Width: Difference between consecutive lower class
limits (with the exception of open-ended classes).
For the class 30 39, the class width = 40 30 = 10.
(All the classes (20 29, 30 39, 40 49, 50 59) have the same widths.)

IMPORTANT: The class width is NOT the difference between upper


and lower class limits for a single class.
The class 30 39 years old actually is 30 years to 39 years 364 days old
or 30 years to just less than 40 years old.
The class width is 10 years, all adults in their 30s.

Class Midpoint: Sum of consecutive lower class limits divided


by 2.
For the class 30 39, the class midpoint = (30 + 40)/2 = 35.

BPS - 5th Ed.

Chapter 1

60

Frequency Tables
The classes and the number of values in each can be put
into a table (called a frequency table):
1.
2.
3.
4.
5.

Select number of classes desired.


Choose class width.
Choose class limits.
Count number of measurements per class.
List the classes and frequencies (or proportions or percents)
in a table.
In this data set, there are
Age
Number
1147 subjects between 30
20 29
533
and 39 years old.
30 39

1147

40 49

1090

50 59

493

60 and older

110

BPS - 5th Ed.

Chapter 1

61

Frequency Tables (cont.)


Good practices for constructing tables for continuous
variables:
There is no unique "best" way of selecting classes! But, the
classes should:
Not overlap.
Not have any gaps between them.
Have the same width (except for possible open-ended classes at
the extreme low or extreme high ends).
Cover the range of the data.

The lower class limits should be reasonable numbers.


The class width should be a reasonable number.
Typically 5-20 classes, with larger number of classes used with
larger data sets.
Select classes to provide a meaningful overall summary of the data:
Too few classes causes data to bunch.
Too many spread the data out so far that hard to detect patterns.
BPS - 5th Ed.

Chapter 1

62

Histograms
A histogram is a "picture" of a frequency/relative frequency table for
quantitative data.
Classes
0-1.99
2-3.99
...
12-13.99
14-15.99

To construct a histogram:
1.
2.
3.
4.
5.

Construct the frequency or relative frequency table desired.


Place the variable of interest on the horizontal axis.
Place the lower class limits for each interval on the axis.
Draw a rectangle above each interval.
The height of each rectangle is proportional to the frequency or
relative frequency for that class.

BPS - 5th Ed.

Chapter 1

63

Histograms (cont.)
Important points of histogram construction:
Plot and label only the lower class limits, in between the
bars.
Provide a descriptive title.
Generic title: Distribution of Name of the Variable for
Describe the Items.
(You must substitute the correct words for the problem at
hand for the items in italics.)

Label the horizontal axis.


Generic label: Name of the Variable (in Units)
(You must substitute the correct words for the problem at
hand for the items in italics. YOU MUST INCLUDE UNITS!)

Label the vertical axis as for a bar graph.


BPS - 5th Ed.

Chapter 1

64

Stock Example (MINITAB 2 in MINITAB


Examples folder on Blackboard)
Session Window

Data must be loaded into column c1 before issuing the commands.

MTB > hist c1;


SUBC> title "Distribution of Daily Stock Volume for 35 Trading Days";
SUBC> axlabel 1 "Volume (in millions)".
Minimum = 3.01
Maximum = 10.96

MTB > desc c1


Distribution of Daily Stock Volume for 35 Trading Days

In MINITAB, doubleclick any title or


label to change.

7
6

Default label in
MINITAB is
Frequency

Frequency

5
4
3
2

Problems with labeling of


class limits? (And for that
matter, what are the classes
that MINITAB used?)
BPS - 5th Ed.

1
0

Chapter 1

6
8
Volume (in millions)

10

65

Stock Example (cont.)


Double-click label; opens this
dialog box.

Distribution of Daily Stock Volume for 35 Trading Days


7
6

Frequency

5
4
3
2
1
0

6
8
Volume (in millions)

10

Distribution of Daily Stock Volume for 35 Trading Days


7

Number of Days

6
5
4
3
2
1
0

BPS - 5th Ed.

Chapter 1

6
8
Volume (in millions)

10

66

Stock
Example
(cont.)
Double-click any tick mark label;
opens this dialog box.

SUBC> cutpoint 3:11/1


BPS - 5th Ed.

Chapter 1

67

Stock Example (cont.)


Distribution of Daily Stock Volume for 35 Trading Days
9
8

Number of Days

7
6
5
4
3
2
1
0

6
8
Volume (in millions)

10

Distribution of Daily Stock Volume for 35 Trading Days


9

Double-click x-axis
or tick mark label;
opens this dialog
box.
BPS - 5th Ed.

Number of Days

7
6
5
4
3
2
1
0

Chapter 1

6
7
8
Volume (in millions)

10

11

68

MINITAB Histogram

1. Enable Commands from the Editor menu item.


2. Open the MINITAB file (i.e., worksheet) containing the data or
enter the data into a column in the worksheet.
3. Type the following command and subcommands:
MTB > hist cy;
SUBC> title Distribution of Name of the Variable
Note: Substitute
for Describe the Items ";
the correct
words/numbers
SUBC> axlabel 1 Name of the Variable (in units);
from specific
SUBC> cutpoint limit_1:limit_2/class_width.
problem for the
items in italics.

where:
cy is the column containing the data.
limit_1 is the lower limit of the first class.
limit_2 is the next lower class limit if one more class were to
exist (or the last lower class limit plus class width).
class_width is the class width.
Use percent subcommand for relative frequency histogram.
BPS - 5th Ed.

Chapter 1

69

MINITAB Histogram
To make a histogram with
tick marks you define, can use
(cont.)
the subcommand cutpoint.
However, to use the cutpoint subcommand, you must figure
out values for limit_1, limit_2, and class_width; MINITAB
does not compute these for you. (Use DESCRIBE command
to find sample size, minimum, and maximum.)

BPS - 5th Ed.

Chapter 1

70

Additional Example
a) Determine the class width.
b) Identify the classes.
c) Which class has the highest
frequency?
d) What MINITAB "cutpoint"
subcommand would you use
to generate this histogram?
e) Approximately how many
states had between 200 and
399 traffic fatalities?
f) Approximately what percent
of states had less than 200
traffic fatalities?

BPS - 5th Ed.

Chapter 1

71

Additional Example
a) Determine the class width. 200
b) Identify the classes. 0-199,
200-399,400-599,600-799,
800-999,1000-1199,1200-1399,
1400-1599,1600-1799
c) Which class has the highest
frequency? 0-200
d) What MINITAB "cutpoint"
subcommand would you use
to generate this histogram?
SUBC> cutpoint 0:1800/200
e) Approximately how many
states had between 200 and
399 traffic fatalities? 15
f) Approximately what percent of
states had less than 200 traffic
fatalities? 20/50= 40%
BPS - 5th Ed.

Chapter 1

72

Case Study
Weight Data
Introductory Statistics class
Spring, 1997
Virginia Commonwealth University

BPS - 5th Ed.

Chapter 1

73

Weight Data

BPS - 5th Ed.

Chapter 1

74

Weight Data: Frequency


Table

sqrt(53) = 7.2, or 8 intervals; range (260100=160) / 8 = 20 = class width

BPS - 5th Ed.

Chapter 1

75

Number of students

Weight Data: Histogram

100

120 140

160

180 200
Weight

220 240

260

280

* Left endpoint is included in the group, right endpoint is not.

BPS - 5th Ed.

Chapter 1

76

Stem-and-Leaf Plot
A stem-and-leaf plot is a different way to represent
quantitative data that is similar to a histogram.
To draw a stem-and-leaf plot, each data value must be
broken up into two components. In the simplest
scenario:
The stem consists of all the digits except for the
right-most one.
The leaf consists of the right-most digit.

Example: For the number 173, the stem would be 17


|3
and the leaf Stem
would be17
3
Leaf

BPS - 5th Ed.

Chapter 1

77

Stem-and-Leaf Plot (cont.)


Decimal Point?

The smallest value is 5.6.

Raw Data
5.6 6.5 7.3 7.8 7.8
. . . . . . . . . .
16.8 17.0 17.6 17.8 18.0

To draw a stem-and-leaf plot:


1. Write all the values in ascending order.
(optional)
2. Find the stems and write them vertically in
ascending order. (First stem is stem for
minimum; last stem corresponds to
maximum. Must include all "stems" in
between.)
3. Draw a vertical line to the right of the stems.
4. For each data value, write its leaf in the row
next to its stem.
5. The resulting leaves will also be in ascending
order (or arrange them in ascending order).

The largest value is 18.0


The second largest value is 17.8
BPS - 5th Ed.

The list of stems with their corresponding


leaves is the stem-and-leaf plot.
Chapter 1

78

Stem-and-Leaf Plot (cont.)


To read a stem-and-leaf plot:
Read the stem first.
Attach the leaf as the last digit of the stem.
The result is the original data value (after placement of the
decimal point in some cases).

Stem-and-leaf plots display the same visual patterns as


histograms. (Essentially a histogram turned on its side).
Stem
Classes
Leaves
Bars
Advantages:
Contain more information than histograms usually can recover the
"raw" data.
"Quick" way to sort data.
Disadvantages:
Best used only with small data sets.
Histogram more flexible in choice of "classes".
BPS - 5th Ed.

Chapter 1

79

Stem-and-Leaf: Manual
Solution

Construct a stem-and-leaf plot for the following data set:


8.6 11.7 9.4 9.1 10.2 8.1 8.3 11.0 8.8 7.8
Stem
1. Sort numbers into ascending
order.
2. Write the stems vertically in
increasing order.
3. Write the leaves
corresponding to their stems.

Sorted data:
7.8 8.1 8.3 8.6 8.8 9.1 9.4 10.2 11.0 11.7

Chapter 1

Leaf

Include zeros.

Modifying Data

Leaf Unit = 0.10


2
0
09
2
1
5
2
238
7
3
07
10 4
139
13 5
399
19 6
013568
(2) 7
79
19 8
23358
14 9
1388
10 10 0139
6
11 69
4
12 05
2
13 8
1
14 4
BPS - 5th Ed.

MTB > desc c1

Leaf Unit = 0.10

Minimum = 0.05
Maximum = 14.48

1
2
5
7
10
11
19
(2)
19
14
10
6
4
2
1

Truncated
MTB > stem c1
Rounded

Chapter 1

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14

1
0
339
17
139
4
00114578
89
23369
1489
1139
79
15
9
5

81

Split Stems

Include
stem even
when no
leaves.

a)
b)
c)
d)

How many values exceed 30?


What percentage of values are 20 or below?
What value occurs most frequently?
What are the minimum and maximum
values?
e) What value roughly divides the data set in
half?
BPS - 5th Ed.

Chapter 1

82

Modifications to Stem-and-Leaf
If we wanted to comparePlots
two sets of data, we could draw
two stem-and-leaf plots using the same stem, with leaves
going left (for one set of data) and right (for the other set).

Compare to a sideby-side bar graph.

There are cases where constructing a descending stemand-leaf plot could also be appropriate (for test scores, for
example).

BPS - 5th Ed.

Chapter 1

83

Case Study
Weight Data
Introductory Statistics class
Spring, 1997
Virginia Commonwealth University

BPS - 5th Ed.

Chapter 1

84

Weight Data

BPS - 5th Ed.

Chapter 1

85

Weight Data:
Stemplot
(Stem & Leaf Plot)
Key
20|3 means
203 pounds
Stems = 10s
Leaves = 1s

BPS - 5th Ed.

Chapter 1

10
11
12 5
13
14 2
15
16
17
18 2
19
20
21
22
23
24
25
26

192
152
135

86

Dot Plot
Number of arrivals at Wendy's Data
A dot plot is a graph where a
dot is placed over the value
each time it is observed.
(Used with discrete data and
small sets of continuous data.)

BPS - 5th Ed.

Chapter 1

87

Examining the Distribution


of Quantitative Data

BPS - 5th Ed.

Overall pattern of graph


Deviations from overall pattern
Shape of the data
Center of the data
Spread of the data (Variation)
Outliers

Chapter 1

88

Identifying Shapes of the Distributions


A useful way to describe a quantitative variable
is by the shape of its distribution.
Some common shapes of distributions are:
Uniform
Symmetric
bell shaped
other symmetric shapes
Asymmetric
right skewed
left skewed
Unimodal, bimodal

BPS - 5th Ed.

Chapter 1

89

Uniform
A variable has a uniform distribution
when:
Each of the values tends to occur with
the same frequency.
The histogram looks flat.

BPS - 5th Ed.

Chapter 1

90

Bell-Shaped
A variable has a bell-shaped (or
mound-shaped) distribution when:
Most of the values fall in the middle.
The frequencies tail off to the left and to
the right.
It is symmetric (i.e., left half mirror
image of right half).

BPS - 5th Ed.

Chapter 1

91

Symmetric
Bell-Shaped

BPS - 5th Ed.

Chapter 1

92

Symmetric
Mound-Shaped

BPS - 5th Ed.

Chapter 1

93

Right-Skewed
A variable has a skewed right
distribution when:
The distribution is not symmetric.
The tail to the right is longer than the
tail to the left.
The arrow from the middle to the long
tail points right.

Right
BPS - 5th Ed.

Chapter 1

94

Asymmetric
Skewed to the Right

BPS - 5th Ed.

Chapter 1

95

Left-Skewed
A variable has a skewed left
distribution when:
The distribution is not symmetric.
The tail to the left is longer than the tail
to the right.
The arrow from the middle to the long
tail points left.

Left
BPS - 5th Ed.

Chapter 1

96

Asymmetric
Skewed to the Left

BPS - 5th Ed.

Chapter 1

97

Summary: Organizing Quantitative Data


Quantitative data can be organized in several ways:
Histograms based on data values are good for discrete data.
Histograms based on classes (intervals) are good for
continuous data.
The shape of a distribution describes a variable
histograms are useful for identifying the shapes.

Labels in "Middle" of Bars


(midpoints)
BPS - 5th Ed.

Labels "In Between" Bars


(cutpoints)
Chapter 1

98

Time Plots
A time plot shows behavior over time.
Time is always on the horizontal axis, and the
variable being measured is on the vertical
axis.
Look for an overall pattern (trend), and
deviations from this trend. Connecting the
data points by lines may emphasize this trend.
Look for patterns that repeat at known regular
intervals (seasonal variations).

BPS - 5th Ed.

Chapter 1

99

Time Plots
Time-Series Data: Variable is measured at different points in
time.
Time-Series Plot: Time-series data (vertical axis) plotted
against time (horizontal axis). Lines are then drawn
connecting the points.
Identify long term trends.
Identify regularly occurring patterns with time (seasonality).

BPS - 5th Ed.

Chapter 1

100

Class Make-up on First Day


(Fall Semesters: 1985-1993)
Class Make-up On First Day
70%
60%

Percent of Class
That Are Freshman

50%
40%
30%
20%
10%
0%
1985

1986

1987

1988

1989

1990

1991

1992

1993

Year of Fall Semester

BPS - 5th Ed.

Chapter 1

101

Average Tuition (Public vs. Private)

BPS - 5th Ed.

Chapter 1

102

Outliers
Extreme values that fall outside the
overall pattern
May occur naturally
May occur due to error in recording
May occur due to error in measuring
Observational unit may be
fundamentally different

BPS - 5th Ed.

Chapter 1

103

S-ar putea să vă placă și