ISDS Chapter 2 Outline - Updated 0810

ii.
Chapter 2: Presenting Data in Tables and

Charts
Objectives:
1. Understand that the variable type
determines the analysis approach.
2. Recognize if a variable is categorical or
numeric.
3. recognize if a numeric variable is discrete
or continuous.
4.
Recognize which summaries are used for

numeric data or for categorical data.
5. Construct a frequency table, bar graph and
pie chart for qualitative data.
6. Convert raw data into a data array.
7. Construct frequency table, relative and
cumulative frequency tables, and histogram
for quantitative data.
8. Construct a stem-and-leaf display to represent
quantitative data.
A. Types of Variables in order to address

statistical questions, one must FIRST be
able to identify types of variables. See
page 1 of text (before the Title Page) for
the Roadmap.
The TWO types of variables are:
1. Categorical Variables (also known as
Qualitative) have values that can
be placed into categories (Yes/ No;
Fr/Soph/Jr/Sr; Rep/Dem/Indep;
Defective/Not Defective).
2. Numeric Variables (also known as
Quantitative) yield values that
represent quantities (weight, salary,
return-on-investment, GPA, # of
children).
a. Discrete result of counting
b. Continuous measurements
can take on infinitely many values
within an interval
Example: Taken from an Excel

spreadsheet containing data collected
from the Fall 2006 ISDS 2000 Course
Survey
GENDER
CLASSIF
CREDIT
HOURS*
INTERNET
USAGE
SKIP CLASS
19
JR
20
JR
67
VERY OFTEN
61
VERY OFTEN
19
SO
36
19
SO
19
20
19
19
AGE*
HRS
WORK*
BUY
ONLINE
VERY RARELY
20
YES
3.23
VERY RARELY
17
YES
3.41
SOMEWHAT OFTEN
NEVER
YES
2.85
30
VERY OFTEN
VERY RARELY
20
YES
3.98
SO
42
VERY OFTEN
NEVER
15
YES
3.67
SO
56
VERY OFTEN
NEVER
20
YES
3.29
SO
34
VERY OFTEN
OCCASIONALLY
12
YES
3.36
SR
115
VERY OFTEN
VERY RARELY
14
YES
2.92
GPA*
Note : Columns represent variables (questions asked on the survey); Rows

represent the observations (students); * = quantitative data (all other
variables are qualitative); GPA = continuous numeric data
B. Data Collection When addressing

business question, you must collect data
on the variable(s) of interest.
1. Data fall into two categories:
a. Primary Source
b. Secondary Source
2. Data sources are created in one of

four ways:
a. data distributed by an
organization or individual;
(internet, databases of private and
government organizations,
industry journals, etc.)
b. conducting and reporting the
results of a designed experiment
(example: a study is designed to
see if sales increase when a
company implements an
advertising slogan)
c. responses from a survey (our class
survey)
d. conducting an observational study
(focus groups conducted by
market researchers to elicit
customer preferences)
C. Organizing Categorical Data (2.3)

1. Introduction: Data are usually
collected, entered, and saved into
some form of database. In this form,
trends and characteristics are not
easily detectable as there can
sometimes be millions of pieces of
data. We want to summarize/reduce
the data to a form which is more
easily interpreted and which will aid
in decision-making.
Many summaries are found in
newspapers, magazines, internet,
annual reports, and research studies;
therefore, it is important for you to
understand how these summaries are
constructed.
2. Summary Table - a tabular summary of
a data showing the frequency (or
percent) of items in each of several
distinct categories.
Example: I recorded the number of

students in each of the following
academic majors and wanted to
summarize:
MAJOR
ACCT
ISDS
PBADM
ISDS
FIN
PBADM
PBADM
ISDS
ISDS
PBADM
MKT
MKT
PBADM
PBADM
FIN
PBADM
ACCT
ISDS
PBADM
ISDS
PBADM
ISDS
PBADM
PBADM
.
.
.
MKT
MAJOR
ISDS
FIN
MKT
ACCT
PBADM
TOTAL
FREQ
24
9
15
7
40
95
RELATIVE FREQ (%freq)

0.253 (25.3)
0.095 (9.5)
0.158 (15.8)
0.074 (7.4)
0.421 (42.1)
1.001* (100.1)
D. Visualizing Categorical Data (2.5)

1. Bar Graph graphical representation
of data where each category is
depicted by a bar representing the
frequency or proportion of
observations falling into a category.
(Note: bars do not touch)
ISDS 2000 - FALL 2001
45
40
40
35
30
25
24
20
15
15
10
5
0
ISDS
FIN
MKT
ACCT
ACADEMIC MAJORS
PBADM
(Example from Course Survey)
2.
Pie Chart a graphical

representation of data where slices
of the pie, represented by degrees,
are associated with the frequency
or proportion of observations
falling into a category.
ISDS 2000 - FALL 2001

ACADEMIC MAJORS
25%
43%
ISDS
FIN
MKT
ACCT
9%
7%
16%
3. Pareto Chart chart where vertical

bars are plotted in descending order,
combined with a cumulative
percentage line
The Pareto Principle states that a
majority of responses exist within a
small number of categories/groups.
PBADM
Conclusion: 87% of students have some

agreement with the statement that salary
potential matters when selecting a major.
E. Organizing Numerical Data (Sect 2.4)

1. Ordered Array - a sequence of raw
data in rank order from the smallest
to the largest observation.
Example: Suppose you are provided
with a data set containing the time in
days required to complete year-end
audits for a sample of 20 clients of a
particular accounting firm:
Year-End Audit Time (days)

12
14
19
18
15
15
18
17
20
27
22
23
22
21
33
28
14
18
16
13
Data Array: 12 13 14 14 15 15 16 17 18
18 18 19 20 21 22 22 23 27 28 33
(Note: you can see min=12, max=33,

range=21, 18 occurs most often)
2. Frequency Distribution sometimes
we may prefer to arrange data into
categories or class groups so that
interpretation is more manageable;
however, the original observations are
lost in the grouping process.
A Frequency Distribution is a
summary table of data showing
the number of observations in each
of the defined numerically-ordered
categories (or classes).
Creating a Frequency Distribution:
a. Select Number of Classes usually

5 to 15 classes. (Larger data sets
require more classes, smaller data
sets require less classes; this is a
very subjective decision should
try to avoid the pancake (wide/flat)
and skyscraper (tall/thin) effect)
(In this example, lets use 5 classes
for summarizing)
b. Width of Class (approx)
Width
Range
33 12
4.2
NumberOfClasses
5
We will round up to 5 as that value

is commonly used and is easily
read. (Note: each category has the
same width)
c. Class Limits the boundaries for

each class; These are very
subjective, must be defined so that

all observations are included.
(Note: we must include the
smallest value; however, instead
of using 12 to begin the class
definitions, we begin with 10 in
order to facilitate the ease in
interpretation)
Frequency Distribution
for Audit Time Data
Audit Time (Days) Frequency
4
10 - under 15
8
15 - under 20
5
20 - under 25
2
25 - under 30
1
30 - under 35
20
d. Class Midpointhalfway point

between the class boundaries.
3. Relative Frequency Distribution a
tabular summary of a set of data
showing the proportion of
observations in each of the defined
categories.
Frequency
Relative Frequency = n
Relative Frequency Distribution
Audit Time Data
Relative
Audit
Frequency
Time
(Proportion)
(Days)
0.20
10 - under 15
0.40
15 - under 20
0.25
20 - under 25
0.10
25 - under 30
0.05
30 - under 35
1.00
(Useful when comparing different

data sets of different sizes)
4. Cumulative Distribution a tabular
summary of a set of data that
accumulates information from class to
class. This type of tabular summary
can be constructed from frequency
and relative frequency distributions.
Cumulative Distribution - Audit Time Data
Audit Time
(Days)
Under 15
Under 20
Under 25
Under 30
Under 35
Cumulative
Cumulative Relative
Frequency Frequency
0.20
4
0.60
12
0.85
17
0.95
19
1.00
20
F. Visualizing Numeric Data (2.6)

1. Stem-and-Leaf Display separates
data into stems (leading digits) and
leaves (or trailing digits).
a. Right-most digits are leaves,
remaining numbers are stems.
Audit Data Example: 12 13 14 14
15 15 16 17 18 18 18 19 20 21
22 22 23 27 28 33
1
2
3
234455678889
0122378
3
b. Characteristics of Stem-and-Leaf
(1) most effective for relatively

small data sets
(2) can use to determine
minimum, maximum, range,
mode
(3) gives an idea of how the
individual values are
distributed across the range of
the data
(4) Retains all data - each
observation remains distinctly
identifiable
2. Histogram a vertical bar chart in
which the rectangular bars are
constructed at the boundaries of each
class.
a. Horizontal Axis represents the
values of the random variable (in
this case, the time of audit in days)
b. Vertical Axis represents

frequencies or proportions; the
height of the bar represents the
quantity of the random variable
for that particular class)
Histogram
Frequency
10
8
6
4
2
0
10
15
20
25
30
35
X = # of Audit Days
(Note: this histogram illustrates skewed data)
3. Frequency Polygon: Formed by

connecting midpoints of each class.
Histogram
Frequency
10
8
6
4
2
0
X = # of Audit Days

ISDS Chapter 2 Outline - Updated 0810

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

ISDS Chapter 2 Outline - Updated 0810

Încărcat de

Drepturi de autor:

Formate disponibile

ii.

Chapter 2: Presenting Data in Tables and

Recognize which summaries are used for

A. Types of Variables in order to address

Example: Taken from an Excel

Note : Columns represent variables (questions asked on the survey); Rows

B. Data Collection When addressing

2. Data sources are created in one of

C. Organizing Categorical Data (2.3)

Example: I recorded the number of

RELATIVE FREQ (%freq)

D. Visualizing Categorical Data (2.5)

(Example from Course Survey)

Pie Chart a graphical

ISDS 2000 - FALL 2001

3. Pareto Chart chart where vertical

Conclusion: 87% of students have some

E. Organizing Numerical Data (Sect 2.4)

Year-End Audit Time (days)

(Note: you can see min=12, max=33,

a. Select Number of Classes usually

We will round up to 5 as that value

c. Class Limits the boundaries for

subjective, must be defined so that

d. Class Midpointhalfway point

(Useful when comparing different

F. Visualizing Numeric Data (2.6)

(1) most effective for relatively

b. Vertical Axis represents

(Note: this histogram illustrates skewed data)

3. Frequency Polygon: Formed by

S-ar putea să vă placă și