Sunteți pe pagina 1din 19

ii.

Chapter 2: Presenting Data in Tables and


Charts
Objectives:
1. Understand that the variable type
determines the analysis approach.
2. Recognize if a variable is categorical or
numeric.
3. recognize if a numeric variable is discrete
or continuous.
4.

Recognize which summaries are used for


numeric data or for categorical data.
5. Construct a frequency table, bar graph and
pie chart for qualitative data.
6. Convert raw data into a data array.
7. Construct frequency table, relative and
cumulative frequency tables, and histogram
for quantitative data.
8. Construct a stem-and-leaf display to represent
quantitative data.

A. Types of Variables in order to address


statistical questions, one must FIRST be
able to identify types of variables. See
page 1 of text (before the Title Page) for
the Roadmap.
The TWO types of variables are:
1. Categorical Variables (also known as
Qualitative) have values that can
be placed into categories (Yes/ No;
Fr/Soph/Jr/Sr; Rep/Dem/Indep;
Defective/Not Defective).
2. Numeric Variables (also known as
Quantitative) yield values that
represent quantities (weight, salary,
return-on-investment, GPA, # of
children).
a. Discrete result of counting
b. Continuous measurements
can take on infinitely many values
within an interval

Example: Taken from an Excel


spreadsheet containing data collected
from the Fall 2006 ISDS 2000 Course
Survey
GENDER

CLASSIF

CREDIT
HOURS*

INTERNET
USAGE

SKIP CLASS

19

JR

20

JR

67

VERY OFTEN

61

VERY OFTEN

19

SO

36

19

SO

19

20
19
19

AGE*

HRS
WORK*

BUY
ONLINE

VERY RARELY

20

YES

3.23

VERY RARELY

17

YES

3.41

SOMEWHAT OFTEN

NEVER

YES

2.85

30

VERY OFTEN

VERY RARELY

20

YES

3.98

SO

42

VERY OFTEN

NEVER

15

YES

3.67

SO

56

VERY OFTEN

NEVER

20

YES

3.29

SO

34

VERY OFTEN

OCCASIONALLY

12

YES

3.36

SR

115

VERY OFTEN

VERY RARELY

14

YES

2.92

GPA*

Note : Columns represent variables (questions asked on the survey); Rows


represent the observations (students); * = quantitative data (all other
variables are qualitative); GPA = continuous numeric data

B. Data Collection When addressing


business question, you must collect data
on the variable(s) of interest.
1. Data fall into two categories:
a. Primary Source
b. Secondary Source

2. Data sources are created in one of


four ways:
a. data distributed by an
organization or individual;
(internet, databases of private and
government organizations,
industry journals, etc.)
b. conducting and reporting the
results of a designed experiment
(example: a study is designed to
see if sales increase when a
company implements an
advertising slogan)
c. responses from a survey (our class
survey)
d. conducting an observational study
(focus groups conducted by
market researchers to elicit
customer preferences)

C. Organizing Categorical Data (2.3)


1. Introduction: Data are usually
collected, entered, and saved into
some form of database. In this form,
trends and characteristics are not
easily detectable as there can
sometimes be millions of pieces of
data. We want to summarize/reduce
the data to a form which is more
easily interpreted and which will aid
in decision-making.
Many summaries are found in
newspapers, magazines, internet,
annual reports, and research studies;
therefore, it is important for you to
understand how these summaries are
constructed.
2. Summary Table - a tabular summary of
a data showing the frequency (or
percent) of items in each of several
distinct categories.

Example: I recorded the number of


students in each of the following
academic majors and wanted to
summarize:
MAJOR
ACCT
ISDS
PBADM
ISDS
FIN
PBADM
PBADM
ISDS
ISDS
PBADM
MKT
MKT
PBADM
PBADM
FIN
PBADM
ACCT
ISDS
PBADM
ISDS
PBADM
ISDS
PBADM
PBADM
.
.
.
MKT

MAJOR
ISDS
FIN
MKT
ACCT
PBADM
TOTAL

FREQ
24
9
15
7
40
95

RELATIVE FREQ (%freq)


0.253 (25.3)
0.095 (9.5)
0.158 (15.8)
0.074 (7.4)
0.421 (42.1)
1.001* (100.1)

D. Visualizing Categorical Data (2.5)


1. Bar Graph graphical representation
of data where each category is
depicted by a bar representing the
frequency or proportion of
observations falling into a category.
(Note: bars do not touch)
ISDS 2000 - FALL 2001
45

40

40
35
30
25

24

20

15

15

10

5
0
ISDS

FIN

MKT

ACCT

ACADEMIC MAJORS

PBADM

(Example from Course Survey)

2.

Pie Chart a graphical


representation of data where slices
of the pie, represented by degrees,
are associated with the frequency
or proportion of observations
falling into a category.

ISDS 2000 - FALL 2001


ACADEMIC MAJORS
25%

43%

ISDS
FIN
MKT
ACCT
9%

7%

16%

3. Pareto Chart chart where vertical


bars are plotted in descending order,
combined with a cumulative
percentage line
The Pareto Principle states that a
majority of responses exist within a
small number of categories/groups.

PBADM

Conclusion: 87% of students have some


agreement with the statement that salary
potential matters when selecting a major.

E. Organizing Numerical Data (Sect 2.4)


1. Ordered Array - a sequence of raw
data in rank order from the smallest
to the largest observation.
Example: Suppose you are provided
with a data set containing the time in
days required to complete year-end
audits for a sample of 20 clients of a
particular accounting firm:

Year-End Audit Time (days)


12
14
19
18
15
15
18
17
20
27
22
23
22
21
33
28
14
18
16
13
Data Array: 12 13 14 14 15 15 16 17 18
18 18 19 20 21 22 22 23 27 28 33

(Note: you can see min=12, max=33,


range=21, 18 occurs most often)
2. Frequency Distribution sometimes
we may prefer to arrange data into
categories or class groups so that
interpretation is more manageable;
however, the original observations are
lost in the grouping process.
A Frequency Distribution is a
summary table of data showing
the number of observations in each
of the defined numerically-ordered
categories (or classes).
Creating a Frequency Distribution:

a. Select Number of Classes usually


5 to 15 classes. (Larger data sets
require more classes, smaller data
sets require less classes; this is a
very subjective decision should
try to avoid the pancake (wide/flat)
and skyscraper (tall/thin) effect)
(In this example, lets use 5 classes
for summarizing)
b. Width of Class (approx)
Width

Range
33 12

4.2
NumberOfClasses
5

We will round up to 5 as that value


is commonly used and is easily
read. (Note: each category has the
same width)

c. Class Limits the boundaries for


each class; These are very

subjective, must be defined so that


all observations are included.
(Note: we must include the
smallest value; however, instead
of using 12 to begin the class
definitions, we begin with 10 in
order to facilitate the ease in
interpretation)

Frequency Distribution
for Audit Time Data
Audit Time (Days) Frequency
4
10 - under 15
8
15 - under 20
5
20 - under 25
2
25 - under 30
1
30 - under 35
20

d. Class Midpointhalfway point


between the class boundaries.
3. Relative Frequency Distribution a
tabular summary of a set of data
showing the proportion of
observations in each of the defined
categories.
Frequency
Relative Frequency = n
Relative Frequency Distribution
Audit Time Data
Relative
Audit
Frequency
Time
(Proportion)
(Days)
0.20
10 - under 15
0.40
15 - under 20
0.25
20 - under 25
0.10
25 - under 30
0.05
30 - under 35
1.00

(Useful when comparing different


data sets of different sizes)
4. Cumulative Distribution a tabular
summary of a set of data that
accumulates information from class to
class. This type of tabular summary
can be constructed from frequency
and relative frequency distributions.
Cumulative Distribution - Audit Time Data
Audit Time
(Days)
Under 15
Under 20
Under 25
Under 30
Under 35

Cumulative
Cumulative Relative
Frequency Frequency
0.20
4
0.60
12
0.85
17
0.95
19
1.00
20

F. Visualizing Numeric Data (2.6)


1. Stem-and-Leaf Display separates
data into stems (leading digits) and
leaves (or trailing digits).
a. Right-most digits are leaves,
remaining numbers are stems.
Audit Data Example: 12 13 14 14
15 15 16 17 18 18 18 19 20 21
22 22 23 27 28 33
1
2
3

234455678889
0122378
3

b. Characteristics of Stem-and-Leaf

(1) most effective for relatively


small data sets
(2) can use to determine
minimum, maximum, range,
mode
(3) gives an idea of how the
individual values are
distributed across the range of
the data
(4) Retains all data - each
observation remains distinctly
identifiable
2. Histogram a vertical bar chart in
which the rectangular bars are
constructed at the boundaries of each
class.
a. Horizontal Axis represents the
values of the random variable (in
this case, the time of audit in days)

b. Vertical Axis represents


frequencies or proportions; the
height of the bar represents the
quantity of the random variable
for that particular class)
Histogram

Frequency

10
8
6
4
2
0

10

15

20

25

30

35

X = # of Audit Days

(Note: this histogram illustrates skewed data)

3. Frequency Polygon: Formed by


connecting midpoints of each class.

Histogram

Frequency

10
8
6
4
2
0
X = # of Audit Days

S-ar putea să vă placă și