Chapter 2

Exploring and Understanding Data

When looking at data we must identify the following aspects before we can analyze and draw
appropriate conclusions.



What? _______________________________________________________________________________
Qualitative (categorical)


Where? When? & How? _________________________________________________________________

Remember : Bad samples = Bad results

AP Statistics Chapter 3 Part I Displaying and Describing Categorical Data Pages 20-43

Frequency Tables
A. Define Frequency Table

B. Define Relative Frequency Table


The Area Principle

A. What is the Area Principle? _________________________________________________


Bar Charts
A. What is a bar chart?

B. What is a relative frequency bar chart?

C. Relative Frequency corresponds to _______________________________.


Pie Charts
A. What is a pie chart?


Contingency Tables
A. What is a contingency table?

B. What is marginal distribution?


Conditional Distributions
A. What is a conditional distribution?

B. How do we determine independence or no association between the variables in a

contingency table?


Segmented Bar Charts

A. What is a segmented bar chart?

Chapter 4 Displaying Distributions with Graphs continued

Displaying QUANTITATIVE DATA with graphs:
1. Dotplot: a graph which records each observation by a dot above the location
corresponding to its value on a horizontal measurement scale.
When to use: Small numerical data sets
How to construct: Locate each value in the data set along the measurement scale,
and represent it by a dot. If there are two or more observations with the same value,
stack the dots vertically.
What to look for: Center (typical value), Spread (the extent to which the data set
stretches), Shape (the nature of the distribution of values along the number line),
Deviations (the presence of unusual values in the data set.

2. Stem-and-leaf display: a graph in which each number in the data set is broken
into two pieces called a stem and a leaf. The stem is the first part of the number
and consists of the beginning digit(s). The leaf is the last part of the number and
consists of the final digit.
When to use: Numerical data sets with a small to moderate number of observations
(does not work for very large data sets)
(Back-to-back stemplots can be used to compare two distributions)
How to Construct:
1. Select one or more leading digits for the stem values. The trailing digits
become the leaves.
2. List possible stem values in a vertical column.
3. Record the leaf for every observation beside the corresponding stem value
in order.
4. Create a key indicating the units for the stems and leaves.
What to look for: Center (mode, median), Shape (symmetry or clustering), Spread
(the extent of spread about the typical value, location of peaks), Deviations (the
presence of gaps or outliers).

3. Histogram: for Discrete Numerical Data looks like a bar chart.

Histogram: for Continuous Numerical Data is constructed from a frequency
distribution in which class intervals create groups of equal width to tally and display
data with bars which represent frequency for a class.
When to use: continuous numerical data. Works well, even for large data sets.
How to construct:
1. Mark the boundaries of the class intervals on a horizontal axis.
2. Use either the frequency or relative frequency on the vertical axis.
3. Draw a rectangle for each class directly above the corresponding interval (so
that the edges are at the class boundaries). Use frequency or relative
frequency to determine the heights of the rectangle.
What to look for: Central or typical value (cant be exact because actual values are
not retained, extent of spread or variation, general shape, location and number of
peaks, and presence of gaps and outliers.

4. Boxplot (Box-and-whisker display): a graph based on the median and quartiles. It is

compact, yet it provides information about the center, spread, and symmetry or
skewness of the data.
How to Construct:
1. Draw a horizontal (or vertical) measurement scale.
2. Construct a rectangle box whose left (or lower) edge is at the lower
quartile and whose right (or upper) edge is at the upper quartile (so box
width is the Interquartile Range.)
3. Draw a vertical (or horizontal) line segment inside the box at the location
of the median.
4. Extend horizontal (or vertical) line segments from each end of the box to
the smallest and largest observations in the data set (called whiskers).
What to look for: Center (median), shape, spread, deviations (long whiskers).

Chapter 1 Exploring Data


Displaying Distributions with Graphs

Statistics is the science of gaining information from numerical data. We study

statistics because the use of data has become ever more common in a growing
number of professions.
We can divide statistics in practice into two parts:
1. Descriptive Statistics
2. Inferential Statistics
Individuals are the objects described by a set of data. Individuals may be
people, but they may also be animals or things.
Variable is any characteristic of an individual. A variable can take different
values for different individuals.
Categorical variable or Qualitative variable records which of several groups
or categories an individual belongs to, like male or female.
Quantitative variable takes numerical values for which it makes sense to do
arithmetic operations like adding and averaging.
Distribution of a variable tells us what values it takes and how often it takes
these values.
To describe a distribution, begin with a graph.
Bar charts/Pie charts These graphs are used for categorical/qualitative data and
display either the count or the percent of individuals who fall in each category.

1.2 Describing Distributions with Numbers


Measures of Center
A. Mean
1. most common measure of center
2. arithmetic average
3. sample mean,
4. always exists
5. takes every data value into account
6. NOT resistant to outliers

1. middle value
2. denoted Med
3. commonly used
4. always exists
5. RESISTANT to outliers
6. The Median may not be an actual data value, but an average
of two.


1. does not always exists
2. may have more than one mode
3. most frequent value
4. only one that may be used with categorical data.


Symmetric: mean = median
Right-skewed: mean > median
Left-skewed: mean < median


Measures of Spread
A. Range = max min (NOT resistant)
B. Variance, s2 the average of the squares of the deviations of the
observations from their mean. (NOT resistant)
Formula: s2 = (x )2 (sample std dev)
= (x )2 (population std dev)


Standard Deviation, s or the average deviation from the mean

1. (s = 0) when there is no spread and gets larger as the spread
increases. The s must be > 0.
2. The and s are strongly influenced by outliers or skewness in
a distribution. They are good descriptions for symmetric
distributions and are most useful for normal distributions.
(NOT resistant)


Quartiles (resistant)
1. Q1 = 25% mark, P25, The middle between the min and the
2. Q2 = 50% mark, P50, median
3. Q3 = 75% mark, P75, The middle between the median and the


Interquartile Range, IQR

1. IQR = Q3 Q1
2. The range for the middle 50%
3. Resistant


Five-number summary
min, Q1, med, Q3, max

BOXPLOT: , a graph based on the five-number summary. The box spans the
quartiles and shows the spread of the central half of the distribution. The median
is marked within the box. The whiskers extend to the extremes and show the
full spread of the data.
MODIFIED BOXPLOT: , Plots outliers as isolated points and pulls the
whiskers back to the next highest/lowest data value that is not an outlier.
IV. Deviations
Outlier: If an observation falls more than 1.5 IQR below Q1 or above Q3

