Sunteți pe pagina 1din 24

Analyzing and Summarizing Data

Summarizing Data
Most sets of data show a distinct tendency to group around a central value (or central tendency).

The purpose of central tendency is to find a single value that best represents an entire distribution of scores.
When people talk about an average value or the middle value or the most frequent value, they are talking informally about the mean, median, and modethree measures of central tendency.

TERMINOLOGY
Central Tendency
- the extent to which the data values group around a typical or central value

Measures of Central Tendency


- numerical values that locate, in some sense, the center of a set of data

Variation
- the amount of dispersion, or scattering, of values away from a central value

Shape
- the pattern of the distribution of values from the lowest value to the highest value

IMPORTANCE
1) To find representative value
It give us one value for the distribution and this value represents the entire distribution.

2) To condense data
Average converts the whole set of figures into just one figure and thus helps in condensation.

3) To make comparisons
To make comparisons of two or more than two distributions, we have to find the representative values of these distributions.

4) Helpful in further statistical analysis


Many techniques of statistical analysis (Dispersion, Skewness, Correlation) are based on measures of central tendency.

Mean
- The average with which you are probably most familiar. - The sample mean is represented by (read x-bar or sample mean). - The mean is found by adding all the values of the variable x (this sum of x values is symbolized x) and dividing the sum by the number of these values, n (the sample size).

Sample Mean =

Population Mean =

Activity: Typical Time It Takes To Get Ready In The Morning


If you knew the typical time it takes you to get ready in the morning, you might be able to better plan your morning and minimize any excessive lateness (or earliness) going to your destination.

Find the Mean for the following times (in mins) collected for 10 consecutive days.
Day 1 2 3 4 5 6 7 8 9 10 Time (min) 39 29 43 52 39 44 40 31 44 35

Answer:

Mean = 39.6 minutes


Even though no individual day in the sample actually had the value 39.6 minutes, allotting about 40 minutes to get ready would be a good rule for planning your mornings.

What if on Day 4, the time you spent is 102 minutes instead of 52 minutes:
Day 1 2 3 4 5 6 7 8 9 10 Time (min) 39 29 43 102 39 44 40 31 44 35

Find the Mean.

Answer: Mean = 44.6 minutes


The one extreme value has increased the mean from 39.6 to 44.6 minutes. In contrast to the original mean that was in the middle, the new mean is greater than 9 of the 10 getting-ready times. Because of the extreme value, now the mean is not a good measure of central tendency.
Time (min) 29 31 35 39 39 39.6 40 43 44 44 52 Time (min) 29 31 35 39 39 40 43 44 44 44.6 102

Mean
Use the mean to describe the middle of a set of data that does not have an outlier (extreme values). Advantages: Most popular measure in fields such as business, engineering and computer science. It is unique - there is only one answer. Useful when comparing sets of data. Disadvantages: Affected by extreme values (outliers)

Median
- The value of the data that occupies the middle position when the data are ranked in order according to size. - The sample median is represented by x (read x-tilde or sample median). - The median is not affected by extreme values, so you can use the median when extreme values are present.

Steps in determining the Median:


1) Rank the data. 2) Determine the depth of the median (rank of the median value). () + 1 = 2 3) Determine the value of the median by counting its rank as given by the depth.

Activity:
A) Find the median for the set of data {6, 3, 8, 5, 3}. Median = 5 (3rd value)

B) Find the median of the sample 9, 6, 7, 9, 10, 8. Median = 8.5 (3.5th value)

C) Find the median for both cases:


a) Time (min) b) Time (min)

29 31 35 39 39 40 43 44 44 52 29 31 35 39 39 40 43 44 44 102

a) Median = 39.5 (5.5th)

b) Median = 39.5 (5.5th)

Median
Use the median to describe the middle of a set of data that does have an outlier. Advantages: Extreme values (outliers) do not affect the median as strongly as they do the mean Easy to calculate and in some cases, can be obtained by inspection It is unique - there is only one answer. Disadvantages: Not capable of further algebraic treatment Ranking a large number of data can be tedious

Mode - The value of x that occurs most frequently - Can be used with categorical data
- Like the median, extreme values do not affect the mode - Often, there is no mode or there are several modes in a set of data - Distributions can be: unimodal, bimodal, or multimodal

Activity: For Categorical Data Find the mode. Flavor f Vanilla 28 Chocolate 22 Strawberry 15 Neapolitan 8 Butter Pecan 12 Rocky Road 9 Fudge Ripple 6 Mode: Vanilla

Activity: For Numerical Data


A) Find the Mode. Day 1 2 3 4 5 6 7 8 9 10 Time (min) 39 29 43 52 39 44 40 31 44 35 Mode = 39, 44 --> bimodal B) The bounced check fees ($) for a sample of 10 banks is: 26 28 20 21 22 25 18 23 15 30 Find the Mode. Mode = no mode

Mode
Use the mode when the data is non-numeric or when asked to choose the most popular item. Advantages: Extreme values (outliers) do not affect the mode. Disadvantages: Not necessarily unique - may be more than one answer When no values repeat in the data set, there is no mode and may seem useless. When there is more than one mode, it is difficult to interpret and/or compare
.

Considerations for Choosing a Measure of Central Tendency:


For nominal variables, the mode is the only measure that can be used. For ordinal variables, the mode and the median may be used. The median provides more information (taking into account the ranking of categories). For numerical variables, the mode, median and mean may all be calculated. The mean provides the most information about the distribution but the median is preferred if the distribution has extreme values.

Midrange
- The number exactly midway between a lowest-valued data, L, and a highest-valued data, H + = 2

Activity:
Find the mean, median, mode and midrange. {6, 7, 8, 9, 9, 10} Mean = 8.17 Median = 8.5 Mode = 9 Midrange = 8

ASSIGNMENT: (1 whole sheet, due: THU, Dec. 12)


1) Compute the mean, median and mode for the set of scores shown in the following frequency distribution table.
X 7 6 5 4 3 2 1 f 1 1 1 1 4 3 1

2) Identify the circumstances where the median instead of the mean is the preferred measure of central tendency.

3) Under what circumstances will the mean, the median, and the mode all have the same value? 4) Under what circumstances is the mode the preferred measure of central tendency? 5) Explain why the mean is often not a good measure of central tendency for a skewed distribution? 6) Draw and determine the shape of the distribution when: a) The mean, median and mode are equal b) The mode is lowest, followed by median and mean c) The mean is lowest, followed by the median and mode

S-ar putea să vă placă și