Quantitative Techniques: Measures of Central Tendency and Dispersion

Quantitative Techniques
Measures of Central Tendency

1
Central Tendency
Measure of Central Tendency:
A single summary score that best describes the central location of an entire distribution of scores.
The typical score. The center of the distribution.
Three of the most common measures of central tendency are:

Mean Median Mode
One distribution can have multiple locations where scores cluster.

Must decide which measure is best for a given situation.
2
Mean
The most commonly used measure of central tendency When people ask about the average of a group of scores, they usually are referring to the mean. The mean is the sum of all the scores in the distribution divided by the number of scores (the mathematical average). Is the balance point of a distribution.
Mean (cont)
Population
mu sigma, the sum of X, add up all scores
Sample
X = N
N, the total number of scores in a population sigma, the sum of x, add up all scores
X bar
X X= n
n, the total number of scores in a sample

4
Mean (cont)
Exam Scores 75 82 72 68 89 91 78 94 88 75
X X= n
812 X = = 81 .2 10
Mean Score = 81.2
Mean (cont)
2 4 2 4 3 4 3 4
Frequency Performance and Memory S tudy
6 5 4 3 2 1 0
4 10
40 X = =4 10
1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5
Number of Words Recalled
The mean includes the weight of every score.

6
Pros and cons of using mean

Pros
Mathematical center of a distribution. Just as far from scores above it as it is from scores below it. Good for interval and ratio data. Does not ignore any information.
Cons
Influenced by extreme scores. May not exist in the data.
E.g., average salary at a company

12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 20,000; 390,000 Mean = 44,167
7
Median
The middle score of the distribution when all the scores have been ranked either in ascending or descending. Represents the exact center or middle of the distribution Appropriate for variables that are at least at the ordinal level Odd number of cases = (n+1)/2 th score Even number of cases = ((n/2)+(n/2 +1) th score)/2 average the two middle values together
8
What is the median suicide rate for the nine largest U.S. cities?
Rate 7.44 13.38 10.00 14.11 14.78 12.61 12.26 14.30 18.37 Total (N) City New York Los Angeles Chicago Houston Philadelphia San Diego Detroit Dallas Phoenix 9
9
First order the data

Rate 7.44 10.00 12.26 12.61 13.38 14.11 14.30 14.78 18.37 Total (N) City New York Chicago Detroit S an Diego Los A ngeles Houston Dallas P hiladelphia P hoenix 9
10
n is odd (9 + 1) / 2 = 5 Now, find the 5th case The median suicide rate for the nine largest U.S. cities is 13.38 (not 5)
Median (cont)
2 2 3 3 4 4 4 4 4 10
11
Number of Words Recalled in Performance Study

Median (even no. of cases) = ((n/2)+(n/2 +1) th score)/2 = ((10/2) th score + (10/2 +1) th score) /2 = (5th score + 6th score)/2 = (4+4)/2 = 4
How satisfied are you with your health insurance?

Responses of 7 Individuals very dissatisfied very satisfied somewhat satisfied very dissatisfied somewhat dissatisfied somewhat satisfied very satisfied
Total(N)
7
12
To locate the median Arrange the responses in order from lowest to highest (or highest to lowest): Response
very dissatisfied very dissatisfied somewhat dissatisfied somewhat satisfied ( The middle case =Median) somewhat satisfied very satisfied very satisfied
13
Pros and Cons of Median

Pros
Not influenced by extreme scores or skewed distributions. Good with ordinal data. Easier to compute than the mean.
Cons
May not exist in the data. Doesnt take actual values into account.
14
Mode
The most frequent score in the distribution. A distribution where a single score is most frequent has one mode and is called unimodal. When there are ties for the most frequent score, the distribution is bimodal if two scores tie or multimodal if more than two scores tie. Applications: Printing, Manufacturing, etc
For example, it is important to print more of the most popular books; because printing different books in equal numbers would cause a shortage of some books and an oversupply of others. For example, it is important to manufacture more of the most popular shoes; because manufacturing different shoes in equal numbers would cause a shortage of some shoes and an oversupply of others.
15
Mode (cont)
2 2 3 3 4 4 4 4 4 10
16
Number of Words Recalled in Performance Study The mode is 4.
Mode (cont)
72 81 87
72 83 88
73 85 90
76 85 91
78 86 92
This distribution is bimodal.
17
Mode (cont)
Mode is best measure of central tendency when data are not orderedlike the colors of cars in a parking lot.
For colors of cars

You cant use the medianthere is no order in the colors, no counting up from the bottom to find the middle score Also, you cannot add them together to find a mean. (blue +red+ white=?) Summary: Mode is the place where the greatest number of cases, observations, scores occur
18
Red Yellow Green
Blue Red Red
Green Yellow Blue Yellow Red Blue

19
Pros and Cons of Mode

Pros
Good for nominal data. Good when there are two typical scores. Easiest to compute and understand. The score comes from the data set.
Cons
Ignores most of the information in a distribution. Small samples may not have a mode.
20
The best measure of central tendency depends on.

The scale of measurement. The shape of the distribution.
21
Scales of Measurement
Nominal scale = mode Ordinal scale = median Ratio scale = mean, median, or mode Interval scale = mean, median, or mode
22
Shape of the Distribution

Skew refers to the general shape of a distribution when it is graphed. Symmetrical = zero skew Scores clustered on the high or low end of a distribution = skewed distribution
23
Symmetrical D istribution
16 14 12 10 8 6 4 2 0
Frequency
2 4 .5 2 9 .5 3 4 .5 3 9 .5 4 4 .5 4 9 .5 5 4 .5 5 9 .5 6 4 .5 6 9 .5
Score s
The mean, median, and mode are the same.

24
The normal distribution is the ideal symmetrical distribution
25
Distributions that are skewed have one side of the distribution where the data frequency tapers off
26
Skewed Distribution
P ositive S kew
12 10
Frequency
8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77
Score s
Tail points in the positive direction.

27
Skewed Distribution
Negative Skew
12 10
Frequency
8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77
Scores
Tail points in negative direction.

28
The mean will either underestimate or overestimate the center of skewed distributions.
Positive Skew
12 10 12 10
Negative Skew
Frequency
Frequency
27 32 37 42 47 52 57 62 67 72 77
8 6 4 2 0
8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77
Scores
Scores
Mode Median Mean
Mode Median Mean

29
Dispersion
The spread of a set of scores around some central value Why it is important
It gives us additional information that enables us to judge the reliability of our measure of the central tendency
Two datasets can have the same average but very different variability.
Applications
Stock market Quality control Data set B Data set A
30
Measures of Variability
Range Interquartile Range Variance. Standard Deviation
31
Range
The difference between the highest and lowest score in a distribution Range = highest value - lowest value
Las Vegas Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Range: 891-52 = 839
32
Pros and Cons of the Range

Pros
Very easy to compute.
Cons
Value depends only on two scores. Very sensitive to outliers. Influenced by sample size (the larger the sample, the larger the range).
33
Interquartile Range
Range of the middle half of scores IQR = Q3(Third quartile) Q1(First quartile) Las Vegas Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Interquartile Range: (35+1)/4 = 9 (Q1) 472 (Q3) 257(Q1) = 215
34
Pros and Cons of the Interquartile Range

Pros
Fairly easy to compute. Scores exist in the data set. Eliminates influence of extreme scores.
Cons
Discards much of the data.
35
Variance
Mean of all squared deviations from the mean. The average amount that a score deviates from the typical score. Score Mean = Difference Score Average of Difference Scores = 0 In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).
36
Variance: Formula
Population Sample
2
=
2
(X )
N
(X X ) S = n 1
2
sigma
37
Calculate the variance for the given sample
3, 4, 4, 4, 6, 7, 7, 8, 8, 9
(X X ) S = n 1
2
X 60 X= = =6 n 10
S2 = S2 =
(3 6)2 + (4 6)2 + (4 6)2 + (4 6)2 + (6 6)2 + (7 6)2 + (7 6)2 + (8 6)2 + (8 6)2 + (9 6)2 9 40 = 4.4 5 9
38
Pros and Cons of Variance

Pros
Takes all data into account. Lends itself to computation of other stable measures (and is a prerequisite for many of them).
Cons
Hard to interpret. Can be influenced by extreme scores.
39
Standard Deviation
Square root of the average of the squared distances of the observations from the mean To undo the squaring of difference scores, take the square root of the variance. Return to original units rather than squared units.
Population
Sample
= (X ) =
2
s= s
2
2
2
(X X ) S= n 1
40
Example
(X X ) S= n1
(3 6) 2 + (4 6) 2 + (4 6) 2 + (4 6) 2 + (6 6) 2 + (7 6) 2 + (7 6) 2 + (8 6) 2 + (8 6) 2 + (9 6) 2 S= 9 S= 40 = 2.11 9
41
Pros and Cons of Standard Deviation

Pros
Lends itself to computation of other stable measures (and is a prerequisite for many of them). Average of deviations around the mean.
Cons
Influenced by extreme scores.
42
And
If X = mean, s = standard deviation and x is a value in the data set, then: about 68% of the data lie in the interval X -s < x < X +s about 95% of the data lie in the interval X -2s < x < X +2s about 99% of the data lie in the interval X -3s < x < X +3s
43
Coefficient of Variation
It relates the standard deviation and the mean by expressing the standard deviation as a percentage of the mean Population Sample
(100)
S (100) X
44
Coefficient of Variation (cont)

Example: Which Technician shows less variability
Technician A: Average = 40 Standard deviation =5 Technician B: Average = 160 Standard deviation = 15
C V = (100 ) O
= 500/40 =12.5%
CO = (100 ) V
= 1500/160 =9.4%
45

Quantitative Techniques: Measures of Central Tendency and Dispersion

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Quantitative Techniques: Measures of Central Tendency and Dispersion

Încărcat de

Drepturi de autor:

Formate disponibile

Quantitative Techniques

Measures of Central Tendency

Three of the most common measures of central tendency are:

One distribution can have multiple locations where scores cluster.

n, the total number of scores in a sample

Mean Score = 81.2

Number of Words Recalled

The mean includes the weight of every score.

Pros and cons of using mean

E.g., average salary at a company

First order the data

Number of Words Recalled in Performance Study

How satisfied are you with your health insurance?

Pros and Cons of Median

Number of Words Recalled in Performance Study The mode is 4.

This distribution is bimodal.

For colors of cars

Red Yellow Green

Blue Red Red

Green Yellow Blue Yellow Red Blue

Pros and Cons of Mode

The best measure of central tendency depends on.

Shape of the Distribution

The mean, median, and mode are the same.

The normal distribution is the ideal symmetrical distribution

Tail points in the positive direction.

Tail points in negative direction.

Mode Median Mean

Mode Median Mean

Pros and Cons of the Range

Pros and Cons of the Interquartile Range

Calculate the variance for the given sample

Pros and Cons of Variance

Pros and Cons of Standard Deviation

Coefficient of Variation (cont)

S-ar putea să vă placă și