Sunteți pe pagina 1din 45

Quantitative Techniques

Measures of Central Tendency


1

Central Tendency
Measure of Central Tendency:
A single summary score that best describes the central location of an entire distribution of scores.
The typical score. The center of the distribution.

Three of the most common measures of central tendency are:


Mean Median Mode

One distribution can have multiple locations where scores cluster.


Must decide which measure is best for a given situation.
2

Mean
The most commonly used measure of central tendency When people ask about the average of a group of scores, they usually are referring to the mean. The mean is the sum of all the scores in the distribution divided by the number of scores (the mathematical average). Is the balance point of a distribution.

Mean (cont)
Population
mu sigma, the sum of X, add up all scores

Sample

X = N

N, the total number of scores in a population sigma, the sum of x, add up all scores

X bar

X X= n

n, the total number of scores in a sample


4

Mean (cont)
Exam Scores 75 82 72 68 89 91 78 94 88 75

X X= n
812 X = = 81 .2 10

Mean Score = 81.2

Mean (cont)
2 4 2 4 3 4 3 4
Frequency Performance and Memory S tudy
6 5 4 3 2 1 0

4 10

40 X = =4 10

1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5

Number of Words Recalled

The mean includes the weight of every score.


6

Pros and cons of using mean


Pros
Mathematical center of a distribution. Just as far from scores above it as it is from scores below it. Good for interval and ratio data. Does not ignore any information.

Cons
Influenced by extreme scores. May not exist in the data.

E.g., average salary at a company


12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 20,000; 390,000 Mean = 44,167
7

Median
The middle score of the distribution when all the scores have been ranked either in ascending or descending. Represents the exact center or middle of the distribution Appropriate for variables that are at least at the ordinal level Odd number of cases = (n+1)/2 th score Even number of cases = ((n/2)+(n/2 +1) th score)/2 average the two middle values together
8

What is the median suicide rate for the nine largest U.S. cities?
Rate 7.44 13.38 10.00 14.11 14.78 12.61 12.26 14.30 18.37 Total (N) City New York Los Angeles Chicago Houston Philadelphia San Diego Detroit Dallas Phoenix 9
9

First order the data


Rate 7.44 10.00 12.26 12.61 13.38 14.11 14.30 14.78 18.37 Total (N) City New York Chicago Detroit S an Diego Los A ngeles Houston Dallas P hiladelphia P hoenix 9
10

n is odd (9 + 1) / 2 = 5 Now, find the 5th case The median suicide rate for the nine largest U.S. cities is 13.38 (not 5)

Median (cont)
2 2 3 3 4 4 4 4 4 10
11

Number of Words Recalled in Performance Study


Median (even no. of cases) = ((n/2)+(n/2 +1) th score)/2 = ((10/2) th score + (10/2 +1) th score) /2 = (5th score + 6th score)/2 = (4+4)/2 = 4

How satisfied are you with your health insurance?


Responses of 7 Individuals very dissatisfied very satisfied somewhat satisfied very dissatisfied somewhat dissatisfied somewhat satisfied very satisfied

Total(N)

7
12

To locate the median Arrange the responses in order from lowest to highest (or highest to lowest): Response
very dissatisfied very dissatisfied somewhat dissatisfied somewhat satisfied ( The middle case =Median) somewhat satisfied very satisfied very satisfied
13

Pros and Cons of Median


Pros
Not influenced by extreme scores or skewed distributions. Good with ordinal data. Easier to compute than the mean.

Cons
May not exist in the data. Doesnt take actual values into account.

14

Mode
The most frequent score in the distribution. A distribution where a single score is most frequent has one mode and is called unimodal. When there are ties for the most frequent score, the distribution is bimodal if two scores tie or multimodal if more than two scores tie. Applications: Printing, Manufacturing, etc
For example, it is important to print more of the most popular books; because printing different books in equal numbers would cause a shortage of some books and an oversupply of others. For example, it is important to manufacture more of the most popular shoes; because manufacturing different shoes in equal numbers would cause a shortage of some shoes and an oversupply of others.
15

Mode (cont)
2 2 3 3 4 4 4 4 4 10
16

Number of Words Recalled in Performance Study The mode is 4.

Mode (cont)

72 81 87

72 83 88

73 85 90

76 85 91

78 86 92

This distribution is bimodal.

17

Mode (cont)
Mode is best measure of central tendency when data are not orderedlike the colors of cars in a parking lot.

For colors of cars


You cant use the medianthere is no order in the colors, no counting up from the bottom to find the middle score Also, you cannot add them together to find a mean. (blue +red+ white=?) Summary: Mode is the place where the greatest number of cases, observations, scores occur
18

Red Yellow Green

Blue Red Red

Green Yellow Blue Yellow Red Blue


19

Pros and Cons of Mode


Pros
Good for nominal data. Good when there are two typical scores. Easiest to compute and understand. The score comes from the data set.

Cons
Ignores most of the information in a distribution. Small samples may not have a mode.

20

The best measure of central tendency depends on.


The scale of measurement. The shape of the distribution.

21

Scales of Measurement
Nominal scale = mode Ordinal scale = median Ratio scale = mean, median, or mode Interval scale = mean, median, or mode
22

Shape of the Distribution


Skew refers to the general shape of a distribution when it is graphed. Symmetrical = zero skew Scores clustered on the high or low end of a distribution = skewed distribution

23

Symmetrical D istribution
16 14 12 10 8 6 4 2 0

Frequency

2 4 .5 2 9 .5 3 4 .5 3 9 .5 4 4 .5 4 9 .5 5 4 .5 5 9 .5 6 4 .5 6 9 .5

Score s

The mean, median, and mode are the same.


24

The normal distribution is the ideal symmetrical distribution

25

Distributions that are skewed have one side of the distribution where the data frequency tapers off

26

Skewed Distribution
P ositive S kew
12 10

Frequency

8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77

Score s

Tail points in the positive direction.


27

Skewed Distribution
Negative Skew
12 10

Frequency

8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77

Scores

Tail points in negative direction.


28

The mean will either underestimate or overestimate the center of skewed distributions.
Positive Skew
12 10 12 10

Negative Skew

Frequency

Frequency
27 32 37 42 47 52 57 62 67 72 77

8 6 4 2 0

8 6 4 2 0 27 32 37 42 47 52 57 62 67 72 77

Scores

Scores

Mode Median Mean

Mode Median Mean


29

Dispersion
The spread of a set of scores around some central value Why it is important
It gives us additional information that enables us to judge the reliability of our measure of the central tendency
Two datasets can have the same average but very different variability.

Applications
Stock market Quality control Data set B Data set A

30

Measures of Variability
Range Interquartile Range Variance. Standard Deviation

31

Range
The difference between the highest and lowest score in a distribution Range = highest value - lowest value
Las Vegas Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Range: 891-52 = 839
32

Pros and Cons of the Range


Pros
Very easy to compute.

Cons
Value depends only on two scores. Very sensitive to outliers. Influenced by sample size (the larger the sample, the larger the range).

33

Interquartile Range
Range of the middle half of scores IQR = Q3(Third quartile) Q1(First quartile) Las Vegas Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Interquartile Range: (35+1)/4 = 9 (Q1) 472 (Q3) 257(Q1) = 215
34

Pros and Cons of the Interquartile Range


Pros
Fairly easy to compute. Scores exist in the data set. Eliminates influence of extreme scores.

Cons
Discards much of the data.

35

Variance
Mean of all squared deviations from the mean. The average amount that a score deviates from the typical score. Score Mean = Difference Score Average of Difference Scores = 0 In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).

36

Variance: Formula
Population Sample
2

=
2

(X )
N

(X X ) S = n 1
2

sigma

37

Calculate the variance for the given sample

3, 4, 4, 4, 6, 7, 7, 8, 8, 9

(X X ) S = n 1
2

X 60 X= = =6 n 10

S2 = S2 =

(3 6)2 + (4 6)2 + (4 6)2 + (4 6)2 + (6 6)2 + (7 6)2 + (7 6)2 + (8 6)2 + (8 6)2 + (9 6)2 9 40 = 4.4 5 9

38

Pros and Cons of Variance


Pros
Takes all data into account. Lends itself to computation of other stable measures (and is a prerequisite for many of them).

Cons
Hard to interpret. Can be influenced by extreme scores.

39

Standard Deviation
Square root of the average of the squared distances of the observations from the mean To undo the squaring of difference scores, take the square root of the variance. Return to original units rather than squared units.

Population

Sample

= (X ) =
2

s= s
2

2
2

(X X ) S= n 1

40

Example

(X X ) S= n1

(3 6) 2 + (4 6) 2 + (4 6) 2 + (4 6) 2 + (6 6) 2 + (7 6) 2 + (7 6) 2 + (8 6) 2 + (8 6) 2 + (9 6) 2 S= 9 S= 40 = 2.11 9

41

Pros and Cons of Standard Deviation


Pros
Lends itself to computation of other stable measures (and is a prerequisite for many of them). Average of deviations around the mean.

Cons
Influenced by extreme scores.

42

And

If X = mean, s = standard deviation and x is a value in the data set, then: about 68% of the data lie in the interval X -s < x < X +s about 95% of the data lie in the interval X -2s < x < X +2s about 99% of the data lie in the interval X -3s < x < X +3s
43

Coefficient of Variation
It relates the standard deviation and the mean by expressing the standard deviation as a percentage of the mean Population Sample

(100)

S (100) X

44

Coefficient of Variation (cont)


Example: Which Technician shows less variability
Technician A: Average = 40 Standard deviation =5 Technician B: Average = 160 Standard deviation = 15

C V = (100 ) O
= 500/40 =12.5%

CO = (100 ) V
= 1500/160 =9.4%

45