Sunteți pe pagina 1din 25

MEASURES OF CENTRAL TENDENCY

Applied Statistics and Computing Lab Indian School of Business

Applied Statistics and Computing Lab

Learning Goals
Concept of Central Tendency Various measures Of Central Tendency An appropriate measure in a given situation

Applied Statistics and Computing Lab

Food For Thought


While the individual man is an insolvable puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be upto- Sherlock Holmes to Dr. Watson in The Sign of Four (Arthur Conan Doyle)

Two most important concepts of statisticsaverage and variation

Applied Statistics and Computing Lab

Introducing the Concept of Central Tendency


Your friend, on returning to Hyderabad from a vacation asks you How hot was this week? You have data on this weeks temperature (in degree Celsius)36, 35, 36, 34, 37, 40, 39. How to combine these numbers and summarize them into one intelligible number that conveys how hot was this week?

She further asks Was this week hotter than the last week?
Also, you have the data on last weeks temperature (in degree Celsius)- 28, 30, 30, 32, 33, 38, 42 Clearly, she would not be happy if you were to present these numbers instead of an yes/no/same!
4

Applied Statistics and Computing Lab

Concept of Central Tendency Continued


You need to summarize the data you have!
Central tendency is a measure that summarizes or condenses a large set of data into a central value by identifying an "average" or typical value

In addition, it is possible to compare two (or more) sets of data by simply comparing their averages(central tendency) .

Applied Statistics and Computing Lab

Arithmetic mean- The Most Common Average


Arithmetic mean (AM) is defined as the sum of all numbers in the set divided by the total number of elements in the set. For numbers y1.yn the AM is given as:
n

y (1 / n) yi
i 1

AM of this weeks temperature= ( 36+ 35+ 36+34+ 37+40+39)/7= 36.71 For the last week week= 33.14 So you tell your friend, on average this week was hotter! Most common measure- average GPA or marks in a semester, average expenditure for the last two months most often refer to the AM In statistical literature, average need not necessarily represent the AM

Applied Statistics and Computing Lab

Measures Of Central Tendency


Various measures of central tendency: Arithmetic Mean Harmonic Mean Geometric Mean Median Mode Quartiles, percentiles
Applied Statistics and Computing Lab
7

Why Different Measures of Central Tendency

Situation 1: Consider the following example of salary break up in a small firm visiting your campus for placement: Table 1
Employee CEO (only 1) Senior Analyst (10 of them) Junior Analyst (20 of them) Salary (Monthly) 3,00,000 70,000 50,000

Computer Scientist (2 of them)


Intern (2)
Applied Statistics and Computing Lab

35,000
15,000
8

Situation 1 continued
Arithmetic Mean= (3,00,000+ 10*70,000+ 20*50,000+2*35,000+2*15,000)/35= 60,000
However, more than half the employees, 24 out of 35, get salary less than 50,000! AM doesnt seem representative- The salary of the CEO pulls it up! Also, as a college graduate, you know that you are not going to be a CEO or an intern so you are hardly interested in the values of extreme observations. (Too high a value for CEO and too small a value for intern)

Applied Statistics and Computing Lab

Situation 2
Situation 2: Qualitative Data- Data on colors of flowers in your garden: 3 Blue, 7 yellow, 8 purple,15 red. Which one is the most representative color? Clearly, you cannot find AM in this data! Limitations of AM: Affected by extreme observations in a dataset. In this example, salary of the CEO- Can be corrected by using trimmed mean or winsorized mean. For further reading see- http://en.wikipedia.org/wiki/Trimmed_estimator http://en.wikipedia.org/wiki/Winsorising Gives equal importance to all observations- can be corrected by weighted arithmetic mean. For eg, suppose your mid-term exam has a 40% weightage and end-term a 60% weightage, then an average of 75% in midterm and 65% in end-term yields a weighted average of (.40*75+ .60*65)= 69 Cannot be used in summarizing qualitative data
10

Applied Statistics and Computing Lab

More Measures: Median


Median is defined as the middlemost observation in the dataset. In other words, median is defined as that value in the dataset such that 50% of the observations are less than or equal to it and the other 50% are greater than or equal to it when the observations are arranged in ascending or descending order. In table 1, there are total 35 employees. Median is that salary below which (35/2) of the employees salaries lie. Here, the median salary is 50,000, more representative in this case than the AM. Median is less susceptible to extreme observations.

Applied Statistics and Computing Lab

11

Some Comments:
If the total number of values n, say, is an odd number, then the median is the (n+1)/2 th value. If it is even, the AM of the n/2 th and (n/2)+1th values is the median (Convention!)
Scores of 9 students: 40,37,41,38,31,37,44,45,42. Median is score of (9+1)/2= 5th student, which is 40 ( Arrange the marks in ascending order, then take the marks of the 5th student) Add the score of another student: 48. Now, the median of the score of 10 students=AM of the score of the 5th and the 6th student= (40+41)/2= 40.5
Applied Statistics and Computing Lab
12

Quantiles, Quartiles and Percentiles:


Median is a positional measure. There are other positional measures (measures which divide the data into equal parts)- Quartiles and Percentiles. Quartiles and percentiles belong to a more general category called Quantiles. First Quartile the value below which lie one quarter of the total observations Third Quartile the value below which lie three quarters of the total observations Nth percentile the value below which lie N (0 N 100) percent of observations In general, 'qth (0 q 1) Quantile is the value below which lie 'qth' fraction of observations.
In summary, q=4 gives quartile and q=100 gives percentile. Median is the second quartile ( below which lie 2/4 of the observations) and the 50th percentile.
Applied Statistics and Computing Lab
13

Other Measures Continued: Mode


Mode of a variable is that value of the variable which has the highest frequency ( value observed the maximum no. of times)
In the example in situation 2, there are 15 red flowers- the highest no. of flowers of a particular color. So, red is the mode. A distribution can have more than one mode (Suppose there are also 15 pink flowers). Then pink and red are the two modes. If all the values in a distribution have the same frequency then mode is not defined.
14

Applied Statistics and Computing Lab

Why different measures continued


Situation 3: Suppose an investment of 100 yields the
following returns in three years: 180, 210, 300. Hence the 100) 100 =80%, (210 180) 100 = 16.67%, growth rate is: (180100 180 (300 210) = 42.85% (Source: wikipedia.org) 210 If we compute the average growth rate as the AM, then it is 46.51% (Check!) However, at this average growth rate, we have 100(1 .4651)3 = 314 in the third year- not 300! Clearly AM is not appropriate when averaging rates of growth.

Applied Statistics and Computing Lab

15

Other Measures Continued: Geometric Mean


Geometric Mean (GM) of two positive numbers is defined as the square root of the product of those numbers, ie, GM of a and b is defined as (ab). In general, GM of n positive numbers is the nth root of the product of those numbers. The geometric mean of a data set (a1, a2, a3an) is given by: n a1...an Taking log on both sides you could prove that logarithm of GM is the AM of the logarithm of the values. Growing at 80% corresponds to multiplying by 1.80. We take the GM of 1.80, 1.67, 1.42, which is equal to 3 1.80 *1.1666 *1.428571 ) = 1.442249 Hence the rate of growth by GM is 44.2249%, hence at the end of the third year we have return=300
Applied Statistics and Computing Lab
16

Why different measures continued.


Situation 4: A man walks along the four sides of a square ground with speeds 10,5,8 and 6 km/hr respectively. What is his average speed? Clearly AM of 10,5,8 and 6 doesnt yield the answer!
His average speed= (Total Distance)/ (Total Time)=

4d (d / 10) (d / 5) (d / 8) (d / 6)

Applied Statistics and Computing Lab

17

Other Measures Continued: Harmonic Mean


The harmonic mean (HM) of a set of non-zero values of a variable is the reciprocal of the AM of the reciprocals of the values. The HM of non-zero values a1,a2,.an is:
n (1 / a1 ) (1 / a2 ) ..... (1 / an )
4 6.76 (1 / 10) (1 / 5) (1 / 8) (1 / 6)

The HM of 10,5,8 and 6 is, which is precisely the average speed of this man! HM may not be very commonly used measure of central tendency, but it is the appropriate average when the variable is of the form x per unit of y.
Applied Statistics and Computing Lab
18

Relation between AM, GM, HM- For two observations


For two numbers a and b
AM= GM= HM=
a b 2

ab
(2) (1/ a 1/ b)

2ab ab

2 Check: ( AM * HM ) (GM )

But this holds true for only two numbers. For any number of observations, AM>= GM>= HM, with equality holding when all observations are equal.
Applied Statistics and Computing Lab
19

Comparison of the different measures


Average Mean Rigidity of definition Based on all values Not affected by extreme observations Affected by extreme observations (AM).GM and HM less affected

Not rigidly defined in case of even no. of observations

May remain unchanged even after the alteration of several observations May remain unchanged even after the alteration of several observations

Median


20

Mode

Applied Statistics and Computing Lab

Some Applications
An investor deciding on whether to invest this year in a stock that yielded bimonthly returns 15%, 4%, 5%,7%,10%,10% last year. He computes the GM Computer sales representative sells the brands and number of computers shown.
Brand IBM PS(2)/ M30 IBM PS(2)/ M50 IBM PS(2)/ M70 Compaq No of computers sold 500 410 250 506

Sales representative interested in most popular brand. Compute mode. Problem: Ignore the importance of other brands.
Applied Statistics and Computing Lab
21

Applications continued
A manufacturing company claims: On average we ship parts within 37 hours of order entry. But a careful look at the data shows that for the worst off 10% of customers the shipping time was within 89 hours of order entry. Simple average misleading? Look at data on the worst off 1,5,10 or 25% of customers. Use quantile! Word of caution: Typical representation (average) in certain situations results in gross misrepresentations Use average depending on the business situation at hand
Source: Thriving on Chaos- Tom Peters
22

Applied Statistics and Computing Lab

Illustration Using R
We take the age variable from the bodymeasurement dataset Objective: Compare the ages of male and female using various measures of central tendency R-Code Age=age$Age Gender=age$Gender # Attach the variable name Agemale <- ifelse(gender == "Male", Age,0 ) Age.male<-subset(Agemale, Agemale!=0) # Assigning a variable to only mens age summary(Age.male) # Viewing five point summary Age.female <- ifelse(gender == "Female", Age,0 ) Age.female<-subset(Agefemale, Agefemale!=0) # Assigning a variable to only womens age summary(Age.female) AM = mean(Age.male) AM GM = exp(mean(log(Age.male))) GM HM = 1/mean(1/Age.male) HM Mode = names(sort(-table(Age.male)))[1] Mode quantile(Age.male,c(.5,.25,.75,.46,.79)) # Finding any n-th quantile dotchart(Age.male) dotchart(Age.female) # To spot extreme observations

Applied Statistics and Computing Lab

23

Conclusion
But Sherlock Holmes did not just talk about average- you can say with precision what an average number will be upto So, what exactly is this precision that he is referring to? The investor in our example can calculate his average return, in the ages data we can find the average age of men and women- but how precise a representation is this average? In the next module on dispersion, you will be able to answer these questions and more.
Applied Statistics and Computing Lab
24

Thank you

Applied Statistics and Computing Lab

S-ar putea să vă placă și