Sunteți pe pagina 1din 19

Mean Deviation

The mean of the distances of each value from their mean.

Yes, we use "mean" twice: Find the mean ... use it to work out distances ... then find the mean of
those!

Three steps:

 1. Find the mean of all values


 2. Find the distance of each value from that mean (subtract the mean from each value,
ignore minus signs)
 3. Then find the mean of those distances

Like this:

Example: the Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16

Step 1: Find the mean:

3 + 6 + 6 + 7 + 8 + 11 + 15 + 16 72
Mean = = =9
8 8

Step 2: Find the distance of each value from that mean:

Value Distance from 9

3 6

6 3

6 3

7 2

8 1

11 2

15 6
16 7

Which looks like this:

Step 3. Find the mean of those distances:

6+3+3+2+1+2+6+7 30
Mean Deviation = = = 3.75
8 8

So, the mean = 9, and the mean deviation = 3.75

It tells us how far, on average, all values are from the middle.

In that example the values are, on average, 3.75 away from the middle.

For deviation just think distance

Formula

The formula is:

Σ|x - μ|
Mean Deviation =
N

Let's learn more about those symbols!

Firstly:

 μ is the mean (in our example μ = 9)


 x is each value (such as 3 or 16)
 N is the number of values (in our example N = 8)

Absolute Deviation

Each distance we calculated is called an Absolute Deviation, because it is the Absolute Value of
the deviation (how far from the mean).

To show "Absolute Value" we put "|" marks either side like this: |-3| = 3

For any value x:

Absolute Deviation = |x - μ|

From our example, the value 16 has Absolute Deviation = |x - μ| = |16 - 9| = |7| = 7

And now let's add them all up ...

Sigma

The symbol for "Sum Up" is Σ (called Sigma Notation), so we have:

Sum of Absolute Deviations = Σ|x - μ|

Divide by how many values N and we have:

Σ|x - μ|
Mean Deviation =
N

Let's do our example again, using the proper symbols:

Example: the Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16

Step 1: Find the mean:

3 + 6 + 6 + 7 + 8 + 11 + 15 + 16 72
μ= = =9
8 8

Step 2: Find the Absolute Deviations:


x |x - μ|

3 6

6 3

6 3

7 2

8 1

11 2

15 6

16 7

Σ|x - μ| = 30

Step 3. Find the Mean Deviation:

Σ|x - μ| 30
Mean Deviation = = = 3.75
N 8

What Does It "Mean" ?

Mean Deviation tells us how far, on average, all values are from the middle.

Here is an example (using the same data as on the Standard Deviation page):
Example: You and your friends have just measured the heights of your dogs (in millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.

Step 1: Find the mean:

600 + 470 + 170 + 430 + 300 1970


μ= = = 394
5 5

Step 2: Find the Absolute Deviations:

X |x - μ|

600 206

470 76

170 224

430 36

300 94

Σ|x - μ| = 636

Step 3. Find the Mean Deviation:


Σ|x - μ| 636
Mean Deviation = = = 127.2
N 5

So, on average, the dogs' heights are 127.2 mm from the mean.

(Compare that with the Standard Deviation of 147 mm)

A Useful Check

The deviations on one side of the mean should equal the deviations on the other side.

From our first example:

Example: 3, 6, 6, 7, 8, 11, 15, 16

The deviations are:

6+3+3+2+1 = 2+6+7

15 = 15

Likewise:

Example: Dogs

Deviations left of mean: 224 + 94 = 318

Deviations right of mean: 206 + 76 + 36 = 318

Standard Deviation and Variance


Deviation just means how far from the normal

Standard Deviation

The Standard Deviation is a measure of how spread out numbers are.

Its symbol is σ (the greek letter sigma)

The formula is easy: it is the square root of the Variance. So now you ask, "What is the
Variance?"

Variance

The Variance is defined as:

The average of the squared differences from the Mean.

To calculate the variance follow these steps:

 Work out the Mean (the simple average of the numbers)


 Then for each number: subtract the Mean and square the result (the squared difference).
 Then work out the average of those squared differences. (Why Square?)

Example

You and your friends have just measured the heights of your dogs (in millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.

Find out the Mean, the Variance, and the Standard Deviation.

Your first step is to find the Mean:


Answer:
600 + 470 + 170 + 430 + 300 1970
Mean = = = 394
5 5

so the mean (average) height is 394 mm. Let's plot this on the chart:

Now, we calculate each dogs difference from the Mean:

To calculate the Variance, take each difference, square it, and then average the result:

So, the Variance is 21,704.

And the Standard Deviation is just the square root of Variance, so:

Standard Deviation: σ = √21,704 = 147.32... = 147 (to the nearest mm)


And the good thing about the Standard Deviation is that it is useful. Now we can show which
heights are within one Standard Deviation (147mm) of the Mean:

So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what
is extra large or extra small.

Rottweilers are tall dogs. And Dachshunds are a bit short ... but don't tell them!

Now try the Standard Deviation Calculator.

But ... there is a small change with Sample Data

Our example was for a Population (the 5 dogs were the only dogs we were interested in).

But if the data is a Sample (a selection taken from a bigger Population), then the calculation
changes!

When you have "N" data values that are:

 The Population: divide by N when calculating Variance (like we did)


 A Sample: divide by N-1 when calculating Variance

All other calculations stay the same, including how we calculated the mean.

Example: if our 5 dogs were just a sample of a bigger population of dogs, we would divide by 4
instead of 5 like this:

Sample Variance = 108,520 / 4 = 27,130

Sample Standard Deviation = √27,130 = 164 (to the nearest mm)

Think of it as a "correction" when your data is only a sample.


Formulas

Here are the two formulas, explained at Standard Deviation Formulas if you want to know more:

The "Population Standard Deviation":

The "Sample Standard Deviation":

Looks complicated, but the important change is to


divide by N-1 (instead of N) when calculating a Sample Variance.

*Footnote: Why square the differences?

If we just added up the differences from the mean ... the negatives would cancel the positives:

4+4-4-4
=0
4

So that won't work. How about we use absolute values?

|4| + |4| + |-4| + |-4| 4+4+4+4


= =4
4 4

That looks good (and is the Mean Deviation), but what about this case:
|7| + |1| + |-6| + |-2| 7+1+6+2
= =4
4 4

Oh No! It also gives a value of 4, Even though the differences are more spread out!

So let us try squaring each difference (and taking the square root at the end):

42 + 42 + 4 2 + 4 2 64
√ =√ =4
4 4

72 + 12 + 6 2 + 2 2 90
√ =√ = 4.74...
4 4

That is nice! The Standard Deviation is bigger when the differences are more spread out ... just
what we want!

In fact this method is a similar idea to distance between points, just applied in a different way.

And it is easier to use algebra on squares and square roots than absolute values, which makes the
standard deviation easy to use in other areas of mathematics.

Quartiles
Quartiles are the values that divide a list of numbers into quarters.

 First put the list of numbers in order


 Then cut the list into four equal parts
 The Quartiles are at the "cuts"

Like this:
Example: 5, 8, 4, 4, 6, 3, 8

Put them in order: 3, 4, 4, 5, 6, 8, 8

Cut the list into quarters:

And the result is:

 Quartile 1 (Q1) = 4
 Quartile 2 (Q2), which is also the Median, = 5
 Quartile 3 (Q3) = 8

Sometimes a "cut" is between two numbers ... the Quartile is the average of the two numbers.

Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8

The numbers are already in order

Cut the list into quarters:

In this case Quartile 2 is half way between 5 and 6:

Q2 = (5+6)/2 = 5.5

And the result is:

 Quartile 1 (Q1) = 3
 Quartile 2 (Q2) = 5.5
 Quartile 3 (Q3) = 7
Interquartile Range

The "Interquartile Range" is from Q1 to Q3:

To calculate it just subtract Quartile 1 from Quartile 3, like this:

Example:

The Interquartile Range is:

Q3 - Q1 = 8 - 4 = 4

Box and Whisker Plot

You can show all the important values in a "Box and Whisker Plot", like this:

A final example covering everything:

Example: Box and Whisker Plot and Interquartile Range for


4, 17, 7, 14, 18, 12, 3, 16, 10, 4, 4, 11
Put them in order:

3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18

Cut it into quarters:

3, 4, 4 | 4, 7, 10 | 11, 12, 14 | 16, 17, 18

In this case all the quartiles are between numbers:

 Quartile 1 (Q1) = (4+4)/2 = 4


 Quartile 2 (Q2) = (10+11)/2 = 10.5
 Quartile 3 (Q3) = (14+16)/2 = 15

Also:

 The Lowest Value is 3,


 The Highest Value is 18

So now we have enough data for the Box and Whisker Plot:

And the Interquartile Range is:

Q3 - Q1 = 15 - 4 = 11

HOW TO CALCULATE COEFFIECIENT OF VARIATION

Instructions

1.
o 1

Calculate the sample mean, using the formula ? = ?x_i / n, where n is the number
of data point x_i in the sample, and the summation is over all values of i. Read i
as a subscript of x.

For example, if a sample from a population is 4, 2, 3, 5, then the sample mean is


14/4 = 3.5.

o 2
Calculate the sample variance, using the formula ?(x_i - ?)^2 / (n-1).

For example, in the above sample set, the sample variance is [0.5^2 + 1.5^2 +
0.5^2 + 1.5^2] / 3 = 1.667.

o
o 3

Find the sample standard deviation by solving the square root of the result of
step 2. Then divide by the sample mean. The result is the CV.

Continuing with the above example, ?(1.667)/3.5 = 0

Percentiles
Percentile: the value below which a percentage of data falls.

Example: You are the fourth tallest person in a group of 20

80% of people are shorter than you:

That means you are at the 80th percentile.

If your height is 1.85m then "1.85m" is the 80th percentile height in that group.

In Order

The data needs to be in order! So percentiles of height need to be in height order (sorted by
height). If they were percentiles of weight, they would need to be in weight order.

Deciles

A related idea is Deciles (sounds like decimal and percentile together), which splits the data into
10% groups:

 The 1st decile is the 10th percentile (the value that divides the data so that 10% is below it)
 The 2nd decile is the 20th percentile (the value that divides the data so that 20% is below it)
 etc!

Example: (continued)

You are at the 8th decile (the 80th percentile).

Quartiles

Another related idea is Quartiles, which splits the data into quarters:

Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8

The numbers are in order. Cut the list into quarters:

In this case Quartile 2 is half way between 5 and 6:

Q2 = (5+6)/2 = 5.5

And the result is:

 Quartile 1 (Q1) = 3
 Quartile 2 (Q2) = 5.5
 Quartile 3 (Q3) = 7

The Quartiles also divide the data into divisions of 25%, so:

 Quartile 1 (Q1) can be called the 25th percentile


 Quartile 2 (Q2) can be called the 50th percentile
 Quartile 3 (Q3) can be called the 75th percentile

Example: (continued)

For 1, 3, 3, 4, 5, 6, 6, 7, 8, 8:

 The 25th percentile = 3


 The 50th percentile = 5.5
 The 75th percentile = 7

Estimating Percentiles

We can estimate percentiles from a line graph.

Example: Shopping

A total of 10,000 people visited the shopping mall over 12 hours:

Time (hours) People

0 0

2 350

4 1100

6 2400

8 6500

10 8850

12 10,000
a) Estimate the 30th percentile (when 30% of the visitors had arrived).

b) Estimate what percentile of visitors had arrived after 11 hours.

First draw a line graph of the data: plot the points and join them with a smooth curve:

a) The 30th percentile occurs when the visits reach 3,000.

Draw a line horizontally across from 3,000 until you hit the curve, then draw a line vertically
downwards to read off the time on the horizontal axis:

So the 30th percentile occurs after about 6.5 hours.


b) To estimate the percentile of visits after 11 hours: draw a line vertically up from 11 until you
hit the curve, then draw a line horizontally across to read off the population on the horizontal
axis:

So the visits at 11 hours were about 9,500, which is the 95th percentile

S-ar putea să vă placă și