Documente Academic
Documente Profesional
Documente Cultură
Yes, we use "mean" twice: Find the mean ... use it to work out distances ... then find the mean of
those!
Three steps:
Like this:
3 + 6 + 6 + 7 + 8 + 11 + 15 + 16 72
Mean = = =9
8 8
3 6
6 3
6 3
7 2
8 1
11 2
15 6
16 7
6+3+3+2+1+2+6+7 30
Mean Deviation = = = 3.75
8 8
It tells us how far, on average, all values are from the middle.
In that example the values are, on average, 3.75 away from the middle.
Formula
Σ|x - μ|
Mean Deviation =
N
Firstly:
Absolute Deviation
Each distance we calculated is called an Absolute Deviation, because it is the Absolute Value of
the deviation (how far from the mean).
To show "Absolute Value" we put "|" marks either side like this: |-3| = 3
Absolute Deviation = |x - μ|
From our example, the value 16 has Absolute Deviation = |x - μ| = |16 - 9| = |7| = 7
Sigma
Σ|x - μ|
Mean Deviation =
N
3 + 6 + 6 + 7 + 8 + 11 + 15 + 16 72
μ= = =9
8 8
3 6
6 3
6 3
7 2
8 1
11 2
15 6
16 7
Σ|x - μ| = 30
Σ|x - μ| 30
Mean Deviation = = = 3.75
N 8
Mean Deviation tells us how far, on average, all values are from the middle.
Here is an example (using the same data as on the Standard Deviation page):
Example: You and your friends have just measured the heights of your dogs (in millimeters):
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
X |x - μ|
600 206
470 76
170 224
430 36
300 94
Σ|x - μ| = 636
So, on average, the dogs' heights are 127.2 mm from the mean.
A Useful Check
The deviations on one side of the mean should equal the deviations on the other side.
6+3+3+2+1 = 2+6+7
15 = 15
Likewise:
Example: Dogs
Standard Deviation
The formula is easy: it is the square root of the Variance. So now you ask, "What is the
Variance?"
Variance
Example
You and your friends have just measured the heights of your dogs (in millimeters):
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation.
so the mean (average) height is 394 mm. Let's plot this on the chart:
To calculate the Variance, take each difference, square it, and then average the result:
And the Standard Deviation is just the square root of Variance, so:
So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what
is extra large or extra small.
Rottweilers are tall dogs. And Dachshunds are a bit short ... but don't tell them!
Our example was for a Population (the 5 dogs were the only dogs we were interested in).
But if the data is a Sample (a selection taken from a bigger Population), then the calculation
changes!
All other calculations stay the same, including how we calculated the mean.
Example: if our 5 dogs were just a sample of a bigger population of dogs, we would divide by 4
instead of 5 like this:
Here are the two formulas, explained at Standard Deviation Formulas if you want to know more:
If we just added up the differences from the mean ... the negatives would cancel the positives:
4+4-4-4
=0
4
That looks good (and is the Mean Deviation), but what about this case:
|7| + |1| + |-6| + |-2| 7+1+6+2
= =4
4 4
Oh No! It also gives a value of 4, Even though the differences are more spread out!
So let us try squaring each difference (and taking the square root at the end):
42 + 42 + 4 2 + 4 2 64
√ =√ =4
4 4
72 + 12 + 6 2 + 2 2 90
√ =√ = 4.74...
4 4
That is nice! The Standard Deviation is bigger when the differences are more spread out ... just
what we want!
In fact this method is a similar idea to distance between points, just applied in a different way.
And it is easier to use algebra on squares and square roots than absolute values, which makes the
standard deviation easy to use in other areas of mathematics.
Quartiles
Quartiles are the values that divide a list of numbers into quarters.
Like this:
Example: 5, 8, 4, 4, 6, 3, 8
Quartile 1 (Q1) = 4
Quartile 2 (Q2), which is also the Median, = 5
Quartile 3 (Q3) = 8
Sometimes a "cut" is between two numbers ... the Quartile is the average of the two numbers.
Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8
Q2 = (5+6)/2 = 5.5
Quartile 1 (Q1) = 3
Quartile 2 (Q2) = 5.5
Quartile 3 (Q3) = 7
Interquartile Range
Example:
Q3 - Q1 = 8 - 4 = 4
You can show all the important values in a "Box and Whisker Plot", like this:
Also:
So now we have enough data for the Box and Whisker Plot:
Q3 - Q1 = 15 - 4 = 11
Instructions
1.
o 1
Calculate the sample mean, using the formula ? = ?x_i / n, where n is the number
of data point x_i in the sample, and the summation is over all values of i. Read i
as a subscript of x.
o 2
Calculate the sample variance, using the formula ?(x_i - ?)^2 / (n-1).
For example, in the above sample set, the sample variance is [0.5^2 + 1.5^2 +
0.5^2 + 1.5^2] / 3 = 1.667.
o
o 3
Find the sample standard deviation by solving the square root of the result of
step 2. Then divide by the sample mean. The result is the CV.
Percentiles
Percentile: the value below which a percentage of data falls.
If your height is 1.85m then "1.85m" is the 80th percentile height in that group.
In Order
The data needs to be in order! So percentiles of height need to be in height order (sorted by
height). If they were percentiles of weight, they would need to be in weight order.
Deciles
A related idea is Deciles (sounds like decimal and percentile together), which splits the data into
10% groups:
The 1st decile is the 10th percentile (the value that divides the data so that 10% is below it)
The 2nd decile is the 20th percentile (the value that divides the data so that 20% is below it)
etc!
Example: (continued)
Quartiles
Another related idea is Quartiles, which splits the data into quarters:
Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8
Q2 = (5+6)/2 = 5.5
Quartile 1 (Q1) = 3
Quartile 2 (Q2) = 5.5
Quartile 3 (Q3) = 7
The Quartiles also divide the data into divisions of 25%, so:
Example: (continued)
For 1, 3, 3, 4, 5, 6, 6, 7, 8, 8:
Estimating Percentiles
Example: Shopping
0 0
2 350
4 1100
6 2400
8 6500
10 8850
12 10,000
a) Estimate the 30th percentile (when 30% of the visitors had arrived).
First draw a line graph of the data: plot the points and join them with a smooth curve:
Draw a line horizontally across from 3,000 until you hit the curve, then draw a line vertically
downwards to read off the time on the horizontal axis:
So the visits at 11 hours were about 9,500, which is the 95th percentile