Sunteți pe pagina 1din 16

# BOX-PLOT WITH FENCES

## Applied Statistics and Computing Lab

Learning goals
Why go beyond a basic box-plot? What are fences? How is box-plot with fences constructed? How does one interpret such a plot? What are the gains and limitations?

## Box-plot with fences

Can we modify the basic box-plot so that it helps in detecting unusual observations? Box-plot with fences can be useful What are fences? Let us take a look at a figure!

## Applied Statistics and Computing Lab

4 Source: http://en.wikipedia.org/wiki/Boxplot

## Basis for fences

From the previous figure, we see that for a normally distributed data, 99.3% of the data lies in the interval
(Q1 1.5(Q3 Q1 ), Q3 + 1.5(Q3 Q1 ))

Also, only 3 out of a million or 0.003% observations are expected to be present outside the interval
(Q1 3(Q3 Q1 ), Q3 + 3(Q3 Q1 ))
5

## Box-plot with fences

Suspected outlier

Outlier

## Box-plot with fences

Box-plot with fences are useful in identifying unusual observations What are unusual observations? Box-plot serves only as a diagnostic. It is not a test of significance. Caution: Even for a random sample from a normal distribution, about 7 out of thousand sample points can lie outside the inner fence and 3 out of a million can lie outside the outer fence. Thus when dealing with large data sets, one has to be careful about declaration of outliers on the basis of a Box-plot. Sometimes, simulation-based methods are used for this purpose. For more information one may see Robert Dawson (2011) Sometimes only the inner fence is used (as is the default in R) The default for Box-plot command in R produces Box-plot with inner fence

## Applied Statistics and Computing Lab

Comparison of data

10

11

12

## Interpretation of the Box-plot

In the Box-plot corresponding to the scores in the second semester exam, we have 3 unusual observations among 50. Under normal situation, we expect to have about 7 in a thousand observations. Thus one needs to probe into these unusual observations. The distribution of scores of second semester exam appears to be symmetric, but may have slightly longer tails in view of the unusual observations, situated symmetrically below and above the fences. From the box-plots corresponding to the three minors, it appears that
The distribution of scores in First minor is skewed to the right, The distributions of scores in Second and Third minors are symmetric and are somewhat similar, and The median scores of the three minors seem to be close (we shall examine this further when we deal with the notched box-plots)

There is an unusual observation in the Box-plot of scores of First semester exam, with a value of about 18. We know that the GPA is out of 10. Thus this is an outlier!
13

## Gain from a Box-plot with fence

As we saw,
We can identify unusual observations We can examine the tail behaviour We can compare two or more variables or datasets more easily However we cannot get modal information from these plots!

## Applied Statistics and Computing Lab

14

R-codes
Plot Boxplot (of single variable) Boxplot (of all the variables in a dataset) Boxplot (of k distinct variables from a dataset) Boxplot with means (can be drawn for one or many variables at the same time) R-code boxplot(variable name) boxplot(name of data as input in R) boxplot(dataname\$variable 1 name, dataname\$variable 2 name,, dataname\$variable k name) boxplot(variable specification) points(y=colMeans(variables specification),x=1:(total number of variables in a box-plot))

15

Thank you