Sunteți pe pagina 1din 21

Comparative study of Outliers Detection Tests in Sampling from Cauchy

Distribution

Alamgir, Amjad Ali, Sajjad Ahmad Khan, Umair Khalil and Dost Muhammad Khan

Department of Statistics, University of Peshawar, KPK, Pakistan


Email: profalamgir@yahoo.com

Abstract

Outliers have been of constant concern for statistician. These outliers must be handled carefully
while analyzing data using statistical tools. In the present study, we introduce some more tests for
detecting one or more outliers, which make use of the more robust statistics, namely, median,
quartile deviation and the variance based on “sample free of suspected outlier(s)”. We simulate
critical values for the tests and also for those tests which are available in the literature while
sampling from Cauchy distribution. We compare their efficiency using power criteria.

Key Words: Outliers, Cauchy distribution, Critical Values, Powers

Introduction

Statistical analyses are greatly affected by the outliers present in the data. In practice, while
observing several measurements of chemical or physical quantity, one or more of the observed
values may be significantly different from majority of the remaining observations. Such
observations are termed as outliers and may be removed from the data set before performing
statistical analysis. Van der Loo (2010) argued that the detection of outliers is of immense
importance from statistical analysis point of view. This task can be accomplished if one uses some
test procedures for the detection of outliers.
In single sample case, researchers have studied many tests and procedures relating to different
probability distributions to detect and to handle such outliers. Dixon (1950) introduced several
tests for detection of one or more outliers. Several researchers have simulated critical values for
the tests based on sampling from normal distribution. For example, Grubbs and Beck (1972)
simulated critical values for the tests for k = 2, where k is the number of outliers. Barnett and Lewis
(1994) have simulated critical values for the tests when k = 2, 3 and 4. Fernando et al. (2000) have
given a comparison of efficiency of several tests for the detection of outliers in sampling from
normal population. Verma and Quiroz-Ruiz (2006) computed comparatively more precise and
accurate simulated critical values (with four decimal places) for six Dixon tests with various levels
of significance for detection of outliers while sampling from normal population for samples of up
to sizes 100. A nice review of already work done in this area is given by Beckman and Cook
(1983). For non- normal populations, the published work has been concentrated on uniform
distribution, exponential and gamma distribution because of their simplicity and wide applications.
Fung and Paul (1985) conducted a simulation study to compare eight different tests for the
detection of outliers in sampling from Weibull or extreme-value distribution. In the current study
we simulate critical values for several tests and also compare their performance using power
criteria (Beckman and Cook, 1983) while sampling from Cauchy distribution.
The Test Statistics for the Study

Several tests are considered in this study. These tests are described in brief below.

Dixon-Type Tests

Three Dixon- type tests to be considered in this study are given by

xn  xn 1 x  xn 1 x  xn 1
T1  , T2  n , T3  n .
xn  x1 xn  x2 xn  x3
We also consider some more tests involving mean and median. They are

xn  x
T4 
s
An analog of the above test is obtained if sample standard deviation is replaced by mean absolute
deviation, given by
x  median
T5  n
ˆ
where,
1 n
ˆ   xi  median
n i
x  median
T6  n
Q.D
where,
Q. D  QuartileDeviation  (Q3  Q1 ) / 2
.
xn  median
T7 
MAD
where,
MAD  (median xi  median ) / 0.6457
.
The two tests, T6 and T7 also make use of MAD and Q.D which are more robust estimates as
compared to sample standard deviation.

The Proposed Test

We propose a test making use of the robust location estimate, “median” and standard deviation
based on first “n-1” observations (when the largest observation is tested for possible outlier) and
is defined as
x  median
T8  n
sn 1
n 1

(X i  X )2
Where sn21  i 1

n 1
Two Sided Tests

We investigate four (4) two- sided tests for an extreme outlier. They are (Tests, T9 – T12) given
below:
x  x x  x1
T9  max ( n , )
s s
A two-sided analogue of the T5 is given by
x  median median  x1
T10  max ( n , )
ˆ ˆ
We define two-sided Dixon test given by
x  xn1 x2  x1
T11  max ( n , )
xn  x1 xn  x1
The following test is a two- sided analogue of T6,
x  median median  x1
T12  max( n , )
Q.D Q.D

Block Tests

The following block tests, available in the literature, are used to detect two or more (d>1)
outliers:
x   xn1  xn  d x
T13  nd 1
s
x   xn  d Median
T14  nd 1
ˆ
Dixon type tests, for detecting more than one outlier, are given by
x  xn d x  xn d x  xn d
T15  n , T16  n , T17  n ,
xn  x1 xn  x2 xn  x3
The Proposed Block test

We propose a block test, T18 , which is a generalization of our proposed test T8 and is given by
x   xn  d Median
T18  nd 1
snd
where sn  d is standard deviation based on first “n-d” observations (when the largest d observations
are tested for outliers).

Power Criteria

The power criteria used by many researchers is as follows:


Let Z1 denotes the number of correctly identified outliers and Z2 denotes the number of incorrectly
identified outliers (regular observations declared as outliers, called false positives). The criteria
they considered are given by:
(a) P( z1  d , z2  0), the probability of no false positives and detecting all contaminants
(correct identification).
(b) P( z1  0, z2  d ), the probability that all regular observations are detected as outliers and
no correct identification of outliers happens.
(c) P( z1  j, z2  d  j ), j  1, 2, . . . , d , the probability that some observations detected are
actual contaminants and some are false positives.
(d) The average number of correctly identified contaminants E ( Z1 ).
(e) The average number of incorrectly identified outliers (false positives) E ( Z 2 ).
(f) The percentage of expected actual outliers among all observations that were expected to be
E ( Z1 )
declared as outliers .
E ( Z1 )  E ( Z 2 )
Fung and Paul (1985) and Childs (1996) have used the above criteria for comparison purpose.

Simulation Studies

An extensive simulation study is conducted to compute empirically the critical values for various
tests discussed above based on sampling from Cauchy distribution. All the critical values
computed for the tests are based on 10000 simulations. The sample sizes considered for all tests in
this study are n = 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 18, 20, 25, 30. For each test, critical values are
simulated at 1%, 5% and 10% level of significance.
In order to compare the performance of the above tests (T1- T18), we perform this evaluation using
the power criteria. For this purpose, we consider contaminated models. For upper extreme outliers
detection tests (T1- T8, and T13- T18), we use the following contamination model;
(I) Model based on Shift in Location Parameter
Let d be the number of contaminants. Here we assume that n- d observations are from
Cauchy distribution with location parameter  and scale parameter  , whereas, d
(contaminants) observations come from C (  a  ,  ) , where, “a” is the amount of shift
in location parameter.
For the two sided tests considered in the study, we have considered a shift in the scale parameter,
that is, scale shift model given below;
(II) Model based on Shift in Scale Parameter
Let “b” be the amount of shift in scale parameter. Here we assume that n- 1 observations
are from Cauchy distribution with location parameter  and scale parameter  , whereas,
1 (contaminant) observation comes from C ( , b  ).

Numerical (simulation) Results and Discussions


We simulate critical values for all the tests under consideration along with the proposed tests when
sampling is done from Cauchy distribution.

Critical Values and Powers for the Tests

Simulated critical values for various levels of significance and different values of sample size, n,
for all tests considered in this study have been obtained. We calculate the critical values of the test
up to three (3) decimal places and are presented in Table 1 to Table 8.
Table 1. Critical values for T1 and T2
n T1 T2
(10%) (5%) (1%) (10%) (5%) (1%)
5 0.787 0.878 0.957 0.894 0.943 0.981
6 0.766 0.865 0.953 0.862 0.923 0.973
7 0.754 0.857 0.950 0.844 0.913 0.970
8 0.746 0.852 0.948 0.833 0.906 0.967
9 0.741 0.850 0.946 0.826 0.902 0.966
10 0.737 0.847 0.946 0.821 0.899 0.965
12 0.733 0.844 0.945 0.815 0.895 0.963
14 0.730 0.842 0.944 0.811 0.893 0.962
15 0.729 0.840 0.943 0.810 0.891 0.960
16 0.728 0.839 0.943 0.809 0.890 0.960
18 0.727 0.837 0.941 0.807 0.888 0.958
20 0.725 0.834 0.940 0.805 0.885 0.957
25 0.724 0.831 0.937 0.804 0.882 0.955
30 0.723 0.827 0.935 0.803 0.875 0.952

Table 2. Critical values for T3, T4 and T5


n T3 T4 T5
(10%) (5%) (1%) (10%) (5%) (1%) (10%) (5%) (1%)
5 0.967 0.984 0.995 1.748 1.775 1.786 3.841 4.303 4.746
6 0.917 0.955 0.985 1.983 2.021 2.038 4.335 4.971 5.616
7 0.889 0.939 0.979 2.193 2.242 2.264 4.901 5.686 6.503
8 0.873 0.929 0.976 2.385 2.444 2.470 5.395 6.343 7.361
9 0.863 0.924 0.974 2.563 2.632 2.661 5.932 7.039 8.232
10 0.856 0.919 0.972 2.730 2.807 2.840 6.428 7.688 9.088
12 0.848 0.915 0.970 3.083 3.129 3.168 7.438 9.008 10.806
14 0.842 0.911 0.969 3.317 3.421 3.466 8.422 10.313 12.513
15 0.841 0.910 0.968 3.449 3.558 3.606 8.928 10.952 13.365
16 0.840 0.909 0.967 3.612 3.716 3.732 9.389 11.235 14.143
18 0.837 0.908 0.966 3.814 3.941 3.997 10.355 12.854 15.903
20 0.835 0.907 0.965 4.040 4.177 4.238 11.294 14.117 17.586
25 0.833 0.906 0.963 4.558 4.718 4.787 13.631 17.257 21.770
30 0.832 0.905 0.961 5.021 5.201 5.281 15.899 20.288 25.929

Table 3. Critical values for T6, T7 and T8


n T6 T7 T8
(10%) (5%) (1%) (10%) (5%) (1%) (10%) (5%) (1%)
5 19.766 40.781 208.391 17.358 36.100 182.932 8.082 16.233 79.707
6 18.069 36.734 182.669 17.834 36.296 182.201 8.416 16.807 82.739
7 18.801 37.988 189.885 22.495 45.849 233.176 8.845 17.659 88.096
8 21.421 43.020 213.118 23.919 49.004 246.367 9.304 18.583 93.305
9 29.804 60.616 303.442 28.077 57.407 284.905 9.764 19.617 96.998
10 30.457 62.085 310.258 30.110 61.442 307.197 10.266 20.545 100.643
12 35.341 71.662 356.783 36.192 73.692 367.518 11.161 22.268 110.252
14 42.772 87.741 437.870 42.262 86.361 433.444 12.031 24.037 118.544
15 44.949 90.999 447.154 45.731 92.690 458.003 12.459 24.653 120.691
16 48.783 100.153 498.259 49.012 99.705 491.724 12.876 25.974 125.341
18 54.888 111.899 548.006 54.245 110.851 543.295 13.583 27.025 131.384
20 60.433 123.689 620.915 60.237 123.003 626.282 14.253 28.359 142.031
25 76.899 157.317 780.136 75.710 154.389 765.664 15.974 31.964 156.459
30 91.093 186.724 910.735 90.308 184.917 903.286 17.443 34.842 169.187

Table 4. Critical values for T9 and T10


n T9 T10
10% 5% 1% 10% 5% 1%
5 1.756 1.780 1.788 4.384 4.684 4.930
6 1.983 1.992 1.996 4.973 5.453 5.978
7 2.245 2.267 2.282 5.752 6.347 6.992
8 2.404 2.456 2.475 6.493 7.187 7.817
9 2.568 2.673 2.683 7.121 7.867 8.762
10 2.755 2.822 2.845 7.912 8.867 9.757
12 3.066 3.147 3.174 9.285 10.501 11.709
14 3.215 3.336 3.372 10.290 11.743 12.946
15 3.483 3.582 3.614 11.335 12.979 14.590
16 3.596 3.623 3.784 12.145 13.985 15.763
18 3.742 3.823 3.924 12.935 14.925 16.978
20 4.089 4.207 4.247 14.745 17.101 19.345
25 4.613 4.748 4.798 17.971 20.922 24.044
30 5.066 5.238 5.293 21.125 24.996 28.926

Table 5. Critical values for T11 and T12


n T11 T12
10% 5% 1% 10% 5% 1%
5 0.893 0.947 0.988 38.608 83.616 414.590
6 0.885 0.941 0.987 39.425 83.986 428.923
7 0.878 0.938 0.986 40.348 84.569 445.677
8 0.870 0.933 0.985 41.206 85.101 462.302
9 0.868 0.931 0.984 52.135 102.457 546.451
10 0.866 0.930 0.983 60.037 122.633 624.925
12 0.862 0.928 0.981 68.340 146.753 824.723
14 0.861 0.925 0.978 78.923 187.567 902.652
15 0.860 0.924 0.975 87.258 185.332 1001.572
16 0.858 0.923 0.974 94.563 202.784 1103.342
18 0.855 0.922 0.972 104.578 222.376 1216.934
20 0.854 0.921 0.969 121.246 250.043 1428.174
25 0.852 0.916 0.963 154.635 315.443 1703.966
30 0.848 0.913 0.960 176.390 381.384 2075.488

Table 6. Critical values for T13 and T14


n d T13 T14
10% 5% 1% 10% 5% 1%
5 2 1.960 2.040 2.124 4.398 4.658 4.879
6 2 2.250 2.351 2.470 4.998 5.404 5.780
7 2 2.509 2.629 2.776 5.727 6.230 6.711
3 2.683 2.790 2.943 6.019 6.417 6.783
8 2 2.746 2.881 3.056 6.332 6.978 7.613
3 2.987 3.105 3.280 6.677 7.199 7.698
9 2 2.966 3.116 3.314 7.003 7.761 8.522
3 3.263 3.393 3.589 7.439 8.041 8.631
10 2 3.174 3.336 3.555 7.608 8.497 9.416
3 3.519 3.661 3.877 8.093 8.815 9.540
4 3.696 3.844 4.054 8.322 8.963 9.597
12 2 3.554 3.742 3.995 8.837 9.989 11.210
3 3.984 4.153 4.401 9.439 10.388 11.368
4 4.232 4.413 4.662 9.764 10.601 11.452
14 2 3.900 4.109 4.395 10.040 11.448 12.991
3 4.404 4.597 4.878 10.748 11.925 13.182
4 4.709 4.920 5.210 11.156 12.191 13.287
15 2 4.063 4.281 4.584 10.649 12.186 13.879
3 4.599 4.803 5.097 11.403 12.700 14.087
4 4.928 5.154 5.466 11.849 12.992 14.205
18 2 4.516 4.760 5.102 12.372 14.329 16.532
3 5.138 5.378 5.715 13.277 14.940 16.779
4 5.537 5.804 6.172 13.823 15.302 16.923
20 2 4.796 5.058 5.424 13.522 15.749 18.296
3 5.471 5.731 6.095 14.516 16.444 18.571
4 5.913 6.205 6.604 15.124 16.845 18.732
25 2 5.434 5.734 6.156 16.346 19.273 22.658
3 6.226 6.533 6.952 17.562 20.116 23.006
4 6.754 7.102 7.579 18.320 20.626 23.215
5 7.163 7.537 8.050 18.843 20.976 23.356
30 2 6.003 6.335 6.808 19.080 22.717 26.995
3 6.895 7.245 7.718 20.501 23.721 27.410
4 7.503 7.898 8.444 21.382 24.319 27.655
5 7.977 8.408 9.000 22.011 24.738 27.826

Table 7. Critical values for T15 and T16


n d T15 T16
10% 5% 1% 10% 5% 1%
5 2 0.894 0.942 0.981 0.979 0.990 0.997
6 2 0.861 0.923 0.973 0.942 0.969 0.989
7 2 0.844 0.912 0.970 0.919 0.956 0.985
3 0.889 0.939 0.979 0.957 0.977 0.992
8 2 0.832 0.906 0.967 0.906 0.948 0.982
3 0.872 0.929 0.976 0.940 0.967 0.989
9 2 0.825 0.902 0.966 0.897 0.943 0.981
3 0.862 0.923 0.974 0.928 0.961 0.987
10 2 0.821 0.899 0.965 0.892 0.940 0.979
3 0.855 0.919 0.972 0.921 0.957 0.985
4 0.878 0.933 0.977 0.940 0.967 0.989
12 2 0.814 0.895 0.963 0.884 0.936 0.978
3 0.847 0.914 0.9708 0.912 0.951 0.983
4 0.867 0.926 0.974 0.928 0.961 0.986
14 2 0.810 0.892 0.962 0.879 0.933 0.977
3 0.842 0.911 0.969 0.906 0.948 0.982
4 0.860 0.922 0.973 0.921 0.957 0.985
15 2 0.809 0.891 0.962 0.878 0.932 0.977
3 0.840 0.910 0.969 0.904 0.947 0.981
4 0.858 0.921 0.973 0.919 0.956 0.985
18 2 0.806 0.890 0.962 0.875 0.930 0.976
3 0.837 0.908 0.968 0.900 0.945 0.981
4 0.853 0.918 0.972 0.914 0.953 0.984
20 2 0.805 0.889 0.962 0.873 0.929 0.976
3 0.835 0.907 0.968 0.898 0.944 0.981
4 0.851 0.917 0.972 0.972 0.952 0.984
25 2 0.804 0.889 0.961 0.872 0.929 0.976
3 0.833 0.906 0.968 0.896 0.943 0.980
4 0.849 0.916 0.971 0.910 0.950 0.983
5 0.859 0.921 0.973 0.918 0.955 0.985
30 2 0.802 0.888 0.961 0.870 0.928 0.975
3 0.831 0.905 0.967 0.895 0.942 0.980
4 0.847 0.915 0.971 0.908 0.949 0.983
5 0.856 0.920 0.973 0.916 0.954 0.984

Table 8. Critical values for T17 and T18


n d T17 T18
10% 5% 1% 10% 5% 1%
5 2 1 1 1 21.220 42.995 203.335
6 2 0.985 0.993 0.998 18.472 36.290 167.711
7 2 0.957 0.977 0.992 18.149 35.347 164.817
3 0.990 0.995 0.998 25.435 49.489 223.902
8 2 0.940 0.967 0.989 18.247 35.490 165.278
3 0.971 0.984 0.995 24.299 46.775 211.213
9 2 0.928 0.961 0.987 18.721 36.092 166.158
3 0.957 0.977 0.992 24.344 46.203 204.560
10 2 0.921 0.957 0.985 19.239 37.023 170.897
3 0.948 0.972 0.990 24.690 46.615 208.139
4 0.965 0.981 0.993 29.012 54.671 240.222
12 2 0.912 0.951 0.983 20.339 38.892 178.321
3 0.937 0.965 0.988 25.716 48.229 212.617
4 0.952 0.974 0.991 29.652 55.100 237.277
14 2 0.906 0.948 0.982 21.479 41.172 188.708
3 0.931 0.962 0.987 26.988 50.512 223.503
4 0.944 0.970 0.990 30.843 57.026 247.585
15 2 0.904 0.947 0.982 22.094 41.968 190.595
3 0.928 0.961 0.987 27.620 51.545 225.588
4 0.942 0.968 0.989 31.480 58.020 249.466
18 2 0.901 0.945 0.981 23.733 45.186 204.994
3 0.924 0.958 0.986 29.522 54.991 239.356
4 0.937 0.965 0.988 33.466 61.519 261.530
20 2 0.899 0.944 0.981 24.754 47.203 221.246
3 0.922 0.957 0.985 30.752 57.513 259.732
4 0.935 0.964 0.988 34.792 64.109 282.850
25 2 0.896 0.943 0.981 27.283 52.237 237.597
3 0.919 0.956 0.985 33.802 62.979 278.831
4 0.932 0.963 0.987 38.219 70.047 304.136
5 0.939 0.967 0.989 41.361 74.941 320.881
30 2 0.895 0.942 0.980 29.593 56.555 255.904
3 0.918 0.955 0.985 36.506 68.136 296.786
4 0.930 0.962 0.987 41.116 75.637 321.566
5 0.937 0.966 0.988 44.388 80.653 338.286
Powers for the Tests
Tables (9- 13) present powers computed based on simulating 10000 random samples drawn from
Cauchy distribution. For location shift model, various values of n and d have been considered for
comparison purpose, i.e., n = 5, 8, 10, 15, 20, 25, 30 and d = 1(1)5. The amounts of shift in location
parameter considered are a = 10 and 15. All comparisons have been made at commonly used 5%
level of significance. For the scale shift model, in case of tests T9 - T12, we have considered the
same values of n but d = 1. The amounts of shift in the scale parameter considered are b = 10 and
15.
Results presented in Table 9 and Table 10 show that T5 performs very well as compared to the
remaining tests for both a = 10 and 15 when n = 5, 8, 10, 12 and 15 as it correctly detects more
outliers as compared to the other tests. But at the same time, it also declares more false positives
than the other tests. Other competitors of T5 are T1, T4 and T8 which have similar performance in
terms of correctly identifying the outliers. Contrary to T5, these tests declare less false positives
when n = 5, 8, 10, 12, 15 and 20 for both a = 10 and 15. Beyond n  20 , T1 and T8 perform better
than T5 and the remaining tests as these two correctly identify outliers more often than T5 and the
remaining tests for a = 10. For larger sample size, T1 performs better as compared to other tests
for a = 10. For a = 15, T8 performs better than the remaining tests. The ratio EZ1/(EZ1+EZ2) is the
greatest for T1 as compared to all other tests for a = 10. T6 and T7 are almost the poorest tests based
on the power criteria for both a = 10 and 15 irrespective of the sample size.
Table 9: Powers for tests T1- T8 when a = 10
n D Power T1 T2 T3 T4 T5 T6 T7 T8
5 1 P(Z1=0, 2=1) 0.38 0.23 0.12 0.36 0.39 0.23 1.37 0.37
P(Z1=1, 2=0) 51.87 47.14 26.47 52.26 53.28 47.46 49.03 52.19

EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99 0.97 0.99
8 1 P(Z1=0, 2=1) 0.68 0.58 0.44 0.69 0.65 0.81 2.55 0.69
P(Z1=1, 2=0) 38.08 37.71 36.67 38.51 39.15 37.60 37.73 38.50

EZ1/(EZ1+EZ2) 0.98 0.98 0.99 0.98 0.98 0.98 0.94 0.98


10 1 P(Z1=0, Z2=1) 0.69 0.56 0.44 0.72 1.07 2.55 3.09 0.73
P(Z1=1, Z2=0) 28.66 27.53 26.42 28.69 28.73 26.52 25.48 28.67

EZ1/(E Z1+EZ2) 0.98 0.98 0.98 0.98 0.96 0.91 0.89 0.98
12 1 P(Z1=0, Z2=1) 0.89 0.73 0.59 0.97 1.45 3.03 3.52 0.98
P(Z1=1, Z2=0) 20.87 20.84 19.93 21.27 22.55 16.56 15.44 21.29

EZ1/(E Z1+EZ2) 0.96 0.97 0.97 0.96 0.94 0.85 0.81 0.96
15 1 P(Z1=0, Z2=1) 0.95 0.78 0.69 1.00 1.57 3.54 3.71 1.00
P(Z1=1, Z2=0) 11.19 10.04 10.02 12.53 12.56 5.84 6.05 11.80

EZ1/(E Z1+EZ2) 0.93 0.93 0.94 0.92 0.89 0.62 0.62 0.92
20 1 P(Z1=0, Z2=1) 1.21 0.96 0.81 1.36 1.65 4.15 4.34 1.37
P(Z1=1, Z2=0) 5.86 4.66 4.36 4.78 4.71 1.01 1.03 4.78

EZ1/(E Z1+EZ2) 0.83 0.83 0.84 0.78 0.74 0.19 0.19 0.78
25 1 P(Z1=0, Z2=1) 1.41 1.05 0.90 1.59 1.77 4.23 4.43 1.60
P(Z1=1, Z2=0) 3.31 1.95 1.43 2.10 1.63 0.32 0.34 2.20

EZ1/(E Z1+EZ2) 0.70 0.65 0.61 0.57 0.48 0.07 0.07 0.58
30 1 P(Z1=0, Z2=1) 1.63 1.69 1.11 1.69 1.78 4.78 4.69 1.68
P(Z1=1, Z2=0) 1.35 0.80 0.66 0.86 0.64 0.22 0.22 0.89
EZ1/(E Z1+EZ2) 0.45 0.38 0.37 0.34 0.26 0.04 0.04 0.34

Table 10: Powers for tests T1- T8 when a = 15


n D Power T1 T2 T3 T4 T5 T6 T7 T8
5 1 P(Z1=0, Z2=1) 0.09 0.13 0.06 0.08 0.02 0.13 1.04 0.08
P(Z1=1, Z2=0) 74.66 71.40 44.91 74.53 76.45 71.84 70.72 74.77

EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99 0.98 0.99
8 1 P(Z1=0, Z2=1) 0.42 0.30 0.23 0.42 0.06 0.58 2.02 0.43
P(Z1=1, Z2=0) 72.61 68.39 66.3 72.92 72.89 72.01 71.57 72.58

EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99 0.97 0.99
10 1 P(Z1=0, Z2=1) 0.39 0.30 0.24 0.40 0.69 2.03 2.39 0.45
P(Z1=1, Z2=0) 65.76 60.06 59.49 67.51 67.57 67.44 62.07 67.23

EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.98 0.97 0.96 0.99
12 1 P(Z1=0, Z2=1) 0.47 0.38 0.36 0.51 0.88 2.50 2.98 0.51
P(Z1=1, Z2=0) 49.24 48.14 45.63 51.34 56.47 49.85 46.17 51.46

EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.98 0.95 0.94 0.99
15 1 P(Z1=0, Z2=1) 0.59 0.47 0.40 0.62 0.90 3.12 3.28 0.63
P(Z1=1, Z2=0) 43.97 41.05 40.10 42.88 44.48 40.84 40.74 43.86

EZ1/(E Z1+EZ2) 0.98 0.98 0.99 0.98 0.99 0.93 0.92 0.99
20 1 P(Z1=0, Z2=1) 0.73 0.58 0.49 0.77 1.28 3.92 4.15 0.71
P(Z1=1, Z2=0) 27.64 26.58 26.58 29.15 27.6 21.88 22.44 29.06

EZ1/(E Z1+EZ2) 0.97 0.97 0.98 0.97 0.95 0.84 0.84 0.98
25 1 P(Z1=0, Z2=1) 0.79 0.58 0.50 0.89 1.26 3.97 4.1 0.83
P(Z1=1, Z2=0) 20.25 18.63 17.31 20.87 19.52 6.59 5.76 21.15

EZ1/(E Z1+EZ2) 0.96 0.96 0.97 0.95 0.94 0.62 0.58 0.96
30 1 P(Z1=0, Z2=1) 0.97 0.75 0.6 1.08 1.95 1.93 4.68 0.89
P(Z1=1, Z2=0) 12.89 11.82 11.6 12.90 12.87 4.76 1.68 13.73

EZ1/(E Z1+EZ2) 0.93 0.94 0.95 0.92 0.86 0.71 0.26 0.94

Table 11: Powers for tests T9- T12 when b = 10


N D Power T9 T10 T11 T12 T9 T10 T11 T12
b = 10 b = 15
5 1 P(Z1=0, Z2=1) 0.66 0.69 0.48 0.43 0.85 0.82 0.82 0.83
P(Z1=1, Z2=0) 62.20 65.22 61.82 59.46 76.52 76.55 76.66 75.84

EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99
8 1 P(Z1=0, Z2=1) 0.69 0.75 0.67 0.62 1.42 0.86 1.42 0.88
P(Z1=1, Z2=0) 66.51 67.15 65.18 63.61 69.42 73.89 69.91 73.83

EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.98 0.99 0.98 0.99
10 1 P(Z1=0, Z2=1) 0.82 0.83 0.79 0.75 0.88 0.99 1.39 1.73
P(Z1=1, Z2=0) 68.27 69.73 67.62 62.52 74.51 74.57 74.43 74.42

EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.98 0.99 0.99 0.98
12 1 P(Z1=0, Z2=1) 0.97 0.99 0.97 0.93 1.51 1.88 1.47 2.50
P(Z1=1, Z2=0) 73.27 74.05 72.47 70.53 75.34 75.47 75.24 76.85
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.98 0.98 0.98 0.97
15 1 P(Z1=0, Z2=1) 1.20 1.57 1.15 1.14 2.62 2.09 1.59 2.83
P(Z1=1, Z2=0) 73.00 73.01 72.19 68.84 76.37 76.48 76.27 78.84

EZ1/(E Z1+EZ2) 0.98 0.98 0.98 0.98 0.97 0.97 0.98 0.98
20 1 P(Z1=0, Z2=1) 1.66 1.87 1.53 1.25 1.77 2.28 1.73 3.92
P(Z1=1, Z2=0) 69.80 70.11 68.36 64.02 72.15 72.60 72.64 72.86

EZ1/(E Z1+EZ2) 0.98 0.97 0.98 0.98 0.97 0.97 0.98 0.95
25 1 P(Z1=0, Z2=1) 2.49 2.60 2.11 1.83 3.69 3.16 1.99 3.97
P(Z1=1, Z2=0) 66.12 66.53 64.33 63.34 65.81 62.52 69.25 69.52

EZ1/(E Z1+EZ2) 0.96 0.96 0.97 0.97 0.95 0.95 0.97 0.95
30 1 P(Z1=0, Z2=1) 3.37 3.79 3.09 2.78 4.09 4.99 2.92 6.73
P(Z1=1, Z2=0) 61.66 62.23 60.34 58.26 22.92 60.82 63.87 57.83
EZ1/(E Z1+EZ2) 0.95 0.94 0.95 0.95
0.85 0.92 0.96 0.90

As far as the two sided tests (T9- T12) are concerned, the results presented in Table 11 reveal that
T10 performs better than other tests for b = 10 and n = 5, 8, 10, 12, 15, 20, 25 and 30 since it
correctly identifies outliers more often than other three tests. But it also declares more false
positives as compared to other three tests for b = 10 and for all values of n. The performance of
T12, for b = 10 and for all values of n, is very poor as it correctly detects less outliers as compared
to other three tests. But unlike T10, T12 declares the least number of false positives as compared to
the remaining tests for almost all values of n and b = 10. The ratio EZ1/(EZ1+EZ2) almost remains
the same for all four tests when b = 10.

For n = 5, 8 and 10, and b = 15, T10 performs better than the remaining tests as it correctly identifies
more outliers as compared to other three tests and also declares less false positives. But as the
sample size increases (n  12), the performance of T11 seems to be more promising as it correctly
identifies more outliers and at the same time, also declares less false positives. It also gives greater
ratio EZ1/(EZ1+EZ2) as compared to other tests. Hence the test T11 seems the best for b = 15 and
large sample size.

Table 12 and Table 13 present simulated powers for six (6) tests of which T18 is our proposed test.
The results in these two tables show that test T13 performs better than all other block tests as it not
only correctly identifies outliers more often than the other tests but also declares the least number
of false positives among all competitors for all n and both b = 10 and 15. Close competitor of test
T13 is T18 in all cases. Test T13 also gives highest EZ1 and the highest ratio EZ1/(E Z1+EZ2) as
compared to all other tests.
In order of performance, there are two situations (for both b =10 and 15):
1. T16 and T17 are the poorest among all block tests. Test T13 is the best, T14 is the second best
and T18 is the third best for d = 2 and 3 and for all n. T15 is only better than the other two
Dixon tests.
2. But, when d = 4 and 5, the proposed test, T18 performs better than T14 for all n.
Consequently, T13 is the best, T18 is the second best and T14 is the third best. T15 is at number
four.
Table 12: Powers for tests T13 - T18 when a = 10
N d Power T13 T14 T15 T16 T17 T18
5 2 P(Z1=0, Z2=2) 0.01 0.03 0.04 0.01 0.00
P(Z1=1, Z2=1) 0.05 0.14 0.14 0.28 0.16
P(Z1=2, Z2=0) 88.33 53.83 53.02 17.71 55.11

E(Z1) 176.71 107.80 106.18 35.70 110.38


E(Z2) 0.070 0.20 0.220 0.30 0.16
EZ1/(E Z1+EZ2) 0.99 0.99 0.9979 0.9916 0.99
8 2 P(Z1=0, Z2=2) 0.01 0.01 0.02 0.01 0.01 0.00
P(Z1=1, Z2=1) 0.07 0.72 0.33 0.40 0.68 0.55
P(Z1=2, Z2=0) 85.84 51.73 22.44 21.14 19.18 49.54

E(Z1) 171.75 104.18 45.21 42.68 38.36 99.63


E(Z2) 0.09 0.74 0.37 0.42 0.70 0.55
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.98 0.99
10 2 P(Z1=0, Z2=2) 0.01 0.02 0.03 0.01 0.01 0.02
P(Z1=1, Z2=1) 0.18 0.81 0.45 0.55 0.58 0.58
P(Z1=2, Z2=0) 82.72 43.12 32.46 14.42 12.36 39.93

E(Z1) 165.62 87.05 65.37 29.39 25.30 80.44


E(Z2) 0.20 0.85 0.51 0.57 0.60 0.62
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.98 0.97 0.99
12 2 P(Z1=0, Z2=2) 0.01 0.03 0.05 0.02 0.02 0.04
P(Z1=1, Z2=1) 0.30 1.02 0.44 0.58 0.84 0.66
P(Z1=2, Z2=0) 80.27 35.91 26.89 7.32 7.08 32.86

E(Z1) 160.84 72.84 54.22 15.22 15.00 66.38


E(Z2) 0.32 1.10 0.54 0.62 0.88 0.74
EZ1/(E Z1+EZ2) 0.99 0.98 0.99 0.96 0.94 0.99
15 2 P(Z1=0, Z2=2) 0.04 0.09 0.06 0.07 0.03 0.07
P(Z1=1, Z2=1) 0.49 1.21 0.58 0.59 0.87 0.75
P(Z1=2, Z2=0) 75.12 24.58 16.84 2.54 3.28 22.54

E(Z1) 144.73 50.37 34.26 5.67 7.43 45.83


E(Z2) 0.59 1.39 0.70 0.73 0.93 0.89
EZ1/(E Z1+EZ2) 0.99 0.97 0.97 0.88 0.89 0.98
20 2 P(Z1=0, Z2=2) 0.10 0.09 0.10 0.06 0.02 0.06
P(Z1=1, Z2=1) 0.74 1.39 0.65 0.65 0.94 0.88
P(Z1=2, Z2=0) 69.00 9.31 7.95 0.66 0.68 11.54

E(Z1) 138.74 21.01 16.55 1.97 2.30 23.96


E(Z2) 0.94 1.57 0.85 0.77 0.98 1.00
EZ1/(E Z1+EZ2) 0.99 0.93 0.95 0.7189 0.70 0.96
25 2 P(Z1=0, Z2=2) 0.15 0.16 0.12 0.12 0.07 0.11
P(Z1=1, Z2=1) 0.68 1.73 0.75 0.74 0.95 1.09
P(Z1=2, Z2=0) 62.48 3.71 1.41 0.38 0.44 5.70

E(Z1) 125.64 9.15 3.57 1.5 1.83 12.49


E(Z2) 0.98 2.05 0.99 0.98 1.09 1.31
EZ1/(E Z1+EZ2) 0.99 0.82 0.78 0.60 0.63 0.90
30 2 P(Z1=0, Z2=2) 0.27 0.22 0.13 0.20 0.09 0.13
P(Z1=1, Z2=1) 0.81 2.02 0.88 0.76 0.97 1.21
P(Z1=2, Z2=0) 54.90 1.14 0.83 0.20 0.28 2.39

E(Z1) 110.61 4.3 2.54 1.16 1.53 5.99


E(Z2) 1.35 2.28 1.14 1.16 1.15 1.47
EZ1/(E Z1+EZ2) 0.99 0.65 0.69 0.50 0.57 0.80
10 3 P(Z1=0, Z2=3) 0.0 0.01 0.01 0.01 0.01 0.00
P(Z1=1, Z2=2) 0.05 0.11 0.13 0.11 0.10 0.01
P(Z1=2, Z2=1) 0.64 0.92 0.43 0.51 0.51 0.59
P(Z1=3, Z2=0) 87.31 52.87 17.28 12.14 7.88 48.35

E(Z1) 263.82 159.6 52.83 37.55 24.76 146.24


E(Z2) 0.74 1.17 0.72 0.76 0.74 0.61
EZ1/(E Z1+EZ2) 0.99 0.99 0.98 0.98 0.97 1.00
12 3 P(Z1=0, Z2=3) 0.01 0.01 0.12 0.01 0.01 0.01
P(Z1=1, Z2=2) 0.06 0.13 0.14 0.11 0.12 0.02
P(Z1=2, Z2=1) 1.97 1.12 0.46 0.43 0.36 0.77
P(Z1=3, Z2=0) 85.89 47.22 9.89 5.17 4.13 43.42

E(Z1) 261.69 144.03 30.73 16.98 13.23 131.82


E(Z2) 2.12 1.41 1.10 0.68 0.63 0.84
EZ1/(E Z1+EZ2) .99 0.99 0.96 0.96 0.95 0.99
15 3 P(Z1=0, Z2=3) 0.04 0.05 0.05 0.05 0.04 0.01
P(Z1=1, Z2=2) 0.12 0.16 0.15 0.12 0.11 0.04
P(Z1=2, Z2=1) 3.33 1.28 0.59 0.50 0.40 0.75
P(Z1=3, Z2=0) 82.60 37.39 4.44 2.24 1.58 33.99

E(Z1) 254.58 114.89 14.65 7.84 5.65 103.51


E(Z2) 3.65 1.75 1.04 0.89 0.74 0.86
EZ1/(E Z1+EZ2) 0.98 0.98 0.93 0.89 0.88 0.99
20 3 P(Z1=0, Z2=3) 0.05 0.07 0.09 0.07 0.07 0.01
P(Z1=1, Z2=2) 0.20 0.22 0.21 0.14 0.12 0.08
P(Z1=2, Z2=1) 5.35 1.34 0.61 0.44 0.40 0.75
P(Z1=3, Z2=0) 77.31 22.75 1.29 0.48 0.46 20.50

E(Z1) 242.83 71.15 5.3 2.46 2.3 63.08


E(Z2) 5.90 1.24 1.3 0.93 0.85 0.91
EZ1/(E Z1+EZ2) 0.97 0.98 0.80 0.72 0.73 0.98
25 3 P(Z1=0, Z2=3) 0.11 0.10 0.11 0.11 0.10 0.00
P(Z1=1, Z2=2) 0.25 0.26 0.22 0.17 0.14 0.11
P(Z1=2, Z2=1) 6.49 1.63 0.76 0.53 0.46 0.92
P(Z1=3, Z2=0) 72.45 13.88 0.60 0.37 0.35 12.88

E(Z1) 230.58 46.16 3.54 2.34 2.11 40.59


E(Z2) 7.32 2.45 1.54 1.2 1.04 1.14
EZ1/(E Z1+EZ2) 0.96 0.94 0.69 0.66 0.66 0.97
30 3 P(Z1=0, Z2=3) 0.12 0.11 0.12 0.12 0.11 0.00
P(Z1=1, Z2=2) 0.39 0.33 0.23 0.19 0.16 0.13
P(Z1=2, Z2=1) 7.90 1.99 0.89 0.65 0.57 1.11
P(Z1=3, Z2=0) 67.17 6.17 0.34 0.31 0.29 6.18

E(Z1) 217.7 22.82 3.03 2.42 2.17 22.89


E(Z2) 9.04 2.98 1.71 1.39 1.22 1.37
EZ1/(E Z1+EZ2) 0.96 0.87 0.64 0.63 0.64 0.94
12 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=2) 0.18 0.23 0.24 0.23 0.21 0.24
P(Z1=3, Z2=1) 1.65 1.23 0.47 0.58 0.30 1.28
P(Z1=4, Z2=0) 86.89 50.92 13.49 10.25 3.32 52.41

E(Z1) 352.87 207.83 55.85 43.2 14.6 213.96


E(Z2) 2.01 1.69 0.95 0.74 0.72 1.76
EZ1/(E Z1+EZ2) 0.99 0.99 0.97 0.98 0.95 0.99
15 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=2) 0.26 0.27 0.26 0.23 0.22 0.29
P(Z1=3, Z2=1) 6.37 1.41 0.55 0.54 0.40 1.83
P(Z1=4, Z2=0) 85.72 48.73 6.60 3.61 1.30 49.94

E(Z1) 362.91 199.69 28.57 16.52 6.84 205.83


E(Z2) 6.89 1.95 1.07 1.00 0.84 2.41
EZ1/(E Z1+EZ2) 0.98 0.99 0.96 0.94 0.89 0.98
20 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.01 0.02 0.01 0.02 0.01 0.00
P(Z1=2, Z2=2) 0.33 0.32 0.29 0.33 0.23 0.37
P(Z1=3, Z2=1) 10.68 1.54 0.68 0.64 0.49 1.84
P(Z1=4, Z2=0) 83.30 35.30 2.01 0.81 0.58 39.18

E(Z1) 365.91 146.48 10.67 5.84 4.26 162.98


E(Z2) 11.37 2.24 1.29 1.36 0.98 2.58
EZ1/(E Z1+EZ2) 0.97 0.98 0.89 0.81 0.81 0.98
25 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.09 0.09 0.09 0.08 0.03 0.01
P(Z1=2, Z2=2) 0.65 0.36 0.32 0.27 0.24 0.42
P(Z1=3, Z2=1) 13.89 1.70 0.71 0.55 0.38 1.91
P(Z1=4, Z2=0) 80.57 21.22 0.83 0.54 0.39 22.99

E(Z1) 284.77 90.79 6.18 4.43 3.21 98.54


E(Z2) 15.46 2.69 1.62 1.33 0.95 2.78
EZ1/(E Z1+EZ2) 0.95 0.97 0.79 0.77 0.77 0.97
30 4 P(Z1=0, Z2=4) 0.03 0.02 0.01 0.01 0.00 0.00
P(Z1=1, Z2=3) 0.15 0.12 0.11 0.11 0.08 0.01
P(Z1=2, Z2=2) 1.19 0.44 0.34 0.30 0.26 0.44
P(Z1=3, Z2=1) 17.06 2.11 0.86 0.65 0.49 2.14
P(Z1=4, Z2=0) 77.16 12.95 0.51 0.43 0.28 13.76

E(Z1) 362.35 59.13 5.41 4.38 3.03 61.71


E(Z2) 20.01 3.43 1.91 1.62 1.25 3.05
EZ1/(E Z1+EZ2) 0.95 0.95 0.74 0.73 0.71 0.95
25 5 P(Z1=0, Z2=5) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=4) 0.02 0.03 0.04 0.05 0.01 0.00
P(Z1=2, Z2=3) 0.10 0.09 0.11 0.10 0.03 0.01
P(Z1=3, Z2=2) 0.60 0.23 0.20 0.17 0.12 0.28
P(Z1=4, Z2=1) 14.11 2.01 0.72 0.56 0.37 2.09
P(Z1=5, Z2=0) 81.13 31.55 1.18 0.75 0.48 37.23

E(Z1) 464.11 166.69 9.64 6.75 4.31 195.37


E(Z2) 15.69 2.86 2.9 1.4 0.74 2.68
EZ1/(E Z1+EZ2) 0.97 0.98 0.77 0.83 0.85 0.98
30 5 P(Z1=0, Z2=5) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=4) 0.03 0.04 0.06 0.07 0.05 0.04
P(Z1=2, Z2=3) 0.14 0.10 0.11 0.11 0.10 0.11
P(Z1=3, Z2=2) 1.12 0.33 0.21 0.18 0.15 0.41
P(Z1=4, Z2=1) 17.40 2.39 0.88 0.67 0.49 2.38
P(Z1=5, Z2=0) 77.74 21.50 0.63 0.52 0.30 22.52

E(Z1) 461.97 118.29 7.58 6.11 4.11 123.61


E(Z2) 20.18 3.51 1.87 1.64 1.64 3.69
EZ1/(E Z1+EZ2) 0.96 0.97 0.80 0.79 0.76 0.97

Table 13: Powers for tests T13 - T18 when a = 15


n D Power T13 T14 T15 T16 T17 T18
5 2 P(Z1=0, Z2=2) 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=1) 0.03 0.09 0.11 0.27 0.10
P(Z1=2, Z2=0) 93.99 74.78 62.23 32.12 75.78

E(Z1) 188.01 149.65 124.54 64.51 151.66


E(Z2) 0.03 0.09 0.11 0.27 0.10
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99
8 2 P(Z1=0, Z2=2) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=1) 0.07 0.49 0.29 0.59 0.63 0.29
P(Z1=2, Z2=0) 95.92 77.44 63.03 51.41 45.49 72.72

E(Z1) 191.91 155.37 126.35 102.87 91.61 145.73


E(Z2) 0.07 0.49 0.29 0.59 0.63 0.29
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99
10 2 P(Z1=0, Z2=2) 0.01 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=1) 0.10 0.52 0.30 0.63 0.49 0.27
P(Z1=2, Z2=0) 91.59 73.94 44.44 44.29 40.00 67.01

E(Z1) 183.28 148.4 89.15 89.01 80.49 134.29


E(Z2) 0.12 0.52 0.27 0.43 0.49 0.27
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.98
12 2 P(Z1=0, Z2=2) 0.01 0.01 0.00 0.00 0.00 0.00
P(Z1=1, Z2=1) 0.15 0.65 0.35 0.64 0.47 0.35
P(Z1=2, Z2=0) 90.11 69.74 37.13 34.85 33.55 61.80

E(Z1) 180.37 140.13 74.61 70.15 67.57 123.95


E(Z2) 0.17 0.67 0.35 0.45 0.47 0.35
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.98
15 2 P(Z1=0, Z2=2) 0.02 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=1) 0.15 0.84 0.45 0.70 0.43 0.48
P(Z1=2, Z2=0) 87.75 62.81 27.36 23.18 23.23 53.59

E(Z1) 175.65 126.46 55.17 46.76 46.89 107.66


E(Z2) 0.19 0.84 0.45 0.40 0.43 0.48
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.98
20 2 P(Z1=0, Z2=2) 0.03 0.01 0.00 0.00 0.00 0.00
P(Z1=1, Z2=1) 0.31 1.01 0.55 0.86 0.31 0.58
P(Z1=2, Z2=0) 84.49 47.40 14.87 10.70 9.43 41.47

E(Z1) 169.29 95.81 30.29 21.76 19.17 83.52


E(Z2) 0.37 1.03 0.55 0.36 0.31 0.58
EZ1/(E Z1+EZ2) 0.99 0.98 0.98 0.98 0.98 0.98
25 2 P(Z1=0, Z2=2) 0.05 0.06 0.03 0.02 0.01 0.02
P(Z1=1, Z2=1) 0.33 1.30 0.58 0.90 0.31 0.64
P(Z1=2, Z2=0) 80.79 36.11 9.81 5.48 4.54 32.30

E(Z1) 161.91 73.52 20.2 11.36 9.39 65.24


E(Z2) 0.43 1.42 0.64 0.44 0.33 0.68
EZ1/(E Z1+EZ2) 0.99 0.98 0.96 0.96 0.98 0.98
30 2 P(Z1=0, Z2=2) 0.06 0.06 0.04 0.02 0.01 0.02
P(Z1=1, Z2=1) 0.43 1.45 0.65 0.99 0.38 0.70
P(Z1=2, Z2=0) 76.11 24.29 5.42 2.62 1.81 23.15

E(Z1) 152.65 50.03 .49 5.73 4.01 47.00


E(Z2) 0.55 1.57 0.73 0.53 0.40 0.74
EZ1/(E Z1+EZ2) 0.99 0.97 0.94 0.92 0.91 0.98
10 3 P(Z1=0, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=2) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=1) 0.17 0.49 0.20 0.52 0.52 0.26
P(Z1=3, Z2=0) 93.68 76.21 40.39 35.00 25.80 72.28

E(Z1) 288.17 224.44 81.66 70.76 52.25 217.36


E(Z2) 1.35 2.46 1.14 1.16 0.83 0.26
EZ1/(E Z1+EZ2) 0.99 0.98 0.98 0.98 0.98 0.99
12 3 P(Z1=0, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=2) 0.01 0.00 0.00 0.00 0.01 0.00
P(Z1=2, Z2=1) 1.02 0.71 0.34 0.43 0.47 0.42
P(Z1=3, Z2=0) 92.87 75.56 32.10 26.64 22.81 69.12

E(Z1) 280.66 231.1 96.98 80.78 69.38 208.20


E(Z2) 1.04 0.71 0.34 0.43 0.49 0.42
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99
15 3 P(Z1=0, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=2) 0.11 0.10 0.10 0.09 0.04 0.00
P(Z1=2, Z2=1) 1.81 0.89 0.35 0.33 0.38 0.52
P(Z1=3, Z2=0) 91.45 71.42 22.28 16.58 13.46 62.48

E(Z1) 278.08 216.14 67.64 50.49 41.18 188.48


E(Z2) 2.03 1.09 0.55 0.51 0.46 0.52
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99
20 3 P(Z1=0, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=2) 0.13 0.10 0.11 0.10 0.10 0.00
P(Z1=2, Z2=1) 3.20 1.14 0.44 0.26 0.26 0.58
P(Z1=3, Z2=0) 88.89 62.52 12.07 6.26 5.07 51.69

E(Z1) 273.20 189.94 37.20 19.4 15.83 156.23


E(Z2) 3.46 1.34 0.660 0.46 0.46 0.58
EZ1/(E Z1+EZ2) 0.98 0.99 0.98 0.97 0.97 0.99
25 3 P(Z1=0, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=2) 0.16 0.14 0.12 0.11 0.11 0.01
P(Z1=2, Z2=1) 4.18 1.42 0.47 0.25 0.23 0.64
P(Z1=3, Z2=0) 85.84 52.14 6.17 3.58 2.14 43.51

E(Z1) 266.04 159.4 19.57 11.43 6.99 131.82


E(Z2) 4.50 1.70 0.71 0.51 0.45 0.66
EZ1/(E Z1+EZ2) 0.98 0.98 0.96 0.96 0.94 0.99
30 3 P(Z1=0, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=2) 0.19 0.16 0.13 0.13 0.12 0.02
P(Z1=2, Z2=1) 5.39 1.61 0.27 0.19 0.22 0.70
P(Z1=3, Z2=0) 82.88 42.62 3.43 1.21 0.87 35.37

E(Z1) 259.61 131.24 11.56 4.52 3.4 107.53


E(Z2) 5.77 1.93 0.83 0.61 0.56 0.74
EZ1/(E Z1+EZ2) 0.97 0.98 0.93 0.88 0.86 0.99
12 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=2) 0.02 0.03 0.00 0.00 0.00 0.03
P(Z1=3, Z2=1) 1.92 0.75 0.30 0.46 0.45 0.85
P(Z1=4, Z2=0) 93.54 75.20 30.65 22.30 16.40 78.79

E(Z1) 379.96 303.11 123.5 90.58 66.95 317.77


E(Z2) 1.96 0.81 0.30 0.46 0.46 0.91
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99
15 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=2) 0.04 0.06 0.01 0.00 0.00 0.05
P(Z1=3, Z2=1) 2.52 0.98 0.31 0.49 0.31 0.99
P(Z1=4, Z2=0) 92.43 73.42 21.46 14.11 10.27 79.90

E(Z1) 377.36 302.74 86.74 57.91 42.01 322.67


E(Z2) 2.6 1.10 0.33 0.49 0.31 1.09
EZ1/(E Z1+EZ2) 0.99 0.99 0.99 0.99 0.99 0.99
20 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=2) 0.05 0.08 0.02 0.01 0.00 0.07
P(Z1=3, Z2=1) 3.92 1.35 0.33 0.52 0.18 1.42
P(Z1=4, Z2=0) 89.98 71.55 10.31 5.22 3.47 73.36

E(Z1) 371.78 290.41 11.34 22.46 14.42 297.84


E(Z2) 4.02 1.51 0.37 0.54 0.18 1.56
EZ1/(E Z1+EZ2) 0.98 0.99 0.97 0.97 0.98 0.99
25 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=2) 0.11 0.09 0.03 0.01 0.01 0.06
P(Z1=3, Z2=1) 5.06 1.60 0.39 0.61 0.17 1.63
P(Z1=4, Z2=0) 87.27 63.21 5.05 1.90 1.27 64.12

E(Z1) 364.48 257.72 21.39 8.25 5.61 261.49


E(Z2) 5.28 0.128 0.41 0.23 0.19 1.75
EZ1/(E Z1+EZ2) 0.98 0.99 0.98 0.97 0.97 0.99
30 4 P(Z1=0, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=3) 0.01 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=2) 0.15 0.05 0.04 0.02 0.02 0.06
P(Z1=3, Z2=1) 6.46 1.99 0.47 0.36 0.11 1.94
P(Z1=4, Z2=0) 84.32 55.46 2.57 0.93 0.57 60.23

E(Z1) 356.97 227.91 11.73 4.8 3.21 246.86


E(Z2) 6.79 2.09 0.51 0.36 0.31 2.06
EZ1/(E Z1+EZ2) 0.98 0.99 0.96 0.93 0.91 0.99
25 5 P(Z1=0, Z2=5) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=3) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=3, Z2=2) 0.16 0.04 0.11 0.01 0.01 0.16
P(Z1=4, Z2=1) 5.42 1.84 0.29 0.36 0.13 1.92
P(Z1=5, Z2=0) 87.87 70.05 5.22 0.84 1.24 74.13

E(Z1) 461.16 358.03 27.59 5.67 6.75 378.81


E(Z2) 5.8 2.12 0.51 0.38 0.15 2.24
EZ1/(E Z1+EZ2) 0.98 0.99 0.98 0.94 0.98 0.99
30 5 P(Z1=0, Z2=5) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=1, Z2=4) 0.00 0.00 0.00 0.00 0.00 0.00
P(Z1=2, Z2=3) 0.01 0.00 0.00 0.00 0.00 0.00
P(Z1=3, Z2=2) 0.17 0.05 0.03 0.01 0.01 0.07
P(Z1=4, Z2=1) 6.85 2.30 0.45 0.19 0.26 2.13
P(Z1=5, Z2=0) 85.16 63.66 2.57 1.87 0.51 65.24

E(Z1) 453.73 327.65 14.74 10.14 3.62 334.93


E(Z2) 7.22 2.4 0.51 0.21 0.28 2.27
EZ1/(E Z1+EZ2) 0.9843 0.9927 0.9665 0.9797 0.9282 0.99

Examples for Illustration Purpose

In this section, we present two examples to illustrate how the above tests work for detecting one
or more outliers.

Example 1
Consider the following data containing n = 10 observations drawn from Cauchy distribution:
4.859363, 5.116336, 5.120637, 5.257150, 6.055268, 6.207880, 6.777558, 6.912309, 9.847006,
25.000201.
When the data is plotted, the data plot indicates that the largest observation is a possible outlying
observation. The QQ- plot also reveals that the extreme upper observation is a possible outlier. We
test the last observations for upper outlier by applying the tests used to detect single upper (lower)
outlier.
Figure 1: QQ-plot for data in example 1 Figure 2: Data-plot for data in example 1

Table 14 presents the calculated values for the tests (T1- T8) calculated from the above data.
According to the results in Table 14, the largest observation is declared as an outlier by T1, T4, T5
and T8 at 10% level of significance but not at 5% and 1% significance levels. The remaining tests
(T2, T3, T6 and T7) failed to declare the largest observation as an outlier at any significance level.
Table 14: Calculated values and critical values of Tests (T1- T8)
Test Calculated Critical value
value 10% 5% 1%
T1 0.752 0.737 0.847 0.946
T2 0.762 0.821 0.899 0.965
T3 0.763 0.856 0.919 0.972
T4 2.763 2.730 2.807 2.840
T5 6.658 6.428 7.688 9.088
T6 21.890 30.457 62.085 310.258
T7 20.018 30.110 61.442 307.197
T8 12.187 10.266 20.545 100.643

Let us explore some facts about the above sample containing 10 observations. In this sample, 9 of
the observations were generated from Cauchy distribution with  = 10 and  = 1, whereas, one
observation was simulated from Cauchy distribution with  = 20 and  = 1. Here a = 10 (amount
of shift) and d (number of outliers) = 1. Here, in fact, the largest observation is an outlier from
different Cauchy distribution. The same observation is not only declared as an outlier by T1, T4, T5
and T8 but the various graphs also detected it as an outlier. These results are very much in
agreement with the results we found from power comparisons based on Table 9.

Example 2
Consider the following data containing n = 25 observations drawn from Cauchy distribution:
13.19576, 13.96621, 14.28533, 14.52214, 14.52261, 14.56306, 14.59187, 14.60249, 14.68260,
14.70289, 14.70781, 14.93240, 15.04186, 15.14828, 15.31356, 15.45453, 15.46260, 15.64192,
16.92745, 17.20134, 17.41059, 20.23015, 30.50170, 31.15637, 35.24563.
When the data in example 2 is plotted, the plot indicates that the largest three observations are
possible outlying observations. The QQ- plot of the data in the example also reveals that the last
three ordered observations are possible outliers. Based on the results presented in Table 15, the
largest three observations are clearly declared as outliers by test T13 at all levels of significance,
that is, 10%, 5% and 1% levels of significance.
Tests T14 and T18 declare these three observations as outliers only at 10% but not at 5% and 1%
level of significances. The remaining three tests (T15, T16 and T17) failed to declare the largest three
observations as outliers at any significance level. Test T13 is probably the most likely to detect
three outliers as compared to all other block tests.
Figure 3: QQ- plot for data in example 2 Figure 4: Data plot for data in example 2
Table 15: calculated values and critical values of Tests (T13 - T18)
Test Calculated Critical value/Decision
value 10% 5% 1%
T13 7.665 6.226 6.533 6.952
T14 17.874 17.562 20.116 23.006
T15 0.680 0.833 0.906 0.968
T16 0.706 0.896 0.943 0.980
T17 0.716 0.919 0.956 0.985
T18 34.807 33.802 62.979 278.831

Let us explore some facts about the above sample data containing 25 observations. In this sample,
the first 22 ordered observations were generated from Cauchy distribution with  = 20 and  =
1, whereas, the last three ordered observations were generated from Cauchy distribution with  =
30 and  = 1. Here a = 10 and d = 3. Here, in fact, the largest three observations in the data are
outliers from different Cauchy distribution. The same observations are not only declared as an
outlier by T13, T14 and T18, but the graphs also detected them as outliers. These results are very
much in accordance with the results we found from power comparisons based on Table 13 and
Table 14 which clearly indicate that T13 is the most powerful test as it gives maximum probability
of detecting three outliers as compared to other tests.

Conclusion
In the present study, we considered several tests used for detecting one or more outliers. We
proposed and introduced some more tests for the detection of outliers.
A simulation study was carried out to compare the performance of all the tests considered in this
study. Among the tests, T1 is found to be the winner for a = 10 but for a = 15, T8 is the winner.
Among two sided tests, we found that T10 and T12 performed well identifying correctly outliers
more often and declaring less false positives respectively for b = 10. Whereas, for b = 15, test T11
is found to be the best when the sample size becomes large.
We examined the performance of six block tests in this study. We have observed that the
performance of these tests was different in different scenarios. Test T13 is found to be the champion,
T14 is the runner up and T18 is the third best for d = 2 and 3 and for all n. In case of d = 4 and 5, our
proposed test, T18 performed better than T14 for all n.
We also presented two examples to illustrate how these tests work. We artificially generated data
from Cauchy distribution in each example and were contaminated with one and three outliers
respectively from two different Cauchy distributions. The tests were applied and their calculated
values were computed from the sample data in each case. The results found in the examples
matched well with the simulation results.

References
 Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data, Third edition, John Wiley
& Sons, Colchester, England.
 Beckman, R. J. and Cook, R. D. (1983). Outlier…..s, Technometrics, 25, 119-149
 Childs, A. M. (1996). Advances in statistical inference and outlier related issues. PhD
Thesis. Open Access Dissertations and Theses. Paper 3693.
 Dixon, W. J. (1950). Analysis of extreme values, Annals of Mathematical Statistics, 21,
488-506.
 Fernando V., Surendra P. V. and Mirna G. (2000). Comparison of the Performance of
Fourteen Statistical Tests for Detection of Outlying Values in Geochemical Reference
Material Databases, Mathematical Geology, 32(4), 439-464
 Fung, K. Y. and Paul, S. R. (1985). Comparisons of outlier detection procedures in Weibull
or extreme-value distribution, Communications in Statistics-Simulation and Computation,
14, 895-917.
 Grubbs, F. E. and Beck, G. (1972). Extension of sample sizes and percentage points for
significance tests of outlying observations, Technometrics 14; 847-854.
 Van der Loo, M. P. J. (2010). Distribution Based outlier detection in univariate data,
Statistics Netherland, Henri Faasdreef, 312, 2492 JP The Hague.
 Verma, S. P. and Quiroz-Ruiz, A. (2006). Critical values for six Dixon tests for outliers in
normal samples up to sizes 100, and applications in science and engineering, Revista
Mexicana de Ciencias Geologicas, 23(2), 133-161.

S-ar putea să vă placă și