Anderson Distribution of The Correlation Coefficient

Distribution of the Serial Correlation Coefficient Author(s): R. L. Anderson Reviewed work(s): Source: The Annals of Mathematical Statistics, Vol.
13, No. 1 (Mar., 1942), pp. 1-13 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/2236157 . Accessed: 10/01/2012 04:15
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of Mathematical Statistics.
http://www.jstor.org
COEFFICIENT DISTRIBUTIONOF THE SERIAL CORRELATION BY R. L. ANDERSON NorthCarolinaStateCollege

1. Introduction. The problem of serial correlation was brought to the attention of statisticians by Yule in 1921 [9]. Both Yule and Bartlett [2] have shown that the ordinary tests of significance are invalidated if successive observations are not independent of one another. The serial correlation coefficient has been introduced as a measure of the relationship between successive values of a variable ordered in time or space. Interest in the serial correlation problem was stimulated further by the new concepts of time series analysis discussed by Wold [8]. We shall define the serial correlation coefficient for lag L and N observations to be
IRN
=
LCN
VN
XI XL+I + X2 XL+2 +
2;X! _(2;Xi)2
+ XN XL, (-Xi)
IN
IN
where C and V are the covariance and variance respectively and the X's are considered to be independently normally distributed about the same mean with unit variance.! If the population variance were known a priori, the variates could be transformed so that they would have unit variance; under such an unusual circumstance, the only distribution required would be that of the serial covariance. Tintner has given a test of significance for the serial covariance [6] and for the correlation coefficient [7] by using a method of selected items. The author has presented the distribution of the serial covariance and of the serial correlation coefficient not corrected for the mean in a recent doctoral thesis [1]. The distributions of LRNnot corrected for the mean will be mentioned in the sections which follow. 2. Small sample distributions for lag 1. W. G. Cochran has suggested that we use a result given in his article on quadratic forms to derive the distributions of the serial correlation coefficient for small samples [3]. If X1, X2, ...* XN are independently normally distributed with variance 1 and mean 0, then "Every quadratic form 2aijXiX is distributed like
E XkUk, k-1
where r is the
rank of the matrix, A, of the quadratic form, the u's are independently distributed as x2, each with 1 d.f., and the X's are the non-zero latent roots of the characteristic equation of A" [3, p. 179]. If each Xi appears ki times as a latent root, ui will be distributed as x2 with k, degrees of freedom.
1 This circular definition of the serial correlation coefficient was suggested by H. Hotelling.
R. L. ANDERSON
If we set L = 1 in the above definition of the serial covariance, we note that the characteristic equation of 1CN is
a,
aN 1FN
a2 a3 a4
...
a2
a3
...
.
aN
aNi1
a,
a2
0, a,
where a, = -(X + 1/N), a2 = aN = (N - 2)/2N, and all other a's = -1/N. The determinant can be evaluated by the method of circulants. We find that
,FN(X)
=
N
{
N
aiw(A},
ti=l
where C0k iS the kth root of unity.
Hence,
N-1 E
R=1
FN =
Since
N/1\ X-(K + N) +
2N
(k
)k
Nii=3
(Wk
+ 1 + c),
for
k #N
((N -3),
(Wk
fork =N
N-1i2
N-1 lFN =
KR1
II {-Xk
+ Wk)/22 =
k
t
II K=1
=-X + cos
= 0.
Hence
Xk
=cosN,
27rk (k = 1, 2
N 1), and
i(N-1) R-1
E:
ikUk,
for N odd, for N even,
1CN =
XkiUk-U,
where uk is distributed as x2 with 2 d.f. and u with 1 d.f. At the same time, we note that VN = 2T(Xi - X)2 is distributed as x2 with N - 1 d.f. The general procedure in deriving the distribution of iRN is as follows: We determine the joint density function of the u's which form the distributions of 1CN(= 1RN- VN) and VN . The u's are integrated out, leaving the joint density function of iRN and VN. The distribution of iRN is obtained by integrating with respect to VN from 0 to oo. As examples, derivations of the distributions of 1R6and 1R7have been included. In order to simplify the results, the first subscripts have been dropped from iRN. Distribution of R6. R6V6 = XlUl + X2U2 - u and V6 = Ul + u2 + u, where and i u, and u2 are distributed as x2 with 2 d.f. and u with 1 d.f. and X1 = is Hence the density function of the u's X2 - -. D(ui,
U2,
u) = (4\/2-r)-lu-
e-
SERIAL CORRELATION COEFFICIENT
Since u, = [V6(R6 U2 =
X2) +
u(1 + X2)I/(X1
-
X2) and
[VT(1 - R)
u( + X1)]/(X1-2),
R6)/(1 + Xi) for -1 < R6 < X2. After integrating with respect to u between these limits and then with respect to V6 from 0 to oo, we obtained the following density function for RB:
u must vary between 0 and V6(X1 + X2) and V6(X1 V6(X2 - R6)/(l
for R8)/(1 + X1) X2< R6 S XIand between
R8) V(X1~3 I V(1 D(R6) = lAe) = 2j I/.(1+ + X) (X1-2)
forX2 ?R6 <X1

+
X2) \/(l
V/(X,-Re)
X1) (X1 -
V(Xs-Re)
+ X2) (X2 X1)
for-1
< Re < X2.
The cumulative probability function has the same general form:

_____________+
(X2- R'
+ V/(1 X2)(X2 X1)
for - 1 < R' <
X2
P(Re R') = >
~(1+ XI) (XI- X2) (X1- R)

V/(1 + X1) (X1-X2)
for X2< R' <X

U3,
and + of Distribution R7. R7V7= XIUI+ X2U2 X3u3 V7 = ul + U2 +

where each u is distributed as x2 with 2 d.f. Hence,
V7(R7 X2)+ U3(X2- X3)
(X1- X2)
, For X2 < R7 < X1 0 <
V7(X2 derived
adI
V7(Xi- R7) - u3(X1- X2)
R7)/(X2 the following

f
U3 < V7(Xi- R7)/(XI - X3); for X3 < R7 < X2, X3) < u3 < V7(Xi - R7)/(Xl - X3). Using these limits, we density function for R7:
(X1 R7)
(Xi-X2) (Xi-X3)
(X2- R7)
(X2-X1) (X2-X8)
forX3 <
R7<X2
Xi. 2
D(R7)
2.
(X1- R7) ~ (X1- X2) (X1-X)

The cancels
< 7< forX2 R7

coefficient
orX2<
is similar, except that the cumulative function probability is raised by one. and the exponent of each numerator
General formulas for N odd. VN for N odd is
It appears that the density function for
RN and
D(RN, VA) = KV,Nwhere ai

=
eivN
(A,
RN)i(N-5)/ai
1/K =
for
21(i)r[1(N
Xm+i
< RN < Xm 2 This
I'
(X
X,) for j 0
i and
3)].
2 Note
that we are omitting the lag subscript from IRN.
R. L. ANDERSON
formula holds for N = 5 and 7; we will show that it holds for N + 2, assuming it true for N. If we set k = 2(N + 1), RN+2VN+2 = RNVN + Xkuk and VN+2 =
VN + Uk
; hence,
RN =
(RN+2VN+2
VN+2 -
XkUk) Uk
and
VN
VN+2 -
Uk.
If we make the substitution and RN+2 is

4KV N+2
e
Uk = ukVN+2,
m *=1
the density function for
Uk,
VN+2,
[(X;
RN+2)
Uk(X
.2
-x.3
-t .$ -7 -s a-.3
-.3 -.
. J
FIG. 1
In order to obtain the distribution of VN+2 and RN+2 , we must integrate out The limits of integration differ for different values of m. We note that
Uk = (RN
-
Uk .
RN+2)/(RN
Xk)2
0 when Xk < RN < Xm+i, since Xm+l < RN+2 < Xm and uk except that uk can not be negative. For RN+2 > Xk , Uk < 1; hence, if RN is replaced by a larger (smaller) quantity, uk will be larger (smaller). For m = 1(X2 < RN+2 < X,), we need to consider only that region for which In this region, 0 < ua < (X, - RN+2)/(Xl - Xk) and the density X2 < RN < X1. function of RN+2 and VN+2 is
SERIAL
CORRELATION
COEFFICIENT
4(VN+2)(Xl-
-/a1 RN+2)
e-'VN+2/2i(N+1) where O(VN+2)= Vj(+A-1) For m = 2(X3 < RN+2
. r[(N consider
1)] and a
=
j=2
(X
-)
<
X2-
X2), we must
two regions
in the RN plane.
When X2 < RN < X1,

RN+2 < Xk
U < -
X2
X xi
RN+2 Xk
Xk).
and when X3 <

density functions
RN
< X2, 0 < Uk < (X2 - RN+2)/(X2 for these two regions, we find that
2
If we combine the
D(RN+2, VN+2)= 4(VN+2)
i=l
(Xi-
RN+2)(N$) /a/ regions.
for X3< RN+2< X2.
Similar results can be obtained for the other Finally we conclude that for N odd,
D(1RN) =2(N
and
- 3)
(Xi -
lRN)i(N5)/ai
for
Xm+l
?< 1RN
<
Xm
P(1RN > R') =

*=1
i(N-1)
(Xi - R
X), i
$
)i(N-3)/ai
for Xm+l< R' < Xm

density function for N odd and
where
ai =
I'
j=1
(Xi-
j.
The general mean is [1]

Xi)N*)/a
1RN not
corrected
for the sample

I(N-1)
D(,RN) =2(N
(N
_1)
2)
(1RN -
for
Xm
1RN<
-, Xm
where a =
that that
II' (Xi Xi) V(i 1=1
),
j.
for the mean No general except
General formulas for N even.

the same formulas in this case ci
=
I(N-2)
Using the same method as above, we can show
hold for N even and 1RN corrected
fi'
j=1
(Xi
X2)V(X,
1), j # i.
formulas
were derived
for N even
and iRN not corrected
for the mean. function
for lag 1. The simultaneous 3. Large sample distributions density of C and V, where we will drop the subscripts is for convenience,
D(C, V) = (2X)-2 f
fo
q(s,
00
t)e-8ctV
ds dt.
+(s, t) = K1
...
e- ?dXj dX2 ... dXN I
R. L. ANDERSON
...
where 0 = I + XNX1 - (ZXX)2/N]I 2t[2(Xi - X)2] - 2s[XlX2 + and s and t are pure imaginaries. 4(s, t) = A-, where A is the determinant of the quadratic form 0. This determinant was evaluated by the method of circulants; we found that A = N-1 k-1
11
1 - 2(t +
SXA)},
where Xk
2Kij
cos 27rk/N.
Set K = log 0(s, t) =

X-1
tij!.
If K is expanded in series, we find that Ki

1). For N > i, we might indicate these Z
Kol = =
m!2mE
k-1
Xk,
where m = (i + j =
summations:ZXk
2k=
K21 =
-1,
Klo =
kXk -
'(N-2),
-1.
Ko2
Hence
E(C) = -1,
= aV= 2(N - 1), Kil = pa,v - 2), K12 = -8, etc. 4(N If we let C' = C + 1 and V' = V -(N will remain unchanged except that Ko = Kol
(N - 2),
? = (3N -8) and = -1, = E(V) = (N - 1), c2 = = -8, Ko3 = 8(N - 1), -2, K30
0.
1), all of these semi-invariants Since R = CIV,
1 \ N N-1,
C'(N -1) + V' [V' + (N -1)](N -1)

C'(N -1) + V' J (_1)P Vt \) }
- 1), E(R - R)2 of If we neglect termns order less than 1/N, E(R) =-1/(N (N -2)_Rk and E(R -)k = O for k > 2. For N < 75, a more exact approxi(N 1)2, mation may be desired. If the above approximation is used, iRN is normally distributed with mean - 1/(N 1) and variance (N - 2)/(N - 1)2. The single-tail significance points can be found by substituting in the formulas
-
-1 -1 1.645V/(N - 2)
or1
2.326V\/1(N
2)
Refer to Fig. 2 for a comparison of the exact distribution and the normal approximation for N = 15. I have included the graphs of the exact distributions for N = 6 and 7 in Fig. 1. We might note a few comparisons between the approximate significance points and the exact ones:
Positive tail N Exact 5% Approx. Exact 1% Approx. Exact 5% Approx. Exact Negative tail 1% Approx.
45 75
0.218 0.173
0.223 0.176
0.314 0.250
0.324 0.255
-0.262 -0.199
-0.268 -0.203
-0.356 -0.276
-0.369 -0.282
was For iRN not correctedfor the mean, it was found that y = A! N 'V 1 + 2 RNwa asymptoticallynormallydistributedwith mean 0 and variance 1 [1]. 4. Significance points of iRN. An exampleof the methodsusedin tabulating these significancepoints has been presentedin the author'sdoctoralthesis [1]. The significancepoints for the values of N enclosed in parentheseshave been obtainedby graphical interpolation. Note that N is the number observations of (see Table-I). ~~~~~~~I I 14~~~~~~~~~~~~~~~~~~~~~~~~~~1
1J.3 ;
Alormal
Ippr~zf
*f sraoof,,
A,*
5~~~~/ I
1.3I
I
i<-
~~~~~~~
^to- -8 .
-A-.
3 -z
-J
J.235
.$
.s.
FIG. 2
5. Dist-ributions for general lag, L.
(a) Introduction. For a general lag, L,
the constants in the characteristicequation for the covariance LCN are a, = and all othera's =1/N. Hence -('A + 1/N), aL+l = aN-L+l = (N -2)/2N
the characteristic equation is
N-1
LPN =
II
[Ak
k=1
COS(27rLk/N)]
= 0.
Certainimportantgeneralizations concerningLFN may be set down:

1. When L is not a factor of N or has no common factor with N, LFN FN . 2. when L and N have a commonfactor, a, LFN = tNlc,N0'( A- 1) 2a. If a = L) LFN = ( IFp)L( where p =NL. _)L-1, The proof of the first statement was suggested by Cochran. Since cos (a + 2a7r)= cos a, wherea is any integerwe must prove that the series of numbers
R. L. ANDERSON
L, 2L, ...,
(N-1)L,
when reduced modulus N can be arranged to form the series 1, 2, * @, (N -1). This proof can be found in most books on the theory of numbers; e.g. [4]. Hence we conclude that each term of the sequence {cos (27rLk/N)} reduces uniquely TABLE I
Positive tail
N 5% 1% 5% 1%
Negative tail
5 6 7 8 9 10 11 12 13 14 15 20 25 30 (35) (40) 45 (50) (55) (60) (65) (70) 75
0.253 0.345 0.370 0.371 0.366 0.360 0.353 0.348 0.341 0.335 0.328 0.299 0.276 0.257 0.242 0.229 0.218 0.208 0.199 0.191 0.184 0.178 0.173
0.297 0.447 0.510 0.531 0.533 0.525 0.515 0.505 0.495 0.485 0.475 0.432 0.398 0.370 0.347 0.329 0.314 0.301 0.289 0.278 0.268 0.259 0.250
-0.753 0.708 0.674 0.625 0.593 0.564 0.539 0.516 0.497 0.479 0.462 0.399 0.356 0.325 0.300 0.279 0.262 0.248 0.236 0.225 0.216 0.207 -0.199
.
-0.798 0.863 0.799 0.764 0.737 0.705 0.679 0.655 0.634 0.615 0.597 0.524 0.473 0.433 0.401 0.376 0.356 0.339 0.324 0.310 0.298 0.287 -0.276
1), when L/N to one of the sequence {cos (2xrk/N)} for k = 1, 2, -, (Nis a prime fraction. If L and N have a common factor, a, L = qa and N = pa, where p and q are integers prime to one another. Hence, F=fl
LFN
~
k=1 =
Xk -O
cos__
7rqk', _= 11
p }
1k
-
CS
cos
p)
(X
cos2r)-
k=1(
(,Fp)a(X
LFN
-1)a1-_
O. where p NIL.
If a = L,
(Fp )L(
1)L-1,
SERIAL
CORRELATION
COEFFICIENT
When these results are applied to the large sample distribution of LRN, we find that it is independent of L. For the more important case in which p = NIL, the semi-invariants Kii for C and V are exactly the same for all L with a given N. We see that
p-1
KL =
-4L
k_1
E log {1 - 2(t +
sXk)}
(L -
1) log 1i -
2(t + s)},
where
p-i
Xk =
cos (27rk/p). Hence,
Kij
m!2m{ Xf(E' + 1) - 1}.
But
E k=1 Ki/S
Xk+ 1 is always 0 or a multiple of p when p > i; therefore, the p's cancel

Ki1
and
K30 =
is the same for all p or for all L, since L

For example
N/p.
2(N -
When p < i, the

1) for p = 2 and
will not be equal for all p. 2(N - 4) for p = 3.
K2o =
(b) Distributions of LRN when N/L = p. These results indicate that the distributions of the serial correlation coefficients for which the number of observations is divisible by the lag, so that N/L = p, would include the distributions of all the serial correlation coefficients regardless of the values of N and L. We will designate any lag L as the primary lag for a given N if N/L = p, an integer. For example, 2R6and 41?have the same density function, but we will derive only the density function for lag 2, which we will call the primary lag. The case of p = 1 is trivial, since it involves correlating a series with itself. To date, we have derived the exact density functions for p = 2 and p = 3 and the required integrals for p = 4. The significance points have been tabulated in Table Ir. For simplicity of notation, we will set LRN = LRPand
VN = V.
Case p = 2(N = 2L). LR2V = -U1 + U2 and V = ul + u2, where ul is distributed as x2 with L d.f. and U2 as x2 with L - 1 d.f. Hence,
DL(U1
,
U2) =
K(U
)i(L-2)
(U
i(L-3)
- 1)]eVf2. After substituting ul = V(1 - LR2)/2 where 1/K = 2L-lr(IL)r['(L and U2 = V(1 + LR2)/2 and integrating with respect to V from 0 to 0, we have
D(LR2)
=(-
LR2)
2L-1#[FL,
(1
+ 2(L
LR2)1(L-3 1)]
If we set (1
LR2) = 2y, then the cumulative probability function is
P(LR2>
R') =
4(L-1 L)3[L,
1)dy.
((L-R
Y)
Pearson has tabulated the values of these incomplete Beta functions [5]. In 1)], where x = 2(1 - R'). For LR2 not corhis notation, P = Ix[4L, '(Lrected for the mean, P = I2(2L, 'L) [1]. -u1 Case p = 3(N = 3L). LR3V= + u and V = ul + u, where ul is 2 ditrbta21 distributed as X with 2L d.f. and u with L - 1 d.f. Therefore, DL(Ul, U)=
10
KUi1u1(L,3)
R. L. ANDERSON
ul
= 2V(1 - LR3)/3 and u = V(1 + 2LR3)/3and integratingwith respect to V fromO to oo, we find that
D(d?3)
2L(1
-
where 1/K = 21(3L-1)r(L)r[ (L -
1)leV12. After substituting
LR3)
3'r--'1) If we set x = 2(1
(1 + 2LR3) 3[L, <(L-1)]

=
R?-
R')/3, P(LR3 > R')
IJL, 1(L
1)]. For
LR3 not
cor-
rected for the mean, P = IL, IL]. Case p = 4(N = 4L). LR4V= -u2 + U4and V= u2 + u4+ u, whereu2 is distributedas x2with L d.f., U4 with L - 1 d.f. and u with 2L d.f. The density function of the u's is DL (U2 U4 u) = KuI(L-2)uI (L-3) L-le-V/2 where 1/K = 2'(4L-1)r(4L)r[I(L- 1)]r(L). Since u4 = [V(1 + LR4) - u]/2 and U2 = u]/2, 0 < u < V(1 - LR4) for LR4 ? 0 and 0 < u < V(1 + LR4) [V(1 - LR4)for R4 < O. For R4 2 ?, D(dR4)=
Ml2 f| [V(1 + LR4)u(L3)[V(1
-
L -u]R4
u'du.
is For LR4 < 0, D(LR4) the same except that the upper limit for the integralis If we make the substitutiony = u/(upper limit) in each case and V(1 + LR4). then integratewith respectto V from 0 to *, we have these density functions:
(1 + iR43L-3)
f
yL- (1 - y)I(L-3)[(1 - LR4) -y(l
+ LR4)](2) dy,
for d?4 < 0,

D(L1B4) =k
(1
-
3R4)
YL-1 (1 _ y)I(L-2)[(1 + LR4)
-y(l
LR4)]
dy,
for LR4 ?0, The probability integrals must be evaluated for each L. The cumulative probabilityfunctionsfor L = 2 and 3 are:
P(LR4>
where k = rPL(4L - 1)]/21(2L-3) *r(L) *r('L) * rF[(L - 1)].
R')
1-
f(1+ f')512-R'312(5 + R')/ /h l(1 + R')5/ ,

-
for R' > 0 forfR'< 0O
P(LR4 >
R')
R1 (
(1
forR' >- 0, R')912- (-R'/2)5'2(22R'2 + 36R'+ 126), for R' < 0. R') I
Since the density functions are much simplerfor R' > 0 when L is odd and for R' < 0 when L is even, we have derived only these significancepoints for L > 3 and interpolatedfor the intermediate points. It was noted that the significancepoints approachthose given in Table I for the first lag. For these comparisons,see Table III below. Note that for L > 7 the 5% points are almost identical and the 1% points are nearly accurateto two decimal places.
TABLE II Significance points of LRNfor p

p=2 (ATN2L) L' Positive tail 5% 2 3 4 5 6 7 8 9 10 12 14 16 18 20 25 30 40 50 0.805 0.729 0.664 0.612 0.571 0.536 0.507 0.483 0.462 0.428 0.399 0.376 0.357 0.340 0.308 0.282 0.247 0.222 1% 0.960 0.907 0.852 0.802 0.759 0.721 0.688 0.659 0.634 0.590 0.554 0.523 0.498 0.476 0.432 0.398 0.348 0.314 Negative tail 5% 1%
=
2 and 33
p=3 (N=30) Negative tail 5% -0.496 0.474 0.439 0.406 0.377 0.354 0.334 0.316 0.301 0.276 0.256 0.240 0.227 0.215 0.193 0.176 0.153 -0.136 1% -0.50 0.496 0.480 0.461 0.440 0.420 0.402 0.387 0.373 0.347 0.326 0.308 0.293 0.280 0.254 0.234 0.205 -0.184
Positive tail 5% 0.488 0.447 0.406 0.373 0.346 0.324 0.306 0.291 0.278 0.256 0.239 0.225 0.213 0.202 0.182 0.167 0.146 0.131 1% 0.762 0.677 0.610 0.559 0.518 0.485 0.457 0.433 0.413 0.380 0.353 0.332 0.314 0.298 0.268 0.245 0.212 0.191
-0.99 -1.00 0.928 0.994 0.848 0.950 0.773 0.902 0.712 0.856 0.662 0.812 0.620 0.774 0.585 0.739 0.554 0.708 0.505 0.656 0.612 0.467 0.577 0.436 0.410 0.546 0.520 0.389 0.347 0.469 0.317 0.431 0.273 0.374 -0.243 1 -0.335
TABLE JJI4 Significance points for p

Positive tail L N 5% Exact Table 1 Exact 2 3 4 5 6 7 8 12 16 20 24 28 0.373 0.353 0.325* 0.301 0.281* 0.264 0.371 0.348 0.322 0.299 0.280 0.264 0.618 0.547 0.490* 0.451 0.419* 0.392 1% Table 1 Exact 5% Table 1 Exact
=
4
Negative tail 1% Table 1
0.531 -0.653 -0.625 -0.818 -0.764 0.505 0.528 0.516 0.692 0.655 0.451 0.466 0.447 0.604 0.580 0.402* 0.432 0.409 0.543* 0.524 0.404 0.365 0.363 0.497 0.482 0 .380 -0.338* 0.337 -0.460* -0.448
I L is the lag and p = NIL. 4 * indicates interpolated values.
12
R. L. ANDERSON
Case p > 4. We have not set up any of the density functions for p > 4; however, it appears that the significance points given for lag 1 would be accurate enough for the higher lags. The exact significance points for lag 2 have been derived for p = 5 and 7. The reader may note the close approximation -given by the significance points for lag 1 when p = 7. We hope to check the lag 1 approximation for other lags in the near future. TABLE IV Some significance points for lag 2
Positive tail
5%
Negative tail
1%
5% 1%
j
p =
5 (N = 10)
0.540 0.525 -0.417 -0.564 -0.595 -0.705
. Exact. Approx...........
0.342 0.360
p = 7 (N = 14) Exact. Approx.. 0.335 0.335 0.482 0.485 -0.479 -0.479 -0.616 -0.615
7. Summary. 1. The exact and large sample distributions have been derived for the serial correlation coefficient for lag 1 and the exact significance points tabultaed for N, the number of observations, up to 75; for N > 75, the large sample approximations can be used. 2. It has been noted that the distributions for any lag L are the same as those for lag 1 when L and N are prime to each other. In general the distribution of the serial correlation coefficient can be derived for any L and N by using only those distributions for which L is a factor of N. The distributions and significance points have been derived for N/L = p = 2,3 and 4. For p> 4(N> 4L), the significance points given for lag 1 probably can be used when L is greater than 4 or 5. The accuracy of this approximation has been checked for lag 2. 3. These significance points should be useful in determining the methods of studying a time series, as suggested by Wold, and in the formulation of a better test of the significance of regression coefficients when we know that the observations are correlated in time. In addition, we now have a method of testing our assumptions of independence for any set of data.
REFERENCES [1] R. L. Serial Correlationin the Analysis of Time Series, unpublished thesis, Library, Iowa State College, Ames, Iowa, 1941. [2] M. S. BARTLETT, "Some aspects of the time-correlation problem in regard to tests of significance," Roy. Stat. Soc. Jour., Vol. 98 (1935), pp. 536-543.
ANDERSON,
SERIAL
CORRELATON
COEFFICIENT
13
[31W. G. COCHRAN, "Distribution of quadratic forms in a normal system with applications

[4] L. E. to the analysis of covariance," Camb.Phil, Soc. Proc., Vol. 30 (1934), pp. 178-191. ModernElementary Theoryof Numbers, U. of Chicago Press, 1939. (Editor), Tables of the IncompleteBeta-Function, Cambridge U. Press, 1934. [6] G. TINTNER, "Tests of significance in time series," Annals of Math. Stat., Vol. 10 (1939), p. 141 ff. [7] G. TINTNER, The Variate Difference Method, Principia Press, Bloomington, Indiana, Appendix 5B, 1940. [8] H.. WoLD,A Study in the Analysis of Stationary Time Series, Almquist and Wiksells Boktryckeri A. B., Uppsala, 1939. [91G. U. YULE, "On the time-correlation problem," Roy. Stat. Soc. Jour. Vol. 84 (1921), pp. 496-537.
DICKSON, [5] KARL PEARSON

Anderson Distribution of The Correlation Coefficient

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Anderson Distribution of The Correlation Coefficient

Încărcat de

Drepturi de autor:

Formate disponibile

Distribution of the Serial Correlation Coefficient Author(s): R. L. Anderson Reviewed work(s): Source: The Annals of Mathematical Statistics, Vol.

COEFFICIENT DISTRIBUTIONOF THE SERIAL CORRELATION BY R. L. ANDERSON NorthCarolinaStateCollege

where C0k iS the kth root of unity.

for N odd, for N even,

SERIAL CORRELATION COEFFICIENT

for R8)/(1 + X1) X2< R6 S XIand between

R8) V(X1~3 I V(1 D(R6) = lAe) = 2j I/.(1+ + X) (X1-2)

forX2 ?R6 <X1

< Re < X2.

The cumulative probability function has the same general form:

for - 1 < R' <

P(Re R') = >

~(1+ XI) (XI- X2) (X1- R)

for X2< R' <X

and + of Distribution R7. R7V7= XIUI+ X2U2 X3u3 V7 = ul + U2 +

V7(R7 X2)+ U3(X2- X3)

V7(Xi- R7) - u3(X1- X2)

R7)/(X2 the following

(X1- R7) ~ (X1- X2) (X1-X)

< 7< forX2 R7

General formulas for N odd. VN for N odd is

It appears that the density function for

D(RN, VA) = KV,Nwhere ai

< RN < Xm 2 This

that we are omitting the lag subscript from IRN.

If we make the substitution and RN+2 is

the density function for

e-'VN+2/2i(N+1) where O(VN+2)= Vj(+A-1) For m = 2(X3 < RN+2

When X2 < RN < X1,

and when X3 <

D(RN+2, VN+2)= 4(VN+2)

RN+2)(N$) /a/ regions.

for X3< RN+2< X2.

P(1RN > R') =

for Xm+l< R' < Xm

The general mean is [1]

for the sample

II' (Xi Xi) V(i 1=1

General formulas for N even.

Using the same method as above, we can show

hold for N even and 1RN corrected

and iRN not corrected

for the mean. function

e- ?dXj dX2 ... dXN I

Set K = log 0(s, t) =

If K is expanded in series, we find that Ki

1), all of these semi-invariants Since R = CIV,

C'(N -1) + V' [V' + (N -1)](N -1)

SERIAL CORRELATION COEFFICIENT

5. Dist-ributions for general lag, L.

(a) Introduction. For a general lag, L,

Certainimportantgeneralizations concerningLFN may be set down:

5 6 7 8 9 10 11 12 13 14 15 20 25 30 (35) (40) 45 (50) (55) (60) (65) (70) 75

cos (27rk/p). Hence,

m!2m{ Xf(E' + 1) - 1}.

Xk+ 1 is always 0 or a multiple of p when p > i; therefore, the p's cancel

is the same for all p or for all L, since L

When p < i, the

will not be equal for all p. 2(N - 4) for p = 3.

LR2) = 2y, then the cumulative probability function is

where 1/K = 21(3L-1)r(L)r[ (L -

1)leV12. After substituting

where k = rPL(4L - 1)]/21(2L-3) r(L) r('L) * rF[(L - 1)].