Sunteți pe pagina 1din 41

Biometrika Trust

On the Systematic Fitting of Curves to Observations and Measurements


Author(s): Karl Pearson
Source: Biometrika, Vol. 1, No. 3 (Apr., 1902), pp. 265-303
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2331540 .
Accessed: 18/06/2014 18:33

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

http://www.jstor.org

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
VOLUME I APRIL, 1902 No. 3

ON THE SYSTEMATICFITTING OF CURVES TO


OBSERVATIONSAND MEASUREMENTS.

BY KARL PEARSON, F.R.S., UniversityCollege,London.

CONTENTS.
Pae
Note.
Introductory 266
Theorem.
(1) General If the valuesof the constants
of a curvebe found by the
methodofmomenits, thefitwillbe good. . .. . . 267
oftheareaand moments
(2) On thediscovery ofa curvegivenbya seriesofisolated
observations.. ..272
(3) Comparison ofmoments 279
in a specialcase of methodofleastsquaresand method
(4) On the discovery of the area and momentswhernthe data consistof the
frequencies withiin
falling certainelementaryranges. . . . . . 282
(5) l1ustrationI. To fita curveof type

y=yO(1+a)m 1 - ai)

of moments
Calculation and constantsforthe fecundity of brood-
distribution
mares. 289
(6) Illustratiot1. To fita curve of type

y = yO(1+ Pe a.
Observations of Thiele. 292
(7) Illustration11. To fita curveof type
y=a sin(nx-+a),
whena partonlyoftherangehas beenobserved.
or a sine-curve,
Observations ofChree. 295
(8) illustrationIV. To fitMakeham's statistics.
curveto mortality 298

Biometrika
I 26

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
266 On the Systematic
Fittingof Curves

IntroductoryNote.

ONE of the most frequient tasks of the statistician,the physicist,and the


engineer, a seriesof observations
is to represent or measurements bya conciseand
suitableformula.Sucha formula mayeitherexpressa physicalhypothesis, oron
theotherhandbe merely empirical,i.e.it mayenableus to represent
by a fewwell
selectedconstantsa wide rangeof experimnental data. In the
or observational
lattercaseit servesnotonlyforpurposesof interpolation, but frequentlysuggests
newphysicalconceptsorstatistical constants.
In anygivencase theformula
or curveto be fitted
to thedata is:
(i) Directlygiveni
byphysicaltheory;
(ii) Chosenon thebasisofa physicalhint;
(iii) Althoughpurelyempirical, of goodnessof fit
suggestedbyexperience
in likecases;
(iv) Quiteunknown and to be chosensolelybyexamination
ofthematerial.
Now,as I hopeto indicatein thispaper,halfthe difficulty of curve-fitting in
practicelies in thechoiceof a suitablecturve.Especiallyin Case (iv) it is onlya
veryconsiderable experiencein curve-fittinig
that can lead to a suitablechoice
amongall the possiblealgebraic,exponentialand trigonometrical curvesthat
suggestthemselves.
The hastyassumption
of somephysicists
and manyengineers
thata parabola
oftheform
y = co+ c1x+ c2x + c33 +
is alwaysa goodthingis to be deprecated,
as maybe seenat onceby considering
whata poorfitis obtainedin thiswayto materialreallyexpressedbysuch curves
as
y = yoe-cS y = yosin nx, y (x + c) = b2,etc.
To assume a curve of this formwe must showthe rapid convergency throughout
the proposedrange of the MaclaurinExpansion,and this is not alwaysfeasible.
The presentpaper does not concernitself withthe choice of suitable curves,
but onlywiththe determination of the constants,whenthe formof the curvehas
been selected. This I readilyallow to be the easier halfof the task.
So farI have not,however,been able to findany systematictreatiseon curve-
fitting. It is usually taken forgrantedthat the rightmethodfordeterminingthe
constantsis the methodof least squares. But it is leftto the unfortunate physicist
or engineerto make the discoverythat the equations for the constantsfoundin
this mannerare in nine cases out of ten insoluble,or a solutionso laboriousthat it
cannotprofitably be attempted.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
Biometrika,Vol. 1. Part 111. Plate Il1.

FIG. 1. FIo. 2.
First Embryo: 72 Hours ofIncubation. Second Embryo: 96 Hours of Incubation.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 267

The present paper endeavoursto indicate a systematicmethod for fitting


curves. It is not claimedforit:
(i) That it will succeed in giving the values of the constantsin every
conceivablecase.
I can onlysay that afteran experienceof some eightyears' use by my fellow-
workers,studentsand myselfwe have foundit applicable to a vastrangeof physical
and statisticaldata.
(ii) That it will give absolutelythe "best" values of the constantsin all
cases.
I endeavourto showthat it must give good values. The definitionof "best
fit" is more or less arbitrary,
and forpractical purposes,I have found that with
due precautionsas to quadrature,it gives, when one can make a comparison,
sensiblyas good resultsas the metho(dof least squares.
Finally it is an advantage to have a systematicmethodof approachingcurve-
fittingproblems,which at any rate gives practicallyexcellent values for the
constantsin a verygreat numberof cases in whichthe methodof least squares is
admittedlyof no serviceat all.

(1) General Theorem.

A seriesof measurements or observationsof a variabley havingbeenmade,corre-


spondingto a seriesof values of a secondvariablex, it is requiredto determinea
good methodoffitting or empiricalcurvey = / (x, cl, c2,C3, C.), where
a theoretical .. .

C1, C2, CS, ... c,4are arbitraryconstants,to theobservationsfor a givenrange 21 of the


variablex.
Such problems in curve-fittingrecur with great frequencyin physical,
biological arid statisticalinvestigations. The usual theoreticalrule is to lusethe
method of least squares, but if the constantscl, c2, c,, ... c,, are inivolvedin a
complex mannerthe equations obtained by the methodof least squares become
unmnanaoeable, and we findphysicistand statisticianremarkingthat " the increased
accuracyof the resultobtainableby least squares would not be an adequate return
forthe labour involved,"and then fallingback on somiemoreor less questiornable
processofdeterminingthe constants.This processmaybe graplical or arithmetical,
but it is usually unsystematicin characterand elastic in result. The object of the
present paper is to give a systematicmethod of fittingcurves to observations,
whichI have reasonableground forconsideringa good one, and whichat arnyrate
fora greatvarietyof problemsleads us to easy and simple results.
The assumnption to be made solelyfor the proof,but not in practice,is that
y= (x) can be expanded by Maclaurin'sTheorem,and that the resultingseries
convergesfairlyrapidly. Let the expansionbe:
26-2

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
268 On the SystematicFitting of Curves

y= ( (0) +4' (0) + 42"' (0) +l xI2


84"'(0) +
1.2 1.2.

=-o + alx + a2 2 + a3.2 3 + *.,say.

Here ao, al, a., ... etc. will be functions of the n constants cl, c2, ... cn of the curve.
Hence theoreticallywe can findthe n c's in termsof ac,,a,, a2, ... an_i. We should
thus be able on substitutionto expressa", an+i, ... etc. in termsof ao, a,, ... an-i.
Now considerthiefirstn a's as the constantsof our curveand it will be expressible
in the form:
xI.
y= ao+ oilx+
,zn-I
12 + ***+atn-lt
f-

In
+ +r (ot, a,, of.,... a,,-,)

+ etc............... (i).
Next let y' be the ordinatecorresponding to x given by observation,theny -y
will be the distance between the theoreticaland observedcurves at the point
corresponding to x, and our object is to make the values of thiisas small as possible
a
by proper choice of ao , a2, ... an-,. This may be done by the methodof least
squares or making
f (y - yl)2 dx = a miniinum.
This obviouslygives a verv good method,if not the "best," a term incapable of
definition. The resultingequation,since y is the variable,is
f(y - y') Sydx = 0 ........... (ii).
Now Sy depends on the variation of ao, ... an, or

Sy= Sao + &alx + a y+ca l23 + ** + a-l

+ (n a+dd &aI ? ... +da,ls^l

da. dal +f
da-1 I
nl)
(dqbn+ls ____a+

+ etc.

=Sao (I dc d4+1 1 +

+d in+ daoIn+1

Xn2(2+d- n+1 + )

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 269

= BO {1+ do%j(Ed))}

+ 1x +?a-(~
8aa d($(ox))}

+ etc ...................(iii),

where 9 lies between 0 and 1, and f9'(Ox) = R, say,representsthe remailnder

aftern termsofMaclaurin'sexpansion.

it becomes
Substitutingin (ii) and rearranging,

{f y)
Y (I + daR)dx} 8ac

+- YJY-Y)(12 +daj) dxj ua2

+ II(Y-Y) (12 3 rdaR ^'

But the quantitiesaO,al, a2, ... an_lare at our choice,and thereforeto satisfythis
equationl,theirvariationsmulstbe independently zero. Thus we have the following
equationsto finsda0, al, . .. a-

x. dR\2 I
y- y') (x + dx \= 0)
.I0
B),..........
t (iV).

|(Y-Y')(l 2 3+ da) =0

Now let A be the area, A,a1,A1.2, A,a3, A1.4, ... etc. be the first,second,
third,fourth,etc. momentsof the theoreticalcurve,and A' be the area, A'th1',
A',2', A/#'p~ A',u4',... the like moments for the observation curve, mnoments
being taken round the axis of y (which is of course any axis). Then the above
equationstmaybe written

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
270 Fittingof Curves
On theSystematic

A =A' f(y_y')Adhx

dx dR
/L1J\YYJdaX
A1u= A'lUA' Y-Y')

A,2uA'=.' -2 "y-y'')RdR
dax

AIa\ _
,u
AIf= Y _ ) dx
............d..
dR
Y (v).

A/X
= A'4l - |4(YY
- dR dx
"~dR

............................................. -d dR
-
A,u_l = A'P'n-l-n (- Y')dRdj

Now the integraltermin these equations mustclearlybe small beeause

(i) It involvesthe small factory - y'.


$n
(ii) R, the remainder= - n (Ox) will by hypothesisbe small,if n is at all

considerable. Hence neglectingthe integralterms,we find


A=A'

.(vi)S v i.
.... ..... ............. ( I .

n= / n-i

Or, the constantsof the theoreticalcurve are to be foundby equating its area
and firstn -1 momentsto the area and firstn -1 momentsof the observedcurve.
These results having been obtained we may at once replace a,, a, qc, ... a,"_ by
the real constantscl, c2, c8, ... cn of the theoreticalcurve,and we obtain the rule:

To fit a good theoreticalcurvey = 4 (x, cl, c, cs, ... c.) to an observed


curve,
expressthearea and moments of thecurvefor thegivenrangeof observation in
termsofC1,c2,C3,.. c,, and equatethesetothelikequantities for theobservations.
The momentsmay be taken abotutany axis parallel to y likelyto simplifythe
results,e.g. the mid-verticalofthe range or in othercases the centroidvertical.

Returningto equations(v) we see that the solution(vi) is even moreapproxi-


mate than might at firstsight be imagined. For if we renderidenticalthe first

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 271

n - 1 momentsof the two curves,the higher momentsof the curves becomeipso


facto moreand morenearlyidenticalthe largern may be. But such a termas
JI- ) dR dx
dax
vanishesif the highermomentsare equal; forwe may write
acn Xn+1
R = o n (0) + I_ n+1(0) +

and accordingly
J dR dn (?) 1

+ d+"+' (0) 1
(Aon+l - A'/'n+1)

1
~d4pn+2
+4 (0) - A/kn2
(A/pkn2
+ etc.
Thus if A =A', we have the factors (n -un'), (/n- I'n+l)- (/n+2 -/'n+2), etc.

Thus besides the smallness of the factors jn?) on+ , ... etc., dependingon

the hypothesisof convergencyin Maclaurin's expansion,we have the smallness


of the factors n - ttn', /n+1 -u'n+l ... depending on the fact that if n-I
momentsof a curve are equal, the succeeding ones will be nearly equial.
We conclude accordinglythat equality of momentsgives a good method of
fittingcuirvesto observations. It is this method of momentswhich I ventnire
to suggest as a good systematicprocess, preferableto those in ordinaryuse
when the methodof least squares is too laborious or impracticable,for deter-
mining the constantsof emnpirical or theoreticalcurves fromobservation. This
is reallythe methodwhich has been long in constantuse forfittingthe normal
curveof errorsy = y0e-020 to observations;it,has been largelyadopted by myself
in fittingskew frequencycurves to observations*;and it becomes identicalwith
that of least squares when we fit parabolic curves of any order to observations.
It is then no approximation,but the accurate solution,for the expansion by
Maclaurin'sTheoremis finite.
One great advantage of the method,as will be illustratedbelow, is that it
enables us to determinein manycases the whole of the theoreticalcurve fron
a part,if the observationscan onlybe made along a portionof the range.
There are threeessentialsto the applicationofthe method:
(a) We must be able to ascertainthe momentsof the theoreticalcurve in
termsof cl, c2, CS, c"C.
..

* Phil. Trans., Vol. 186, A, pp. 343-414.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
272 On theSystemticFittingof Curtew
(b) We mnust
knowhow to findthe momentsofany systemof observations.
(c) The expressionsfor the momentsin terms of cl, c,, c., ... C,, must be
such that we can easily solve the systemof equations (vi).
I proposenow to considerthese pointsin some detail,startingwiththe second.

(2) Onthediscoveryof the area and momens of a curvegpien by a series of


isolatedobservations.
The isolated observationsmnay
be of two kinds:
(a) Actual measurements
may have beenmade of theordinatesof the curve
at p points.
This is the mnost
commoncase in physicalinvestigations, but it is not infreqllent
in economicand actuarial enquiries,e.g. the mean age of bridegroomforbrides of
a givenage,or the mean numberofyearsof insuranceof those that die at a certain
age.
(b) The actual measurementsmay representthe areas for certain base
p in numnber,
elements, of a given curve.
This latter is the usual case of frequencyobservations. We determinethe
numnber of individualcases whichfallinto each of a small seriesof rangesof somne
vital or economicvariable,e.g. the numberof deaths,which undercertaincircum-
stancesoccur in each year of life,the numberof in(dividualswhichfallinto each
small ranigeof a particularorgan or character,etc. This is the type of data on
which " frequencycurves" are based. Actually (b) would sensiblycoincide with
(a) if we took our elementaryranges for classification,extremelysmall. This,
owing either to roughnessand paucity of data, or to the immense labour in-
volved,is veryoftenpracticallyimpossible. Not uncommonly (a) is used for (b),
and for a great majorityof cases the work is close enouighforthe value of the
observations. But for fine and importantwork it is desirable to keep the two
classes ofcases essentiallydistinct.

CASE (i). p ordinatesof a curveare observedor meawsured,


tofindits area and
moments.
What we requireis clearly
A =p, yxndx,

froma knowledgeof z = ye at p points. The answerto this problemis familiar,


and consistsin the choice of a good quadratureformula. Whetherwe are dealing
with the ordinates y simnply, or the more complex momnents, yxe, will make
theoreticallyno difference, except that in the latter case we may have to go to
higherdifferencesforthe purposeofreachingaccuracy.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 273

If I venturehere to deal at some lengthwithquadratureformulae, it is because


the choice of a good fornmula is essential to the application of the method of
moments. At the sarnetime,although much will be familiar,there are new and
novel points to which I want to draw special attention. For this portionof my
paper I am chieflyindlebte(dto the kindness of Mr W. F. Sheppard of Trinity
College, Cambridge. I told him that I wanted the best correctionalterms for
the tangentialaind chordal areas,arndthe working-oult of the systemof formulve
is entirelydue to him.

An area may be looked at as given in two ways: (i) by extremeordinates,or


byz0,z,, z2, z3, ... zp and the
we willrepresent
(ii) bymid-ordinates.The former
latter b)yz4, zj, z-, ... zp-,, zp-j. These ordinateswill be supposedtaken at equal
distances,h, and forthe purposes of practical calculation,h can nearlyalways be
taken as our horizontalunit. We have thus the twoschenmes:

oc ZI 1 ,11 2 1Z\ 11

< ?-x0 ->-


. > 0><, > xp

For these cases we have respectivelythe Euler-Maclaurinformulae

fPzdx = AC + h(,yA2+ry'A2yA3+ 3 4L- 4 +


(z0+z)............(a),

Jxo zdx = AT- h (7y'z - y2'A2 + y3A3-_ y4z4 + ..)


(zi + zp4).....(8

Here A, h(IzO+ z, + Z2 + + Zp1 + i Zp),

and A T-h (zi + z r zi + -


-.-Z + zp_j),

are respectivelythe areas of the chordal and tangentialseries of trapezia. Thus


the formule(a) and (p3) give the correctionswhich are respectivelyto be added
and subtractedfromwhat we may term for brevitythe chordal and tangential
areas in orderto obtainthe curvedarea.

A operatingon zp and zp_ must be taken backwards,i.e.


In the above formuloe
=
Azp zp,- zp and Azp1- zp_- z, while
=

Azo=z1-zo, Azi=z1-zi.
BiometrikaI 27

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
274 On theSystematic
Fittingof C1urver
The values of the coefficients
ryare as follows*:
Yi = 083,3333 7y' =041,6667
72 = 041,6667 72 = 041,6667
73 = *026,3889 73 =038,7153
1y4= 018,7500 =4 =035,7639

y5 = 014,2692 5 =*033,1918
ey7= 011,3674 76' =030,9989
77 = *009,3563 y77=*029,1253
7Y8 = 007,8925 7Y8 =027,5110
79 = *006,7858 9 =*026,1066
7y,o=005,9241 7io 024,8732
,yll= 005,2367 = 025,7807
yr11
Y12= 004,6775 7Y12 = 022,8052
713= *004,2150
714 = *003,8269

Now the Euler-Maclaurinformulepossess markedmeritsand defects:


(a) The correctiontermsbeing usually small, they equally weightall the
observationsin the bulk AO anidAT of the formulat. This is of muchimportance
whenthe observationsare liable to considerableerror.
(b) They will give the best possibleresultsif we go to the completesystem
ofdifferences
forthe p ordinates.
But:
(c) To do this involvesin most cases verygreat labour. The coefficients
'y
do not convergeveryrapidly,and the A's in many practical cases, especially of
frequency,do not get rapidlysmall.
(d) If we stop at the third or fourthdifference,
then the tycoefficients
are
not the best coefficients
by which to multiplythe successivedifferences,
but the
best coefficients
differconsiderablyfromthese ifp be not verylarge.

In order to get over (c) a number of formulaehave been uisedwhich depend


upon the numberof ordinatesused being a multiple of 2, 3, 4, 6, etc. Thus we
have the followingrules:
Simpson'sRule (2p elements),

f2Pzdx= h z + 2 (z + z4+ z..*


+ ) + z2p

+ 4(z1 + z8 + + z,l) ....................


* Calculated fromthe formula given by De Morgan; Differentialand
Integral Calculus, Art. 61,
p. 262.
t Except in the case of the firstand last ordinatesof Ac, which clearlycan only be given half
weight.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 275

Newton'sRule (3p elements),

Jpzdx= h {z0 +3 (z, + z2++z5+z7+z8+ )+3p

+ 2(z3 + z6 + ... + z3p_3)}. ............(.).

Boole's Rule* (4p elements),

J 4P
zdx = -2-h
{7zo + 14 (z4 + z8 + ... + z4p-4) + 7z4p
xo
+ 32 (Z1+ z3+,5+z., ++z4,-1)

+ 12(z2+z6 z10+zo +Z4p-2)} .


..............(6).

Ieddle's Rule (6p elements),

JZd 3h fzO
+ Z2 + Z4 + Z8 + ZO + + Zp-2 +z6p

+ 2 (z6 + Z12 + + z66)

+ a (Z1 + Z5 + Z7 +. + Z6P-1)

+ 6 (z3 + z9 + z15 +*+ z6p-3)} .


................().

All these rules give with increasingexactness the value of the integral,but
theysufferurnderobviousdisadvantages:
(a) The numberof elementscannot oftenbe selectedbeforehand, and if for
exampletherebe 7 or 11 or 13 a new rule has at once to be workedout.
(b) The multiplyingdifferentordinates by differentfactors is a source
fruitful
of arithmnetical
slips.
(c) The multiplyingof certainordinatesby factorsmuchlargerthanothers,
multipliesthe errormade in the determinationof certain ordinateslargely. We
do not give equal weightto all the ordinates.

Thus, while formulaelike (e) or (') give extremelygood results,especiallyfor


the integrationofcontinuousmathematicalfunctions, and this withless woork than
(a) or (/3).theydo not seem advantageousforwhatwe maytermobservation-curves.
AccordinglyMr Sheppard has determinedtthe best coefficients forthe corrections
to the chordaland tangentialareas when one, two or three differencesonly are
ised. He has providedthe followingquadrature formulewhich seem to me of
muchinterestand practicalvalue.

* I do not know who originatedthis rule; it is given in Boole's Finite Differences.


t Mr Sheppard,since this memoirwas written,has given the proofsof his formule,L. Math. Soc,
Proc., Vol. 32, p. 270.
27-2

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
276 On theSystematic
Fittingof Cuirves
Case (i). Bounding ordinatesor chordal area known.
(a) One Difference:

Area = A0 + (-1
2 z -Z) - (z - zp,)} h..............................().

If we take p/(p - 1) to be approximatelyunity,this formulareduces to (a)


retainingonlythe firstdifference.

(b) Two Differences:

Area = Ac + 120( P(1)(p- 2) {(zi-zo)-(zp-zp_,)1 h


1 p p(p-62)
- 1
-
120p (p - Zi) - h............... (6).
(6z2 (ZZ,5p-l-Zp-2)}

(c) Three Differences:

A-ea
Area=Ac+ +- 1 p (p-
=Ac (763p2- 3444p + 3636)
1) (p-2)
5040 (p - 3)
1 p(ll9p2_504p + 432)
1260 (p - 2) (p -3) (p - 4)
1 p(133p2 - 462p + 360)
+ f(z3- z2)-(zp-2 - X-3)} h......
-Z3} .()
-0-40(p -3) (p -4) (p 5)(2-Z2-(z

Case (ii). Mid-ordinatesor tangentialarea known.

(a) One Difference:

Area= AT- 24 (2 t(z ) - )-(zp-,- zpq4)} h........................... (K).

(b) Two Differences:

Area = AT - 1 p (80p-l177) zp)} A


960 (p - 2) (p-3) RI-z)-(pi-z-)
{(z1- Zj) - (Z,,* -
1 p (40p -57)
+ 6 P - 3) (p - 4) (p-)z,,-
z)}- hzp(X)- h ...............

This formulahas manyadvantages,it is moreexact than (K), and althoughless


so than (,u) is sufficient
formostpracticalpurposes. lt weightsin the bulk of the
formula,AT, all the ordinatesequally and thus is superiorto those of Case (i)
which give onlyhalf-weightto the terminalordinates. In orderto facilitateits
use, writingit in the form
Area= AT - P {(Z1 - Zf) - (zp- - zp-)} h+ Q {(zj - zj) - (zp- -
zp4} A,

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 277

the values of P and Q have been tabulated for 8 to 20 ordinatesinclusive by


Mr Leslie Bramley-Moore. They are as follows:
p p Q
8 *128,6111 *109,5833
9 *121,2054 *094,6875
10 *115,8854 *085,0694
11 *111,8779 *078,3668
12 *108,7500 073,4375
13 106,2405 *069,6644
14 *104,1825 *066,68a6
15 102,4639 *064,2756
16 *101,0073 *062,2863
17 *099,7569 *060,6170
18 *098,6719 *059,1964
19 097,7214 0o57,9731
20 *096,8818 *056,9087
This formulawill give results more close as a rule than Simpson's, and it
possesses the great advantaoe of only weighting particular ordinates in the
correctionalterms.
(c) Three Differences:
1 p (9842p2- 53970p + 70407) j(zj - z4)- (zpk - zpj)} h
T 80640 (p-2)(p-3)(p-4)
1 p (4802p2- 23016p + 22905) t( z) zp_)} h - - -

~40320 (p -3) (p -4) (p - )


1 p (3122p2-12222p + 10935) t(zi - zi) - (zp_i- zp_i)} h
80640 (p -4) (p -5)(p --6)
....... ..... i (,u).

Special and occasionallyuseful Cases.

Case (iii). Mid-ordinatesand two extremeordinatesknown.


(a) One Difference:
1 2p
Area = AT- z
o(2p- ) t(z- l zo) -(zp-zp_h)} h. (v).
unity,this becomesa formulawell-known
If 2p/(2p- 1) be taken as approximately
on the continentas Parinentier's.
(b) Two Differences:
2p(4Op-57)
Area = AT- 1 {( I -h
180(2p- 1) (2p - 3) (- ).
1
+ 180
2p (5p -6) _
(,-z)-(pi-z_)
(2p - 2) (2p - 3)t(i-z)-(1-z4}
........ (.0

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
278 On theSystematic
Fittingof Curves
onlyof theterminal
Case (iv). Boundingordinateswiththetwo mid-ordinates
elements.
(a) One Difference:

Area= Ac+6 (2p-) t(z-zo)-(p-zp-4)1 h. .............................

(b) Two Differences:


I
2p (30p -:1)
Area =AO+j
= + t(z*- zo) - (zp - zp-j)} h
T20(2p -1) (p-1)
1 2p-()Op- 9) RZ zi)-(zp- ............... h
zp-,)} (7r).
1'2-0(2p - 3) (p - 1){(1-z)-(p-z,)h . r)

If p be fairlylarge this is not verydivergentfrom


Area = Ac + i I(z -zo) - (zp-zp-j)p h- {(z1- z)-(zpq - zp-,)}h...(p),
which may be obtained directlyby a double application of Simpson's Formula,
and is somiewhatmore exact than the latter.

f
It is, perhaps, worthwhile exhibitingthe sort of relative exactness to be
obtained by the whole series of formuleon a special example,say d.1 for
12 or 13 ordinates. We find
Jidx
1 + X = '693,147,18.

Divergence
(a), withfourdifferences, + 000,000,2.5
(,8),withfourdifferences, - '000,000,59
+ '000,001,48
() + '000,003,28
(e) + '000,000,07
() + '000,000,04
(n) +X000,014,59
(0) + '000,000,93
(t) + '000,000,07
(IC) - *000,014,93
(X) - *000,001,26

(I",) - *000,000,12
(v) - '000,003,91
(0) - *000,000,14
(o) + '000,008,12
(7r) + '000,000,22
(p) +'000,001,27

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 279

It maybe noticedalso that*

AO = '693,580,83,or A = + '000,433,65,
AT= '692,930,49,or A = - '000,216,69.

The latter is less divergentfromthe true value than the former,but they
differby as muchas 1 in 3200 and 1 in 1600 respectivelyfromthe truevalue. On
the other hand the worstof the above quadratureformulae (Kc)and (I) give results
onlyabout 1 in 48,000 in error,whilethe best,like Boole's or Weddle's Rules, or
(t) and (,), varyfromabout 1 in 6,000,000to 1 in 17,000,000,while ({) and (7r)
are almostas good. When we are dealing with frequencywe probablynever,and
oftenwhen we are dealing with measurements, physicalor economic,we do not,
know our data with anythinglike the accuracy of 1 in 48,000. We conclude
therefore that we may expectgood resultsfrommostof these formlulhe.But some
remarkson theirrelativegoodnessmay be ofservice. In the firstplace the Euler-
Maclaurin formulae(a) and (,8) withfour differences are not nearlyas good as
Mr Sheppard's new formule (t) and (,A) using,only threedifferences, and not
so good as (t) or (7r)withtwodifferences. It seems to me accordinglythat unless
we are preparedto go to great labourand calculate high differences, (5) or
(t\ (P,w,
(7r) are the best formulaeto.use, and that fornearlyall practicalpurposes(0) and
(k) are quite accurateenoughforuse. Boole's Rule (e) and Weddle's Rule (') give
splendidresults,but great care mustbe taken when we apply thein to somewhat
irregularobservationsof physical quantities and to frequencies,and not to the
evaluationof mathematicalintegrals,for in the bulk of the formulwtheyweight
and largelyweight certainordinates,and thus may tend to emphasise errorsin
particularobservations.

(3) It seems well to illustratethe applicationof these formuilae to a special


case, althoughin doing so I anticipatesome of the resultsto be reachedlater. Let
us try and fit by the method of momentsa parabola of the third order to the
followingdata:
x=0 y= *382 x= '6 =1'270
1 '674 '7 1 215
2 '923 '8 1'137
3 1'104 '9 '989
4 1'214 1.0 '819
5 1'273

These data are reallya series of measurementson Aneroid Barometerspublished


by Dr Chree in a paper in the Phil. Trans.,Vol. 191, A., p. 448. They will serve
as well as any others,however,as an illustrationof method.

* Clearly: Area=AT-
i (AT- AO),nearly. This is a veryuseful formula-based on an assumption
as to parabolicsegmentslike Simpson's-when both extremeand mid-ordinatesare available.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
280 On the SystematicFitting of Curves
We want to determinethe values of the constantsa, b, c, d, when a curve of
type
type
~~~~~y
= a + bx + Wx+ dx3
is fittedto the above data.
In using the methodof momnents we require to evaluate S (yxn) up to n = 3
frotm a knowledgeof its value at a numberof isolatedpoints. In orderto do this
and the exactnessof ourresultswillincrease
we requireto use a quadraturefornmula,
as we use betterformuloe.The object of this illustrationis to shlowthe increasing
acctiracyof differentquadratureformulae. The actual valuiesof a, b, c, d are given
in terms of the momentsin the second part of this paper. In calculating the
momentsx = 5 was taken as origin,and in each case the same quadratureformula
was used forthe area and all the moments. The followingmethodswereused,-
R. M. S. stands for root mean square of the error of ordinate at the 11 given
points:-
I. The curve was taken through four selected points. This method was
adopted by Dr Chree,and I have merelytransferredthe result obtained by him
to the centreof the range:
y= 1P269,100+ 024,000x- 027,320ax+ 000,969x',
R. M.S. = *0126.
II. The area and momentswere evaluated by treatingAT as if it were A:
y = 1P270,290+ 033,402x- 026,806X2
+ 000,3279x",
R. M.S. = *0094.

III. The area and momentswere evaluated by Parmentier'sRule, or (v) with


2p/(2p-1) put unity:
y= 1263,808 + 032,31lx - 026,380x2+ 000,4113x3,
R. M.S. = *0089.

IV. The area and momentswere evaluated by Simpson'sRule (y):


y = 1P270,130+ *027,046x- *027,180e'+ (000,7326x0,
R. M.S. = *0070.

V. The area and momentswere evaluated by Sheppard'sRule (X):


y = 1P268,898+ *029,388x- *026,853x2
+ 000,5764x',
R. M.S. = *0057.

VI. The curvewas fittedby the methodof least squares:


y = 1P268,800+ *028,700x- -026,880x2
+ *000,6167xe,
R. M. S. = *0055.

Now these resultsshow us at once that with(X) we have a fitby the method
of nmomentssensiblyas good as that obtainedby the methodof least squares. Had

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 281

in the R. M. S. between
we used (t) or (,a) therecould not have been any difference
the method of momentsand the methodof least squares. There is of coursea
distinctionbetween the two methods which it is importantto bear in mind,

-5 -4 -3 -2 _1 +I ;-2 4-3 F4 6
O 1 2 3 4 5 6 7 8 9 10

FiG. 1. Fitting of Parabolas to Observations.

namely: the methodof least squares takes a curve which passes with the least
rootmean square deviationfrom11 observationpoints; the method of moments
BiometrikaI 28

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
282 On the SystematicFitting of Curves
takes a curvewhichhas the least root mean square deviation fromlall the points
of some smooth curve with a moment system determinedby the 11 points.
Hence it is quite possible that the methodof momentsmay actuallygive better
resultsthan the methodof least squares in such a case as the above,if afterthe
determinationof the curve it becomesnecessaryto compareexperienceand obser-
vationat otherpointsthan the eleven used in the firstdetermination.
Fig. 1 gives the theoreticalcurvesand the pointsof observationin cases J,II,
IV, V, and VI.
(4) CASE (ii). The frequencyz of individualsfalling withinp elementary
ranges of a total range ph is observedor measured,to determinethe true mean
and momentsof thesystem.
Let y =f (x) be the curvegivingthe frequency and the frequency
distribution, Zr
observedwithinthe range of the variablex fromx = xr-ito x = xr.Then whatwe
actuallyobserveare
XI
Z,= ydx, 2= ydx,... ypdx
Let N be the total numberof observations,or
N = z, + z, + ... + z.

For the nth momentabout a line throughthe originperpendicularto the range


we require
r
N=n PXnydx.

Now let Z = ydx,

i.e. be all the frequencyfromx = x to x = xp,or above the value x. Then

Zo= ydx, Z=f ydx,... Z,=f ydx


are knownand given by
N, z2+zS+...+z1, Z3+Z4+.--+Zp, z4+ Z5 +*p +z1 ......
zp,,
0.

Since dZ/dx= - y, we have


I - f Xn dZ

-
[-2czxn]
+ nflPie-'Zdx
P

XO X

Z Jxp
+ n
-Z0X0n Zx"'dx.

Thus ,.' = $0' + V Zen-ldx ................ (vii).

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 283

This is the fundamentalformulafor findingthe true momentsof frequency


fromthe groupedfrequencies.The rule is clear. In orderto evaluate
distributions
Jxo
Zwn-'dx,since we knowthe value of Zn-l forx = x0,
Ix, 2 . x1p,we have to
findthe area ofa curveof whichwe are givenp + 1 ordinates;we have accordingly
to use the best available quadrature formulw,taking care that the exactness
of the formulacorrespoiidsto the degree of the momentinvestigated.
For practicalworking,since Z0= N is large,it is convenientto take x$ = 0, and
our formulathen becomes

,n= n Zxd-l.dx. ............ (viii).

Here we must be very carefulto notice that our origin is the startof the base-
elementin whichthe frequencybegins,that ZO= N is the total frequency, and Zp
is zero,and that xp is measuredto the end of the last base-elementh forwhichwe
are consideringthe frequency.Thus a lengthx$ + h,and not xp,wouldbe the total
range we should obtain by plottingthe frequenciesz as if theywere ordinates
at the middle of the elements. This process thereforetends to exaggerate the
range. As a rule it is convenientin frequencydistributionsto determinethe U's
about the mean. In this case theymay be foundfromthe ,u"s about any other
line by the formulwe
=AI0,
2
2= L2 -,_ /

3=

/-4
= P4-
=
L3' - 3,1't,u.+ 2F(ix),

'1JU h3 + 6Al/2,U2/-
1'It4 + 2(L3'-
3^ 'L
+
[.
J
J = F6- +
4 6i24 l/0t1 - lO0,t1'3j.2' (ix).,
/1,6 = U6 - + 15,1 '244'
6u'/L'p5 - 20iti'3u3' + 151'4/2' - 5,'1

Should the fre(luenicyobservationswe are dealing with cover a completedistribu-


tion we can proceedsomewhatdifferently.Let y =f(x) be the frequencydistribu-
tion and let it be absolutelyconfinedwithin the range I of x. If we take x = 0,
at one end of this rangewe have forthe integralcurveZ= f ydx. Now,whatever
be the formof the frequencydistribution,whetherit gives a curveof high contact
or not at x = 0 aid x =1, it must follow,if the rangrebe absolutelylimited,that
Z= 0 forx = 1,and Z= constant= N at x = 0. But ustuallythere is contactof a
high order at one or both ends of the range. I shall thereforework out the
modifications whichmtustbe made in the momentswhenthere is high contactat
one end at least,say at x = 1.
Thus forx = 1,we have
z ==O, d =-Y
dd xv' =? dx
d 2 = O,
ddeZ2 ' etc.,
dx3= 8-
282

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
284 On theSystematic
Fittingof Curves
and forx = 0 we have
dZ d2Z dsZ
Z=,dx ' dX2 dx'

wherea2,a,, ... a, ... definethecontactof the integralcurveat theoriginwith


thelineZ = N, and willbe supposedforthetimebeingknown.

The frequency cuirveand its integralcurvewill accordingly


take the form
indicatedin the diagrambelow.

x=l

NowbytheEuler-Maclaurin
formula

JZ'dx =(Zo' + Zl' + Z2'+ ... + Z'- + iZ) h

FB1hdZ' B3h3
d3Z' BMhd'Z' 1
-hi -
dx- x + 6 d ...

The expressionin squarebrackets vanishesat the upperlimitnotonlyforZ' =,


but forZ' =Zx',-since everydifferentialcontaining eitherZ oroneof itsdiffer-
is zeroat x = 1.
entialcoefficients

At thelowerlimitwe haveby applyingLeibnitz'sTheorem

[dn(Ze-
d O=n(n-1)(n-2)
=de(n
... (n - s+l)
+1)
dnp ,Z,
La
providedn be greaterthans, otherwise
it is zero,unlessn = s, whenwe havethe
value i N.

Nowlet CAhstandforthechordalarea
(jZ0o8 + Zxs1'+ Z2, + ... + jZ 18) h,

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 285

where of course x0 0, x =h, x2= 2h, etc., if h be the base element. Then we
easily find

f Zdx2= - 0 024
h
zz A+12N-2 +6048 a4

h6 h8
f'Zedzx= C2h+ 1512N 6a- -- 5-a,+
. (x)
h Ah4 ,6 A8
jZx3dx= C+- 120N+ 2 0a2 -7 a4+

fZa0+x- Oh 1440 a?1


h6 M8 hio
Zx5dx + N2-2 8 a2+3168a4-

wherethe values of the Bernloullinumbershave been substituted. Now let us use


hi
(viii) and write /N = a,', then
32 we findi
ha h8O

Z4dx Ch h4
00Ch 4 h a, h+ a,..-x)
j............
hA6 ,

1 A
N 720"+32~a-
2Ch = A4 a'= e
v -6 1
=
+
-N-8, ,0 2+3024
/L2 hl a4
=6C5h
, 3r 2 h6 120 ___

ba N +5-04 ,9600a5+

4W3e
,an Ai (x)i......n.e A A6 . (xi).
/A
N 0 +12~6a2 -14 a4'+
Al 504h
z/ A6 _ (_0'fA1h0
Sa,/ als' h

,
OC5h A6 A8 hl-
P' a2 + a4
+425

If the base elementbe as usual unity,thenwe have simplyto put h 1 throughout.


If therebe high contactat x =0, then we have simply

I18= I )i8B8..,A8,
NCT -

whereB21= 0, forin this case a2'= a.' = a4' = etc. = 0.

We can modify(xi) in the followingmanner. Si-nceZae, s > 0, vanishesat both


ends ofthe raingewe have,substitutingthe value of Z (see p. 282), and putting,the
base elementunity:

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
286 On the SystematicFitting of Curves
Chordalarea of Zx8 =(zo + z, + z2 + ..+ Znj (

+ (ZI + Z2 + *++ Zn) 18

+ (z2 + z. + *@+ zn) 28

+ (zn-1 + zn) (n- 1)


+ znn8
=z(18 +28+38+... + n8)

+ Zn-1(18 + 2+ 38+ ... + (n-1)s)


+ Ona_
(18, + 28,+ 38, + ...+ (n,- 2)s)

+ z2 (18 + 28) + zi (1s) ............. (xii).

Now 1?+ 28+ 38+ ... + nl can be summedby a Bernoulli'snumbersseries,i.e.

I8 + 28+ 38+ ... + ns= =- + ins JI


+ nl

_s(s l) (s 2)BBnh...+s (s-l)(s- 2)(s -3)(s 4)B s f

as s is odd or even.
the seriesendingwitha constantor n, according,

Now we may writeon the right-handside n + j - j forn, and we findaccord-


inglythat
2(1 +2 +3+...+n)=(n+)2-i,

3 (12+22 +32 + +.... 2)= (n + 8(n +i)


+
4 (13 + 23 + 3-3+ +n3)= (n
............. J)4- (n+ J.2.. (Xiii).
5 (14 +24 +34 + + n4)=(n+0)5-i(n+j)-3+ 7(n+j) ,
3,
6 (151+ 251+ 35+ ..+ nl) = (n + J)6 -4 (n +J4 + 7(n + )2-

In these we can write n, n - 1, n - 2, n - 3, etc.,successivelyforn, but zr(r + iD is


n
the sth momentof zrabout one end of the range, and Sz, (r + i)8 is the sth
momentof the systemof groupedfrequenciesabout one end of the range. Let us
call this Nv8'. We can now rewriteequations (xi) in termsof the v"s. We first
note,however,that whens =0:

= IfJ
Zdx= (chordalarea) by (x)

=(IZ0+ Z + Z2 + ... + j Zn)IN


=1(n+Dzn+(n -1 +Jz, 1+(n -2+j j)z.2+etc.}/N
= V1.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 287

Hence finallyreintroducingh forbase uinit,and substitutingini(xi), we have, if


Vn= S (ZrXr )

'= vi' _h , + 30hh a5, -

h 120
124v +3024 a4'-

2 5 7I + a84 1
4= 4'--V2 +270 h4+h6
64 4
2 '+Z h2' 1 8 h8 A10
h v3'+ -0h4v' -
/LV /x
p13'= V-- a3 + 3168a5 -

2416 1344 1a4 a2 +

forfindingthe firstsix momentsabout one end of the


These are the formulae
range when we have found the v's or the "rough momenits," i.e. the frequencies
groupedat the mid-pointsof the about
elements, the same end. Putting
a2a = as' = a4 =a=/ =O,
we have the correctionsfirstgiven by Mr W. F. Sheppard (Proceedingsof the
London MathematicalSociety,Vol. XXIX. p. 368) for the case of high contact at
both enidsof the range.
It remainsnow to be consideredhow we can determinegood values forthe a's.
I assume that the formof the integralcurve near the origincan be closelygiven
by a parabola of the 5th order. This, since Z = N and dZ/dx= 0 forx = 0, mustbe
Z tT (l+TX,I
+ a
tX3
+ a4/X,4 +
a. zX5)

whencewe have as required a(Z = '=r a8.

Now when x = c, x = 2e, x = 3e, x = 4E. let


Z=N(1i-n1), N(1 - n.-n2), N(1-nj-n2-n3), N(1 -nj-n2-n,-n4).
Thus we find:
a2'e2 a3'eX a4'e4 a5'e5
-nl= :-+ L+ L-+

- (n1 + n2)- a2e22 +a 23 + ai 24 +a 25

-(n1 + n2 + n3) =-a2 32 + a. 3 33 + a3+

, + n,3+ n4)= ae
- (n + n2
13
L2 a3 3 3+
L4
+ 1~55

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
288 On theSystematic
Fittingof Curves
By actuialsolutionof these equations I find*:

ae22 415n,- 161n2+ 055n3 - 9n4


a2 - 72 =72,

3 755l-493n2 + 191n3- 33n4=


36 = =
3/
~~48
.
_
11= 9n-97n2 + 47n3-9n (xv).
a4E
~~~6

/6= 125n,- 115n2+ 65n3- 15n4

These values of ycan be easily calculated. Now let h/e= p. Then if we take h
equal unity as usual, i/p will measure the fiaction e is of h. Of couirsevery
usually p = 1, but this is not necessary; thus in certaindiseases the frequencyof
cases in each of the firstfive yearsof lifeis oftenrecorded,but later onlyin five-
year periods. Making these substituitions we may write our final results forthe
moments:
3
A =V - + p 75 _
fL 17203+ 30240
2^2 p474
2= V2 -12 120 + 3024

P ~ = v's-
f43S 1 I+.
-" P
3
Ya -6
96-on+
i 5074 ______

4 4-2
/ ^2+240 1P276
P'72 . 1P440 +((v)
__ '4........(xvi).

=
-6 V+48 -288+ 3168

~,,t5 / 7 , 31 p272 p474


F4 V1V2
=6Lc4 1344 80 528

(xvi) and (xv) formthe solution of the problem. It is of coursesometimesmore


convenientto use (xi) directly. In anyindividualcase we inustfirstfindthe v"s-
generally only vI' to V4' are needfu;l-about one end of the range. Then we
calculate the y's and p and so determnine the ,"s by (xvi). Thenl by (ix) we
find the values of the ,'s transferredto the centroid. Of course the process
will be much simplifiedif there be high contact at both ends, forthen all the
,y's may be put zero. The methods here indicated seem of such importance
that it is desirable to fullyillustratethem in various special examples,each of
which has been selected with a view of illustratina some peculiar point or
difficulty.My firsttwo examplesare illustrationsof the fittingof skew frequency
curves; my third deals with the fittingof sine curves when only a portionof
* The reader must rememberthat n1, n7, n3 and n4 are not the total frequenciesin the firstfour
elements,but the proportionalfrequencies.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 289

of vital
are known; my fourthdeals with the representation
the observations
statisticsby Makeham'scurveand myfifthwiththe deductionof the curveof
errors offrequency.
frompartialobservations
(5) Illustration I. On themeanvariability and distributionoffecundity in
2000 thoroughbred broodmares.
By fecundity of the mareis heremeantthe ratioof the numberof yearling
foalsshe has actuallyproduce(d to the numnber of her opportunities.The base-
elementsweretaken+ on eithersideof 0, I, -, 1-6,... It, 1. Thusfecundity
'I
from0 to 1 was dividedinto 16 grades,respectively denotedbya, b,c, d, ... 1,m,
n,p, q. The data wereextractedfromthe stud-books, everymarehavinghad at
least8 or morecoverings.
I proposein thisfirstillustration to go throughthe wholeof the workof
dealingwiththe frequenicy distribution as it inaybe unfainiliar to manyof my
readers,and yet it is reallyverysimple. I shall firstsupposethe curveto have
highcontactat theterminals oftherange,andworkoutthev'sand deducetheit's
bySheppard'scorrections: see p. 287. This,however, is notin thiscase legitimate
frommereinspection of the curve,and therefore we ouight to startbyusing(xiv).
Workingout the ,u's in the latter way also we can comparethe resultsactually
to go as faras p,
obtained. It willbe sufficient
Grade Frequencyx x zx zx2 zx3 zx'
a 0 -9 - 0 + 0 - 0 + 0
b 2 -8 - 16 + 128 -1024 + 8192
c 7-5 -7 - 52-5 + 367-5 - 2572,5 + 18007-5
d 11.5 -6 - 69 + 414 - 2484 + 14904
e 21-5 -5 - 107-5 + 537.5 - 2687-5 + 13437-5
f 55 -4 -220 + 880 -3520 + 14080
g 104-5 - 3 - 313-5 + 940-5 - 2821-5 + 8464-5
A 182 -2 - 364 + 728 - 1456 + 2912
i 271P5 -1 271-5 + 271-5 - 271-5 + 271-5
j 315 0 -1414 -16837
k 337 +1 + 337 + 337 + 337 + 337
1 293-5 +2 + 587 + 1174 + 2348 + 4696
nt 204 +3 + 612 + 1836 + 5508 + 16524
n 127 +4 + 508 + 2032 + 8128 + 32512
p 49 +5 + 245 + 1225 + 6125 + 30625
q 19 +6 + 114 + 684 + 4104 + 24624

2000 +2403 +11555 +26550 +189587


imam-a
1-1414 - 16837

+ 989 + 9713
'= .4945 Y2p=5-7775 V3 = 4 8565 )'= 94 7935

Using (xiv) withthe a's zero to findthe momentswe have


P,'= .4945, p,' 5-694,167, p,s'=4732,875, p4'= 91-933,917,
BiometrikaI 29

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
290 On theSy8tematic
Fittingof Curves
and hence by (ix)

,u,= 0, = 5 449,636, 3-- 3472,584, , = 90 747,281.

It will thus be seen that the determinationof the four,u's about the mean is
neithera long nor a difficult process. I will now proceedto findthemde novoby
applying,(viii).
Grade Frequency Z x Zx Zx2 Zx'
I. II. II1. IV.
a 0
2000 0 0 0 0
b 2
1998 1 1998 1998 1998
c 7-5
1990.5 2 3981 7962 15924
d 11 5
1979 3 5937 17811 53433
e 215
19575 4 7830 31320 125280
f 55
1902-5 5 9512-5 47562-5 237812,5
g 104-5
1798 6 10788 64728 388368
h 182
1616 7 11312 79184 554288
i 271-5
1344-5 8 10756 86048 688384
j 315
029-5 9 9265,5 83389-5 7505055
k 337
692-5 10 6925 69250 692500
1 293-5
399 11 4389 48279 531069
rni 204
195 12 2340 28080 336960
n 127
68 13 884 11492 149396
p 49
19 14 266 3724 52136
q 19
0 15 0 0 0

The chordalareas are here:

Chordalarea ofZ = 17,989,


Zx = 86,184,
Ztz9= 580,828,
Zxs= 4,578,054.
Whence by (xi) withthe a's zero,the momentsabout one end ofthe range:
,q'= 8-9945,
~'= 886350,667,
8(S3 871242,
= 9156&074,667.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 291

Using (ix):
1.2= 5*449,637,
/3=
- 3>472,590,
/4= 90 747,442.
The divergence between these results and those given by Mr Sheppard's
process is verysmall and solely due to the arithmeticbeing cut offat the sixth
place of decimals in the multiplication. Thus #4above really ends with 6, and
is sensible in the fourthplace of decimals of p whenwe multiply
this difference
/' by 6#42..
Now let us see if Mr Sheppard's process is in this case justified; let us no
longerput the a's zero,i.e. no longersuppose high contactat the high fecundity
end of the curve. We have
n,= 19/2000, n2= 49/2000, n,= 127/2000, n4= 204/2000.
Hence we findfrom(xv)
72=-*035,729, "ye=080,344, PY4=-*136,750, rye='080,625.
This leads us by (xvi) to
/4'= V1- 000,1089, V2/- + 000)2525;
IS =VS'-+Vi+O000,1l50,
3 /4'= v-J ^32+- -000,1886,
or the P"s are onlyinfluencedin the fourthplace of decimals. Substitutingthe
values of vl',v,',v.' and v,'about one end of the rangejust found,we have
'= *494,391,#p= 5'694,4195,
/A= 4'733,026, /h4 = 91-933,7284,
whichlead by (ix) to /2= 5449,997,
,43=- 3-471,085,
/4= 90'745,703.
We see that modificationsare in the thirdplace ofdecimalsof F,3and , and in the
of
fourth p2. Clearly we are in assumiinghigh contact
not justifiedtheoretically
end of the frequency
at the high fecuindity curve,but for mostpracticalproblems
Sheppard'scorrectionswould in this case have been quite sufficient.The actual
slope of the tangentto the frequencycurveat the end of the rangewould be
-d2Z
-2=Y2 = -035,729,

which is of course fairlysmall. Thus if there be not high contact at one end of
the curve,but the slope of the tangent be not large, Sheppard's correctionswill
still give the substantialpart of the requiredcorrection. If, however,as in the
mortalitydue to variousdiseases the curve meets the axis at a considerableangle,
we must endeavourto determinein some such mainer as the above the value of
the correctiveterms.
29-2

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
292 On theSystematic
Fittingof Curves
Suppose we attemptto fita curveof the form

y =y(1 + )l)(1 a)m

to the above data. The values ofthe constantsin termsofthe momentsare given
in Phil. Tranm.Vol. 186, A., pp. 367 et seq.,and we find

y= $42-187 1 + 47.13579) (1 - 12.11059)

The mean being at j + '4945, and the mode, which is the origin,at j + '7970.
Here j denotes a fecundityof 9/15),and 1/15 is the unit of fecundity. Fig. 2
showsthat we have a veryreasonablefit,-a curvewhicheffectively representsthe
phenomenon.

150-~~~~~~~~~~~~~~

i'ecundity.
FI-G . - Fe?undityof 2000 Brood Mares, 8 or more coverings.

(6) Illustration is to selecta suitable


II. Half the battleof curve-fitting
typeof curveforrepresetiting
the observations.Mereiincreaseof the numberof

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 293

constantswill oftengive farless advantageous resultsthan the choice of a more


suitable formeven withfewerconstants. I will endeavourto illustratethis by the
followingsystemof frequenciesdue to data froma game of 'patience* ':

chVaratef4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22-28

Frequency - - - 3 7 35 101 89 94 70 46 30 1 5 4 5 1 - - -

The possiblerange is 4 to 28.


Now we findthe mean of the characterto be 11V86,and let us assume the form
of the curveto be

Y=YOf1+ ) e a

Here the originis at the mode or maximumordinatewhichis at a distancealp


fromthe mean. We bave thus onlythree constantsp a and yoat our disposal.
We shall show in the sequel that we get a better fit than if we disposed of seven
constantsin a curveof the form
y = ao + a,x + a2x2+ ax3 + aeA + ax + a68.
The data appear to give highconitactat both ends,and thereforeSheppard'smodi-
ficationswould give the best values of the moments. But for the purposes of
illustrationwe will treat the data as giving,a polygonalcurve,and assume our
object to be that of findinga curve goingfas close to this polygonas possible.
Methodsof findingthe momentsof a polygonalarea will be given in the second
part of this paper. Formulaoforour presentpurposewill be foundin Phil. Trans.
Vol. 186, A., p. 350. There resultsforthe momentsabout the mean in the present
case
2= 43231, p, = 4-6804, /4 = 59-683.

Hence we deducet

Y98762 t(i +?
y= ~7-4449)
1S.7530

The distance fromthe mean to the mode, which is the origin,is *5413. Thus
the modal value is at 11-3187,and the range startsat 3,2931.
The ordinatescorresponding to the obseivationsare given in columntwo of the
Table in Art. 13 of this memoir. Fig. 3 shows a reasonable fit. The Table
comparesthese results with the successive parabolas up to the sixth and shows
how a well selected curve with three constantscan easily be superiorto one with
seven constants+.
* Thiele; Forelaesningerover almindeligIagttagelseslaere,p. 12.
t The formuleforp, a, and yoin termsof the momentsare given,Phil. Trans.,Vol. 186, A, p. 373.
- This point is of special importance,for objectionshave been raised against the skewfrequency
curvesjust referredto on the groundthat theygive betterfitsthan the normal curvebecause theyhave
one or twomoreconstants as the case may be. This is true,but theyalso give betterfitsthan some
other curves with double their numberof constants!

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
294 Fittingof Curves
On theSystematic

I I 011,1 1 1 1 1 'I 112.~~~~~~~ ~

apo~~
~~x,
co'
(

-~ - - - r1~ -

S S R a O o
fiouaflb9-,d

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 295

Now either of the curves in IllustrationsI. and II. is a good example of the
impossibilityof using the method of least squares for systematiccurve-fitting.
The reader need onlyattempt to write down the type-equations,which must be
solved to findthe constants,and he will realise the siinplyappalling amount of
lengthyapproximationswhichmustbe carriedout even afterroughvalues of these
constantshave been guessed by some one or othermeans.
But it is not onlyalgebraic and exponentialcurves forwhich the methodof
curves. I will now illustrate
least squares fails; it fails also for trigonometrical
this in the verysimplestcase possible.

(7) III.
Illu,stration Let it be requiredto fitthe simplestsine-curve
y = a sin (nx + a)
to the aneroidbarometerobservationsin the Illustrationin ? 3. Let us writethe
equation in the form
y= A sin nx + B cos nx............... (xvii).
Then the three type-equationsto find n, A and B, arising fromapplying the
methodof least squares,are the following:
AS (cos2 nx) + IBS (sin 2nx) = S (y cos nw),
I4AS(sin 2nx) + BS (sin2 nx) = S (y sin nx),
AS (yx sin nx) - BS (yx cos nx) = j (A2 - B2) S (x sin 2nx) - ABS (x cos 2nx).
Here S denotes a suminationwith regard to the eleven values of x and y given
on p. 279; afterthese have been substitutedin the summations, we must eliminate
A and B, and we shall then have an equation to determine n. Afterwardsthe
values of A and B must be foundby substituting the value of n in the two first
equations. We mlayleave this as an exercise to those readerswho have faithin
the methodofleast squares applied to curve-fitting!*

Now let us turn to the method of moments. There are three constantsto
be found,so we must find the area and the firsttwo momentsof the observa-
tionsand of the theoreticalcurve.

Taking the origin at the middle of the range 21, and writingM0, Ml, Ma
forthe area and firsttwo moments,we have
sin nl
MO=2B M1=2A (-lcos nl+ sin nl)

I
n ( n n' )
M2= 2B sinnl 21cosnl 2 sinnl
n n2 n

* The equation to findn is intractableeven if we place the originat the centreof the range and
evaluateby trigonometrythe summationsnot involvingy or x outsidethe trigonometrical terms.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
296 On theSystenatWic
Fittingof Curves

Put -.. and s nl, then we find

- + =z= cots.................. (xviii),

___
Ml
B 21si A 2l sin .................. (xix).

'.8 -4 -__ - _. 2_ 4

1.1 _____ ______ / IFiG 4

FI- 4.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PIARSOIN 297
These are the generalequationsforfitting the sinecurve(xsi) by the method
of moments to anyseriesof observationsWe see howsimple is this methodas
compared withthatof least squares; we mustfirstfinda rootof (xviii)and this
valueofz substitutedin (xix) willgiveus A and B.
For the specialcase of the barometer I supposeour ordinates
observations
placed at the middleof the elementsso that the range 21= 11. I thenfind
the moments by an applicationof (X) on p. 276,usingthe values of P and Q
givenforp= 11 on p. 277. Thisgives
Ho= 10'979,4240,M1= 4'420,0564, M,- 86'783,0235,
and thence 8=- -369,353.
To solve(xviii),thehyperbola
y= - .369,353s,

curve y = cots
and thetranscendental
wereroughlyplottedand observedto intersectaboutz = 1'2. Using Newton's
I foundwithMissM. A. Iewenz'said:
methodofapproximation,
z=- 11867, z= 11844, z= 1-184,4132,
exact. Thisgave
whichlastvalueis practically
n='215,348= 12020' 19",
A = '213,545, B = 1'276,288.
Whencetherequiredcurveis
y= '213,545sin( 215,348x)+ 1P276,288cos('215,348x),
or y = 1P29403 x
sin1(12020' 19") + 80030' 5"I,
thelatterformallowingofeasycalculation oftheordinates.
We have
x Oboervedy alculaed y
-5 '382 '417
- 4 '674 '669
- 3 '923 *891
- 2 1104 1o071
- 1 1'214 1'201
0 1'273 1'276
+ 1 1'270 1'292
+2 1'215 1'250
+3 1'137 1'148
+4 '989 .993
+5 '819 '793
The rootmeansquareerrorof the ordinatesis '0233. Thus the fitis byno
meansso goodas thatof the parabolaof the thirdordery= a. + a1x+ a,x + aV
Biometrikai D0

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
298 On theSystematic
Fittingof Curves
dealt withon pp. 279-281. But therewas no reason forsupposinga priorithe
observationsto be suitable to sine curverepresentation,
and the sine curve has one
less constant. The fit is illustratedin the accompanyingFig. 4, and is seen
to be by no means bad. The data were merelyselected,as equally good withany
others,to exemplifythe processof fittinga sine curve.
(8) Illustration
IV. TofitMakeham's
CurvetoMortality
Statistics.
Given a mortalitytable-i.e. a table whichgives the numberof survivorsout of
n people bornin the same year at each year of age of the group-then if l, denotes
the numberwho attainithe age of x, the table will be closelyrepresentedbetween
the ages 20-25 to 85-90 by Makeham'sformula,i.e.
lx= ksx(g)zW......... (xx),
wherek,s, g and c are constantsto be determinedfromthe data of the table.
Now there will be some 60 to 70 correspondingvaltuesof x and lx and it is
a quite hopelesstask to thinkof discoveringthe values of k,s,g and c forthe equa-
tion as it stands. If we take logarithis the equation may be written:
Lx'=K'+xS'+ G'cz,
wherethe capitals are the logarithmsof the small letterquantities. The determi-
nationof K', S', G' and c by the methodof least squares is still impracticable. Of
course fourcorresponding, values of Lx' and x would give K', S', G' and c, but
such a selectionof fourarbitraryvalues out of 60 or 70 is unsatisfactory in the
extreme. AccordinglyMessrsG. King and G. F. Hardy have deterininedvalues of
these constantsby a processof averagingseries of corresponding, values of L=' and
x, so that the finalvalues of the constantsshall depend on as much of the table as
possible*. The values reached for the constantsare good, but no doubt better
ones couildbe found,and the processfromthe standpointof systematiccurvefitting
is unsatisfactory.It involvesempiricaltrials-e.g. " various groupingsweretried,
and the best was foundto be, fourgroupsof eighteenyears of lifeeach" (Text-book,
p. 82)-and therefore followsno generalrule forcurve fitting.
Accordinglyit seems verydesirable to indicate how the method of moments
can be applied to Makeham'sforrnula.
I shall take I forthe range of the mortalitytable to be dealt with and my
originof X' at the mid-pointof this range. If x0 be the age corresponding
to the
origin,I shall write

whencewe see that: 1_,-kls''l (g)c .(xxv),

.' = kszx

* Journalof Institute
of Actuaries,Vol. xxii. p. 200, or G. King: Instituteof Actuaries Text-book,
Vol. ii. p. 79 et seq., especiallyp. 82.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 299

at once connectthe ordinaryconstantss, c, g, k with mys', c', g', k'. Clearlyk' is


a iiunuberof livina personslike k, and s', c' and g' are mere numericalquantities
independentof what unitof durationof lifewe mayselect-day, month,year,etc.-
while the s and c of the usual notationinvolvethe unit in whichlifeis measured-
a theoretical,although scarcelypractical disadvantage. Taking logarithmsI now,
droppingdashes,writethe formula

L = K + S 7 + Ge ............ (xxvii),
where
e= c' ................ .. (xxviii).

We muistnow proceedto findthe area and firstthree momentsA, Au1, AA,,


Ap3 of (xxvii) about the middleof the range 1. If we then equate these to the
momentsfoundfromthe table we shall have equations to determineK, S, G and n
and thereforek, s, g and c in a perfectlydirect and systematicmanner,using all
the data provided. We have

A= Ldx= Kl?+ _G(en_ e).


Or if
ao = A/l........................................ (xxix),
sinhn.xx)
ao K a 8in ........ .................. a..............(xxx),

AP, Lxdx.
-it
Or if
a, = 12ApF/l2
....................................(xxxi),
= S+ 6S 'h_ nsin- n)
alS+G {n ...(xxxii), (xi)
al ~ '}................................
=f+"Lx2dx.
Ap,u
Or if
a2 ,=12Ap/3 .................. .............. (xxxiii),
(sinhn 2 coshn 2 sinhn)
a2 =K+3G |.n.na.+.nS. ....a. (xxxiv),

Ap8= Lxldx.
Or if
a.= 80A/14. ....... (xxxv),
Scoshn -
3 sinhn +6 coshn 6 sinhn}.(x i)
ofs S
+IOG -....(xv)
s
2 )
From (xxxii) and (xxxvi) we have:
coshn - 24 sinhn 60 coshn 60 sinhn
as.- a, = G {4 + ... (xxxvii).

80-2

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
300 On theSystematic
Fittingof Curves
From (xxxiv) and (xxx) we have:
2 sinhn 6 coshn 6 sinhn..
o(2 -aO= G n1 ns + . (xxxviii).
inh

EliminatingG between(xxxvii)and (xxxviii)and writingfifor(a3- a,)/(a2- a0),


or:
4 (20/h 3P,
i
18 -' ........................... (xxxix),

a constantto be foundfromthe momeintsof the mortalitytable, we have after


some reductions:
2ns+ 30n + 3,8n2
tanhn = /n'+
12nW+ 3,8n+ 30

Or, substitutingforthe hyperbolictangent:


e (,8 +2)n'+3($+4)'+33(8+10)n+30. (xl3
e (,83- 2) n-3-3 (,8 -4) n2 + 3 (,B-10) n + :30.........

This equation will give n to any degree of approximationirequired. Then (xxxviii)


gives G, and (xxxii) gives S, and (xxx) K, whence all the constantsof the solution
can be found.
To solve (xl) an approximatevalue of n = nmis easily found; forc = emllhas
been foundfrompreviousexperienceto have a logarithmverynearly-04. Start-
ing fromthis value of n0 successive approximationscan be obtained by Newton's
method. Thus, put n, + h in (xl) and neglect h2,we findif e2ll= N/D, where N
stands fornumeratorand D fordenominator:

hN no No

D, (dn D2 dn Do (xli).
Writing
Y, = 2n3+ 383n'+ 30n .(xlii),
Y2= ,8ns+ 12m'+ 3,8n+ 30..(xliii),
we have
Y
t +
- y. .(xliv),

D=Y2-Y1, N=Y2+ Y1
dD dY2 dY1 dN dY2 dY(
dndn dn dn ' dn .............. (xlv),
where
dY1 = 6n'+ 6/3n+ 30, dY2 + 24n
dn dn-
3,82 3/ 3+
+24+3j
..
. (xlvi).
lv)

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 301

Thus everypart of (xli) can be readily foundnumericallyby calculating Y}, Y2,


and their differentials,as given by (xlvi), for any value of n0. Of course for
accurate calculationiswe nmust go to 9 to 12 places ofdecimals and the ordinary
tables of logarithmsare of no service. For the numericalillustrationnow to be
given a large Brunsviga calculator was used, and exponentialsand reciprocals
foundso as to be true to twelveplaces of figures. The calculationswereof course
long and laborious,and I owe an immense amount of solid help to my former
colleague,Mr Leslie Bramley-Moore, forindependentarithmnetic and forverifica-
tioniof miyown calculations.
I selected the mortalitytable given in the Text-book for actuaries, but I
limited the range 1 of life to the 60 years from25 to 85 inclusive. I did this
because the data after8.5 is reallysparse,because the material before25 begins to
divergefromMakeham's law, and lastlybecause as a mereillustrationof method
big task to calculate area and momentsfor a systemof 61
it is a sufficienitly
ordinates. Usirngz0 to z61at equal distances I could apply Weddle's Qtuadrature
Rule (see (g) of p. 27.5), in which I have great confidencefor a fairlysmooth
curve like that given by the mortalitytable. The ordinates,of course,are
z = L = loglx, forthe area, xz for the first,X2Z for the second, and x3z forthe
third moment,where attention must be paid to the sign of x.
The followingvalues werefound:
A = 221-843,235
A,al= - 275 103,222
AFt2= 64,464-355,986
AF3= - 162,062316,564.
Whence
ao= 3-697,387,25O, a2= 3-581,353,110,3
a, = - *917,010,74, a%= - 1P000,384,670,i48.
These lead to
38= *718,529,308,595.
By a rougherquadratureprocessI got for,3 forthe wholerange from20 to 90
le = (801,086,783.
The value of,Bas foundb)y(xl) fromthe n which corresponids
to Messrs King and
Hardy'sc is:
/3= *804,162,5,
is
but theirrange from17 to 88 years.
The next pointis the solutionof (xl). Workingin the mannerindicatedwith
/8= *718,629,309,
and calculating necessaryterms to 12 places of decimals,we foundthe following
seriesof approximationsto the value ofn
2 7, 2 8, 2 807,68, 2-807,312, 2-807,346,8, 2 807,:343,62,
2-807,343,87and 2-807,343,873.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
302 On the SystematicFitting of Curves
This value is correctto the last figureor we have
n = 2-807,343,873.
Hence by using the exponentialtheorem:
e- = e2 x eW7ex
X ??,s,m
= 16-565,858,706,268.
Similarly
e-n= *060,365,117,060.
Thus
sinhn = 8-252,746,794,593,
coshn = 8-313,111,911,675.
Hence we determinefrom(xxxviii):
G= - 064,875,0015,350,
and from(xxxii):
= - 002,866,074,767.
Finally from(xxx)
3-888,100,2-8.
K=
Calculating.: c = e2/1we have:
c = 1P098,096,393,273,
whichI believe is true to the last figure.
The value of c as foundby MessrsKing and Hardy is:
c = 1P095,612,204.
The differenceis partly due to (lifferenceof range, partly due to method of
calculation.
Thus finallywe obtain forL., the logarithmof the numberof survivorsof
age 55 + x years
Lx = 3888,100,258- x x *002,866,074,767
- 064,875,005,350(1P098,096,393,273)x.

In comparingour forrnulawith othersof a like kind,it mustbe remembered


that ouirx is measuredfrom55 yearsas origin. For uiseit maybe noted that the
reciprocalof c is
-
= 910,666,86.5,065,
whichwill be wantedwhenx is negative.

Clearlyc and c are wantedto manyplaces of figuresas theyhave to be raised


to high powers. The values of L,, for = -30 to + 30 were foundby repeated
multiplicationwith a Brunsviga,so that in no part of the work has a table of
logarithmsbeen used.
I give here a table of the observationsand calculatedvalues and add Messrs
King and Hardy's results*,asking the reader,however,to rememberthat these
* Text-book
ofInstitute
ofActuaries,
Partii. p. 88.

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions
K. PEARSON 303

were based on the range 17 to 88 and therefore


are not strictlycomparablewith
mine based on a different
range.

Life Table (Lx = log 1=).

Calculated differences* Calculateddifferences

Age Observed By moments By averages Age Observed By moments By averages

25 3 96833 + 00184 + 00043 56 3 81354 + 00046 + 00039


26 *96609 + 83 - 45 57 *80339 + 75 + 59
27 *96307 + 56 - 59 58 *79289 + 71 + 44
28 *96025 + 5 - 97 59 78184 + 47 + 09
29 *95684 + 8 - 81 60 77069 - 50 - 100
30 *95363 - 13 - 91 61 *75695 + 21 - 41
31 95003 - 1 - 67 6e *74259 + 55 - 20
32 94682 - 34 - 89 63 *72729 + 73 - 15
33 94319 - 32 - 76 64 71075 + 95 - 6
34 *93957 - 37 - 72 65 *69294 + 112 - 2
35 *93578 - 34 - 59 66 *67359 + 138 + 13
36 93219 - 60 - 76 67 65281 + 148 + 12
37 *92833 - 68 - 76 68 63099 + 87 - 57
38 *92416 - 56 - 56 69 60628 + 123 - 27
39 91967 - 23 - 16 70 57894 + 212 + 58
40 *9150;3 + 12 + 25 71 *55390 - 161 - 314
41 91072 0 + 19 72 52602 _ 504 - 652
42 90615 _ 1 + 22 73 *48994 _ 306 - 443
43 90147 - 8 + 19 74 45432 _ 460 - 579
44 89685 - 40 _ 9 75 *40598 + 321 + 229
45 *89169 - 38 _ 5 76 *36295 + 202 + 147
46 *88629 - 34 0 77 31403 + 266 + 261
47 88082 - 48 _ 13 78 *26403 - 7 + 51
48 *87463 - 16 + 18 79 *20713 - 80 + 59
49 *86847 - 18 + 15 80 *14362 - 29 + 210
50 *86179 + 1 + 31 81 07776 - 332 +. 29
51 *85456 + 39 + 65 82 *00212 - 306 + 205
52 84693 + 77 + 99 83 2 92001 - 343 + 348
53 *83947 + 56 + 72 84 *81934 + 694 + 1600
54 *83194 - 5 + 4 85 *73319 - 578 + 584
55 *82363 - 40 - 38

* Calculatedless observedvalues.

By the methodof momentsthe mean differenceis 00116 and by the method


of averages it is 00126. The improvernent is not verysensible,but the method
of fittingbeing broughtundera generalrule is an advantage of great importance.
The actuaries have adopted Mlakeham'sformulato express the life-table,but it
cannot be consideredto give good results forthe down slope of old age, say 68
onwards. From the mathematicalstandpointbetterresultswouldbe obtainedby
the choice of other functions. I have selected Makeham's solely with a view of
illustratingthe applicationof my general method to a somewhatcomplex arith-
meticalproblem.
(To be continued.)

This content downloaded from 195.34.79.174 on Wed, 18 Jun 2014 18:33:03 PM


All use subject to JSTOR Terms and Conditions