Documente Academic
Documente Profesional
Documente Cultură
Original Article
In a series of two papers, this paper and the one by Ozkok et al. (Modelling critical illness claim
diagnosis rates II: results), we develop statistical models to be used as a framework for estimating, and
graduating, Critical Illness (CI) insurance diagnosis rates. We use UK data for 19992005 supplied by
the Continuous Mortality Investigation (CMI) to illustrate their use. In this paper, we set out the basic
methodology. In particular, we set out some models, we describe the data available to us and we discuss
the statistical distribution of estimators proposed for CI diagnosis inception rates. A feature of CI
insurance is the delay, on average about 6 months but in some cases much longer, between the
diagnosis of an illness and the settlement of the subsequent claim. Modelling this delay, the so-called
Claim Delay Distribution, is a necessary first step in the estimation of the claim diagnosis rates and
this is discussed in the present paper. In the subsequent paper, we derive and discuss diagnosis rates for
CI claims from all causes and also from specific causes.
Keywords: critical illness insurance; diagnosis rates; statistical models; Burr generalised linear-type
model; Claim Delay Distribution; Continuous Mortality Investigation
1. Introduction
Critical Illness (CI) insurance is a type of long-term insurance, typically secured by
regular premiums throughout the term of the policy, that provides a lump sum on the
diagnosis of one of a specified list of critical illnesses within the policy conditions. In the
UK, there are two types of CI policy: Full Accelerated (FA), which covers both CI and
death and Stand Alone (SA), which covers only CI. The former is far more popular than
the latter. CI coverage includes, but is not limited to, cancer, heart attack, stroke, coronary
artery by-pass graft (CABG), kidney failure (KF), major organ transplant (MOT) and
multiple sclerosis (MS). Most policies also include total and permanent disability (TPD)
for completeness, essentially to cover disability arising from other causes not covered
explicitly in the policy.
CI insurance has been very popular in the UK since it was introduced in the 1980s.
Around 700,000 new policies were sold in 1998 (Dinani et al. (2000)) and more than
1 million new policies were issued in 2002 (CMI WP 50 (2011)), many of them linked to
*Corresponding author. E-mail: eozkok@hacettepe.edu.tr
# 2012 Taylor & Francis
2
440
442
E. Ozkok et al.
2. Models
Our most detailed model for CI insurance is a cause specific model which is represented in
Figure 1.
Points to note about the model represented in Figure 1 are:
(1) Healthy indicates that the individual has not yet been diagnosed with a CI or died.
(2) An individual exits the Healthy state on death or on the diagnosis of a CI, as specified
in the policy conditions.
j
D
(3) The model is specified in terms of transition intensities, labelled kx;h and kx;h .
Transition intensities are analogous to the force of mortality and there are good
reasons for specifying the model in this way. See, for example, Waters (1984).
3
443
441
(4) A transition from Healthy to Dead means death before the diagnosis of a CI, so that,
D
numerically, we might expect kx;h to be different from, possibly lower than, the total
force of mortality for a corresponding set of individuals.
j
D
(5) The transition intensities, kx;h and kx;h , depend on the cause, on the current age of
the individual, x, and also on a set of other covariates, labelled u. These covariates are
the important characteristics of the individual and/or the policy which affect the
likelihood of the diagnosis of a CI or death; for example, Sex, Benefit amount, Office.
The set of covariates cannot include any characteristics which are not recorded in our
data and a major part of the statistical modelling is to determine which of those
characteristics recorded in our data are important and hence should be included in u.
(6) This model can be used for both FA and SA policies. Each transition intensity would
D
be estimated separately and data from FA policies only would be used to estimate kx;h .
Figure 2 represents a simpler, all causes, model for CI insurance. We could use the model
in Figure 2 to model FA and SA policies separately. In this case, a transition from Healthy
to Insured event means:
(1) diagnosis with a CI or death before diagnosis with a CI (FA policies), or,
(2) diagnosis with a CI (SA policies).
In this case, kx;h in Figure 2 corresponds in terms of the model in Figure 1 to:
n
X
kx;h kx;h
j1
n
X
kx;h
for SA polices:
j1
Figure 2.
E. Ozkok et al.
4
444
442
Figure 3. An all CI causes and death model for critical illness insurance (Ozkok et al. (2012a)).
Alternatively, we could use the model in Figure 2 to model FA and SA policies together.
In this case, Insured event has different meanings for FA and SA policies: for FA policies
it would include death before diagnosis with a CI whereas for SA policies it would not.
We might reasonably expect that Benefit type, FA or SA, would be an important covariate
in this model and that, other things being equal, the total claim rate for FA policies
would be higher than for SA policies.
A more satisfactory model, in terms of Benefit type, FA or SA, is illustrated in Figure 3.
Pn
CI
D
j
D
The transition intensities kx;h and kx;h in Figure 3 correspond to j1 kx;h and kx;h in
Figure 1. For FA policies, a transition to either of the two exit states would result in a
claim. For SA policies only a transition to Diagnosed with a CI would result in a claim;
death before diagnosis with a CI would terminate the policy.
We discuss in Paper II, the parameterisation of the models represented in
Figures 13.
3. Data
3.1. Covariates
We were provided by the CMI with a set of CI data relating to UK policies in the seven
calendar years from 1999 to 2005. The data consisted of records of policies in force at the
start and at the end of each of the seven years and details of claims settled within the seven
years. The covariates included in each data record were as follows:
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
Sex.
Smoker status: non-smoker or smoker.
Benefit type: FA or SA.
Office (coded anonymously).
Policy type: joint or single life.
Benefit amount in pounds.
Date of birth
Date of commencement of the policy.
The original data set contained details of 27,244 claims. Data from some offices could not
be used because of problems associated with missing claims information. Data from these
5
445
443
offices, both in-force and claims, were removed from our analyses, leaving us with data
from a total of 13 offices consisting of 19,127 claims and approximately 18,000,000 policy
years of exposure.
An additional covariate included in our data was Sales channel, which took one of five
possible values: Bancassurer, Direct sales, IFA, Other or Unknown. There was a very close
association between Sales channel and Office 6 of the 13 offices used only one sales
channel, a further three offices used just two, and one office classified all its data as sales
channel Unknown. We decided that it was unnecessary to include both Sales channel and
Office as possible covariates so we excluded the former from our analyses.
For Joint Life policies, both lives are included in the in-force data, but only one claim
can occur.
The presence of duplicate policies in the data would not affect point estimates of the
claim diagnosis rates, but would affect the standard deviations (SD) of these estimates.
This would distort the goodness of fit statistics, making the fit appear to be worse. No
attempt was made to remove duplicate policies from either our in-force or our claims files.
An investigation by the CMI of their 19992004 data indicated that this was not likely to
be a serious problem [see CMI WP 33 (2008, paragraphs 4.104.12)].
E. Ozkok et al.
6
446
444
Table 1.
Benefit type
Full accelerated
Stand alone
Joint/single life
Joint life
Single life
Gender
Female
Male
Smoker status
Non-smoker
Smoker
Cause of claim
Coronary artery bypass graft
Cancer
Death
Heart attack
Kidney failure
Major organ transplant
Multiple sclerosis
Other
Stroke
Total and permanent disability
Type of claim
Critical illness
Death
16,875 (88.2%)
2252 (11.8%)
9743 (50.9%)
9384 (49.1%)
8173 (42.7%)
10,954 (57.3%)
14,129 (73.9%)
4998 (26.1%)
393 (2.1%)
9381 (49.0%)
3371 (17.6%)
2220 (11.6%)
110 (0.6%)
36 (0.2%)
825 (4.3%)
1265 (6.6%)
1027 (5.4%)
499 (2.6%)
15,756 (82.4%)
3371 (17.6%)
Mean delay
Number of observations
Percentage of observations having
both dates
Diagnosis to
notification
Notification to
admission
Admission to
settlement
Diagnosis to
settlement
93
15,585
81
80
9190
48
18
9752
51
185
15,860
83
7
447
445
can estimate missing dates of diagnosis by subtracting from the date of settlement the
median of the CDD. The sensitivity of our final results to the use of the median in this
context is discussed in Paper II.
All claim records had a year of settlement, but, in some cases, the exact date of
settlement was missing. For all these cases, the date of diagnosis was given and a date of
settlement was estimated using the median of the CDD.
The modelling of the CDD is discussed in Section 4 below [see also CMI WP 14
(2005)].
f u; a; s; s
a s u=s
s a1
u1 u=s
(1)
where f(.) is the probability density function of the CDD, a and t are (positive) shape
parameters and s is a (positive) scale parameter. With this parametrisation, the kth
moment of the delay between diagnosis and settlement is:
k
k
sk C a
C 1
Ca
(2)
s
s
for a > k=s, and otherwise.
Note that the CMI has also used Burr distributions to fit CDDs to CI data sets
(CMI WP 33 (2008)), but not in a GLM setting.
(3) The mean of our CDD is a loglinear function of a selected set of covariates denoted
by the vector u, so that:
EX expb hT
(3)
where X denotes the delay and b is a set of regression coefficients. The equation for
the mean given by Equation (3) is achieved by modelling the parameter s as follows:
Ca
expb hT :
s
1
1
C 1
C a
s
s
E. Ozkok et al.
8
448
446
(4)
(5)
where di is the delay for the ith claim and n is the number of claims.
(4) The parameters a and t and the regression coefficients b were modelled using
Bayesian techniques using the full data set consisting of 19,127 claims. Missing event
dates were treated as additional parameters and estimated using their posterior
predictive distributions. Truncation was used where appropriate, so that, for example,
a missing date of diagnosis could not be before the date of commencement of the
policy or after the date of notification of the claim. The Bayesian analysis results in a
posterior distribution for each parameter, and, in particular, for each missing date of
diagnosis. A point estimate of the missing date could be, for example, the median of
the posterior distribution.
(5) Gibbs variable selection was used to determine which covariates should be retained in
the models.
4.2. Allowing for business growth
The analysis in Ozkok et al. (2012b) did not take into account one significant factor:
business growth. For almost all the offices contributing to our data and for almost all
years, the number of CI policies in force increased year on year. If this is not taken into
account, it can introduce bias into the modelling of the CDD. Recall that our claims data
consist of claims settled in the years 19992005. For claims settled in any of these 7 years,
those with relatively short delays relate to claims from policies in force in more recent
years; those with relatively longer delays relate to policies in force in earlier years. The
growth in the numbers of policies in force means that claims with shorter delays are likely
to be relatively over-represented in our data. We allow for business growth in our
modelling of the CDD as follows:
(1) For each office and each year of diagnosis we assign a growth rate, denoted GR. This
depends only on office and year of diagnosis and not on any other characteristic of
the claim, for example type of policy (FA or SA) or age of the policyholder.
(2) For the most recent year for which the office contributed data, 2005 for all but 2 of the
13 offices, GR is set at 1. For each earlier year of diagnosis, GR is set equal to the ratio
of the average number of policies in force in the following year to the average number
in force in the year in question. In this context, average number in force is the average
of the numbers of policies in force at the start and at the end of the year. For years of
diagnosis prior to the earliest for which the office contributed data, the growth rate is
assumed to be the same as in the earliest year for which data exists.
9
447
449
(3) For each office and each year of diagnosis we assign a growth factor, denoted GF. The
growth factor is the product of the growth rates for that year of diagnosis and all
subsequent years of diagnosis up to the final year for which the office contributed
data.
(4) For the purposes of parameter estimation, the parameter s in the three parameter
Burr distribution is replaced in the loglikelihood function (5) by sw, where:
p
sw s= GF :
The effect of this is to decrease the variance for this claim by a factor GF, as can be
seen from Equation (4), so giving more weight to data from years where relatively few
policies were in force. Using weights inversely proportional to the variance is common
in weighted least squares estimation (see, for example, Greene (1990)).
The procedure described above requires the date of diagnosis to be known so that GF
can be estimated. For claims where the date of diagnosis was not known, an iterative
procedure was used. A CDD was parameterised without allowing for business growth and
a preliminary estimate of the year of diagnosis was calculated from the date of settlement
minus the median of the CDD. A value for GF was calculated using this preliminary
estimate, a revised CDD was parameterised and a revised estimate of the year of diagnosis
was calculated. The process ended when two consecutive estimates of the year of diagnosis
were the same this never took more than three iterations.
4.3. Details of the covariates
Details of the covariates used in the modelling of the CDD are given in Table 3. These
covariates are labelled x and u1 u9. The values of x and u1 u7 for each claims record
have been standardised by subtracting the mean and dividing by the standard deviation,
calculated from the claims data. This makes sense for covariates where the nonstandardised value can be very large, for example, benefit amount, and has been done
for consistency for other covariates. For example, Sex has been coded 0 for females, 1 for
males and then standardised for each record by subtracting the mean, 0.573, and
dividing by the standard deviation, 0.495, so that a claims record for a female has a
value u1 (0 0.573)/0.4951.158.
Note that Settlement year is a covariate but Year of diagnosis is not. It would not be
appropriate to have both as covariates. We used the former because we have full
information about Settlement year whereas we have had to estimate the latter in some
cases. This causes minor complications in the estimation of diagnosis rates (see Section 5.4).
Equation (3) can then be written in more detail as follows:
EX expb hT b0 b1 x
7
X
(6)
j1
E. Ozkok et al.
10
448
450
Table 3.
Covariate
x
u1
u2
u3
u4
u5
u6
u7
u8
u9
Age
Sex
Benefit type
Smoker status
Policy type
Settlement year
Benefit amount ()
Policy duration (days)
Office
Cause of claim
Number of levels
2
2
2
2
7
13
10
Additional information
Age last birthday
F0, M 1
FA 0, SA 1
N0, S1
JL 0, SL1
1999 0 1,2000 0 2, . . .
Continuous
Continuous
Mean
SD
44.424
0.573
0.118
0.261
0.491
4.917
55,397
1167
9.478
0.495
0.322
0.439
0.499
1.786
56,988
946
1. CABG
2. Cancer
3. Death
4. Heart attack
5. Kidney failure
6. Major organ transplant
7. Multiple sclerosis
8. Other
9. Stroke
10. Total and permanent disability
Intercept
Benefit type
Policy type
Benefit amount
Policy duration
Office
11
449
451
Parameter
Mean
SD
Covariate
Parameter
Mean
SD
b0
b3
b5
b7
b8
b9;Office1
b9;Office2
b9;Office3
b9;Office4
b9;Office5
b9;Office6
b9;Office7
b9;Office8
b9;Office9
b9;Office10
b9;Office11
b9;Office12
b9;Office13
5.469
0.023
0.034
0.032
0.098
0.303
0.215
0.205
0.249
0.090
0.050
0.129
0.106
0.315
0.201
0.158
0.209
0.581
0.025
0.006
0.006
0.007
0.007
0.022
0.020
0.061
0.050
0.037
0.085
0.118
0.021
0.025
0.033
0.017
0.023
0.047
Cause of claim
b10;Cause1
b10;Cause2
b10;Cause3
b10;Cause4
b10;Cause5
b10;Cause6
b10;Cause7
b10;Cause8
b10;Cause9
b10;Cause10
a
t
0.145
0.101
0.542
0.029
0.129
0.194
0.152
0.006
0.215
0.121
0.618
2.570
0.040
0.018
0.026
0.023
0.079
0.116
0.033
0.028
0.027
0.056
0.015
0.034
the delay between diagnosis and settlement for each of these 11 scenarios is shown in
Table 6, together with the standard deviation and some percentage points of the estimate
of the mean. Note that these are not the standard deviation and percentage points of the
posterior distribution itself the standard deviation is infinite in every case, as pointed
out in comment (4) earlier in this section. The means in Table 6 can be obtained from
Equation (6), noting that for this model u (u2,u4,u6,u7,u8,u9), and using the information
in Table 3 and the parameters in Table 4. For example, the mean delay for scenario 1 is
calculated as follows:
EX exp5:469 0:023 0 0:118=0:322 0:034 0 0:491=0:499 0:032
50 000 55 397=56 988 0:098 1460 1167=946 0:158 0:101
174 days
Table 5. Scenarios for prediction of the CDD under the best fitting model.
Scenario
Benefit type
Joint/single life
Benefit amount
Policy duration
Office code
Cause of claim
FA
J
50,000
1460
11
Cancer
SA
J
50,000
1460
11
Cancer
FA
S
50,000
1460
11
Cancer
FA
J
10,000
1460
11
Cancer
FA
J
250,000
1460
11
Cancer
FA
J
50,000
365
11
Cancer
Scenario
10
11
Benefit type
Joint/single life
Benefit amount
Policy duration
Office code
Cause of claim
FA
J
50,000
3650
11
Cancer
FA
J
50,000
1460
6
Cancer
FA
J
50,000
1460
10
Cancer
FA
J
50,000
1460
11
Death
FA
J
50,000
1460
11
TPD
E. Ozkok et al.
12
452
450
Table 6. The mean of the posterior distribution of the CDD under the different scenarios given in Table 5 using
the best fitting model, and the standard deviation and some percentage points of the estimate of the mean.
Scenario
1
2
3
4
5
6
7
8
9
10
11
Mean
SD
2.5%
50%
97.5%
174
162
186
178
156
195
139
194
249
112
217
4.0
4.8
4.3
4.1
5.1
4.4
4.2
17.8
10.0
3.2
12.7
167
153
178
170
146
187
131
162
230
106
193
174
162
186
178
155
194
139
194
249
112
217
182
172
195
186
166
204
147
231
270
119
243
13
453
451
Table 7. Coefficients for the CDD with all covariates except cause.
Covariate
Parameter
Mean
SD
Covariate
Parameter
Mean
SD
b0
b1
b2
b3
b4
b5
b6
b7
b8
5.288
0.006
0.022
0.010
0.015
0.033
0.008
0.026
0.083
0.020
0.006
0.005
0.005
0.005
0.005
0.006
0.006
0.007
Office
b9;Office1
b9;Office2
b9;Office3
b9;Office4
b9;Office5
b9;Office6
b9;Office7
b9;Office8
b9;Office9
b9;Office10
b9;Office11
b9;Office12
b9;Office13
a
t
0.279
0.203
0.184
0.279
0.112
0.025
0.086
0.122
0.302
0.201
0.170
0.226
0.581
0.543
2.958
0.020
0.018
0.056
0.046
0.035
0.068
0.120
0.019
0.023
0.030
0.017
0.021
0.030
0.011
0.036
Intercept
Age
Sex
Benefit type
Smoker status
Policy type
Settlement year
Benefit amount
Policy duration
Parameter
Mean
SD
Covariate
Parameter
Mean
SD
b0
b1
b2
b3
b4
b5
b6
b7
b8
b9;Office1
b9;Office2
b9;Office3
b9;Office4
b9;Office5
b9;Office6
b9;Office7
b9;Office8
b9;Office9
b9;Office10
b9;Office11
b9;Office12
b9;Office13
5.206
0.014
0.010
0.026
0.011
0.030
0.116
0.036
0.103
0.217
0.095
0.209
0.177
0.190
0.391
0.344
0.004
0.193
0.178
0.120
0.197
0.587
0.022
0.006
0.005
0.005
0.005
0.005
0.006
0.006
0.006
0.019
0.017
0.053
0.043
0.033
0.062
0.112
0.019
0.022
0.029
0.016
0.020
0.028
Cause of claim
b10;Cause1
b10;Cause2
b10;Cause3
b10;Cause4
b10;Cause5
b10;Cause6
b10;Cause7
b10;Cause8
b10;Cause9
b10;Cause10
a
t
0.137
0.120
0.498
0.026
0.106
0.149
0.137
0.003
0.203
0.182
0.660
2.850
0.036
0.018
0.019
0.021
0.067
0.109
0.029
0.024
0.025
0.047
0.015
0.034
E. Ozkok et al.
14
454
452
where gr x and fs(x) are polynomials in age x (last birthday) of degree r and s,
respectively, so that:
kx;h
r
X
i1
i1
ji x
exp
s
X
!
j1
dj x
bh
(8)
j1
15
455
453
Table 9. Definitions of the covariates used in the modelling of the intensity rates.
Covariate
Number of levels
Integer values
u1
u2
u3
u4
u5
Sex
Benefit type
Smoker status
Policy type
Year
2 (F & M)
2 (FA & SA)
2 (N & S)
2 (Joint/Single life)
Numerical (1999, . . .,2005)
u6
Benefit amount
u7
Policy duration
u8
Office
13
Additional information
Age: mean 39.75, SD11.21
Age2: mean 1705, SD 930
F is the base category
FA is the base category
N is the base category
J is the base category
Calendar year of exposure/diagnosis
Year: mean 2002.36, SD1.86
1: Benefit amount B25,000
2: 25,000BBenefit amount B50,000
3: 50,000BBenefit amount B75,000
4: Benefit amount 75,000
Duration between the commencement of the policy and the
beginning of the year of exposure or diagnosis
Duration 0: Policy DurationB1 year
Duration 1: 1 yearBPolicy Duration52 years
Duration 2: 2 years BPolicy Duration 53 years
Duration 3: 3 years BPolicy Duration 54 years
Duration 4: 4 years BPolicy Duration 55 years
Duration 5: Policy Duration5 years
(3) Two covariates, Benefit amount and Policy duration, were treated as continuous in the
modelling of the CDD but are now categorised as shown in Table 9. The reason for
this in both cases is computational convenience.
(4) The regression coefficients for the covariates u6 u8 were chosen so that they summed to 0.
(5) The regression coefficients for the covariates u1 u4 were chosen so that the base
category, as indicated in Table 9, has coefficient zero and the alternative category has,
if appropriate, a non-zero coefficient.
E. Ozkok et al.
16
456
454
time u from the start of a given calendar year, for a given office, classified by x and u. In
conventional actuarial terminology, this is a central exposure. Note that this exposure does
not depend on whether we are estimating cause specific diagnosis rates or all causes rates.
If we knew the number of critical illnesses (cause specific, all causes, including or
excluding deaths, as appropriate) diagnosed in this year, for this office, classified by x and
u, say D(x;u), then, using standard methodology, see, for example, Macdonald (1996), we
could write:
Z
Dx; h
Poisson kx;h
Ex; u; h du
u0
Ex; u; h du
(9)
u0
Ex; u; h du:
u0
The difficulty with this approach is that we do not know the number of critical illnesses
diagnosed in this year; what we know is the number of critical illnesses settled in this year,
and in the subsequent years within the observation period for which this office
contributed data.
17
457
455
probability that a CI diagnosed at time u will be settled by the end of the last year of
contribution is F(tu;x,u). Hence, we can write:
Ex; u; h Ft u; x; h du
u0
^ , is given by:
so that our estimator for the diagnosis rate, k
x;q
Z
^
kx;h Nx; h
Ex; u; h F t u; x; h du
(10)
u0
Ex; u; h F t u; x; h du:
(11)
u0
Comparing Equations (9) and (10), we can see that the numerators are different, as
explained above, and that the denominator of the latter has been reduced by the inclusion
of the term F(t u;x,u) to allow for the probability that a CI diagnosed in the specific year
will be settled within the observation period.
Points to note about this estimation methodology are:
(1) As a starting point, the exposure, E(x,u;u), and the claims count, N(x;u), are classified
by every combination of all possible covariates, as listed in Table 9. It is
computationally convenient, but not essential, that the CDD also includes each of
these covariates. If the claims count relates to a specific cause, then it is convenient for
the model for the CDD to incorporate cause of claim. If it is found that a covariate is
statistically unimportant for the modelling of the diagnosis rates, then the claims
R1
count, N(x;u), and the adjusted exposure, u0 Ex; u; h F t u; x; h du can be
aggregated over the values for that covariate.
(2) The estimator in Equation (10) is based on critical illnesses diagnosed in a particular
year. This year is specified in the covariate u5 for the exposure and the claims count
(see Table 9). However, the CDD used in the estimator has Year of settlement rather
than Year of diagnosis as a covariate (see Table 3). This slight mismatch is unfortunate
but is not likely to be of any numerical significance since:
(i) Year of settlement was not an important covariate for the best fitting CDD, and,
(ii) many claims are settled in, or very soon after the end of, their Year of diagnosis.
(3) The two CDDs in Section 4.5 incorporate Benefit amount (u6) and Policy duration (u7)
as continuous covariates, whereas for the estimation of the diagnosis rates these
covariates have been categorised as shown in Table 9. The value of the CDD in the
calculation of the estimator in Equation (10) uses a mid-point value for these two
covariates, as shown in Table 10, although the mid-point for the upper end is fixed
somewhat arbitrarily. The categories for Benefit amount correspond approximately to
the quartiles from the data.
E. Ozkok et al.
18
458
456
^ b
b p logn
BIC 2 log L^
j; d;
where L() is the likelihood function, j^; d^ and bb are the (vectors of) estimates of the model
parameters, p is the total number of estimated parameters, and, n is the number of data
points.
In principle, we could try to minimise the BIC as a function of the complete set of
parameters. In practice, this would cause computational difficulties and so a pragmatic
approach was employed. We used the following procedure to determine the best model(s):
(1) First we set r 0 and s 1. We then choose the value of d1 and the set of covariates, u,
together with their parameter values, b, which minimises the BIC. In choosing the
optimal set of covariates, we allow for an interaction only if there is a prima facie case
for including it. In practice, the only interaction investigated (and, in some cases,
included) was Age Smoker.
(2) Keeping r 0, we then increase s by 1 and choose the values for d1 and d2, u, and the
corresponding parameter values, b, which minimise the BIC.
(3) We repeat step (2) until the BIC increases. The value of s and the corresponding
values for d1, . . . ,ds, set of covariates, u, and parameters, b, which minimise the BIC,
at least locally, are then our selected values.
(4) For the selected values of s and u we increase the value of r by 1 and check whether,
by optimising over the ks, ds and bs, the BIC decreases or not. If it decreases, we
repeat step (4). If it increases, we choose the value of r which (locally) minimises the
BIC. In almost all cases, the optimal value of r was zero. The only exception was the
diagnosis rate for death for the models in Figures 1 and 3, where the optimal value for
r was 1.
The calculations were carried out using the statistical package R.
Table 10.
Values of benefit amount and policy duration used in the CDDs for the estimation of
CI diagnosis rates.
Benefit amount
Category
1: 525,000
2: 25,000 0 50,000
3: 50,000 0 75,000
4: ]75,000
Policy duration
Mid-point
Category
Mid-point
12,500
37,500
62,500
100,000
0: B1 year
1: 1 0 2 years
2: 2 0 3 years
3: 3 0 4 years
4: 4 0 5 years
5: ]5 years
183 days
548 days
913 days
1278 days
1643 days
2585 days
19
457
459
The results of our modelling are set out and discussed in Paper II. More details of the
procedures and results can be found in Ozkok (2011).
Acknowledgements
The authors are grateful to the Continuous Mortality Investigation for supplying the data
and for advice and support throughout the course of this research, and also to Hacettepe
University for their financial support for one of the authors, Erengul Ozkok, while this
research was being carried out.
References
Association of British insurers. (2011). Statement of best practice for critical illness. London: ABI.
CMI WP 14. (2005). Continuous Mortality Investigation Committee Working Paper 14 Methodology underlying
the 19992002 CMI critical illness experience investigation. Institute of Actuaries and Faculty of Actuaries.
CMI WP 33 (2008). Continuous Mortality Investigation Committee Working Paper 33 A new methodology for
analysing CMI critical illness experience. Institute of Actuaries and Faculty of Actuaries.
CMI WP 43 (2010). Continuous Mortality Investigation Committee Working Paper 43 CMI critical illness
diagnosis rates for accelerated business, 19992004. Institute of Actuaries and Faculty of Actuaries.
CMI WP 50 (2011). Continuous Mortality Investigation Committee Working Paper 50 CMI critical illness
diagnosis rates for accelerated business, 20032006. Institute and Faculty of Actuaries.
CMI WP 52 (2011). Continuous Mortality Investigation Committee Working Paper 52 Causespecific CMI
critical illness diagnosis rates for accelerated business, 20032006. Institute and Faculty of Actuaries.
Dinani, A., Grimshaw, D., Robjohns, N., Somerville, S., Spry, A., Staffurth, J. (2000). A critical review: report of
the critical illness healthcare study group. Presented to the Staple Inn Actuarial Society.
Greene, W. H. (1990). Econometric analysis. New York: Macmillan.
Macdonald, A. S. (1996). An actuarial survey of statistical models for decrement and transition data. I: multiple
state, binomial and Poisson models. British Actuarial Journal 2, 129155.
Ozkok, E. (2011). A stochastic model for critical illness insurance. PhD thesis. HeriotWatt University, 213 p.
Ozkok, E., Srefraris, G., Waters, H. R. & Wilkie, A. D. (2012a). Modelling critical illness claim diagnosis rates II:
results. The Scandinavian Actuarial Journal, DOI:10.1080/03461238.2012.728538.
Ozkok, E., Sreftaris, G., Waters, H. R. & Wilkie, A. D. (2012b). Bayesian modelling of the time delay between
diagnosis and settlement for critical illness insurance using a burr generalised-linear-type model. Insurance:
Mathematics and Economics 50, 266279.
Waters, H. R. (1984). An approach to the study of multiple state models. Journal of the Institute of Actuaries 111,
363374.