Poisson, NB and ZIP model comparison for claim frequency data

Question A
First install the required packages and obtain required data,
We will use maximum likelihood estimation to obtain parameter estimates for the desired models.
Here, we will fit a Poisson distribution to this empirical data. The Poisson log-likelihood function can be written as
follows:
We use the maxLik function to obtain our estimate for the single parameter of the following distribution:
The result is given below:
Hence our estimate of the parameter 𝜆 in the Poisson distribution is 0.072757
Next, we will fit a negative binomial distribution to the data using the parameterisation given in the question. We
define the pdf and log-likelihood functions as follows:
Again, we use the maxLik function to obtain estimates for the parameters r and b.
The results are summarised below:
i.e. we have the following parameter estimates:
𝑟 = 1.157110
𝛽 = 0.062878
The final distribution we are fitting to the data is the zero inflated poisson distribution, with pdf and log-likelihood
functions as follows:
We use the maxLik function to obtain our parameter estimates,
The results are summarised below:

i.e. our estimates are:
𝜙 = 0.450715
𝜃 = 0.132458
We now compare the models using at least two selection criteria to determine the best fit. First, we will consider the
likelihood ratio test. It can be shown that the poisson distribution is a limiting case of the negative binomial
distribution. Hence, we can use the likelihood ratio test as follows:
H0: Poisson is better, H1: NB is better
This test yields a p-value of 2.427 × 10−24 so we reject the Poisson model in favour of the Negative Binomial model.
Similarly, we can perform a likelihood ratio test to compare the zero inflated Poisson and Poisson distributions. The
test is outlined below:
This test yields a p-value of 3.08 × 10−23, so we reject the Poisson model in favour of the Zero-Inflated Poisson
model.
We now consider Akaike's Information Criterion to test the Zero-Inflated Poisson and negative binomial models. The
results are shown below:
As the Negative Binomial model distribution has lower AIC, we conclude that it is a better fit to the data then than
the Zero-Inflated Poisson model. Based on the above selection criteria we suggest that the Negative Binomial model
distribution is the best fitting model to the data.
Question B
First, we must modify the given data so that it is in the format specified in the question. This is shown in the code
below:
We use a log-link function as the data are positive values, which implies that the predicted mean should be positive
regardless of whether or not the linear predictor is positive or negative and the log link function justify this.
Fitting a Poisson GLM to this dataset using the required covariates yields the following:
Based on these parameter estimates we make the following conclusions:
VehValue is positively correlated with the number of claims made, while VehAge and DrivAge are negatively
correlated with the number of claims made. Thus, we suggest that older drivers make less claims, drivers of older
vehicles make less claims and drivers of more expensive vehicles make more claims. All estimates are significant at
the 1% significance level, which suggest that Poisson GLM can fit the data well.
Question C
The pdf of Negative Binomial distribution is:
𝑦+𝑟−1 1 𝑟 𝛽 𝑦
𝑓(𝑦|𝑟, 𝛽) = ( )( ) ( )
𝑦 1+𝛽 1+𝛽
𝑦+𝑟−1 𝛽 1
↔ ln(𝑓(𝑦|𝑟, 𝛽)) = ln (( )) + 𝑦𝑙𝑛 ( ) − (−𝑟𝑙𝑛 ( ))
𝑦 1+𝛽 1+𝛽
In which:
𝛽
𝜃 = ln ( )
1+𝛽
𝜔=𝜙=1
𝑏(𝜃) = −𝑟𝑙𝑛(1 − 𝑒 𝜃 )
Therefore:
𝑒𝜃
𝑏 ′ (𝜃) = 𝑟 ( )
1 − 𝑒𝜃
𝑒𝜃
𝑏 ′′ (𝜃) = 𝑟 ( )
(1 − 𝑒 𝜃 )2
So:
𝑉𝑎𝑟(𝑌) = 𝑏 ′′ (𝜃) − 𝑡ℎ𝑖𝑠 𝑖𝑠 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝜔 = 𝜙 = 1

𝛽
(
)
1+𝛽
→ 𝑉𝑎𝑟(𝑌) = 𝑟 ∗ = 𝑟𝛽(1 + 𝛽)
(1 + 𝛽)−2
Question D
We note that 𝐸(𝑌) = 𝜇 = 𝑟𝛽. Hence:

𝑟
𝛽=
𝜇
𝑦+𝑟−1 1 𝑟 𝛽 𝑦
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒: 𝑓(𝑦|𝑟, 𝛽) = ( )( ) ( )
𝑦 1+𝛽 1+𝛽
𝑦 𝑟
𝜇
𝑦+𝑟−1 1 𝑟 )
↔ 𝑓(𝑦|𝑟, 𝜇) = ( )( 𝜇) ( 𝜇
𝑦 1+ 1+
𝑟 𝑟
𝑦+𝑟−1 𝑟 𝜇
→ ln(𝑓(𝑦|𝑟, 𝜇)) = ln (( )) + 𝑟𝑙𝑛 ( ) + 𝑦𝑙𝑛 ( )
𝑦 𝑟+𝜇 𝜇+𝑟
𝑦+𝑟−1 𝜇 𝑟
→ ln(𝑓(𝑦|𝑟, 𝜇)) = ln (( )) + 𝑦𝑙𝑛 ( ) − (−𝑟𝑙𝑛 ( ))
𝑦 𝜇+𝑟 𝑟+𝜇
Here we can identify that:

𝜇
𝜃 = ln ( )
𝜇+𝑟
𝜔=𝜙=1
𝑏(𝜃) = −𝑟𝑙𝑛 ((1 − 𝑒 𝜃 ))
𝑦+𝑟−1
𝑐(𝑦, 𝜙) = − ln (( ))
𝑦
𝑒𝜃
→ 𝑏 ′ (𝜃) = 𝑟 ( )=𝜇
1 − 𝑒𝜃
𝑒𝜃 𝜇(𝑟 + 𝜇)
→ 𝑏 ′′ (𝜃) = 𝑟 ( 𝜃 2
)=
(1 − 𝑒 ) 𝑟
Question E
Using previous values of parameters derived from the Poisson model and negative binomial model (for r) as initial
values for the Fisher-Scoring Algorithm:
𝛽0 = −3.060606
𝛽1 = 0.03392496
𝛽2 = −0.08425887
𝛽3 = −0.1801847
𝑟𝑖𝑛𝑖 = 1.157218
Based on the parametrisation in question d and using the log link, we can derive the following equations:
𝑛
log (𝐿𝑛 (𝛽|𝑦) = ∑(𝑦𝑖 𝜃𝑖 − 𝑏(𝜃𝑖 ) − 𝑐(𝑦𝑖 , 𝜙))

𝑖=1
𝑑𝑙𝑜𝑔 (𝐿𝑛 (𝛽|𝑦)) 𝑛 𝑛

ℎ′ (𝑛𝑖 )𝑥𝑖𝑗 𝑟𝑥𝑖𝑗
→ = ∑(𝑦𝑖 − 𝜇𝑖 ) ( ) = ∑(𝑦𝑖 − 𝜇𝑖 ) ( )
𝑑𝛽𝑗 𝑉(𝜇𝑖 ) 𝑟 + 𝜇𝑖
𝑖=1 𝑖=1
𝑑 2 𝑙𝑜𝑔 (𝐿𝑛 (𝛽|𝑦)) 𝑛

ℎ′ (𝑛𝑖 )2 𝑟𝜇𝑖
𝑛
→ −𝐸 ( ) = ∑( ) 𝑥𝑖𝑗 𝑥𝑖𝑘 = ∑ 𝑥 𝑥
𝑑𝛽𝑖 𝛽𝑗 𝑉(𝜇𝑖 ) 𝑟 + 𝜇𝑖 𝑖𝑗 𝑖𝑘
𝑖=1 𝑖=1
This is because:
ℎ(. ) = exp (. )
ℎ′ (𝑛𝑖 ) = exp(𝑛𝑖 ) = 𝜇𝑖 (𝐵𝑒𝑐𝑎𝑢𝑠𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑜𝑔 − 𝑙𝑖𝑛𝑘)
𝜇𝑖 (𝑟 + 𝜇𝑖 )
𝑉(𝜇𝑖 ) = 𝑏 ′′ (𝜃𝑖 ) =
𝑟
From this, we construct the algorithm from the following steps:
1. Set the initial parameters for the regressors and the tolerance level
2. Set the coefficients for the mean as i0, i1, i2, i3, i4 from the regressors
3. Set the intercepts from the data provided (constant term as a matrix of 1, value, drive age and vehicle age)
4. Initialise the score vector and iteration number
5. The Fisher-Scoring algorithm will proceed as follow:
a. Initialise the regressors parameters based on previous values (for exposure term, always initialise as
1).
b. Find 𝜇𝑖 𝑠 from the regressors defined above.
c. Find the new parameter of 𝑟 using the new 𝜇𝑖 𝑠 found above.
d. Find each element of the score vector and the information matrix using what we have derived
above.
e. Inverting the Fisher Information Matrix and find the new parameters using the formula:
−1
𝑑2 𝑙𝑜𝑔 (𝐿𝑛 (𝛽|𝑦)) 𝑑𝑙𝑜𝑔 (𝐿𝑛 (𝛽|𝑦))
𝛽𝑡+1 = 𝛽𝑡 + 𝐸 (− ) [ ]
𝑑𝛽𝑖 𝛽𝑗 𝑑𝛽𝑗 |𝛽𝑡
[ |𝛽𝑡 ]
f. Input values for the score vector and find the new value for 𝜇𝑖 based on the new parameters found
in (e).
g. The error for each parameter is the square root for the (𝑖, 𝑖) entry of the inverse Fisher Information
Matrix.
h. Reset the value of r to the value found in (c).
Note that the algorithm will run until all components of the score vector is smaller than the tolerance level.
The code for the algorithm is below:

The first and the last iteration for the Fisher-Scoring Algorithm is presented as follow:
After the algorithm, what we obtained was:
The model fitted is:
log(𝜇𝑖 ) = −3.064778 + 0.034337 ∗ 𝑉𝑒ℎ𝑉𝑎𝑙𝑢𝑒 − 0.179919 ∗ 𝐷𝑟𝑖𝑣𝑒𝐴𝑔𝑒 − 0.08255 ∗ 𝑉𝑒ℎ𝑖𝑐𝑙𝑒𝐴𝑔𝑒 + 𝐸𝑥𝑝𝑜𝑠𝑢𝑟𝑒
Question F
Performing a likelihood ratio test to compare the models, we obtain a p-value of 6.988 × 10−10 , indicating that the
indicator variable for the age of the driver is statistically significant at the 5% level. Therefore, we conclude that the
general model is preferred over the nested one, based on the hypothesis and results below:
The hypothesis is:
𝐻0 : 𝑇ℎ𝑒 𝑠𝑚𝑎𝑙𝑙𝑒𝑟/𝑛𝑒𝑠𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 𝑖𝑠 𝑝𝑟𝑒𝑓𝑒𝑟𝑟𝑒𝑑 (𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝐷𝑟𝑖𝑣𝐴𝑔𝑒)
𝐻1 : 𝑇ℎ𝑒 𝑙𝑎𝑟𝑔𝑒𝑟/𝑔𝑒𝑛𝑒𝑟𝑎𝑙 𝑚𝑜𝑑𝑒𝑙 𝑖𝑠 𝑝𝑟𝑒𝑓𝑒𝑟𝑟𝑒𝑑 (𝑤𝑖𝑡ℎ 𝐷𝑟𝑖𝑣𝐴𝑔𝑒)
Question G
Comparing the models in regards to the AIC suggests that the negative binomial is a better fit,
Comparing the models in regards to the BIC yields the same conclusion,
Question H
The expression for Pearson’s residuals is given as:

𝑦𝑖 − 𝜇𝑖
√𝑉𝑎𝑟(𝜇𝑖 )
Given that 𝜇𝑖 = exp (ηi ):
For Poisson GLM:

𝑦𝑖 − 𝜇𝑖
𝑃𝑒𝑎𝑟𝑠𝑜𝑛′ 𝑠 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 =
√𝜇𝑖
For Negative Binomial GLM:
𝑟(𝑦𝑖 − 𝜇𝑖 )
𝑃𝑒𝑎𝑟𝑠𝑜𝑛′ 𝑠 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 =
𝜇𝑖 (𝑟 + 𝜇𝑖 )
The expression of the 𝑖 𝑡ℎ deviance residual is:
𝑠𝑖𝑔𝑛(𝑦𝑖 − 𝜇̂ 𝑖 )√𝑑𝑖 − 𝑐𝑜𝑛𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑡𝑜 𝑡ℎ𝑒 𝑖 𝑡ℎ 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑠𝑐𝑎𝑙𝑒𝑑 𝑑𝑒𝑣𝑖𝑎𝑛𝑐𝑒
Therefore, for Poisson GLM:

𝑛
∗ (𝛽|𝑦)
𝐷 = 2 (∑(𝑦𝑖 log(𝑦𝑖 ) − 𝑦𝑖 − log(𝑦𝑖 !)) −)
𝑖=1
𝑦𝑖
𝑑𝑖 = 2 (𝑦𝑖 log ( ) + (𝜇̂ 𝑖 − 𝑦𝑖 ))
𝜇𝑖
𝑦𝑖
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒: 𝑑𝑒𝑣𝑖𝑎𝑛𝑐𝑒 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 = 𝑠𝑖𝑔𝑛(𝑦𝑖 − 𝜇̂ 𝑖 )√2 (𝑦𝑖 log ( ) + (𝜇̂ 𝑖 − 𝑦𝑖 ))
𝜇𝑖
For Negative Binomial GLM:
𝐷 ∗ (𝛽|𝑦) = 2 (𝑙(𝛽̂𝑆𝐴𝑇 ) − 𝑙(𝛽̂𝐹𝑈𝐿𝐿 ))
𝑛
𝑦𝑖 + 𝑟 − 1 𝑟 𝑦𝑖
→ 𝐷 ∗ (𝛽|𝑦) = 2 (∑ (ln (( )) + 𝑟𝑙𝑛 ( ) + 𝑦𝑖 𝑙𝑛 ( ))
𝑦𝑖 𝑟 + 𝑦𝑖 𝑦𝑖 + 𝑟
𝑖=1
𝑛
𝑦𝑖 + 𝑟 − 1 𝑟 𝜇𝑖
− ∑ (ln (( )) + 𝑟𝑙𝑛 ( ) + 𝑦𝑖 𝑙𝑛 ( )))
𝑦𝑖 𝑟 + 𝜇𝑖 𝜇𝑖 + 𝑟
𝑖=1
𝑛
∗
𝑟 + 𝜇𝑖 𝑦𝑖 (𝜇𝑖 + 𝑟)
→ 𝐷 (𝛽|𝑦) = 2 (∑ (𝑟𝑙𝑛 ( ) + 𝑦𝑖 𝑙𝑛 ( )))
𝑟 + 𝑦𝑖 𝜇𝑖 (𝑦𝑖 + 𝑟)
𝑖=1
Therefore:
𝑟 + 𝜇𝑖 𝜇𝑖 + 𝑟
𝑑𝑖 = 2 (𝑟𝑙𝑛 ( ) + 𝑦𝑖 𝑙𝑛 ( ))
𝑟 + 𝑦𝑖 𝑦𝑖 + 𝑟
𝑟 + 𝜇𝑖 𝜇𝑖 + 𝑟
→ 𝑑𝑒𝑣𝑖𝑎𝑛𝑐𝑒 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 = 𝑠𝑖𝑔𝑛(𝑦𝑖 − 𝜇̂ 𝑖 )√2 (𝑟𝑙𝑛 ( ) + 𝑦𝑖 𝑙𝑛 ( ))
𝑟 + 𝑦𝑖 𝑦𝑖 + 𝑟
Using the R command 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠() with type = “deviance” or “pearson”, we have the following plots:
The top plots are for the deviance residuals of the 2

models and the bottom plots are the Pearson residuals
of the models. For all plots, we notice that the residual
patterns are not linear, suggesting that both models are
poor fit for the data.
Question I
Question J
The following codes give us the training and the validation dataset as well as the qqplot for both datasets:
The graphs show the randomised quantiles residuals are a good fit to the normal distribution.
Question K
The histogram can be plotted using the following codes:
1. For the Lognormal distribution:

2. For the Inverse Gaussian distribution:
We note that these two distributions are not nested with

each other so we can’t apply the Likelihood Ratio test for
Inverse Gaussian and Lognormal.
Question L
> qlnorm(c(.9,.95,.99), meanlog = 6.80919, sdlog = 1.18776)
[1] 4152.084 6392.504 14361.883
> qinvgauss(c(.9,.95,.99), mean = 2002.7 , dispersion = 727.2)
[1] 0.08708379 0.34970820 8.75274887
These commands give the VAR at 90%, 95% and 99% for the 2 distributions as:
LN(6.80919, 1.18776) IG (2002.7, 727.2)

90% 4152.084 0.08708379
95% 6392.504 0.34970820
99% 14361.883 8.75274887
Question M
Question N,
Advantages of AD test
Anderson-Darling test is much more sensitive to the tails of distribution, whereas Kolmogorov-Smirnov test is more
aware of the centre of distribution.
Also, KS test has a max function which could make it computational expensive.
How the p-value was calculated?

For the calculation of the p-value, use Monte Carlo simulation, under the null hypothesis, one simulation involves
first simulating 4,624 observations from each model (e.g. sample size) to calculate the test statistics. Then, use
10,000 simulations to estimate the p-value. The estimated p-value is the proportion of simulations for which the test
exceeds the K-S test statistic. and give the conclusion of the test.

Poisson, NB and ZIP model comparison for claim frequency data

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Poisson, NB and ZIP model comparison for claim frequency data

Încărcat de

Drepturi de autor:

Formate disponibile

Question A

First install the required packages and obtain required data,

The result is given below:

Hence our estimate of the parameter 𝜆 in the Poisson distribution is 0.072757

i.e. we have the following parameter estimates:

We use the maxLik function to obtain our parameter estimates,

The results are summarised below:

H0: Poisson is better, H1: NB is better

Based on these parameter estimates we make the following conclusions:

The pdf of Negative Binomial distribution is:

𝑉𝑎𝑟(𝑌) = 𝑏 ′′ (𝜃) − 𝑡ℎ𝑖𝑠 𝑖𝑠 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝜔 = 𝜙 = 1

We note that 𝐸(𝑌) = 𝜇 = 𝑟𝛽. Hence:

Here we can identify that:

𝑏(𝜃) = −𝑟𝑙𝑛 ((1 − 𝑒 𝜃 ))

log (𝐿𝑛 (𝛽|𝑦) = ∑(𝑦𝑖 𝜃𝑖 − 𝑏(𝜃𝑖 ) − 𝑐(𝑦𝑖 , 𝜙))

𝑑𝑙𝑜𝑔 (𝐿𝑛 (𝛽|𝑦)) 𝑛 𝑛

𝑑 2 𝑙𝑜𝑔 (𝐿𝑛 (𝛽|𝑦)) 𝑛

The code for the algorithm is below:

After the algorithm, what we obtained was:

The model fitted is:

log(𝜇𝑖 ) = −3.064778 + 0.034337 ∗ 𝑉𝑒ℎ𝑉𝑎𝑙𝑢𝑒 − 0.179919 ∗ 𝐷𝑟𝑖𝑣𝑒𝐴𝑔𝑒 − 0.08255 ∗ 𝑉𝑒ℎ𝑖𝑐𝑙𝑒𝐴𝑔𝑒 + 𝐸𝑥𝑝𝑜𝑠𝑢𝑟𝑒

The expression for Pearson’s residuals is given as:

For Poisson GLM:

Therefore, for Poisson GLM:

The top plots are for the deviance residuals of the 2

The histogram can be plotted using the following codes:

1. For the Lognormal distribution:

We note that these two distributions are not nested with

LN(6.80919, 1.18776) IG (2002.7, 727.2)

How the p-value was calculated?

S-ar putea să vă placă și