Documente Academic
Documente Profesional
Documente Cultură
We will use maximum likelihood estimation to obtain parameter estimates for the desired models.
Here, we will fit a Poisson distribution to this empirical data. The Poisson log-likelihood function can be written as
follows:
We use the maxLik function to obtain our estimate for the single parameter of the following distribution:
Next, we will fit a negative binomial distribution to the data using the parameterisation given in the question. We
define the pdf and log-likelihood functions as follows:
Again, we use the maxLik function to obtain estimates for the parameters r and b.
The results are summarised below:
𝑟 = 1.157110
𝛽 = 0.062878
The final distribution we are fitting to the data is the zero inflated poisson distribution, with pdf and log-likelihood
functions as follows:
𝜙 = 0.450715
𝜃 = 0.132458
We now compare the models using at least two selection criteria to determine the best fit. First, we will consider the
likelihood ratio test. It can be shown that the poisson distribution is a limiting case of the negative binomial
distribution. Hence, we can use the likelihood ratio test as follows:
This test yields a p-value of 2.427 × 10−24 so we reject the Poisson model in favour of the Negative Binomial model.
Similarly, we can perform a likelihood ratio test to compare the zero inflated Poisson and Poisson distributions. The
test is outlined below:
This test yields a p-value of 3.08 × 10−23, so we reject the Poisson model in favour of the Zero-Inflated Poisson
model.
We now consider Akaike's Information Criterion to test the Zero-Inflated Poisson and negative binomial models. The
results are shown below:
As the Negative Binomial model distribution has lower AIC, we conclude that it is a better fit to the data then than
the Zero-Inflated Poisson model. Based on the above selection criteria we suggest that the Negative Binomial model
distribution is the best fitting model to the data.
Question B
First, we must modify the given data so that it is in the format specified in the question. This is shown in the code
below:
We use a log-link function as the data are positive values, which implies that the predicted mean should be positive
regardless of whether or not the linear predictor is positive or negative and the log link function justify this.
Fitting a Poisson GLM to this dataset using the required covariates yields the following:
VehValue is positively correlated with the number of claims made, while VehAge and DrivAge are negatively
correlated with the number of claims made. Thus, we suggest that older drivers make less claims, drivers of older
vehicles make less claims and drivers of more expensive vehicles make more claims. All estimates are significant at
the 1% significance level, which suggest that Poisson GLM can fit the data well.
Question C
𝑦+𝑟−1 1 𝑟 𝛽 𝑦
𝑓(𝑦|𝑟, 𝛽) = ( )( ) ( )
𝑦 1+𝛽 1+𝛽
𝑦+𝑟−1 𝛽 1
↔ ln(𝑓(𝑦|𝑟, 𝛽)) = ln (( )) + 𝑦𝑙𝑛 ( ) − (−𝑟𝑙𝑛 ( ))
𝑦 1+𝛽 1+𝛽
In which:
𝛽
𝜃 = ln ( )
1+𝛽
𝜔=𝜙=1
𝑏(𝜃) = −𝑟𝑙𝑛(1 − 𝑒 𝜃 )
Therefore:
𝑒𝜃
𝑏 ′ (𝜃) = 𝑟 ( )
1 − 𝑒𝜃
𝑒𝜃
𝑏 ′′ (𝜃) = 𝑟 ( )
(1 − 𝑒 𝜃 )2
So:
Question D
𝑦+𝑟−1 𝑟 𝜇
→ ln(𝑓(𝑦|𝑟, 𝜇)) = ln (( )) + 𝑟𝑙𝑛 ( ) + 𝑦𝑙𝑛 ( )
𝑦 𝑟+𝜇 𝜇+𝑟
𝑦+𝑟−1 𝜇 𝑟
→ ln(𝑓(𝑦|𝑟, 𝜇)) = ln (( )) + 𝑦𝑙𝑛 ( ) − (−𝑟𝑙𝑛 ( ))
𝑦 𝜇+𝑟 𝑟+𝜇
𝑦+𝑟−1
𝑐(𝑦, 𝜙) = − ln (( ))
𝑦
𝑒𝜃
→ 𝑏 ′ (𝜃) = 𝑟 ( )=𝜇
1 − 𝑒𝜃
𝑒𝜃 𝜇(𝑟 + 𝜇)
→ 𝑏 ′′ (𝜃) = 𝑟 ( 𝜃 2
)=
(1 − 𝑒 ) 𝑟
Question E
Using previous values of parameters derived from the Poisson model and negative binomial model (for r) as initial
values for the Fisher-Scoring Algorithm:
𝛽0 = −3.060606
𝛽1 = 0.03392496
𝛽2 = −0.08425887
𝛽3 = −0.1801847
𝑟𝑖𝑛𝑖 = 1.157218
Based on the parametrisation in question d and using the log link, we can derive the following equations:
𝑛
→ −𝐸 ( ) = ∑( ) 𝑥𝑖𝑗 𝑥𝑖𝑘 = ∑ 𝑥 𝑥
𝑑𝛽𝑖 𝛽𝑗 𝑉(𝜇𝑖 ) 𝑟 + 𝜇𝑖 𝑖𝑗 𝑖𝑘
𝑖=1 𝑖=1
This is because:
ℎ(. ) = exp (. )
ℎ′ (𝑛𝑖 ) = exp(𝑛𝑖 ) = 𝜇𝑖 (𝐵𝑒𝑐𝑎𝑢𝑠𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑜𝑔 − 𝑙𝑖𝑛𝑘)
𝜇𝑖 (𝑟 + 𝜇𝑖 )
𝑉(𝜇𝑖 ) = 𝑏 ′′ (𝜃𝑖 ) =
𝑟
From this, we construct the algorithm from the following steps:
1. Set the initial parameters for the regressors and the tolerance level
2. Set the coefficients for the mean as i0, i1, i2, i3, i4 from the regressors
3. Set the intercepts from the data provided (constant term as a matrix of 1, value, drive age and vehicle age)
4. Initialise the score vector and iteration number
5. The Fisher-Scoring algorithm will proceed as follow:
a. Initialise the regressors parameters based on previous values (for exposure term, always initialise as
1).
b. Find 𝜇𝑖 𝑠 from the regressors defined above.
c. Find the new parameter of 𝑟 using the new 𝜇𝑖 𝑠 found above.
d. Find each element of the score vector and the information matrix using what we have derived
above.
e. Inverting the Fisher Information Matrix and find the new parameters using the formula:
−1
𝑑2 𝑙𝑜𝑔 (𝐿𝑛 (𝛽|𝑦)) 𝑑𝑙𝑜𝑔 (𝐿𝑛 (𝛽|𝑦))
𝛽𝑡+1 = 𝛽𝑡 + 𝐸 (− ) [ ]
𝑑𝛽𝑖 𝛽𝑗 𝑑𝛽𝑗 |𝛽𝑡
[ |𝛽𝑡 ]
f. Input values for the score vector and find the new value for 𝜇𝑖 based on the new parameters found
in (e).
g. The error for each parameter is the square root for the (𝑖, 𝑖) entry of the inverse Fisher Information
Matrix.
h. Reset the value of r to the value found in (c).
Note that the algorithm will run until all components of the score vector is smaller than the tolerance level.
Question F
Performing a likelihood ratio test to compare the models, we obtain a p-value of 6.988 × 10−10 , indicating that the
indicator variable for the age of the driver is statistically significant at the 5% level. Therefore, we conclude that the
general model is preferred over the nested one, based on the hypothesis and results below:
The hypothesis is:
𝐻0 : 𝑇ℎ𝑒 𝑠𝑚𝑎𝑙𝑙𝑒𝑟/𝑛𝑒𝑠𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 𝑖𝑠 𝑝𝑟𝑒𝑓𝑒𝑟𝑟𝑒𝑑 (𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝐷𝑟𝑖𝑣𝐴𝑔𝑒)
𝐻1 : 𝑇ℎ𝑒 𝑙𝑎𝑟𝑔𝑒𝑟/𝑔𝑒𝑛𝑒𝑟𝑎𝑙 𝑚𝑜𝑑𝑒𝑙 𝑖𝑠 𝑝𝑟𝑒𝑓𝑒𝑟𝑟𝑒𝑑 (𝑤𝑖𝑡ℎ 𝐷𝑟𝑖𝑣𝐴𝑔𝑒)
Question G
Comparing the models in regards to the AIC suggests that the negative binomial is a better fit,
Comparing the models in regards to the BIC yields the same conclusion,
Question H
Question J
The following codes give us the training and the validation dataset as well as the qqplot for both datasets:
The graphs show the randomised quantiles residuals are a good fit to the normal distribution.
Question K
Question L
> qlnorm(c(.9,.95,.99), meanlog = 6.80919, sdlog = 1.18776)
[1] 4152.084 6392.504 14361.883
> qinvgauss(c(.9,.95,.99), mean = 2002.7 , dispersion = 727.2)
[1] 0.08708379 0.34970820 8.75274887
These commands give the VAR at 90%, 95% and 99% for the 2 distributions as:
Question M
Question N,
Advantages of AD test
Anderson-Darling test is much more sensitive to the tails of distribution, whereas Kolmogorov-Smirnov test is more
aware of the centre of distribution.
Also, KS test has a max function which could make it computational expensive.