Sunteți pe pagina 1din 8

The 3rd Global Virtual Conference

April, 6. - 10. 2015, www.gv-conference.com

Automobile insurance pricing with Generalized


Linear Models
Mihaela David
Faculty of Economics and Business Administration
Alexandru Ioan Cuza University of Iasi
Iasi, Romania
Abstract - The fundamental purpose of insurance is to provide
financial protection, offering an equitable method of transferring
the risk of a contingent or uncertain loss in exchange for
payment. Considering that not all the risks are equal, an
insurance company should not apply the same premium for all
insured risks in the portfolio. A commonly method to calculate
the insurance premium is to multiply the conditional expectation
of the claim frequency with the expected cost of claims. In this
paper, Generalized Linear Models are employed to estimate the
two components of the premium given the observed
characteristics of the policyholders. A numerical illustration
based on the automobile insurance portfolio of a French
insurance company is included to support this approach.
Keywords- insurance pricing, insurance premium, frequency of
claims, cost of claims, Generalized Linear Models

I.

INTRODUCTION

The fundamental role of insurance is to provide financial


safety and security against a possible loss on a particular event.
The entire process of insurance consists in offering an equitable
method of transferring the risk of a contingent or uncertain loss
in exchange for payment. Considering that not all the risks are
equal, an insurance company should not apply the same
premium for all insured risks in the portfolio. The necessity of
different charging tariffs is emphasized by the insurance
portfolio heterogeneity that leads directly to the so-called
concept of adverse selection. This basically presumes charging
same tariff for the entire portfolio, meaning that the
unfavourable risks are also assured (at a lower price) and as an
adverse effect, it discourages insuring medium risks. The idea
behind non-life insurances pricing comes precisely in an
attempt to combat the adverse selection. Therefore, it is
extremely important for the insurer to divide the insurance
portfolio in sub-portfolios based on certain influence factors. In
this way, the policyholders with similar risk profile will pay the
same reasonable insurance premium.
A usual method to calculate the premium is to find the
conditional expectation of the claim frequency given the risk
characteristics and to combine it with the expected cost of
claims. The process of measurement and construction of a fair
tariff structure is performed by the actuaries, who over time
proposed and applied different statistical models. In the context
of actuarial science, linear regression was employed to evaluate
the insurance premium. Considering the complexity of the
phenomenon to be modelled and some methodological aspects
related to the insurance data, the assessment of insurance
premium does not fit anymore in the framework of linear

regression. Antonio and Valdez [2] point out that, after decades
dominated by statistically unsophisticated models, it is now
recognized that Generalized Linear Models (GLMs) constitute
the efficient tool for risk classification. Kaas, Goovaerts,
Dhaene and Denuit [14] state that models allow the random
deviations from the mean to have a distribution different from
the normal and the mean of the random variable may be a
linear function of the explanatory variables on some other
scale. GLMs allow modelling a non-linear behaviour and a
non-Gaussian distribution of residuals. This aspect is very
useful for the analysis of non-life insurances, where claim
frequency and costs follows an asymmetric density that is
clearly non-Gaussian. GLMs development has contributed to
quality improvement of the risk prediction models and to the
process of establishing a fair tariff or premium given the nature
of the risk.
This paper present an example based on real-life insurance
data in order to illustrate several techniques in the framework
of GLMs. These illustrations are relevant for the insurers to
implement the used techniques in practice in order to obtain
equitable and reasonable premiums. In this purpose, the
structure of the paper is as follows. Section 2 presents the basic
distributions that can be used to model the two components of
pure premium, namely the frequency and cost of claims. In this
part of paper, the reasons for using these distributions are
explained and a special test concerning the difference between
claim frequency models is also described. Section 3 is
dedicated to an empirical application using a French
automobile insurance portfolio. This is followed by a
discussion and an interpretation of the obtained results.
Concluding remarks are summarized in Section 4.
II.

METHODOLOGY APPROACH

The methodological section of this paper aims to present


some specific issues related to the GLMs and the role of these
models within non-life insurance business. The main focus is
on the definition, interpretation and presentation of the
properties and limits of the insurance premium calculation
models.
A. Generalized Linear Models (GLMs)
The implementation merits of Generalized Linear Models,
both in actuarial science and statistics, goes to British actuaries
from City University, John Nelder and Robert Wedderburn. In
the paper published in 1972, they demonstrate that the
generalization of the linear modeling allows the deviation from
the assumption of normality, extending the Gaussian model to

Social Sciences - Economics and Business

eISSN: 1339-9373, cdISSN: 1339-2778


- 32 -

ISBN: 978-80-554-1003-6

The 3rd Global Virtual Conference


April, 6. - 10. 2015, www.gv-conference.com

a particular family of distribution, namely the exponential


family. Members belonging to this family include, but not
limited to, the Normal, Poisson, Binomial and Gamma
distributions.
Nelder and Wedderburn [17] suggest that the estimation of
the GLMs parameters to be performed through maximum
likelihood method, so that the parameter estimates are obtained
through an iterative algorithm. The contribution of Nelder in
developing and completing the GLMs theory continues while
collaborating with the Irish statistician Peter McCullach, whose
paper [16] offers detailed information on the iterative algorithm
and the asymptotic properties of the parameter estimates.
Since the implementation of GLMs techniques, the
complexity of papers is remarkable, many researchers succeed
to highlight, develop or improve the assumptions imposed by
the practical application of these models in non-life insurance.
Among the precursors of GLMs approach as the main
statistical tool in determining the insurance tariffs is noted in
[15]. Resorting to these models, he aims to estimate the
probability of risk occurrence in automobile insurance, to
establish the insurance premium and also to measure the
effectiveness of the models used to estimate it. Charpentier and
Denuit [6] have a significant contribution in actuarial area,
succeeding to cover in a modern perspective all the aspects of
insurance mathematics. Boucher, Denuit and Guillen [4]
provide a comprehensive reference on several aspects of a
priori risk modeling, with an emphasis on claim frequency.
Frees [10] employs the main statistical regression models for
insurance, illustrating several case studies. Other useful
reference have pointed out the contribution of Jong and Zeller
[13], Kaas, Goovaerts, Dhaene and Denuit [14] or Ohlsson and
Johansen [18], who highlight the GLMs particularities in nonlife insurance risk modeling.
GLMs are defined as an extension of the Gaussian linear
models framework that is derived from the exponential family.
The purpose of these models is to estimate an interest variable
( ) depending on a certain number of explanatory variables
( ). During the actuarial analysis, considering that the
exogenous variables represent information about the insured or
his assets, the dependent variable can be one of the following:

a binary variable that can only have the value zero or


one, the phenomenon studied in this case being the
probability of a risk occurrence, for which it applies
the binomial regression models (logit, probit and loglog complementary models);

a discrete variable, with values belonging to the set of


natural numbers, while following the modeling
frequency of the risk occurrence. In this case the
Poisson regression model will be applied;

a continuous variable, with values belonging to the set


of positive real numbers, while following the
econometric analysis of the risk occurrence cost. In this
case the Gamma regression model is considered.

Conditioned by the explanatory variables ( ), the random


variables
are considered to be independently, but
not identically distributed, that have the probability density

given by the following function, specializing to a probability


density function in the continuous case and a probability mass
function in the discrete case as in [17]:
(

( )

))

( )

where represents a subassembly that belongs to or


set, is the natural parameter and is the scale parameter. In
binomial and Poisson distributions, the scale parameter has the
value 1, and for the Gamma distribution is unknown and has
to be estimated.
Similar with the Gaussian model approach, the purpose of
the econometrical modeling is to obtain the expected values of
the dependent variables through conditional means, given
independent observations. In this case, the searched parameters
, allow writing a function ( ) for the mean ( ) of
the variable
as a linear combination of the exogenous
variables :

( )

( )

the monotonous and differentiable function is known as a


link function because it connects the linear predictor with
the mean .
Jong and Zeller [13] cover in a practical and rigorous
manner the standard exponential family distributions, focusing
on issues related to insurance data and discussing all techniques
that are illustrated on data sets relevant to insurance. As the
objective of this paper is to establish the insurance premium,
further are introduced and detailed only the models employed
to estimate the frequency and cost of claims.
B. Estimation models of claim frequency
Poisson model
The statistical analysis of counts data, known in the
econometric literature as rare events, has a long and rich
history. The Poisson distribution was derived as a limiting case
of the binomial distribution by Poisson (1837) and exemplified
later by Bortkiewicz (1898) in the famous study regarding the
annual number of deaths caused by the mules kicks in the
Prussian army. Cameron and Trivedi [5] have an important
contribution to the development of counts regression models.
They have managed to highlight the particularities of Poisson
regression approach in estimating the claim frequency as a
particular case of GLMs.
Within non-life insurance business, it has been
demonstrated that the usage of the GLMs techniques in order to
estimate the frequency of claims, has an a priori Poisson
structure. In actuarial literature, the Poisson model is presented
as the modeling archetype of the event counts as in [2], also
known in insurance as the frequency of claims. In many papers
as in [7, 8, 9, 11, 20], the Poisson model is considered the main
tool for the estimation of claim frequency in non-life insurance.

Social Sciences - Economics and Business

eISSN: 1339-9373, cdISSN: 1339-2778


- 33 -

ISBN: 978-80-554-1003-6

The 3rd Global Virtual Conference


April, 6. - 10. 2015, www.gv-conference.com

The discrete random variable


(claim frequency or
observed number of claims), conditioned by the vector of
explanatory variables
(the insured characteristics), is
assumed to be Poisson distributed. Therefore, for the insured ,
the probability that the random variable
takes the value
(
), is given by the density:
(

( )

The Poisson distribution implies a particular form of


heteroskedasticity, leading to the equidispersion hypothesis or
the equality of the mean and variance of claim frequency.
Thus, the Poisson distribution parameter has a double meaning,
representing at the same time the mean and the variance
distribution [3]:
(

( )

The standard estimator for this model is the maximum


likelihood estimator. The likelihood function is defined as
follows:

( )

( )

Using a logarithm in both sides of the previous equation, it


is obtained the log-likelihood function:
( )

The main limit of Poisson model is that the equidispersion


assumption is not generally respected in practice, leading to
overdispersion, meaning that the conditioned variance is
greater than the mean of claim frequency. One of the most
important implications of overdispersion is related to the
underestimation of the regression parameter which means that
some risk factors could appear to be significant when actually
they have no considerable influence on the variation of claim
frequency. In this regard, the literature presents the quasiPoisson model as a several enhanced models in order to correct
the Poisson overdispersed data.
Quasi-Poisson model
McCullagh and Nelder [16], based on the data provided by
Lloyd's Register of Ships, apply the quasi-Poisson model to
explain the frequency of damages caused to the cargo ships.
Allain and Brenac [1] sustain also the use of quasi-Poisson
model in the presence of a high level of dispersion, arguing that
the approach of Poisson model could imply the acceptance of
some explanatory variables as apparently significant factors
which in reality they do not have any important impact on the
studied phenomen.
In a road accidents study the overdispersion is modelled
through the quasi-Poisson regression model, which involves a
dispersion parameter, describing the incompatibility between
the variance and the mean as in [1]:
(

The principle of this model is to estimate the regression


parameters to minimize the quasi-likelihood:

( )
{

It can be easy verified that the first two partial derivatives


of the log-likelihood function exists and are expressed as
follows:
( )

( )

( )

The maximum likelihood estimators are the solutions of


the previous likelihood equations that are obtained by
differentiating the log-likelihood in terms of the regression
coefficients and solving them to zero. The equations forming
the system are not generating explicit solutions and therefore
they have to be solved numerically using an iterative algorithm.
As underlined in [6], the most common iterative methods are
considered Newton-Raphson and Fisher information. Hilbe
[12] explains at length that this type of algorithm functions by
updating the estimates based on the value of log-likelihood
function.

where
is the deviance function of Poisson model
determined as follows:
(

( )

( )

( | ))

As mentioned in McCullagh and Nelder [16], the


overdispersion parameter
is estimated by equating the
Pearson
statistic to the residual degrees of freedom as
follows:
(
)

{
}
( )
(
)
where n represents the number of observations and p is the
number of parameters from the regression model.
The parameter estimatates ( ) are identical to those for the
Poisson model, which shows that estimates are identical, but
the standard errors of the estimators for the quasi-Poisson
model are modified by the dispersion factor . McCullagh
and Nelder [16] demonstrate that the parameter estimators
equality of the two models derived from the shape of the
likelihood function corresponding to Poisson distribution and

Social Sciences - Economics and Business

eISSN: 1339-9373, cdISSN: 1339-2778


- 34 -

ISBN: 978-80-554-1003-6

The 3rd Global Virtual Conference


April, 6. - 10. 2015, www.gv-conference.com

the quasi-likelihood function belonging to the adjusted Poisson


distribution, respectively. In other words, by maximizing the
two functions, are obtained the same values for the parameter
estimates, differing only the calculation of their standard errors.
According to the literature, although the quasi-Poisson
distribution is often considered as a reliable alternative, it
presents the disadvantage of not providing additional
information compared with the Poisson distribution regarding
the estimated frequency of claims.
Models goodness of fit
Once established the significant independent variables to
explain variation in the dependent variable and the maximum
likelihood estimators obtained ( ), there is required the
assesment of the regression models goodness of fit. This
evaluation is performed using likelihood ratio described below:
(

where:
1. ( ) is the deviance of the regression modell that
includes the constant term without any explicative variable,
being expressed as follows:

lognormal distribution does not belong to the exponential


family.
Charpentier and Denuit [6] concludes that the obtained
results, after applying Gamma and lognormal models to an
automobile insurance study, are slightly different. He also
states that the divergence between the two models is
highlighted by the restrictive properties of Gamma model to
extreme values. Thereby, the advantage of using the lognormal
model is underlined, which assumes that the costs logarithm
reduces the importance of exceptional claims values.
Since the introduction of GLM techniques in non-life
insurance pricing is the fundamental topic of this paper, further
on the focus is only on defining the Gamma regression model.
Gamma model
For the modeling of claim costs in automobile insurance,
Pinquet [19] describes a simple parametric, but realistic, based
on Gamma distribution, which represents another
generalization of the exponential family.
Noting with
the costs of claims caused by
insured and assuming that they are independently Gamma
distributed, the probability density function is given by:
( )

( )

verifying the mean ( )


where

is the number of observations.

2. ( ) represents the dispersion corresponding to the


considered regression model and is defined bellow:
(

( (
(

( )

))

where is the number of observations and


of variables included in the model.

and variance ( )

)
.

The equations of the log-likelihood function allowing to


obtain the estimators are given by:

is the number

The decision rule related to models goodness of fit is taken


on the basis of accepting or rejecting the null hypothesis
, according to which none of the
2
j
independent variables provide relevant information about the
risk occurrence, against the alternative hypothesis
, which argues that the model explains the
j
significant relationship between the dependent variable and the
explanatory variables.
C. Estimation model of claim costs
Claims amount (or economic compensations to be paid) is
more difficult to predict than claim frequency. In this case, the
analysis is less clear because there are no distributions for
positive real values. The literature argues that the classical
methods allowing the econometric modeling of claim costs are
the Gamma and lognormal models, specifying that the

The log-likelihood function for the Gamma model is given


as follows:

( )
)

or can be simplified as

Defining

the estimated cost of a claim for


the insured , the maximum likelihood estimates is the
solution of the following equation:

Social Sciences - Economics and Business

eISSN: 1339-9373, cdISSN: 1339-2778


- 35 -

ISBN: 978-80-554-1003-6

The 3rd Global Virtual Conference


April, 6. - 10. 2015, www.gv-conference.com

The previous expression is being interpreted as an


orthogonality relationship between explanatory variables and
residuals.
Actuarial literature argues that the main advantage of
applying the Gamma model is due to parameters and ,
through which more flexibility is obtained while estimating the
cost of claims.
Gamma models goodness of fit
Gamma regression models goodness of fit is performed by
means of Fisher statistics that is constructed on the basis of the
difference between the deviance of model without explanatory
variables ( ), and the deviance of model that includes all the
significant risk factors ( ). Considering the estimate of the
dispersion parameter ( ) for the latter model, there is obtained
through the expression bellow:

that follows approximately the Fischer distribution of


parameters:
is the number of parameters corresponding to
model without variables;
represents the number of
parameters from the model that includes the significant risk
factors and n is the number of observations used.
D. Pure premium calculation
In non-life insurance, the pure premium represents the
expected cost of all claims declared by policyholders during the
insured period. The calculation of the premium is based on
statistical methods that incorporate all available information
about the accepted risk, thereby aiming at a more accurate
assessment of tariffs attributed to each insured.
The basis for calculating the pure premium is the
econometric modeling of the frequency and cost of claims
depending to the characteristics that define the insurance
contracts. The pure premium is the mathematical expectation
of the annual cost of claims declared by the policyholders and
is obtained by multiplying the two components, the estimated
frequency and estimated cost of claims:

for the claims amount (


number ( ).

) independent of their

Within the context of insurance pricing, the separate


evaluation of frequency and cost of claims is particularly
relevant since the risk factors that influence the two
components of the insurance premium are usually different.
Essentially, the separate analysis of the two elements provides
a clearer perspective on how the risk factors are influencing the
insurance tariff.
III.

PREPARE YOUR PAPER BEFORE STYLING

A. Data Used
In this paper, the data used constitute a French automobile
portfolio insured against theft of the vehicle and possibly
damage to the vehicle, comprising 50000 polices registered
during the year 2009. An insurance policy corresponds to one
policyholder and the elements included in the policies are the
analysis factors presented bellow. Hence, except the explained
variables, the frequency and cost of claims, the other ones are
considered risk factors, known a priori by the insurer and are
used to customize the profile of each insured. These exogenous
variables reflect the insured characteristics: age (18-75 years),
profession (employed, housewife, retired, self-employed,
unemployed); the vehicle features category (large, medium,
small), brand (A, B, C, D, E, F), GPS (Yes, No), purpose of
vehicle usage (private, professional); the insurance contracts
characteristics: duration (0-15 years), bonus-malus coefficient
(50-150 by 10).
Among the explanatory variables introduced in the analysis,
bonus-malus coefficient presents a particular interest, assuming
the increase or decrease of insurance premium depending on
the number of claims registered by an insured during a
reference period. Therefore, if the policyholder does not cause
any responsible accident, he receives a bonus, meaning that the
insurance premium will be reduced with 5%. Contrary, if the
insured is responsible for the accident, he is penalized by
applying a malus of 25% for a claim declared, which will have
the consequence of a premium increase. The implementation
bonus-malus system is different from one country to another,
but the principle remains the same, namely to purchase the
encouragement of prudent insured and the discouragement of
those who, for various reasons, declare many claims, and
thereby they present a high degree of risk for the insurance
company.
B. Equations
Further there are presented and interpreted the results
obtained through the application of the models mentioned,
based on which the pure premium is determined. The variables
entered previously are taken into consideration as risk factors
and the models are fitted using the SAS 9.3 software by means
of GENMOD procedure. This procedure enables the use of
Type 3 analysis that allows the contribution assessment of each
risk factor, considering all the others explanatory variables.
The type 3 analysis provides the values of Chi-Square statistics
for each variable by calculating two times the difference
between the log-likelihood of the model which includes all the
independent variables and the log-likelihood of the model
obtained by deleting one of the specified variables. This
statistics test appreciates the impact of each risk factor on the
studied phenomenon and follows the asymptotic
distribution with df degrees of freedom, representing the
number of parameters associated to the analyzed variable.
Poisson model
By employing the Poisson model to estimate the frequency
of claims, the results obtained are shown in Table 1.

The empirical part of this paper includes a brief


presentation of the used data, based on which a numerical
illustration of the described techniques is performed.

Social Sciences - Economics and Business

eISSN: 1339-9373, cdISSN: 1339-2778


- 36 -

ISBN: 978-80-554-1003-6

The 3rd Global Virtual Conference


April, 6. - 10. 2015, www.gv-conference.com

TABLE I.

Poisson Regression(*)
Source
Age
Occup
Categ
Brand
GPS
Bonus
Poldur
Use

It also notes that the limits of confidence interval increase


and the values of the probability associated to the statistic test
are higher in comparison with the obtained results for Poisson
regression. This finding is obvious, given the fact that the
estimated value of dispersion parameter. However, the two
models include the same explanatory variables for the
frequency of claims which means that the quasi-Poisson model
does not provide further details in comparison with the Poisson
regression and its use is not justified. Therefore, in determining
the insurance premium, there can be used either the results
obtained by approaching Poisson model or those resulted by
means of quasi-Poisson model.

LR STATISTICS FOR TYPE 3 ANALYSIS

Chi-Square

Poisson Regression(**)

Pr > ChiSq

Chi-Square

<.0001
<.0001
0.1252
<.0001
<.0001
<.0001
<.0001
0.2644

91.64
63.81
47.05
84.55
452.06
35.53
-

87.99
63.86
4.16
46.92
84.52
451.80
35.55
1.25

Pr > ChiSq

<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
-

*Poisson regression including all the explanatory variables


**Poisson regression including only the significant explanatory variables

The obtained results suggest that after excluding from the


regression model the category and purpose of vehicle usage,
having p-values higher than the risk of 0.05, the other
predictors remains significant and they clearly underlines an
important influence on the claim frequency.
In order to verify if the data are overdispersed, the most
commonly way is the interpretation of the deviance and
Pearson statistics values. If they exceed a value of 1, it means
that the overdispersion hypothesis is confirmed. Analyzing the
results presented in Table 3, it is noted that the value of
deviance divided by the number of degrees of freedom
(value/df) is less than 1, but for the Pearson statistics the ratio is
1.0797. Although the difference appears not to be significant, it
could indicate an inequality between the mean and variance of
the claim frequency, and thus overdispersion. In order to
correct the overdispersion, the alternative model presented in
the methodological part of the paper is used.
Quasi-Poisson model
After applying the quasi-Poisson model, the values of
regression coefficients do not change, but the standard errors of
the estimators are now increased with the square root of
dispersion parameter of 1.039 (Table 2).
TABLE II.

ANALYSIS OF PARAMETER ESTIMATES

Poisson Regression
Paramete Estima Std
r
te
Error
Intercept
-0.301 0.074
Age
-0.043 0.001
Occup -0.336 0.036
employed
Occup -0.411 0.043
housewife
Occup -0.045 0.065
retired
Occup -0.015 0.038
self employed
Brand - A -0.356 0.055
Brand - B -0.357 0.057
Brand - C -0.308 0.060
Brand - D -0.112 0.056
Brand (E) -0.039 0.060
GPS - No
0.179 0.029
Bonus0.007 0.001
Malus
Poldur
-0.025 0.003
Scale
1.000 0.000

Quasi-Poisson Regression

Wald 95%
Confidence
Limits

Estima
te

Std
Error

Wald 95%
Confidence
Limits

-0.445 -0.157 -0.301


-0.046 -0.041 -0.043
-0.407 -0.265 -0.336

0.076 -0.451 -0.152


0.002 -0.046 -0.041
0.038 -0.410 -0.262

-0.495 -0.328 -0.411

0.044 -0.498 -0.325

-0.171

0.081 -0.045

0.067 -0.177

0.086

-0.091

0.060 -0.015

0.040 -0.094

0.063

-0.464
-0.468
-0.426
-0.222
-0.156
0.122
0.007

-0.248
-0.246
-0.190
-0.003
0.079
0.237
0.007

-0.356
-0.357
-0.308
-0.112
-0.039
0.179
0.007

-0.031
1.000

-0.020 -0.025
1.000 1.039

0.057
0.059
0.063
0.058
0.062
0.030
0.001

In order to assess the Poisson models goodness of fit, the


likelihood ratio is approach and the results are shown in Table
3.
TABLE III.

Poisson regression
Criterion
Deviance
Pearson ChiSquare
Log Likelihood
(
)*
Log Likelihood
( )**

DF

Gamma regression

Value/
DF

Value

50000 38486.723
50000 161961.992

DF

0.257
1.080

Value/
DF

Value

5500 5598.940
5500 5130.767

1.017
0.930

-24548.229

-47901.574

-25651.178

-48480.555

*the loglikelihood for the regression model including all the significant risk factors
**the loglikelihood for the regression model without explanatory variables

Based on the difference between the deviance model


including all the significant risk factors and the one of the
model without explanatory variables, it is obtained the value of
likelihood ratio (LR = 2205.9 >
= 12.592). Therefore,
the result highlights that the Poisson model regression is
qualitatively significant in order to estimate the claim
frequency within the studied insurance portfolio.
Gamma model
The next step in establishing the insurance premium resides
in estimating the cost of claims based on the risk factors
considered by the insurance company. Therefore, for the
Gamma model, the obtained results (Table 3) suggest that for
the analyzed portfolio, the cost of claims is influenced by the
age and profession of insured, and also by the vehicles brand.
TABLE IV.

LR STATISTICS FOR TYPE 3 ANALYSIS

Gamma Regression*
Source
Age
Occup
Brand
Categ
GPS
BonusMalus
Poldur

-0.468 -0.244
-0.473 -0.242
-0.431 -0.185
-0.226 0.001
-0.160 0.083
0.120 0.239
0.007 0.008

0.004 -0.031 -0.020


0.000 1.028 1.056

CRITERIA FOR ASSESSING GOODNESS OF FIT

Chi-Square

Pr > ChiSq

Gamma Regression**
Chi-Square

Pr > ChiSq

181.64
90.66
78.98
2.11
21.84
0.13

<.0001
<.0001
<.0001
0.3485
<.0001
0.7136

189.84
91.73
79.91
22.62
-

<.0001
<.0001
<.0001
<.0001
-

0.46

0.4980

*Gamma model including all the explanatory variables


**Gamma model including only the significant explanatory variables

The influence factors of the claim costs are different from


the factors corresponding to the frequency of claims, fact that
confirms the assumption suggested by the actuary literature

Social Sciences - Economics and Business

eISSN: 1339-9373, cdISSN: 1339-2778


- 37 -

ISBN: 978-80-554-1003-6

The 3rd Global Virtual Conference


April, 6. - 10. 2015, www.gv-conference.com

regarding the separate analysis of these two elements. Based on


the cost of claims, it is not possible to obtain conclusive
information regarding the risk occurrence probability and the
insurance company cannot properly divide the policyholders.
Nevertheless, the amount of cost is a fundamental component
considered while establishing the insurance premium.
The last step of claim cost analysis consist in measuring the
quality of Gamma regression model by means of Fisher
statistic test detailed previously in the paper. The results
obtained are shown in Table 3. Within the studied portfolio, for
the final regression model, the obtained value of Fisher statistic
test (
= 98.445) is much higher than the theoretical value
(
= 1.831), meaning that the proposed Gamma
model fits well the data and its employment is significant in
order to explain the variation of claim cost.
Pure premium model
The process of establishing the insurance premium resides
in using the same procedure GENMOD as noticed in previous
cases, the obtained results being summarized in Table V. In
this stage of non-life insurance pricing, the explained variable
is the product between the estimated frequency and the
estimated cost of claims:
((

the calculated value representing the insurance pure


premium established for insured , characterized by the
variables vector .
TABLE V.

ANALYSIS OF PARAMETER ESTIMATES


Poisson
Regression

Parameter
Intercept
Age ( )
Occup - employed ( )
Occup - housewife ( )
Occup - retired ( )
Occup - self-employed
( )
Brand - A ( )
Brand - B ( )
Brand - C ( )
Brand - D ( )
Brand - E ( )
GPS - no ( )
Bonus-Malus ( )
Poldur ( )

Gamma
Regression

-0.301
-0.043
-0.336
-0.411
-0.045
-0.015

0.074
0.001
0.036
0.043
0.065
0.038

8.456
-0.012
-0.167
0.024
0.023
0.297

0.108
0.002
0.064
0.077
0.111
0.069

Pure
Premium
6.268
-0.031
-0.433
-0.394
-0.220
0.349

-0.356
-0.357
-0.308
-0.112
-0.039
0.179
0.007
-0.025

0.055
0.057
0.060
0.056
0.060
0.029
0.001
0.003

-0.421
-0.446
-0.300
-0.172
-0.203
-

0.093
0.096
0.103
0.096
0.102
-

-0.778
-0.875
-0.674
-0.359
-0.246
0.422
0.008
-0.029

Estimate

Std
Error

Estimate

Std
Error

Considering this relationship within the analyzed insurance


portfolio, the pure premium for each category of policyholders
is established based on the Gamma regression model, including
all the statistical relevant tariff variables that explain the
variation of claim frequency and costs. More explicit, the
relation between the premium and the risk factors is express
through the regression model written as follows:

Therefore, this regression model allows to obtain the pure


premium corresponding to each tariff class through the
expression:
(
). For example, by using these results, it
can be established the higher risk profile of policyholder.
Taking into consideration the coefficients sign, it can be
observed that the higher risk profile is presented by the
policyholders aged 18 years, being self-employed, with a
vehicle of brand F, without a GPS device, with a bonus-malus
of 150 and being the client of the company for less than a year.
In summary, the obtained results lead to using tariffs
corresponding to the proper risk levels induced by the insured
to the insurance company. The default purpose of the non-life
insurance pricing is deduced from the idea that the new polices
will be mostly established for drivers that fit the profile
generated after establishing the insurance tariffs. Thereupon,
the pure premium will be used for the new policyholders that
will be classified in one of the tariff categories already defined
by the insurance company.
IV.

CONCLUSIONS

In this paper, it was considered an analysis of Generalized


Linear Models in order to establish the pure premium given the
characteristics of the policyholders. Therefore, as a first stage,
while using the Poisson and quasi-Poisson models in the
framework of GLMs, it was obtained a decrease of claim
frequency along with an increase of insureds age and age of
the insurance contracts, and also an increase along with the
bonus-malus coefficient increase. These results are consistent
with the reality of the studied phenomenon, so their
interpretation is considered to be logic and valid.
After the comparison of these two regression models, we
observed that although the quasi-Poisson model corrects the
overdispersion, the risk factors included in Poisson regression
appear to be significant for both models. Therefore, the
regression coefficients do not change and there is no change
that should be made in terms of establishing the expected
frequency of claims. In the next analysis stage, by using the
Gamma regression model we obtained the estimated average
level of the cost of claims corresponding to each category of
policyholders.
Eventually, the empirical results have shown that for the
new customers, the insurance premium will be established
considering a series of risk factors, like age, profession, brand,
purpose of vehicle usage, GPS, bonus-malus coefficient and
age of the insurance contract. Based on the regression
coefficients sign, there could be established the profile of the
riskier policyholders. Taking into consideration these elements,
the insurance company can establish a fair and reasonable
premium associated with each insured profile. Moreover, the

Social Sciences - Economics and Business

eISSN: 1339-9373, cdISSN: 1339-2778


- 38 -

ISBN: 978-80-554-1003-6

The 3rd Global Virtual Conference


April, 6. - 10. 2015, www.gv-conference.com

company could implement a pricing policy that could fairly


discriminate the portfolio, and thereby allowing a better
understanding of insureds behavior and an accurate
assessment of the risks to be insured.

[5]

The conclusions of this study are representative and useful


for the insurance company business, but they do not present a
generalized character, therefore they cannot be applied to all
portfolios or insurance companies. On one side, this aspect is
justified by the data used and the risk factors considered during
the analysis process, meaning that every insurer can use
different information on the insured to their benefit. On the
other side, the used data is not obtained through a random
selection related to the entire population of policyholders.

[7]

This work was supported by the European Social Fund


through Sectoral Operational Programme Human Resources
Development 2007-2013, project number POSDRU/159/1.5/S/
34 97, project title Performance and Excellence in Doctoral
and Postdoctoral Research in Economic Sciences Domain in
Romania.

[3]
[4]

[9]

[10]

[12]
[13]
[14]
[15]

REFERENCES

[2]

[8]

[11]

ACKNOWLEDGMENT

[1]

[6]

E. Allain and T. Brenac, Modles linaires gnraliss appliqus


l'tude des nombres d'accidents sur des sites routiers: le modle de
Poisson et ses extensions, Recherche Transports Scurit, vol. 72, pp.
3-18, 2012.
K. Antonio and E.A. Valdez, Statistical concepts of a priori and a
posteriori risk classification in insurance, Advances in Statistical
Analysis, vol. 96(2), pp. 187-224, 2012.
L. Asandului, Metode statistice de analiz a datelor categoriale, Wolters
Kluwer, Bucureti, 2010.
J.P. Boucher, M. Denuit and M. Guillen, Risk classification for claims
counts - A comparative analysis of various zero-inflated mixed Poisson
and hurdle models, North American Actuarial Journal, vol. 11(4), pp.
110-131, 2007.

[16]
[17]
[18]
[19]
[20]

Social Sciences - Economics and Business

A.C. Cameron and P.K. Trivedi, Regression Analysis of Count Data,


Econometric Society Monograph. New York: Cambridge University
Press, 1998.
A. Charpentier and M. Denuit, Mathmatiques de lassurance non-vie,
Tome II: Tarification et provisionnement. Paris: Economica, 2005.
M. Denuit and S. Lang, Nonlife ratemaking with Bayesian GAMs,
Insurance: Mathematics and Economics, vol. 35(3), pp. 627-647, 2004.
G. Dionne and C. Vanasse, A generalization of automobilemobile
insurance rating models: the negative binomial distribution with a
regression component, ASTIN Bulletin, vol. 19(2), pp. 199-212, 1989.
G. Dionne and C. Vanasse, Automobile insurance ratemaking in the
presence of asymmetrical information, Journal of Applied
Econometrics, vol. 7(2), pp. 149-65, 1992.
E.W. Frees, Regression Modeling with Actuarial and Financial
Applications. New York: Cambridge University Press, 2010.
C. Gourieroux and J. Jasiak, Heterogeneous INAR(1) model with
application to car insurance, Insurance: Mathematics and Economics,
vol. 34(2), pp. 177-192, 2004.
Hilbe, J.M., 2014. Modeling count data. New York: Cambridge
University Press.
P. Jong and G. Zeller, Generalized Linear Models for Insurance Data.
New York: Cambridge University Press, 2008.
R. Kaas, M. Goovaerts, J. Dhaene and M. Denuit, Modern Actuarial
Risk Theory. London: Springer Verlag, 2009.
L. Lemaire, Automobile Insurance: Actuarial Models, Huebner
International Series on Risk, Insurance and Economic Security, 1985.
P. McCullagh and J.A. Nelder, Generalized Linear Models, 2nd ed.
London: Chapman and Hall, 1989.
J.A. Nelder and R.W.M. Wedderburn, Generalized linear interactive
models, Journal of the Royal Statistical Society, pp. 370-384, 1972.
E. Ohlsson and B. Johansson, Non-life insurance pricing with
Generalized Linear Models. London: Springer Verlag, 2010.
J. Pinquet, Allowance for cost of claims in bonus-malus systems,
ASTIN Bulletin, vol. 27, pp. 33-57,
K. Yip and K. Yau, On modeling claim frequency data in general
insurance with extra zeros, Insurance: Mathematics and Economics,
vol. 36(2), pp. 153-163, 2005.

eISSN: 1339-9373, cdISSN: 1339-2778


- 39 -

ISBN: 978-80-554-1003-6

S-ar putea să vă placă și